CysDBase: Database Collection Of Cysteine Post-Translational Modifications
About the CysDBase

Information on the Data

Introduction

Cysteine is an highly reactive amino acids among all the twenty amino acids and is an sulfur containing amino acid. Due to its electron rich nature it can exhibit multiple oxidation states.It has a side chain thiol group which participates in many catalytic reactions and also. The reactive nature of thiol group is susceptible to many Post-Translational Modifications (PTM's) and participates in wide range of biological activites which include the enzymatic reaction alteration, protein-protein interactions,cellular redox environment, ROS-mediated signalling pathways,cell proliferation, differentiation, migration and angiogenesis. Cys undergoes many different types of Post-Translational Modifications (PTM's) such as Oxidation, Disulfide, Thiother, Glutathionylation, S-Nitrosylation, S-Palmitoylation, S-Sulfenylation, Metal-Binding, S-Sulfinylation, S-Sulfonylation, Persulfidation and Sulfhydration.(Figure 1).

Figure.1.Different post-translational modifications of cysteine residues.Modifications, generally, occuring under stress are shown in the left panel and those often occur under normal conditions are shown in the right panel.

Curation Of The Data

Data for developing the database has been curated from UniProt and other existing resources . The data for the specific Post-Translational Modifications (PTM's) has been curated from the UniProt using the keywords of the UniProt Database (Table 01) . The sequence information has also been extracted from some of the biological process keywords available in the UniProt Database . To extract data for each modification different keywords has been used for each modification.

Keywords : Curation of PTM's data

Keyword Cys PTM's Modifications
Disulfide bond Disulphide
S-glutathionyl cysteine S-Glutathionylation
2Fe-2S

3Fe-4S

4Fe-4S

Iron-sulfur

Metal-thiolate cluster

Metallothionein
Metal-Binding
Prenylation

Palmitate

S-diacylglycerol cysteine
S-Palmitoylation
Cysteine sulfenic acid (-SOH) S-Sulfenylation
S-methylcysteine

Lanthionine (Cys-Ser)

Beta-methyllanthionine (Thr-Cys)

S-(2-aminovinyl)-3-methyl-D-Cysteine(Thr-Cys)

5-amino-piperideine-2,5-dicarboxylic acid (Ser-Cys)

(4S)-Thiazoline-4-carboxylic acid (Thr-Cys)

Thiazole-4-Carboxylic Acid (Ile-Cys)

Thiazole-4-Carboxylic Acid (Thr-Cys)

Thiazole-4-Carboxylic Acid (Ser-Cys)

2-(S-Cysteinyl)-Histidine (Cys-His)

3-(S-Cysteinyl)-Tyrosine (Cys-Tyr)

4-Cysteinyl-Glutamic Acid (Cys-Glu)

3-Cysteinyl-Aspartic Acid (Cys-Asp)

4’-Cysteinyl-Tryptophylquinone(Cys-Trp)

3-cysteinyl-aspartic acid (Cys-Asp)

Oxazoline-4-Carboxylic Acid(Cys-Ser)

Cyclopeptide (Cys-Arg)(Arg-Cys)

S-(2,3-dicarboxypropyl)cysteine

Thioether bond
Thioether
S-nitrosylation

S-nitrosocysteine
S-Nitrosylation
Table.01.List of UniProt Keywords used to curate datasets for the Cys PTM's.

How the Database has been made

The data has been curated from different resources and databases using the Python script. The parameters obtained as output are Cys Residue, Cys Post-Translational Modifications, Sequences, Location, Pathway, PubMedID PDB_ID, Cys_Residue and from MENV Buried Fraction, rHpy (Relative Hydrophobcity) and are curated from the UniProt database and downloaded in the form of GFF(General Feature Format) and is then saved CSV file and is finally stored in the database.



Figure.2.Schematic Representation of Database Development

Output Parameters in the Database

Sequence parameters Information

The sequence related information is curated from the UniProt database . The curated information is Protein Name , Organism , Length of the protein and the EC Number.

  • Protein Name:

  • The protein name given in the UniProt database is curated and stored . Different proteins have been curated across diverse species

  • Organism:

  • The Cys PTM's information has been curated from diverse species which is the first database for Cys PTM's.

  • Length:

  • The length of the protein is also curated and stored

  • EC Number:

  • The EC number (Enzyme Commision Number) describes the enzyme class to which the protein belongs to based on the EC number. The EC number is a four digit number where the first digit denotes the class of the enzymes it belongs.(For e.g; EC = 1.1.2.3 ; where one denotes Oxidoreductases)

  • 1 = Oxidoreductases

  • Oxidoreductases denotes the enzymes that catalyze oxidation or reduction reactions by transferring electrons, hydrogens or oxygens from a reductant molecule to an oxidant molecule.

  • 2 = Transferases

  • Transferases denotes the enzymes which catalyze the transfer of a functional group (E.g.; Acetyl group) from one a donor molecule to a receptor molecule.

  • 3 = Hydrolases

  • Hydrolases denotes the enzymes that catalyze the breakdown of a chemical bond through hydrolysis.

  • 4 = Lyases

  • Lyases denotes the enzymes that catalyze the breaking of various chemical bonds by means other than hydrolysis and oxidation.

  • 5 = Isomerases

  • Isomerases denotes the enzymes that catalyze the structural rearrangement of isomers.

  • 6 = Ligases

  • Ligases denotes the enzymes that catalyze the formation of a chemical bond between two large molecules.

  • 7 = Translocases

  • Translocases denotes the enzymes that regulate or permeate the transfer of ions or molecules across the membranes.

  • Cell Organelle:

  • Cell Organelles describes where the location of the Cys PTM's occurs.

    Sequences

    The sequences are trimmed to window size = 7. The seqeunce are trimmed to window size = 7 where the Cys is in the center and the total length of the sequence will be 15 . Sequences are trimmed based on the forumala i.e..., 2n+1 where n = 7 and it will be 2*7+1=15 . (Figure.3)


    Figure.3.Window Size Representation of the Sequence (Window Size = 7).

    Structure Analysis

    Structure Information

    PDB ID has a four letter code where can be obtained from the PDB database. PDB strucutre is required for the computing the protein microenvironment and it is downloaded from the PDB database.

    Protein Microenvironment

    Protein Microenvironment is an tool for the strucutre-function prediction, analysis, and parameter development for the calculation of properties in the proteins.It is quantified in form of Buried Fraction and Relative Hydrophobocity (rHpy). The protein micro-environment was developed previously in our lab which has developed to charactersize the hydrophobicity or hydrophilicity of the microenvironment in which the given amino acid side chain is immersed by calculating the quantitative property descriptor (QPD) based on the relative hydrophobicity of the MENV.

    Buried Fraction :

    Defined as the normalized surface area of the Cys thiol group buried inside the protein. The value of this ranges from 0.0 to 1.0 and 0 indicates that the thiol group is completely exposed to the solvent.

    Relative Hydrophobicity (rHpy) :

    Microenvironment property descriptor describes the relative hydrophilic contribution of protein and the solvent toward the Cys thiol group within its first contact shell . According to the mathematical formulation if the value adopts a upper limit of one it embedded in an pure aqueous solvent. rHpy is a non-local descriptor.

    Buried fraction and rHpy together constituted protein microenvironment space around the Cys thiol group.

    Literature Information

    PubMed ID :

    The PubMed ID are curated for to obtain the literature information of the respective proteins . PubMed ID are curated from the UniProt database.