1
|
Wang X, Li C, Li F, Sharma VS, Song J, Webb GI. SIMLIN: a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models. BMC Bioinformatics 2019; 20:602. [PMID: 31752668 PMCID: PMC6868744 DOI: 10.1186/s12859-019-3178-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 10/28/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (-SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation. RESULTS In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods. CONCLUSIONS In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.
Collapse
Affiliation(s)
- Xiaochuan Wang
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
- Division of Cancer Epidemiology, Cancer Council Victoria, Melbourne, VIC 3004 Australia
| | - Chen Li
- Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093 Zürich, Switzerland
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Fuyi Li
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Varun S. Sharma
- Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800 Australia
| | - Geoffrey I. Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|
2
|
Verification and characterization of an alternative low density lipoprotein receptor-related protein 1 splice variant. PLoS One 2017; 12:e0180354. [PMID: 28662213 PMCID: PMC5491174 DOI: 10.1371/journal.pone.0180354] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/14/2017] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Low density lipoprotein (LDL) receptor-related protein 1 (LRP1) is a ubiquitously expressed multi-ligand endocytosis receptor implicated in a wide range of signalling, among others in tumour biology. Tumour-associated genomic mutations of the LRP1 gene are described, but nothing is known about cancer-associated expression of LRP1 splice variants Therefore, the focus of this study was on an annotated truncated LRP1 splice variant (BC072015.1; NCBI GenBank), referred to as smLRP1, which was initially identified in prostate and lung carcinoma. METHODS Using PCR and quantitative PCR, the expression of LRP1 and smLRP1 in different human tissues and tumour cell lines was screened and compared on tumour biopsies of head and neck squamous cell carcinoma (HNSCC). Using a recently developed anti-smLRP1 antibody, the expression of the putative LRP1 protein isoform in tumour cell lines in Western blot and immunofluorescence staining was further investigated. RESULTS The alternative transcript smLRP1 is ubiquitously expressed in 12 human cell lines of different origin and 22 tissues which is similar to LRP1. A shift in expression of smLRP1 relative to LRP1 towards smLRP1 was observed in most tumour cell lines compared to healthy tissue. The expression of LRP1 as well as smLRP1 is decreased in HNSCC cell lines in comparison to healthy mucosa. In vitro results were checked using primary HNSCC. Furthermore, the expression of the protein isoform smLRP1 (32 kDa) was confirmed in human tumour cell lines. CONCLUSIONS Similar to LRP1, the truncated splice variant smLRP1 is ubiquitously expressed in healthy human tissues, but altered in tumours pointing to a potential role of smLRP1 in cancer. Comparative results suggest a shift in expression in favour of smLRP1 in tumour cells that warrant further evaluation. The protein isoform is suggested to be secreted.
Collapse
|
3
|
Bui VM, Weng SL, Lu CT, Chang TH, Weng JTY, Lee TY. SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. BMC Genomics 2016; 17 Suppl 1:9. [PMID: 26819243 PMCID: PMC4895302 DOI: 10.1186/s12864-015-2299-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2299-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Van-Minh Bui
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management, Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, 110, Taiwan.
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
4
|
Bui VM, Lu CT, Ho TT, Lee TY. MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs. Bioinformatics 2015; 32:165-72. [PMID: 26411868 DOI: 10.1093/bioinformatics/btv558] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 09/18/2015] [Indexed: 01/12/2023] Open
Abstract
UNLABELLED S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs. AVAILABILITY AND IMPLEMENTATION The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/. All of the data set used in this work is also available for download in the website. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT francis@saturn.yzu.edu.tw.
Collapse
Affiliation(s)
- Van-Minh Bui
- Department of Computer Science and Engineering and
| | | | - Thi-Trang Ho
- Department of Computer Science and Engineering and
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering and Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
5
|
Chen YJ, Lu CT, Huang KY, Wu HY, Chen YJ, Lee TY. GSHSite: exploiting an iteratively statistical method to identify s-glutathionylation sites with substrate specificity. PLoS One 2015; 10:e0118752. [PMID: 25849935 PMCID: PMC4388702 DOI: 10.1371/journal.pone.0118752] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 01/06/2015] [Indexed: 01/13/2023] Open
Abstract
S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences.
Collapse
Affiliation(s)
- Yi-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
| | - Cheng-Tsung Lu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Hsin-Yi Wu
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYL); (YJC)
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, Taiwan
- * E-mail: (TYL); (YJC)
| |
Collapse
|
6
|
Zhao X, Ning Q, Ai M, Chai H, Yin M. PGluS: prediction of protein S-glutathionylation sites with multiple features and analysis. MOLECULAR BIOSYSTEMS 2015; 11:923-9. [PMID: 25599514 DOI: 10.1039/c4mb00680a] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
S-Glutathionylation is a reversible protein post-translational modification, which generates mixed disulfides between glutathione (GSH) and cysteine residues, playing an important role in regulating protein stability, activity, and redox regulation. To fully understand S-glutathionylation mechanisms, identification of substrates and specific S-glutathionylated sites is crucial. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of S-glutathionylated sites are very desirable due to their convenience and high speed. Therefore, in this study, a new bioinformatics tool named PGluS was developed to predict S-glutathionylated sites based on multiple features and support vector machines. The performance of PGluS was measured with an accuracy of 71.41% and a MCC of 0.431 using the 5-fold cross-validation on the training dataset. Additionally, PGluS was evaluated using an independent testing dataset resulting in an accuracy of 71.25%, which demonstrated that PGluS was very promising for predicting S-glutathionylated sites. Furthermore, feature analysis was performed and it was shown that all features adopted in this method contributed to the S-glutathionylation process. A site-specific analysis showed that S-glutathionylation was intimately correlated with the features derived from its surrounding sites. The conclusions derived from this study might help to understand more of the S-glutathionylation mechanism and guide the related experimental validation. For public access, PGluS is freely accessible at .
Collapse
Affiliation(s)
- Xiaowei Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China.
| | | | | | | | | |
Collapse
|
7
|
Yaseen A, Li Y. Dinosolve: a protein disulfide bonding prediction server using context-based features to enhance prediction accuracy. BMC Bioinformatics 2013; 14 Suppl 13:S9. [PMID: 24267383 PMCID: PMC3849605 DOI: 10.1186/1471-2105-14-s13-s9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Disulfide bonds play an important role in protein folding and structure stability. Accurately predicting disulfide bonds from protein sequences is important for modeling the structural and functional characteristics of many proteins. Methods In this work, we introduce an approach of enhancing disulfide bonding prediction accuracy by taking advantage of context-based features. We firstly derive the first-order and second-order mean-force potentials according to the amino acid environment around the cysteine residues from large number of cysteine samples. The mean-force potentials are integrated as context-based scores to estimate the favorability of a cysteine residue in disulfide bonding state as well as a cysteine pair in disulfide bond connectivity. These context-based scores are then incorporated as features together with other sequence and evolutionary information to train neural networks for disulfide bonding state prediction and connectivity prediction. Results The 10-fold cross validated accuracy is 90.8% at residue-level and 85.6% at protein-level in classifying an individual cysteine residue as bonded or free, which is around 2% accuracy improvement. The average accuracy for disulfide bonding connectivity prediction is also improved, which yields overall sensitivity of 73.42% and specificity of 91.61%. Conclusions Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracies of both disulfide bonding state prediction and connectivity prediction. Our disulfide prediction algorithm is implemented on a web server named "Dinosolve" available at: http://hpcr.cs.odu.edu/dinosolve.
Collapse
|
8
|
Prediction of S-glutathionylation sites based on protein sequences. PLoS One 2013; 8:e55512. [PMID: 23418443 PMCID: PMC3572087 DOI: 10.1371/journal.pone.0055512] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 12/30/2012] [Indexed: 01/10/2023] Open
Abstract
S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH) and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve) score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB.
Collapse
|
9
|
Singh R, Murad W. Protein disulfide topology determination through the fusion of mass spectrometric analysis and sequence-based prediction using Dempster-Shafer theory. BMC Bioinformatics 2013; 14 Suppl 2:S20. [PMID: 23368815 PMCID: PMC3549834 DOI: 10.1186/1471-2105-14-s2-s20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Disulfide bonds constitute one of the most important cross-linkages in proteins and significantly influence protein structure and function. At the state-of-the-art, various methodological frameworks have been proposed for identification of disulfide bonds. These include among others, mass spectrometry-based methods, sequence-based predictive approaches, as well as techniques like crystallography and NMR. Each of these frameworks has its advantages and disadvantages in terms of pre-requisites for applicability, throughput, and accuracy. Furthermore, the results from different methods may concur or conflict in parts. Results In this paper, we propose a novel and theoretically rigorous framework for disulfide bond determination based on information fusion from different methods using an extended formulation of Dempster-Shafer theory. A key advantage of our approach is that it can automatically deal with concurring as well as conflicting evidence in a data-driven manner. Using the proposed framework, we have developed a method for disulfide bond determination that combines results from sequence-based prediction and mass spectrometric inference. This method leads to more accurate disulfide bond determination than any of the constituent methods taken individually. Furthermore, experiments indicate that the method improves the accuracy of bond identification as compared to leading extant methods at the state-of-the-art. Finally, the proposed framework is extensible in that results from any number of approaches can be incorporated. Results obtained using this framework can especially be useful in cases where the complexity of the bonding patterns coupled with specificities of the fragmentation pattern or limitations of computational models impair any single method to perform consistently across a diverse set of molecules.
Collapse
Affiliation(s)
- Rahul Singh
- Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA.
| | | |
Collapse
|
10
|
Savojardo C, Fariselli P, Martelli PL, Casadio R. Prediction of disulfide connectivity in proteins with machine-learning methods and correlated mutations. BMC Bioinformatics 2013; 14 Suppl 1:S10. [PMID: 23368835 PMCID: PMC3548674 DOI: 10.1186/1471-2105-14-s1-s10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background Recently, information derived by correlated mutations in proteins has regained relevance for predicting protein contacts. This is due to new forms of mutual information analysis that have been proven to be more suitable to highlight direct coupling between pairs of residues in protein structures and to the large number of protein chains that are currently available for statistical validation. It was previously discussed that disulfide bond topology in proteins is also constrained by correlated mutations. Results In this paper we exploit information derived from a corrected mutual information analysis and from the inverse of the covariance matrix to address the problem of the prediction of the topology of disulfide bonds in Eukaryotes. Recently, we have shown that Support Vector Regression (SVR) can improve the prediction for the disulfide connectivity patterns. Here we show that the inclusion of the correlated mutation information increases of 5 percentage points the SVR performance (from 54% to 59%). When this approach is used in combination with a method previously developed by us and scoring at the state of art in predicting both location and topology of disulfide bonds in Eukaryotes (DisLocate), the per-protein accuracy is 38%, 2 percentage points higher than that previously obtained. Conclusions In this paper we show that the inclusion of information derived from correlated mutations can improve the performance of the state of the art methods for predicting disulfide connectivity patterns in Eukaryotic proteins. Our analysis also provides support to the notion that improving methods to extract evolutionary information from multiple sequence alignments greatly contributes to the scoring performance of predictors suited to detect relevant features from protein chains.
Collapse
Affiliation(s)
- Castrense Savojardo
- Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, 41029 Bologna, Italy
| | | | | | | |
Collapse
|
11
|
Bianco G, Labella C, Pepe A, Cataldi TRI. Scrambling of autoinducing precursor peptides investigated by infrared multiphoton dissociation with electrospray ionization and Fourier transform ion cyclotron resonance mass spectrometry. Anal Bioanal Chem 2012. [PMID: 23208287 DOI: 10.1007/s00216-012-6583-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Two synthetic precursor peptides, H(2)N-CVGIW and H(2)N-LVMCCVGIW, involved in the quorum sensing of Lactobacillus plantarum WCFS1, were characterized by mass spectrometry (MS) with electrospray ionization and 7-T Fourier transform ion cyclotron resonance (ESI-FTICR) instrument. Cell-free bacterial supernatant solutions were analyzed by reversed-phase liquid chromatography with ESI-FTICR MS to verify the occurrence of both pentapeptide and nonapeptide in the bacterial broth. The structural characterization of both protonated peptides was performed by infrared multiphoton dissociation using a continuous CO(2) laser source at a wavelength of 10.6 μm. As their fragmentation behavior cannot be directly derived from the primary peptide structure, all anomalous fragments were interpreted as neutral loss of amino acids from the interior of both peptides, i.e., loss of V, G, VG and M, MC, V, CC, from H(2)N-CVGIW and H(2)N-LVMCCVGIW, respectively. Mechanisms of this scrambling are proposed. FTICR MS provides accurate masses of all fragment ions with very low absolute mass errors (<1.6 ppm), which facilitated the reliable assignment of their elemental compositions. The resolving power was more than sufficient to resolve closely isobaric product ions with routine subparts per million mass accuracies. Only the occurrence of pentapeptide was found in the cell-free culture of L. plantarum, grown in Waymouth's medium broth, with a low content of 5.2 ± 2.6 μM by external calibration. Most of it was present as oxidized H(2)N-CVGIW, that is, the soluble disulfide pentapeptide with a level tenfold higher (i.e., 50 ± 4 μM, n = 3).
Collapse
Affiliation(s)
- Giuliana Bianco
- Dipartimento di Scienze, Università degli Studi della Basilicata, Potenza, Italy
| | | | | | | |
Collapse
|
12
|
CMD: A Database to Store the Bonding States of Cysteine Motifs with Secondary Structures. Adv Bioinformatics 2012; 2012:849830. [PMID: 23091487 PMCID: PMC3474208 DOI: 10.1155/2012/849830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 09/06/2012] [Indexed: 11/18/2022] Open
Abstract
Computational approaches to the disulphide bonding state and its connectivity pattern prediction are based on various descriptors. One descriptor is the amino acid sequence motifs flanking the cysteine residue motifs. Despite the existence of disulphide bonding information in many databases and applications, there is no complete reference and motif query available at the moment. Cysteine motif database (CMD) is the first online resource that stores all cysteine residues, their flanking motifs with their secondary structure, and propensity values assignment derived from the laboratory data. We extracted more than 3 million cysteine motifs from PDB and UniProt data, annotated with secondary structure assignment, propensity value assignment, and frequency of occurrence and coefficiency of their bonding status. Removal of redundancies generated 15875 unique flanking motifs that are always bonded and 41577 unique patterns that are always nonbonded. Queries are based on the protein ID, FASTA sequence, sequence motif, and secondary structure individually or in batch format using the provided APIs that allow remote users to query our database via third party software and/or high throughput screening/querying. The CMD offers extensive information about the bonded, free cysteine residues, and their motifs that allows in-depth characterization of the sequence motif composition.
Collapse
|
13
|
Berkmen M. Production of disulfide-bonded proteins in Escherichia coli. Protein Expr Purif 2012; 82:240-51. [DOI: 10.1016/j.pep.2011.10.009] [Citation(s) in RCA: 115] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Revised: 10/24/2011] [Accepted: 10/27/2011] [Indexed: 10/15/2022]
|
14
|
Kondov I, Verma A, Wenzel W. Performance assessment of different constraining potentials in computational structure prediction for disulfide-bridged proteins. Comput Biol Chem 2011; 35:230-9. [PMID: 21864792 DOI: 10.1016/j.compbiolchem.2011.04.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2011] [Revised: 04/18/2011] [Accepted: 04/20/2011] [Indexed: 11/17/2022]
Abstract
The presence of disulfide bonds in proteins has very important implications on the three-dimensional structure and folding of proteins. An adequate treatment of disulfide bonds in de-novo protein simulations is therefore very important. Here we present a computational study of a set of small disulfide-bridged proteins using an all-atom stochastic search approach and including various constraining potentials to describe the disulfide bonds. The proposed potentials can easily be implemented in any code based on all-atom force fields and employed in simulations to achieve an improved prediction of protein structure. Exploring different potential parameters and comparing the structures to those from unconstrained simulations and to experimental structures by means of a scoring function we demonstrate that the inclusion of constraining potentials improves the quality of final structures significantly. For some proteins (1KVG and 1PG1) the native conformation is visited only in simulations in presence of constraints. Overall, we found that the Morse potential has optimal performance, in particular for the β-sheet proteins.
Collapse
Affiliation(s)
- Ivan Kondov
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany.
| | | | | |
Collapse
|
15
|
Savojardo C, Fariselli P, Alhamdoosh M, Martelli PL, Pierleoni A, Casadio R. Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization. ACTA ACUST UNITED AC 2011; 27:2224-30. [PMID: 21715467 DOI: 10.1093/bioinformatics/btr387] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Disulfide bonds stabilize protein structures and play relevant roles in their functions. Their formation requires an oxidizing environment and their stability is consequently depending on the redox ambient potential, which may differ according to the subcellular compartment. Several methods are available to predict cysteine-bonding state and connectivity patterns. However, none of them takes into consideration the relevance of protein subcellular localization. RESULTS Here we develop DISLOCATE, a two-step method based on machine learning models for predicting both the bonding state and the connectivity patterns of cysteine residues in a protein chain. We find that the inclusion of protein subcellular localization improves the performance of these predictive steps by 3 and 2 percentage points, respectively. When compared with previously developed methods for predicting disulfide bonds from sequence, DISLOCATE improves the overall performance by more than 10 percentage points. AVAILABILITY The method and the dataset are available at the Web page http://www.biocomp.unibo.it/savojard/Dislocate.html. GRHCRF code is available at http://www.biocomp.unibo.it/savojard/biocrf.html. CONTACT piero.fariselli@unibo.it.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, University of Bologna, CIRI-Life Science and Health Technologies and Department of Biology, Via San Giacomo 9/2, Bologna, Italy
| | | | | | | | | | | |
Collapse
|
16
|
Esque J, Oguey C, de Brevern AG. Comparative Analysis of Threshold and Tessellation Methods for Determining Protein Contacts. J Chem Inf Model 2011; 51:493-507. [DOI: 10.1021/ci100195t] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeremy Esque
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| | - Christophe Oguey
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
| | - Alexandre G. de Brevern
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| |
Collapse
|
17
|
Guang X, Guo Y, Xiao J, Wang X, Sun J, Xiong W, Li M. Predicting the state of cysteines based on sequence information. J Theor Biol 2010; 267:312-8. [DOI: 10.1016/j.jtbi.2010.09.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Revised: 08/16/2010] [Accepted: 09/01/2010] [Indexed: 10/19/2022]
|
18
|
Elumalai P, Wu JW, Liu HL. Current advances in disulfide connectivity predictions. J Taiwan Inst Chem Eng 2010. [DOI: 10.1016/j.jtice.2010.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
19
|
Analysis of factors that induce cysteine bonding state. Comput Biol Med 2009; 39:332-9. [PMID: 19246035 DOI: 10.1016/j.compbiomed.2009.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 01/19/2009] [Indexed: 11/22/2022]
Abstract
Regarding the fact that the protein structure is principally encoded in its sequence, investigating the bonding state of cysteine has gained a great deal of attention due to its significance in the formation of protein structure. Due to lack of evident influence of free cysteines on the protein structure, it may be expected that only half-cystines convey encoded information. The results obtained from the analysis of amino acid distribution in proximity of both states of cysteines explicitly indicated that perquisite information for inducing cysteine bonding state is present even in the flanking amino acid sequences of free cysteines.
Collapse
|
20
|
Thangudu RR, Manoharan M, Srinivasan N, Cadet F, Sowdhamini R, Offmann B. Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families. BMC STRUCTURAL BIOLOGY 2008; 8:55. [PMID: 19111067 PMCID: PMC2628669 DOI: 10.1186/1472-6807-8-55] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Accepted: 12/26/2008] [Indexed: 11/22/2022]
Abstract
Background Disulphide bridges are well known to play key roles in stability, folding and functions of proteins. Introduction or deletion of disulphides by site-directed mutagenesis have produced varying effects on stability and folding depending upon the protein and location of disulphide in the 3-D structure. Given the lack of complete understanding it is worthwhile to learn from an analysis of extent of conservation of disulphides in homologous proteins. We have also addressed the question of what structural interactions replaces a disulphide in a homologue in another homologue. Results Using a dataset involving 34,752 pairwise comparisons of homologous protein domains corresponding to 300 protein domain families of known 3-D structures, we provide a comprehensive analysis of extent of conservation of disulphide bridges and their structural features. We report that only 54% of all the disulphide bonds compared between the homologous pairs are conserved, even if, a small fraction of the non-conserved disulphides do include cytoplasmic proteins. Also, only about one fourth of the distinct disulphides are conserved in all the members in protein families. We note that while conservation of disulphide is common in many families, disulphide bond mutations are quite prevalent. Interestingly, we note that there is no clear relationship between sequence identity between two homologous proteins and disulphide bond conservation. Our analysis on structural features at the sites where cysteines forming disulphide in one homologue are replaced by non-Cys residues show that the elimination of a disulphide in a homologue need not always result in stabilizing interactions between equivalent residues. Conclusion We observe that in the homologous proteins, disulphide bonds are conserved only to a modest extent. Very interestingly, we note that extent of conservation of disulphide in homologous proteins is unrelated to the overall sequence identity between homologues. The non-conserved disulphides are often associated with variable structural features that were recruited to be associated with differentiation or specialisation of protein function.
Collapse
Affiliation(s)
- Ratna R Thangudu
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France.
| | | | | | | | | | | |
Collapse
|
21
|
Shehu A, Kavraki LE, Clementi C. Unfolding the fold of cyclic cysteine-rich peptides. Protein Sci 2008; 17:482-93. [PMID: 18287281 PMCID: PMC2248317 DOI: 10.1110/ps.073142708] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2007] [Revised: 11/02/2007] [Accepted: 12/14/2007] [Indexed: 10/22/2022]
Abstract
We propose a method to extensively characterize the native state ensemble of cyclic cysteine-rich peptides. The method uses minimal information, namely, amino acid sequence and cyclization, as a topological feature that characterizes the native state. The method does not assume a specific disulfide bond pairing for cysteines and allows the possibility of unpaired cysteines. A detailed view of the conformational space relevant for the native state is obtained through a hierarchic multi-resolution exploration. A crucial feature of the exploration is a geometric approach that efficiently generates a large number of distinct cyclic conformations independently of one another. A spatial and energetic analysis of the generated conformations associates a free-energy landscape to the explored conformational space. Application to three long cyclic peptides of different folds shows that the conformational ensembles and cysteine arrangements associated with free energy minima are fully consistent with available experimental data. The results provide a detailed analysis of the native state features of cyclic peptides that can be further tested in experiment.
Collapse
Affiliation(s)
- Amarda Shehu
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | | | | |
Collapse
|
22
|
Rubinstein R, Fiser A. Predicting disulfide bond connectivity in proteins by correlated mutations analysis. Bioinformatics 2008; 24:498-504. [PMID: 18203772 DOI: 10.1093/bioinformatics/btm637] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Prediction of disulfide bond connectivity facilitates structural and functional annotation of proteins. Previous studies suggest that cysteines of a disulfide bond mutate in a correlated manner. RESULTS We developed a method that analyzes correlated mutation patterns in multiple sequence alignments in order to predict disulfide bond connectivity. Proteins with known experimental structures and varying numbers of disulfide bonds, and that spanned various evolutionary distances, were aligned. We observed frequent variation of disulfide bond connectivity within members of the same protein families, and it was also observed that in 99% of the cases, cysteine pairs forming non-conserved disulfide bonds mutated in concert. Our data support the notion that substitution of a cysteine in a disulfide bond prompts the substitution of its cysteine partner and that oxidized cysteines appear in pairs. The method we developed predicts disulfide bond connectivity patterns with accuracies of 73, 69 and 61% for proteins with two, three and four disulfide bonds, respectively.
Collapse
Affiliation(s)
- Rotem Rubinstein
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| | | |
Collapse
|
23
|
Vincent M, Passerini A, Labbé M, Frasconi P. A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics 2008; 9:20. [PMID: 18194539 PMCID: PMC2375136 DOI: 10.1186/1471-2105-9-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 01/14/2008] [Indexed: 11/17/2022] Open
Abstract
Background Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. Results We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. Conclusion We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.
Collapse
Affiliation(s)
- Marc Vincent
- Machine Learning and Neural Networks Group, Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Via di Santa Marta 3, 50139 Firenze, Italy.
| | | | | | | |
Collapse
|
24
|
Stevanin G, Santorelli FM, Azzedine H, Coutinho P, Chomilier J, Denora PS, Martin E, Ouvrard-Hernandez AM, Tessa A, Bouslam N, Lossos A, Charles P, Loureiro JL, Elleuch N, Confavreux C, Cruz VT, Ruberg M, Leguern E, Grid D, Tazir M, Fontaine B, Filla A, Bertini E, Durr A, Brice A. Mutations in SPG11, encoding spatacsin, are a major cause of spastic paraplegia with thin corpus callosum. Nat Genet 2007; 39:366-72. [PMID: 17322883 DOI: 10.1038/ng1980] [Citation(s) in RCA: 232] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Accepted: 01/18/2007] [Indexed: 11/08/2022]
Abstract
Autosomal recessive hereditary spastic paraplegia (ARHSP) with thin corpus callosum (TCC) is a common and clinically distinct form of familial spastic paraplegia that is linked to the SPG11 locus on chromosome 15 in most affected families. We analyzed 12 ARHSP-TCC families, refined the SPG11 candidate interval and identified ten mutations in a previously unidentified gene expressed ubiquitously in the nervous system but most prominently in the cerebellum, cerebral cortex, hippocampus and pineal gland. The mutations were either nonsense or insertions and deletions leading to a frameshift, suggesting a loss-of-function mechanism. The identification of the function of the gene will provide insight into the mechanisms leading to the degeneration of the corticospinal tract and other brain structures in this frequent form of ARHSP.
Collapse
Affiliation(s)
- Giovanni Stevanin
- INSERM, UMR679, Federal Institute for Neuroscience Research, Pitié-Salpêtrière Hospital, Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Thangudu RR, Sharma P, Srinivasan N, Offmann B. Analycys: A database for conservation and conformation of disulphide bonds in homologous protein domains. Proteins 2007; 67:255-61. [PMID: 17285632 DOI: 10.1002/prot.21318] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Disulphide bonds in proteins are known to play diverse roles ranging from folding to structure to function. Thorough knowledge of the conservation status and structural state of the disulphide bonds will help in understanding of the differences in homologous proteins. Here we present a database for the analysis of conservation and conformation of disulphide bonds in SCOP structural families. This database has a wide range of applications including mapping of disulphide bond mutation patterns, identification of disulphide bonds important for folding and stabilization, modeling of protein tertiary structures and in protein engineering. The database can be accessed at: http://bioinformatics.univ-reunion.fr/analycys/.
Collapse
Affiliation(s)
- Ratna R Thangudu
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 Avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
26
|
Ceroni A, Passerini A, Vullo A, Frasconi P. DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acids Res 2006; 34:W177-81. [PMID: 16844986 PMCID: PMC1538823 DOI: 10.1093/nar/gkl266] [Citation(s) in RCA: 246] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at .
Collapse
Affiliation(s)
- Alessio Ceroni
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
| | - Andrea Passerini
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
| | - Alessandro Vullo
- School of Computer Science and Informatics, University College DublinBelfield, Dublin 4, Ireland
| | - Paolo Frasconi
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
- To whom correspondence should be addressed. Tel: +39 0554796362; Fax: +39 0554796363;
| |
Collapse
|
27
|
Passerini A, Punta M, Ceroni A, Rost B, Frasconi P. Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins 2006; 65:305-16. [PMID: 16927295 DOI: 10.1002/prot.21135] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Accurate predictions of metal-binding sites in proteins by using sequence as the only source of information can significantly help in the prediction of protein structure and function, genome annotation, and in the experimental determination of protein structure. Here, we introduce a method for identifying histidines and cysteines that participate in binding of several transition metals and iron complexes. The method predicts histidines as being in either of two states (free or metal bound) and cysteines in either of three states (free, metal bound, or in disulfide bridges). The method uses only sequence information by utilizing position-specific evolutionary profiles as well as more global descriptors such as protein length and amino acid composition. Our solution is based on a two-stage machine-learning approach. The first stage consists of a support vector machine trained to locally classify the binding state of single histidines and cysteines. The second stage consists of a bidirectional recurrent neural network trained to refine local predictions by taking into account dependencies among residues within the same protein. A simple finite state automaton is employed as a postprocessing in the second stage in order to enforce an even number of disulfide-bonded cysteines. We predict histidines and cysteines in transition-metal-binding sites at 73% precision and 61% recall. We observe significant differences in performance depending on the ligand (histidine or cysteine) and on the metal bound. We also predict cysteines participating in disulfide bridges at 86% precision and 87% recall. Results are compared to those that would be obtained by using expert information as represented by PROSITE motifs and, for disulfide bonds, to state-of-the-art methods.
Collapse
Affiliation(s)
- Andrea Passerini
- Università degli Studi di Firenze, Dipartimento di Sistemi e Informatica Via di Santa Marta 3, 50139 Firenze, Italy.
| | | | | | | | | |
Collapse
|
28
|
Song J, Wang M, Burrage K. Exploring synonymous codon usage preferences of disulfide-bonded and non-disulfide bonded cysteines in the E. coli genome. J Theor Biol 2006; 241:390-401. [PMID: 16427089 DOI: 10.1016/j.jtbi.2005.12.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2005] [Revised: 10/31/2005] [Accepted: 12/05/2005] [Indexed: 11/27/2022]
Abstract
High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.
Collapse
Affiliation(s)
- Jiangning Song
- Advanced Computational Modelling Centre, The University of Queensland, Brisbane, Qld 4072, Australia.
| | | | | |
Collapse
|
29
|
Abstract
Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server http://clavius.bc.edu/~clotelab/DiANNA/ outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond--this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods.
Collapse
Affiliation(s)
- F. Ferrè
- Department of Biology, Boston CollegeChestnut Hill, MA 02467, USA
| | - P. Clote
- Department of Biology, Boston CollegeChestnut Hill, MA 02467, USA
- Department of Computer Science (courtesy appointment), Boston CollegeChestnut Hill, MA 02467, USA
- To whom correspondence should be addressed. Tel: +1 617 552 1332; Fax: +1 617 552 2011;
| |
Collapse
|
30
|
Alland C, Moreews F, Boens D, Carpentier M, Chiusa S, Lonquety M, Renault N, Wong Y, Cantalloube H, Chomilier J, Hochez J, Pothier J, Villoutreix BO, Zagury JF, Tufféry P. RPBS: a web resource for structural bioinformatics. Nucleic Acids Res 2005; 33:W44-9. [PMID: 15980507 PMCID: PMC1160237 DOI: 10.1093/nar/gki477] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
RPBS (Ressource Parisienne en Bioinformatique Structurale) is a resource dedicated primarily to structural bioinformatics. It is the result of a joint effort by several teams to set up an interface that offers original and powerful methods in the field. As an illustration, we focus here on three such methods uniquely available at RPBS: AUTOMAT for sequence databank scanning, YAKUSA for structure databank scanning and WLOOP for homology loop modelling. The RPBS server can be accessed at and the specific services at .
Collapse
Affiliation(s)
| | | | - D. Boens
- Department of Structural Biology, IMPMC, CNRS UMR 7590Paris, France
| | | | - S. Chiusa
- Department of Structural Biology, IMPMC, CNRS UMR 7590Paris, France
| | - M. Lonquety
- Department of Structural Biology, IMPMC, CNRS UMR 7590Paris, France
| | - N. Renault
- Department of Structural Biology, IMPMC, CNRS UMR 7590Paris, France
| | | | - H. Cantalloube
- Chaire de Bioinformatique, Conservatoire National des Arts et MétiersParis, France
| | - J. Chomilier
- Department of Structural Biology, IMPMC, CNRS UMR 7590Paris, France
| | | | | | | | - J.-F. Zagury
- Chaire de Bioinformatique, Conservatoire National des Arts et MétiersParis, France
| | - P. Tufféry
- To whom correspondence should be addressed. Tel: +33 1 44 27 77 33; Fax: +33 1 43 26 38 30;
| |
Collapse
|
31
|
Ferrè F, Clote P. Disulfide connectivity prediction using secondary structure information and diresidue frequencies. Bioinformatics 2005; 21:2336-46. [PMID: 15741247 DOI: 10.1093/bioinformatics/bti328] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION We describe a stand-alone algorithm to predict disulfide bond partners in a protein given only the amino acid sequence, using a novel neural network architecture (the diresidue neural network), and given input of symmetric flanking regions of N-terminus and C-terminus half-cystines augmented with residue secondary structure (helix, coil, sheet) as well as evolutionary information. The approach is motivated by the observation of a bias in the secondary structure preferences of free cysteines and half-cystines, and by promising preliminary results we obtained using diresidue position-specific scoring matrices. RESULTS As calibrated by receiver operating characteristic curves from 4-fold cross-validation, our conditioning on secondary structure allows our novel diresidue neural network to perform as well as, and in some cases better than, the current state-of-the-art method. A slight drop in performance is seen when secondary structure is predicted rather than being derived from three-dimensional protein structures.
Collapse
Affiliation(s)
- F Ferrè
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | | |
Collapse
|
32
|
Jiang-Ning S, Wei-Jiang L, Wen-Bo X. Cooperativity of the oxidization of cysteines in globular proteins. J Theor Biol 2004; 231:85-95. [PMID: 15363931 DOI: 10.1016/j.jtbi.2004.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2003] [Revised: 06/01/2004] [Accepted: 06/07/2004] [Indexed: 11/17/2022]
Abstract
Based on the 639 non-homologous proteins with 2910 cysteine-containing segments of well-resolved three-dimensional structures, a novel approach has been proposed to predict the disulfide-bonding state of cysteines in proteins by constructing a two-stage classifier combining a first global linear discriminator based on their amino acid composition and a second local support vector machine classifier. The overall prediction accuracy of this hybrid classifier for the disulfide-bonding state of cysteines in proteins has scored 84.1% and 80.1%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. It shows that whether cysteines should form disulfide bonds depends not only on the global structural features of proteins but also on the local sequence environment of proteins. The result demonstrates the applicability of this novel method and provides comparable prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.
Collapse
Affiliation(s)
- Song Jiang-Ning
- The Key Laboratory of Industrial Biotechnology, Ministry of Education, Southern Yangtze University, 170 Huihe Road, Wuxi 214036, China.
| | | | | |
Collapse
|
33
|
Lenffer J, Lai P, El Mejaber W, Khan AM, Koh JLY, Tan PTJ, Seah SH, Brusic V. CysView: protein classification based on cysteine pairing patterns. Nucleic Acids Res 2004; 32:W350-5. [PMID: 15215409 PMCID: PMC441613 DOI: 10.1093/nar/gkh475] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/.
Collapse
Affiliation(s)
- Johann Lenffer
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613 Singapore
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Song JN, Wang ML, Li WJ, Xu WB. Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition. Biochem Biophys Res Commun 2004; 318:142-7. [PMID: 15110765 DOI: 10.1016/j.bbrc.2004.03.189] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2004] [Indexed: 11/24/2022]
Abstract
In this paper, a novel approach has been introduced to predict the disulfide-bonding state of cysteines in proteins by means of a linear discriminator based on their dipeptide composition. The prediction is performed with a newly enlarged dataset with 8114 cysteine-containing segments extracted from 1856 non-homologous proteins of well-resolved three-dimensional structures. The oxidation of cysteines exhibits obvious cooperativity: almost all cysteines in disulfide-bond-containing proteins are in the oxidized form. This cooperativity can be well described by protein's dipeptide composition, based on which the prediction accuracy of the oxidation form of cysteines scores as high as 89.1% and 85.2%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. The result demonstrates the applicability of this new relatively simple method and provides superior prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.
Collapse
Affiliation(s)
- Jiang-Ning Song
- The Key Laboratory of Industrial Biotechnology, Ministry of Education, Southern Yangtze University, Wuxi 214036, China.
| | | | | | | |
Collapse
|
35
|
Chen YC, Lin YS, Lin CJ, Hwang JK. Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins 2004; 55:1036-42. [PMID: 15146500 DOI: 10.1002/prot.20079] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The support vector machine (SVM) method is used to predict the bonding states of cysteines. Besides using local descriptors such as the local sequences, we include global information, such as amino acid compositions and the patterns of the states of cysteines (bonded or nonbonded), or cysteine state sequences, of the proteins. We found that SVM based on local sequences or global amino acid compositions yielded similar prediction accuracies for the data set comprising 4136 cysteine-containing segments extracted from 969 nonhomologous proteins. However, the SVM method based on multiple feature vectors (combining local sequences and global amino acid compositions) significantly improves the prediction accuracy, from 80% to 86%. If coupled with cysteine state sequences, SVM based on multiple feature vectors yields 90% in overall prediction accuracy and a 0.77 Matthews correlation coefficient, around 10% and 22% higher than the corresponding values obtained by SVM based on local sequence information.
Collapse
Affiliation(s)
- Yu-Ching Chen
- Institute of Bioinformatics, National Chiao Tung University, HsinChu, Taiwan, ROC
| | | | | | | |
Collapse
|
36
|
Martelli PL, Fariselli P, Malaguti L, Casadio R. Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Eng Des Sel 2002; 15:951-3. [PMID: 12601133 DOI: 10.1093/protein/15.12.951] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A hybrid system (hidden neural network) based on a hidden Markov model (HMM) and neural networks (NN) was trained to predict the bonding states of cysteines in proteins starting from the residue chains. Training was performed using 4136 cysteine-containing segments extracted from 969 non-homologous proteins of well-resolved 3D structure and without chain-breaks. After a 20-fold cross-validation procedure, the efficiency of the prediction scores as high as 80% using neural networks based on evolutionary information. When the whole protein is taken into account by means of an HMM, a hybrid system is generated, whose emission probabilities are computed using the NN output (hidden neural networks). In this case, the predictor accuracy increases up to 88%. Further, when tested on a protein basis, the hybrid system can correctly predict 84% of the chains in the data set, with a gain of at least 27% over the NN predictor.
Collapse
Affiliation(s)
- Pier Luigi Martelli
- Laboratory of Biocomputing, CIRB/Department of Biology, University of Bologna, via Irnerio 42, 40126 Bologna, Italy
| | | | | | | |
Collapse
|
37
|
Martelli PL, Fariselli P, Malaguti L, Casadio R. Prediction of the disulfide-bonding state of cysteines in proteins at 88% accuracy. Protein Sci 2002; 12:1578. [PMID: 15452953 PMCID: PMC2323920 DOI: 10.1110/ps.0219602] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The task of predicting the cysteine-bonding state in proteins starting from the residue chain is addressed by implementing a new hybrid system that combines a neural network and a hidden Markov model (hidden neural network). Training is performed using 4136 cysteine-containing segments extracted from 969 nonhomologous proteins of well-resolved three-dimensional structure. After a 20-fold cross-validation procedure, the efficiency of the prediction scores as high as 88% and 84%, when measured on cysteine and protein basis, respectively. These results outperform previously described methods for the same task.
Collapse
|