Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. ACTA ACUST UNITED AC 2007;23:2376-84. [PMID: 17709338 DOI: 10.1093/bioinformatics/btm349] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. ACTA ACUST UNITED AC 2007;23:2376-84. [PMID: 17709338 DOI: 10.1093/bioinformatics/btm349] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Heinzinger M, Rost B. Artificial Intelligence Learns Protein Prediction. Cold Spring Harb Perspect Biol 2024;16:a041458. [PMID: 38858069 PMCID: PMC11368192 DOI: 10.1101/cshperspect.a041458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]

Yang J, Cheng WX, Wu G, Sheng S, Zhang P. Prediction of folding patterns for intrinsic disordered protein. Sci Rep 2023;13:20343. [PMID: 37990040 PMCID: PMC10663623 DOI: 10.1038/s41598-023-45969-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 10/26/2023] [Indexed: 11/23/2023] Open

Subbaiah S P V, Uttamrao PP, Das U, Sundaresan S, Rathinavelan T. Concentration and time-dependent amyloidogenic characteristics of intrinsically disordered N-terminal region of Saccharomyces cerevisiae Stm1. Front Microbiol 2023;14:1206945. [PMID: 37928673 PMCID: PMC10620681 DOI: 10.3389/fmicb.2023.1206945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 10/03/2023] [Indexed: 11/07/2023] Open

Tang YJ, Yan K, Zhang X, Tian Y, Liu B. Protein intrinsically disordered region prediction by combining neural architecture search and multi-objective genetic algorithm. BMC Biol 2023;21:188. [PMID: 37674132 PMCID: PMC10483879 DOI: 10.1186/s12915-023-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open

Chao TH, Rekhi S, Mittal J, Tabor DP. Data-Driven Models for Predicting Intrinsically Disordered Protein Polymer Physics Directly from Composition or Sequence. MOLECULAR SYSTEMS DESIGN & ENGINEERING 2023;8:1146-1155. [PMID: 38222029 PMCID: PMC10786636 DOI: 10.1039/d3me00053b] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]

Redl I, Fisicaro C, Dutton O, Hoffmann F, Henderson L, Owens BJ, Heberling M, Paci E, Tamiola K. ADOPT: intrinsic protein disorder prediction through deep bidirectional transformers. NAR Genom Bioinform 2023;5:lqad041. [PMID: 37138579 PMCID: PMC10150328 DOI: 10.1093/nargab/lqad041] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 02/07/2023] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open

Intrinsically Disordered Proteins: An Overview. Int J Mol Sci 2022;23:ijms232214050. [PMID: 36430530 PMCID: PMC9693201 DOI: 10.3390/ijms232214050] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open

Ilzhöfer D, Heinzinger M, Rost B. SETH predicts nuances of residue disorder from protein embeddings. FRONTIERS IN BIOINFORMATICS 2022;2:1019597. [PMID: 36304335 PMCID: PMC9580958 DOI: 10.3389/fbinf.2022.1019597] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/20/2022] [Indexed: 11/07/2022] Open

Abstract

Predictions for millions of protein three-dimensional structures are only a few clicks away since the release of AlphaFold2 results for UniProt. However, many proteins have so-called intrinsically disordered regions (IDRs) that do not adopt unique structures in isolation. These IDRs are associated with several diseases, including Alzheimer's Disease. We showed that three recent disorder measures of AlphaFold2 predictions (pLDDT, "experimentally resolved" prediction and "relative solvent accessibility") correlated to some extent with IDRs. However, expert methods predict IDRs more reliably by combining complex machine learning models with expert-crafted input features and evolutionary information from multiple sequence alignments (MSAs). MSAs are not always available, especially for IDRs, and are computationally expensive to generate, limiting the scalability of the associated tools. Here, we present the novel method SETH that predicts residue disorder from embeddings generated by the protein Language Model ProtT5, which explicitly only uses single sequences as input. Thereby, our method, relying on a relatively shallow convolutional neural network, outperformed much more complex solutions while being much faster, allowing to create predictions for the human proteome in about 1 hour on a consumer-grade PC with one NVIDIA GeForce RTX 3060. Trained on a continuous disorder scale (CheZOD scores), our method captured subtle variations in disorder, thereby providing important information beyond the binary classification of most methods. High performance paired with speed revealed that SETH's nuanced disorder predictions for entire proteomes capture aspects of the evolution of organisms. Additionally, SETH could also be used to filter out regions or proteins with probable low-quality AlphaFold2 3D structures to prioritize running the compute-intensive predictions for large data sets. SETH is freely publicly available at: https://github.com/Rostlab/SETH.

Collapse

Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Inter-Residue Distance Prediction From Duet Deep Learning Models. Front Genet 2022;13:887491. [PMID: 35651930 PMCID: PMC9148999 DOI: 10.3389/fgene.2022.887491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/30/2022] [Indexed: 12/04/2022] Open

Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022;38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open

Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022;2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Zhao B, Kurgan L. Surveying over 100 predictors of intrinsic disorder in proteins. Expert Rev Proteomics 2021;18:1019-1029. [PMID: 34894985 DOI: 10.1080/14789450.2021.2018304] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Wang YP, Wu EJ, Lurwanu Y, Ding JP, He DC, Waheed A, Nkurikiyimfura O, Liu ST, Li WY, Wang ZH, Yang L, Zhan J. Evidence for a synergistic effect of post-translational modifications and genomic composition of eEF-1α on the adaptation of Phytophthora infestans. Ecol Evol 2021;11:5484-5496. [PMID: 34026022 PMCID: PMC8131795 DOI: 10.1002/ece3.7442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 02/19/2021] [Accepted: 02/21/2021] [Indexed: 12/18/2022] Open

Abstract

Genetic variation plays a fundamental role in pathogen's adaptation to environmental stresses. Pathogens with low genetic variation tend to survive and proliferate more poorly due to their lack of genotypic/phenotypic polymorphisms in responding to fluctuating environments. Evolutionary theory hypothesizes that the adaptive disadvantage of genes with low genomic variation can be compensated for structural diversity of proteins through post-translation modification (PTM) but this theory is rarely tested experimentally and its implication to sustainable disease management is hardly discussed. In this study, we analyzed nucleotide characteristics of eukaryotic translation elongation factor-1α (eEF-lα) gene from 165 Phytophthora infestans isolates and the physical and chemical properties of its derived proteins. We found a low sequence variation of eEF-lα protein, possibly attributable to purifying selection and a lack of intra-genic recombination rather than reduced mutation. In the only two isoforms detected by the study, the major one accounted for >95% of the pathogen collection and displayed a significantly higher fitness than the minor one. High lysine representation enhances the opportunity of the eEF-1α protein to be methylated and the absence of disulfide bonds is consistent with the structural prediction showing that many disordered regions are existed in the protein. Methylation, structural disordering, and possibly other PTMs ensure the ability of the protein to modify its functions during biological, cellular and biochemical processes, and compensate for its adaptive disadvantage caused by sequence conservation. Our results indicate that PTMs may function synergistically with nucleotide codes to regulate the adaptive landscape of eEF-1α, possibly as well as other housekeeping genes, in P. infestans. Compensatory evolution between pre- and post-translational phase in eEF-1α could enable pathogens quickly adapting to disease management strategies while efficiently maintaining critical roles of the protein playing in biological, cellular, and biochemical activities. Implications of these results to sustainable plant disease management are discussed.

Collapse

Hosoya Y, Ohkanda J. Intrinsically Disordered Proteins as Regulators of Transient Biological Processes and as Untapped Drug Targets. Molecules 2021;26:2118. [PMID: 33917117 PMCID: PMC8067799 DOI: 10.3390/molecules26082118] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 03/29/2021] [Accepted: 03/30/2021] [Indexed: 12/31/2022] Open

Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021;36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open

Abstract

MOTIVATION

Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization.

RESULTS

In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods.

AVAILABILITY AND IMPLEMENTATION

For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Kurgan L, Li M, Li Y. The Methods and Tools for Intrinsic Disorder Prediction and their Application to Systems Medicine. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11320-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Dark Proteome Database: Studies on Disorder. High Throughput 2020;9:ht9030015. [PMID: 32629790 PMCID: PMC7563470 DOI: 10.3390/ht9030015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 06/17/2020] [Accepted: 06/18/2020] [Indexed: 12/17/2022] Open

Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019;20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 241] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here.

RESULTS

We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis.

CONCLUSION

Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.

Collapse

Affiliation(s)

Michael Heinzinger Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Ahmed Elnaggar Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Yu Wang Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Christian Dallago Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Florian Matthes TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Burkhard Rost Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA

Collapse

Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019;20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open

Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019;17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]

Liu Y, Wang X, Liu B. IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields. Int J Mol Sci 2018;19:E2483. [PMID: 30135358 PMCID: PMC6164615 DOI: 10.3390/ijms19092483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/14/2018] [Accepted: 08/18/2018] [Indexed: 12/16/2022] Open

Kaur G, Pandey B, Kumar A, Garewal N, Grover A, Kaur J. Drug targeted virtual screening and molecular dynamics of LipU protein of Mycobacterium tuberculosis and Mycobacterium leprae. J Biomol Struct Dyn 2018;37:1254-1269. [PMID: 29557724 DOI: 10.1080/07391102.2018.1454852] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Hanson J, Yang Y, Paliwal K, Zhou Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 2017;33:685-692. [PMID: 28011771 DOI: 10.1093/bioinformatics/btw678] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 10/26/2016] [Indexed: 11/12/2022] Open

Dosztányi Z. Prediction of protein disorder based on IUPred. Protein Sci 2017;27:331-340. [PMID: 29076577 DOI: 10.1002/pro.3334] [Citation(s) in RCA: 119] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/25/2017] [Accepted: 10/25/2017] [Indexed: 12/19/2022]

Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017;12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open

Abstract

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.

Collapse

Meng F, Uversky VN, Kurgan L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 2017;74:3069-3090. [PMID: 28589442 PMCID: PMC11107660 DOI: 10.1007/s00018-017-2555-4] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 06/01/2017] [Indexed: 12/19/2022]

Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017;32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

MOTIVATION

Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile.

METHOD

This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data.

RESULTS

Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others.

AVAILABILITY AND IMPLEMENTATION

http://raptorx2.uchicago.edu/StructurePropertyPred/predict/

CONTACT

wangsheng@uchicago.edu, jinboxu@gmail.com

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Salt-bridge networks within globular and disordered proteins: characterizing trends for designable interactions. J Mol Model 2017. [PMID: 28626846 DOI: 10.1007/s00894-017-3376-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Abstract

There has been considerable debate about the contribution of salt bridges to the stabilization of protein folds, in spite of their participation in crucial protein functions. Salt bridges appear to contribute to the activity-stability trade-off within proteins by bringing high-entropy charged amino acids into close contacts during the course of their functions. The current study analyzes the modes of association of salt bridges (in terms of networks) within globular proteins and at protein-protein interfaces. While the most common and trivial type of salt bridge is the isolated salt bridge, bifurcated salt bridge appears to be a distinct salt-bridge motif having a special topology and geometry. Bifurcated salt bridges are found ubiquitously in proteins and interprotein complexes. Interesting and attractive examples presenting different modes of interaction are highlighted. Bifurcated salt bridges appear to function as molecular clips that are used to stitch together large surface contours at interacting protein interfaces. The present work also emphasizes the key role of salt-bridge-mediated interactions in the partial folding of proteins containing long stretches of disordered regions. Salt-bridge-mediated interactions seem to be pivotal to the promotion of "disorder-to-order" transitions in small disordered protein fragments and their stabilization upon binding. The results obtained in this work should help to guide efforts to elucidate the modus operandi of these partially disordered proteins, and to conceptualize how these proteins manage to maintain the required amount of disorder even in their bound forms. This work could also potentially facilitate explorations of geometrically specific designable salt bridges through the characterization of composite salt-bridge networks. Graphical abstract ᅟ.

Collapse

Basu S, Söderquist F, Wallner B. Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins. J Comput Aided Mol Des 2017;31:453-466. [PMID: 28365882 PMCID: PMC5429364 DOI: 10.1007/s10822-017-0020-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/24/2017] [Indexed: 12/03/2022]

Abstract

The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents ‘Proteus’, a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible ‘disorder-to-order’ transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.

Collapse

Zhang T, Faraggi E, Li Z, Zhou Y. Intrinsic Disorder and Semi-disorder Prediction by SPINE-D. Methods Mol Biol 2017;1484:159-174. [PMID: 27787826 DOI: 10.1007/978-1-4939-6406-2_12] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S. How disordered is my protein and what is its disorder for? A guide through the "dark side" of the protein universe. INTRINSICALLY DISORDERED PROTEINS 2016;4:e1259708. [PMID: 28232901 DOI: 10.1080/21690707.2016.1259708] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Revised: 11/03/2016] [Accepted: 11/04/2016] [Indexed: 12/18/2022]

AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES : EUROPEAN CONFERENCE, ECML PKDD ... : PROCEEDINGS. ECML PKDD (CONFERENCE) 2016;9852:1-16. [PMID: 28884168 PMCID: PMC5584645 DOI: 10.1007/978-3-319-46227-1_1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Neuhaus K, Landstorfer R, Fellner L, Simon S, Schafferhans A, Goldberg T, Marx H, Ozoline ON, Rost B, Kuster B, Keim DA, Scherer S. Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC). BMC Genomics 2016;17:133. [PMID: 26911138 PMCID: PMC4765031 DOI: 10.1186/s12864-016-2456-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 02/09/2016] [Indexed: 12/30/2022] Open

Abstract

Background

Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome).

Results

Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization.

Conclusions

These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2456-1) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Klaus Neuhaus Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Richard Landstorfer Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Lea Fellner Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
Svenja Simon Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
Andrea Schafferhans Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Tatyana Goldberg Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Harald Marx Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany.
Olga N Ozoline Institute of Cell Biophysics, Russian Academy of Sciences, Moscow Region, 142290, Pushchino, Russia.
Burkhard Rost Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
Bernhard Kuster Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany. .,Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technische Universität München, Gregor-Mendel-Str. 4, 85354, Freising, Germany.
Daniel A Keim Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
Siegfried Scherer Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.

Collapse

Lieutaud P, Ferron F, Longhi S. Predicting Conformational Disorder. Methods Mol Biol 2016;1415:265-299. [PMID: 27115638 DOI: 10.1007/978-1-4939-3572-7_14] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Vicedo E, Gasik Z, Dong YA, Goldberg T, Rost B. Protein disorder reduced in Saccharomyces cerevisiae to survive heat shock. F1000Res 2015;4:1222. [PMID: 26673203 PMCID: PMC4670006 DOI: 10.12688/f1000research.7178.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/02/2015] [Indexed: 11/20/2022] Open

DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel. PLoS One 2015;10:e0141551. [PMID: 26517719 PMCID: PMC4627842 DOI: 10.1371/journal.pone.0141551] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Accepted: 10/09/2015] [Indexed: 12/02/2022] Open

Vicedo E, Schlessinger A, Rost B. Environmental Pressure May Change the Composition Protein Disorder in Prokaryotes. PLoS One 2015;10:e0133990. [PMID: 26252577 PMCID: PMC4529154 DOI: 10.1371/journal.pone.0133990] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 07/03/2015] [Indexed: 12/24/2022] Open

DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int J Mol Sci 2015;16:17315-30. [PMID: 26230689 PMCID: PMC4581195 DOI: 10.3390/ijms160817315] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/15/2015] [Accepted: 07/16/2015] [Indexed: 12/14/2022] Open

Baruah A, Rani P, Biswas P. Conformational Entropy of Intrinsically Disordered Proteins from Amino Acid Triads. Sci Rep 2015;5:11740. [PMID: 26138206 PMCID: PMC4490338 DOI: 10.1038/srep11740] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 05/26/2015] [Indexed: 12/17/2022] Open

Sun HP, Huang Y, Wang XF, Zhang Y, Shen HB. Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins 2015;83:485-96. [PMID: 25524593 DOI: 10.1002/prot.24744] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 11/20/2014] [Accepted: 12/02/2014] [Indexed: 12/28/2022]

Punta M, Simon I, Dosztányi Z. Prediction and analysis of intrinsically disordered proteins. Methods Mol Biol 2015;1261:35-59. [PMID: 25502193 DOI: 10.1007/978-1-4939-2230-7_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Roston RL, Wang K, Kuhn LA, Benning C. Structural determinants allowing transferase activity in SENSITIVE TO FREEZING 2, classified as a family I glycosyl hydrolase. J Biol Chem 2014;289:26089-26106. [PMID: 25100720 DOI: 10.1074/jbc.m114.576694] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Yachdav G, Kloppmann E, Kajan L, Hecht M, Goldberg T, Hamp T, Hönigschmid P, Schafferhans A, Roos M, Bernhofer M, Richter L, Ashkenazy H, Punta M, Schlessinger A, Bromberg Y, Schneider R, Vriend G, Sander C, Ben-Tal N, Rost B. PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 2014;42:W337-43. [PMID: 24799431 PMCID: PMC4086098 DOI: 10.1093/nar/gku366] [Citation(s) in RCA: 435] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Affiliation(s)

Guy Yachdav Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany Biosof LLC, New York, NY 10001, USA TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
Edda Kloppmann Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA
Laszlo Kajan Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Maximilian Hecht Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
Tatyana Goldberg Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM (Technische Universität München), Garching/Munich 85748, Germany
Tobias Hamp Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Peter Hönigschmid Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising 85354, Germany
Andrea Schafferhans Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Manfred Roos Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Michael Bernhofer Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Lothar Richter Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany
Haim Ashkenazy The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
Marco Punta Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK Institute for Food and Plant Sciences WZW-Weihenstephan, Alte Akademie 8, Freising 85350, Germany
Avner Schlessinger European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
Yana Bromberg Biosof LLC, New York, NY 10001, USA Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
Reinhard Schneider Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
Gerrit Vriend Luxembourg University & Luxembourg Centre for Systems Biomedicine, 4362 Belval, Luxembourg
Chris Sander CMBI, NCMLS, Radboudumc Nijmegen Medical Centre, 6525 GA Nijmegen, The Netherlands
Nir Ben-Tal Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, 10065 NY, USA
Burkhard Rost Department of Informatics, Bioinformatics & Computational Biology i12, TUM (Technische Universität München), Garching/Munich 85748, Germany Biosof LLC, New York, NY 10001, USA New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA The Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, New York, NY 10032, USA Institute for Advanced Study (TUM-IAS), Garching/Munich 85748, Germany

Collapse

Zhang F, Moniz HA, Walcott B, Moremen KW, Wang L, Linhardt RJ. Probing the impact of GFP tagging on Robo1-heparin interaction. Glycoconj J 2014;31:299-307. [PMID: 24748467 PMCID: PMC4118743 DOI: 10.1007/s10719-014-9522-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Revised: 03/21/2014] [Accepted: 03/24/2014] [Indexed: 11/26/2022]

Falero A, Marrero K, Trigueros S, Fando R. Characterization of the RstB2 protein, the DNA-binding protein of CTXϕ phage from Vibrio cholerae. Virus Genes 2014;48:518-27. [PMID: 24643345 DOI: 10.1007/s11262-014-1053-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Accepted: 02/28/2014] [Indexed: 11/25/2022]

Combination of site directed mutagenesis and secondary structure analysis predicts the amino acids essential for stability of M. leprae MurE. Interdiscip Sci 2014;6:40-7. [PMID: 24464703 DOI: 10.1007/s12539-014-0185-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 03/22/2013] [Accepted: 04/23/2013] [Indexed: 10/25/2022]

Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G. Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks. BMC Bioinformatics 2014;15:6. [PMID: 24410833 PMCID: PMC3893389 DOI: 10.1186/1471-2105-15-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 12/20/2013] [Indexed: 11/21/2022] Open

Abstract

Background

Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure.

In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.

Results

We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å.

After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å.

Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server.

Conclusions

The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints.

Collapse

Zhuang J, Zhu B. Analysis of Brassica napus ESTs: gene discovery and expression patterns of AP2/ERF-family transcription factors. Mol Biol Rep 2013;41:45-56. [PMID: 24186851 DOI: 10.1007/s11033-013-2836-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Accepted: 10/26/2013] [Indexed: 12/23/2022]

Peng Z, Mizianty MJ, Kurgan L. Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins 2013;82:145-58. [PMID: 23798504 DOI: 10.1002/prot.24348] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 06/06/2013] [Indexed: 12/24/2022]

Abstract

Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/.

Collapse

Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A. Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions. Mol Biol Evol 2013;30:2645-53. [DOI: 10.1093/molbev/mst157] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open