1
|
Jänes J, Beltrao P. Deep learning for protein structure prediction and design-progress and applications. Mol Syst Biol 2024; 20:162-169. [PMID: 38291232 PMCID: PMC10912668 DOI: 10.1038/s44320-024-00016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/21/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024] Open
Abstract
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Collapse
Affiliation(s)
- Jürgen Jänes
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pedro Beltrao
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
2
|
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022; 29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 198] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022]
Abstract
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Eduard Porta Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Jürgen Jänes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur O Zalevsky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Patrick Bryant
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Gabriele Pozzati
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Aditi Shenoy
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Wensi Zhu
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Petras Kundrotas
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | | | - Carlos H M Rodrigues
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - David Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Adam Frost
- Department of Biochemistry and Biophysics University of California, San Francisco, CA, USA
| | - Jérôme Basquin
- Department of Structural Cell Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Andrey V Kajava
- Université de Montpellier, Centre de Recherche en Biologie Cellulaire de Montpellier (CRBM) CNRS, Montpellier, France
| | | | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA, USA.
| | | | - David B Ascher
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia.
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Arne Elofsson
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden.
| | - Tristan I Croll
- Cambridge Institute for Medical Research, Department of Haematology, The University of Cambridge, Cambridge, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
3
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
4
|
Machine learning in protein structure prediction. Curr Opin Chem Biol 2021; 65:1-8. [PMID: 34015749 DOI: 10.1016/j.cbpa.2021.04.005] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 04/10/2021] [Indexed: 12/31/2022]
Abstract
Prediction of protein structure from sequence has been intensely studied for many decades, owing to the problem's importance and its uniquely well-defined physical and computational bases. While progress has historically ebbed and flowed, the past two years saw dramatic advances driven by the increasing "neuralization" of structure prediction pipelines, whereby computations previously based on energy models and sampling procedures are replaced by neural networks. The extraction of physical contacts from the evolutionary record; the distillation of sequence-structure patterns from known structures; the incorporation of templates from homologs in the Protein Databank; and the refinement of coarsely predicted structures into finely resolved ones have all been reformulated using neural networks. Cumulatively, this transformation has resulted in algorithms that can now predict single protein domains with a median accuracy of 2.1 Å, setting the stage for a foundational reconfiguration of the role of biomolecular modeling within the life sciences.
Collapse
|
5
|
Small design from big alignment: engineering proteins with multiple sequence alignment as the starting point. Biotechnol Lett 2020; 42:1305-1315. [PMID: 32430802 DOI: 10.1007/s10529-020-02914-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/14/2020] [Indexed: 02/08/2023]
Abstract
Multiple sequence alignment (MSA) is a fundamental way to gain information that cannot be obtained from the analysis of any individual sequence included in the alignment. It provides ways to investigate the relationship between sequence and function from a perspective of evolution. Thus, the MSA of proteins can be employed as a reference for protein engineering. In this paper, we reviewed the recent advances to highlight how protein engineering was benefited from the MSA of proteins. These methods include (1) engineering the thermostability or solubility of proteins by making it closer to the consensus sequence of the alignment through introducing site mutations; (2) structure-based engineering proteins with comparative modeling; (3) creating paleoenzymes featured with high thermostability and promiscuity by constructing the ancestral sequences derived from multiple sequence alignment; and (4) incorporating site-mutations targeting the evolutionarily coupled sites identified from multiple sequence alignment.
Collapse
|
6
|
Goyal VD, Sullivan BJ, Magliery TJ. Phylogenetic spread of sequence data affects fitness of consensus enzymes: Insights from triosephosphate isomerase. Proteins 2019; 88:274-283. [PMID: 31407418 DOI: 10.1002/prot.25799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 07/26/2019] [Accepted: 08/08/2019] [Indexed: 11/08/2022]
Abstract
The concept of consensus in multiple sequence alignments (MSAs) has been used to design and engineer proteins previously with some success. However, consensus design implicitly assumes that all amino acid positions function independently, whereas in reality, the amino acids in a protein interact with each other and work cooperatively to produce the optimum structure required for its function. Correlation analysis is a tool that can capture the effect of such interactions. In a previously published study, we made consensus variants of the triosephosphate isomerase (TIM) protein using MSAs that included sequences form both prokaryotic and eukaryotic organisms. These variants were not completely native-like and were also surprisingly different from each other in terms of oligomeric state, structural dynamics, and activity. Extensive correlation analysis of the TIM database has revealed some clues about factors leading to the unusual behavior of the previously constructed consensus proteins. Among other things, we have found that the more ill-behaved consensus mutant had more broken correlations than the better-behaved consensus variant. Moreover, we report three correlation and phylogeny-based consensus variants of TIM. These variants were more native-like than the previous consensus mutants and considerably more stable than a wild-type TIM from a mesophilic organism. This study highlights the importance of choosing the appropriate diversity of MSA for consensus analysis and provides information that can be used to engineer stable enzymes.
Collapse
Affiliation(s)
- Venuka Durani Goyal
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio
| | - Brandon J Sullivan
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio.,Ohio State Biochemistry Program, The Ohio State University, Columbus, Ohio
| | - Thomas J Magliery
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio
| |
Collapse
|
7
|
Ping Z, Zhou F, Lin X, Su H. Coupled Mutations-Enabled Glycerol Transportation in an Aquaporin Z Mutant. ACS OMEGA 2018; 3:4113-4122. [PMID: 31458647 PMCID: PMC6641515 DOI: 10.1021/acsomega.8b00126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 03/27/2018] [Indexed: 05/26/2023]
Abstract
Aquaporins are transmembrane channel proteins with key function being transportation of water or other small substrates. Escherichia coli Aqp Z transports water molecules only, whereas Glp F is permeable to glycerol. It is intriguing to explore the possibility to induce glycerol permeability in Aqp Z by targeted mutations. The Aqp Z mutants with mutated selectivity filter (SF) residues exhibit poor permeability for both glycerol and water. For addressing the complexity of protein systems, pair correlation information in protein sequence analyses is instructive to identify residues that are coupled by coevolution and motion. In this study, we analyze the correlation between residues and unravel the clustering patterns of coupled residues, beyond SF residues, in aquaglyceroporins (AQGPs). The identified coupled motifs are proposed to be sequenced into aquaporin (Aqp Z) to introduce glycerol permeability. These residues are located in the vicinity of SF region, C-loop, and M6-M7 linkage domain. Significant enlargement of SF pore size of the proposed Aqp Z mutant is observed by an all-atom replica exchange molecular dynamics simulation, which is critical to facilitate considerable glycerol passage as characterized in calculated free-energy landscapes. Clearly, the hidden connections among residues play crucial roles in water/glycerol selectivity. In contrast, single-site mutation-based scheme may even lead to undesirable effects in AQGPs, such as the blocking of water transportation by aromatic π-stacked gate. As demonstrated in this work, the pair correlation analysis guided rational mutagenesis provides a feasible strategy to modulate proteins' functions.
Collapse
Affiliation(s)
- Zhi Ping
- Institute
of Advanced Studies, Nanyang Technological
University, 60 Nanyang View, 639673 Singapore
| | - Feng Zhou
- Institute
of Advanced Studies, Nanyang Technological
University, 60 Nanyang View, 639673 Singapore
| | - Xin Lin
- Institute
of Advanced Studies, Nanyang Technological
University, 60 Nanyang View, 639673 Singapore
| | - Haibin Su
- Institute
of Advanced Studies, Nanyang Technological
University, 60 Nanyang View, 639673 Singapore
- Department
of Chemistry, The Hong Kong University of
Science and Technology, Hong Kong, China
| |
Collapse
|
8
|
Abstract
Background The importance of mutations in disease phenotype has been studied, with information available in databases such as OMIM. However, it remains a research challenge for the possibility of clustering amino acid residues based on an underlying interaction, such as co-evolution, to understand how mutations in these related sites can lead to different disease phenotypes. Results This paper presents an integrative approach to identify groups of co-evolving residues, known as protein sectors. By studying a protein family using multiple sequence alignments and statistical coupling analysis, we attempted to determine if it is possible that these groups of residues could be related to disease phenotypes. After the protein sectors were identified, disease-associated residues within these groups of amino acids were mapped to a structure representing the protein family. In this study, we used the proposed pipeline to analyze two test cases of spermine synthase and Rab GDP dissociation inhibitor. Conclusions The results suggest that there is a possible link between certain groups of co-evolving residues and different disease phenotypes. The pipeline described in this work could also be used to study other protein families associated with human diseases.
Collapse
|
9
|
Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol 2013; 23:473-9. [PMID: 23680395 DOI: 10.1016/j.sbi.2013.04.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 03/12/2013] [Accepted: 04/02/2013] [Indexed: 11/26/2022]
Abstract
Recent work has led to a substantial improvement in the accuracy of predictions of contacts between amino acids using evolutionary information derived from multiple sequence alignments. Where large numbers of diverse sequence relatives are available and can be aligned to the sequence of a protein of unknown structure it is now possible to generate high-resolution models without recourse to the structure of a template. In this review we describe these exciting new techniques and critically assess the state-of-the-art in contact prediction in the light of these. While concentrating on methods, we also discuss applications to protein and RNA structure prediction as well as potential future developments.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| | | | | |
Collapse
|
10
|
Abstract
Recent work has led to a substantial improvement in the accuracy of predictions of contacts between amino acids using evolutionary information derived from multiple sequence alignments. Where large numbers of diverse sequence relatives are available and can be aligned to the sequence of a protein of unknown structure, it is now possible to generate high-resolution models without recourse to the structure of a template. In this review, we describe these exciting new techniques and critically assess the state of the art in contact prediction in light of them. We discuss areas for immediate research and development as well as potential future developments.
Collapse
|
11
|
Durani V, Magliery TJ. Protein engineering and stabilization from sequence statistics: variation and covariation analysis. Methods Enzymol 2013; 523:237-56. [PMID: 23422433 DOI: 10.1016/b978-0-12-394292-0.00011-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The concepts of consensus and correlation in multiple sequence alignments (MSAs) have been used in the past to understand and engineer proteins. However, there are multiple ways of acquiring MSA databases and also numerous mathematical metrics that can be applied to calculate each of the parameters. This chapter describes an overall methodology that we have chosen to employ for acquiring and statistically analyzing MSAs. We have provided a step-by-step protocol for calculating relative entropy and mutual information metrics and describe how they can be used to predict mutations that have a high probability of stabilizing a protein. This protocol allows for flexibility for modification of formulae and parameters without using anything more complicated than Microsoft Excel. We have also demonstrated various aspects of data analysis by carrying out a sample analysis on the BPTI-Kunitz family of proteins and identified mutations that would be predicted to stabilize this protein based on consensus and correlation values.
Collapse
Affiliation(s)
- Venuka Durani
- Department of Chemistry, The Ohio State University, Columbus, Ohio, USA
| | | |
Collapse
|
12
|
Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Protein Sci 2011; 21:299-305. [PMID: 22102360 DOI: 10.1002/pro.2002] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2011] [Revised: 11/08/2011] [Accepted: 11/10/2011] [Indexed: 11/12/2022]
Abstract
Residue contacts predicted from correlated positions in a multiple sequence alignment are often sparse and uncertain. To some extent, these limitations in the data can be overcome by grouping the contacts by secondary structure elements and enumerating the possible packing arrangements of these elements in a combinatorial manner. Strong interactions appear frequently but inconsistent interactions are down-weighted and missing interactions up-weighted. The resulting improved consistency in the predicted interactions has allowed the method to be successfully applied to proteins up to 200 residues in length which is larger than any structure previously predicted using sequence data alone.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, United Kingdom.
| | | | | |
Collapse
|
13
|
Taylor WR, Sadowski MI. Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS One 2011; 6:e28265. [PMID: 22194819 PMCID: PMC3237328 DOI: 10.1371/journal.pone.0028265] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 11/04/2011] [Indexed: 11/18/2022] Open
Abstract
Residue contact predictions were calculated based on the mutual information observed between pairs of positions in large multiple protein sequence alignments. Where previously only the statistical properties of these data have been considered important, we introduce new measures to impose constraints that make the contact map more consistent with a three dimensional structure. These included global (bulk) properties and local secondary structure properties. The latter allowed the contact constraints to be employed at the level of filtering pairs of secondary structure contacts which led to a more efficient (lower-level) implementation in the PLATO structure prediction server. Where previously the measure of success with this method had been whether the correct fold was predicted in the top 10 ranked models, with the current implementation, our summary statistic is the number of correct folds included in the top 10 models--which is on average over 50 percent.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, United Kingdom.
| | | |
Collapse
|
14
|
Sadowski MI, Maksimiak K, Taylor WR. Direct correlation analysis improves fold recognition. Comput Biol Chem 2011; 35:323-32. [PMID: 22000804 PMCID: PMC3267019 DOI: 10.1016/j.compbiolchem.2011.08.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 08/11/2011] [Accepted: 08/11/2011] [Indexed: 11/23/2022]
Abstract
The extraction of correlated mutations through the method of direct information (DI) provides predicted contact residue pairs that can be used to constrain the three dimensional structures of proteins. We apply this method to a large set of decoy protein folds consisting of many thousand well-constructed models, only tens of which have the correct fold. We find that DI is able to greatly improve the ranking of the true (native) fold but others still remain high scoring that would be difficult to discard due to small shifts in the core beta sheets.
Collapse
Affiliation(s)
| | | | - William R. Taylor
- Corresponding author. Tel.: +44 208 816 2298; fax: +44 208 816 2460.
| |
Collapse
|
15
|
Cetin H, Sasaki TN, Sasai M. The Fragment-based Consistency Score in Model Quality Assessment for De Novo Prediction of Protein Structures. CHEM-BIO INFORMATICS JOURNAL 2011. [DOI: 10.1273/cbij.11.63] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Hikmet Cetin
- Department of Computational Science and Engineering, Nagoya University
| | | | - Masaki Sasai
- Department of Computational Science and Engineering, Nagoya University
- School of Computational Sciences, Korea Institute for Advanced Study
- Okazaki Institute for Integrative Bioscience
| |
Collapse
|
16
|
Jeong CS, Kim D. Linear predictive coding representation of correlated mutation for protein sequence alignment. BMC Bioinformatics 2010; 11 Suppl 2:S2. [PMID: 20406500 PMCID: PMC3165164 DOI: 10.1186/1471-2105-11-s2-s2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporate it with sequence alignment yet. Methods We develop a novel method, CM profile, to represent correlated mutation as the spectral feature derived by using linear predictive coding where correlated mutations among different positions are represented by a fixed number of values. We combine CM profile with conventional sequence profile to improve alignment quality. Results For distantly related protein pairs, using CM profile improves the profile-profile alignment with or without predicted secondary structure. Especially, at superfamily level, combining CM profile with sequence profile improves profile-profile alignment by 9.5% while predicted secondary structure does by 6.0%. More significantly, using both of them improves profile-profile alignment by 13.9%. We also exemplify the effectiveness of CM profile by demonstrating that the resulting alignment preserves share coevolution and contacts. Conclusions In this work, we introduce a novel method, CM profile, which represents correlated mutation information as paralleled form, and apply it to the protein sequence alignment problem. When combined with conventional sequence profile, CM profile improves alignment quality significantly better than predicted secondary structure information, which should be beneficial for target-template alignment in protein structure prediction. Because of the generality of CM profile, it can be used for other bioinformatics applications in the same way of using sequence profile.
Collapse
Affiliation(s)
- Chan-seok Jeong
- Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea
| | | |
Collapse
|
17
|
Weimer KME, Shane BL, Brunetto M, Bhattacharyya S, Hati S. Evolutionary basis for the coupled-domain motions in Thermus thermophilus leucyl-tRNA synthetase. J Biol Chem 2009; 284:10088-99. [PMID: 19188368 PMCID: PMC2665063 DOI: 10.1074/jbc.m807361200] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Revised: 01/30/2009] [Indexed: 11/06/2022] Open
Abstract
Aminoacyl-tRNA synthetases are multidomain proteins that catalyze the covalent attachment of amino acids to their cognate transfer RNA. Various domains of an aminoacyl-tRNA synthetase perform their specific functions in a highly coordinated manner to maintain high accuracy in protein synthesis in cells. The coordination of their function, therefore, requires communication between domains. In this study we explored the relevance of enzyme motion in domain-domain communications. Specifically, we attempted to probe whether the communication between distantly located domains of a multidomain protein is accomplished through a coordinated movement of structural elements. We investigated the collective motion in Thermus thermophilus leucyl-tRNA synthetase by studying the low frequency normal modes. We identified the mode that best described the experimentally observed conformational changes of T. thermophilus leucyl-tRNA synthetase upon substrate binding and analyzed the correlated and anticorrelated motions between different domains. Furthermore, we used statistical coupling analysis to explore if the amino acid pairs and/or clusters whose motions are thermally coupled have also coevolved. Our study demonstrates that a small number of residues belong to the category whose coupled thermal motions correspond to evolutionary coupling as well. These residue clusters constitute a distinguished set of interacting networks that are sparsely distributed in the domain interface. Residues of these networking clusters are within van der Waals contact, and we suggest that they are critical in the propagation of long range mechanochemical motions in T. thermophilus leucyl-tRNA synthetase.
Collapse
|