1
|
Hollmann F, Sanchis J, Reetz MT. Learning from Protein Engineering by Deconvolution of Multi-Mutational Variants. Angew Chem Int Ed Engl 2024; 63:e202404880. [PMID: 38884594 DOI: 10.1002/anie.202404880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/05/2024] [Accepted: 06/06/2024] [Indexed: 06/18/2024]
Abstract
This review analyzes a development in biochemistry, enzymology and biotechnology that originally came as a surprise. Following the establishment of directed evolution of stereoselective enzymes in organic chemistry, the concept of partial or complete deconvolution of selective multi-mutational variants was introduced. Early deconvolution experiments of stereoselective variants led to the finding that mutations can interact cooperatively or antagonistically with one another, not just additively. During the past decade, this phenomenon was shown to be general. In some studies, molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) computations were performed in order to shed light on the origin of non-additivity at all stages of an evolutionary upward climb. Data of complete deconvolution can be used to construct unique multi-dimensional rugged fitness pathway landscapes, which provide mechanistic insights different from traditional fitness landscapes. Along a related line, biochemists have long tested the result of introducing two point mutations in an enzyme for mechanistic reasons, followed by a comparison of the respective double mutant in so-called double mutant cycles, which originally showed only additive effects, but more recently also uncovered cooperative and antagonistic non-additive effects. We conclude with suggestions for future work, and call for a unified overall picture of non-additivity and epistasis.
Collapse
Affiliation(s)
- Frank Hollmann
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629HZ, Delft, Netherlands
| | - Joaquin Sanchis
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, 3052, Australia
| | - Manfred T Reetz
- Max-Plank-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45481, Mülheim, Germany
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
| |
Collapse
|
2
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
3
|
Arpita K, Sharma S, Srivastava H, Kumar K, Mushtaq M, Gupta P, Jain R, Gaikwad K. Genome-wide survey, molecular evolution and expression analysis of Auxin Response Factor (ARF) gene family indicating their key role in seed number per pod in pigeonpea (C. cajan L. Millsp.). Int J Biol Macromol 2023; 253:126833. [PMID: 37709218 DOI: 10.1016/j.ijbiomac.2023.126833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/26/2023] [Accepted: 09/06/2023] [Indexed: 09/16/2023]
Abstract
Auxin Response Factors (ARF) are a family of transcription factors that mediate auxin signalling and regulate multiple biological processes. Their crucial role in increasing plant biomass/yield influenced this study, where a systematic analysis of ARF gene family was carried out to identify the key proteins controlling embryo/seed developmental pathways in pigeonpea. A genome-wide scan revealed the presence of 12 ARF genes in pigeonpea, distributed across the chromosomes 1, 3, 4, 8 and 11. Domain analysis of ARF proteins showed the presence of B3 DNA binding, AUX response, and IAA domains. Majority of them are of nuclear origin, and do not exhibit the level of genomic expansion as observed in Glycine max (51 members). The duplication events seem to range from 31.6 to 42.3 million years ago (mya). Promoter analysis revealed the presence of multiple cis-acting elements related to stress responses, hormone signalling and other development processes. The expression atlas data highlighted the expression of CcARF8 in hypocotyl, bud and flower whereas, CcARF7 expression was significantly high in pod. The real-time expression of CcARF2, CcARF3 and CcARF18 was highest in genotypes with high seed number indicating their key role in regulating embryo development and determining seed set in pigeonpea.
Collapse
Affiliation(s)
- Kumari Arpita
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India
| | - Sandhya Sharma
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India.
| | - Harsha Srivastava
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India
| | - Kuldeep Kumar
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India; ICAR-Indian Institute of Pulses Research, Kanpur, Uttar Pradesh 208024, India
| | - Muntazir Mushtaq
- Shoolini Univeristy of Biotechnology and Management Sciences, Himachal Pradesh 173229, India
| | - Palak Gupta
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India
| | - Rishu Jain
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India
| | - Kishor Gaikwad
- ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India.
| |
Collapse
|
4
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
5
|
Spence MA, Mortimer MD, Buckle AM, Minh BQ, Jackson CJ. A Comprehensive Phylogenetic Analysis of the Serpin Superfamily. Mol Biol Evol 2021; 38:2915-2929. [PMID: 33744972 PMCID: PMC8233489 DOI: 10.1093/molbev/msab081] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Serine protease inhibitors (serpins) are found in all kingdoms of life and play essential roles in multiple physiological processes. Owing to the diversity of the superfamily, phylogenetic analysis is challenging and prokaryotic serpins have been speculated to have been acquired from Metazoa through horizontal gene transfer due to their unexpectedly high homology. Here, we have leveraged a structural alignment of diverse serpins to generate a comprehensive 6,000-sequence phylogeny that encompasses serpins from all kingdoms of life. We show that in addition to a central “hub” of highly conserved serpins, there has been extensive diversification of the superfamily into many novel functional clades. Our analysis indicates that the hub proteins are ancient and are similar because of convergent evolution, rather than the alternative hypothesis of horizontal gene transfer. This work clarifies longstanding questions in the evolution of serpins and provides new directions for research in the field of serpin biology.
Collapse
Affiliation(s)
- Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra, ACT, Australia
| | - Matthew D Mortimer
- Research School of Chemistry, Australian National University, Canberra, ACT, Australia
| | - Ashley M Buckle
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Melbourne, VIC, Australia
| | - Bui Quang Minh
- Research School of Computing and Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT, Australia.,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT, Australia.,Australian Research Council Centre of Excellence in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
6
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
7
|
Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol 2021; 68:175-183. [PMID: 33567396 DOI: 10.1016/j.sbi.2020.11.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 01/05/2023]
Abstract
Comparative evolutionary genomics has revealed that novel protein coding genes can emerge randomly from non-coding DNA. While most of the myriad of transcripts which continuously emerge vanish rapidly, some attain regulatory regions, become translated and survive. More surprisingly, sequence properties of de novo proteins are almost indistinguishable from randomly obtained sequences, yet de novo proteins may gain functions and integrate into eukaryotic cellular networks quite easily. We here discuss current knowledge on de novo proteins, their structures, functions and evolution. Since the existence of de novo proteins seems at odds with decade-long attempts to construct proteins with novel structures and functions from scratch, we suggest that a better understanding of de novo protein evolution may fuel new strategies for protein design.
Collapse
|
8
|
Cambridge SB. Hypothesis: protein and RNA attributes are continuously optimized over time. BMC Genomics 2019; 20:1012. [PMID: 31870287 PMCID: PMC6929361 DOI: 10.1186/s12864-019-6371-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 12/05/2019] [Indexed: 02/01/2023] Open
Abstract
Background Little is known why proteins and RNAs exhibit half-lives varying over several magnitudes. Despite many efforts, a conclusive link between half-lives and gene function could not be established suggesting that other determinants may influence these molecular attributes. Results Here, I find that with increasing gene age there is a gradual and significant increase of protein and RNA half-lives, protein structure, and other molecular attributes that tend to affect protein abundance. These observations are accommodated in a hypothesis which posits that new genes at ‘birth’ are not optimized and thus their products exhibit low half-lives and less structure but continuous mutagenesis eventually improves these attributes. Thus, the protein and RNA products of the oldest genes obtained their high degrees of stability and structure only after billions of years while the products of younger genes had less time to be optimized and are therefore less stable and structured. Because more stable proteins with lower turnover require less transcription to maintain the same level of abundance, reduced transcription-associated mutagenesis (TAM) would fixate the changes by increasing gene conservation. Conclusions Consequently, the currently observed diversity of molecular attributes is a snapshot of gene products being at different stages along their temporal path of optimization.
Collapse
Affiliation(s)
- Sidney B Cambridge
- Department of Functional Neuroanatomy, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
9
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
10
|
Lacombe-Harvey MÈ, Brzezinski R, Beaulieu C. Chitinolytic functions in actinobacteria: ecology, enzymes, and evolution. Appl Microbiol Biotechnol 2018; 102:7219-7230. [PMID: 29931600 PMCID: PMC6097792 DOI: 10.1007/s00253-018-9149-4] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 05/25/2018] [Accepted: 05/28/2018] [Indexed: 12/20/2022]
Abstract
Actinobacteria, a large group of Gram-positive bacteria, secrete a wide range of extracellular enzymes involved in the degradation of organic compounds and biopolymers including the ubiquitous aminopolysaccharides chitin and chitosan. While chitinolytic enzymes are distributed in all kingdoms of life, actinobacteria are recognized as particularly good decomposers of chitinous material and several members of this taxon carry impressive sets of genes dedicated to chitin and chitosan degradation. Degradation of these polymers in actinobacteria is dependent on endo- and exo-acting hydrolases as well as lytic polysaccharide monooxygenases. Actinobacterial chitinases and chitosanases belong to nine major families of glycosyl hydrolases that share no sequence similarity. In this paper, the distribution of chitinolytic actinobacteria within different ecosystems is examined and their chitinolytic machinery is described and compared to those of other chitinolytic organisms.
Collapse
Affiliation(s)
| | - Ryszard Brzezinski
- Département de biologie, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
| | - Carole Beaulieu
- Département de biologie, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada.
| |
Collapse
|
11
|
Schiffrin B, Brockwell DJ, Radford SE. Outer membrane protein folding from an energy landscape perspective. BMC Biol 2017; 15:123. [PMID: 29268734 PMCID: PMC5740924 DOI: 10.1186/s12915-017-0464-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The cell envelope is essential for the survival of Gram-negative bacteria. This specialised membrane is densely packed with outer membrane proteins (OMPs), which perform a variety of functions. How OMPs fold into this crowded environment remains an open question. Here, we review current knowledge about OMP folding mechanisms in vitro and discuss how the need to fold to a stable native state has shaped their folding energy landscapes. We also highlight the role of chaperones and the β-barrel assembly machinery (BAM) in assisting OMP folding in vivo and discuss proposed mechanisms by which this fascinating machinery may catalyse OMP folding.
Collapse
Affiliation(s)
- Bob Schiffrin
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - David J Brockwell
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK.
| | - Sheena E Radford
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK.
| |
Collapse
|
12
|
Williams PD, Pollock DD, Goldstein RA. Functionality and the Evolution of Marginal Stability in Proteins: Inferences from Lattice Simulations. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200013] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
It has been known for some time that many proteins are marginally stable. This has inspired several explanations. Having noted that the functionality of many enzymes is correlated with subunit motion, flexibility, or general disorder, some have suggested that marginally stable proteins should have an evolutionary advantage over proteins of differing stability. Others have suggested that stability and functionality are contradictory qualities, and that selection for both criteria results in marginally stable proteins, optimised to satisfy the competing design pressures. While these explanations are plausible, recent research simulating the evolution of model proteins has shown that selection for stability, ignoring any aspects of functionality, can result in marginally stable proteins because of the underlying makeup of protein sequence-space. We extend this research by simulating the evolution of proteins, using a computational protein model that equates functionality with binding and catalysis. In the model, marginal stability is not required for ligand-binding functionality and we observe no competing design pressures. The resulting proteins are marginally stable, again demonstrating that neutral evolution is sufficient for explaining marginal stability in observed proteins.
Collapse
Affiliation(s)
- Paul D. Williams
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - David D. Pollock
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Richard A. Goldstein
- Mathematical Biology, National Institute for Medical Sciences, The Ridgeway, Mill Hill, London MW7 1AA, UK
| |
Collapse
|
13
|
Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophys J 2017; 113:1719-1730. [PMID: 29045866 PMCID: PMC5647607 DOI: 10.1016/j.bpj.2017.08.039] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 12/23/2022] Open
Abstract
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
14
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
15
|
Abrusán G, Marsh JA. Alpha Helices Are More Robust to Mutations than Beta Strands. PLoS Comput Biol 2016; 12:e1005242. [PMID: 27935949 PMCID: PMC5147804 DOI: 10.1371/journal.pcbi.1005242] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 11/08/2016] [Indexed: 12/30/2022] Open
Abstract
The rapidly increasing amount of data on human genetic variation has resulted in a growing demand to identify pathogenic mutations computationally, as their experimental validation is currently beyond reach. Here we show that alpha helices and beta strands differ significantly in their ability to tolerate mutations: helices can accumulate more mutations than strands without change, due to the higher numbers of inter-residue contacts in helices. This results in two patterns: a) the same number of mutations causes less structural change in helices than in strands; b) helices diverge more rapidly in sequence than strands within the same domains. Additionally, both helices and strands are significantly more robust than coils. Based on this observation we show that human missense mutations that change secondary structure are more likely to be pathogenic than those that do not. Moreover, inclusion of predicted secondary structure changes shows significant utility for improving upon state-of-the-art pathogenicity predictions. The factors that determine the robustness and evolvability of proteins are still largely unknown. In this work the authors show that different secondary structure elements of proteins (helices and strands) differ in their ability to tolerate mutations, and demonstrate that it is caused by differences in the number of non-covalent residue interactions within these secondary structure units. The results suggest that engineering de novo all-alpha proteins should be easier than all-beta ones, as more sequences can to fold to the same topology. Additionally, secondary structure can be used to improve current methods of pathogenicity predictions; mutations that change secondary structure are more likely to be pathogenic than mutations that do not, due to their strong destabilizing effect on protein structure.
Collapse
Affiliation(s)
- György Abrusán
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, United Kingdom
- Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Temesvári krt. 62, Hungary
- * E-mail:
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, United Kingdom
| |
Collapse
|
16
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 180] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
17
|
Sammond DW, Kastelowitz N, Himmel ME, Yin H, Crowley MF, Bomble YJ. Comparing Residue Clusters from Thermophilic and Mesophilic Enzymes Reveals Adaptive Mechanisms. PLoS One 2016; 11:e0145848. [PMID: 26741367 PMCID: PMC4704809 DOI: 10.1371/journal.pone.0145848] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
Understanding how proteins adapt to function at high temperatures is important for deciphering the energetics that dictate protein stability and folding. While multiple principles important for thermostability have been identified, we lack a unified understanding of how internal protein structural and chemical environment determine qualitative or quantitative impact of evolutionary mutations. In this work we compare equivalent clusters of spatially neighboring residues between paired thermophilic and mesophilic homologues to evaluate adaptations under the selective pressure of high temperature. We find the residue clusters in thermophilic enzymes generally display improved atomic packing compared to mesophilic enzymes, in agreement with previous research. Unlike residue clusters from mesophilic enzymes, however, thermophilic residue clusters do not have significant cavities. In addition, anchor residues found in many clusters are highly conserved with respect to atomic packing between both thermophilic and mesophilic enzymes. Thus the improvements in atomic packing observed in thermophilic homologues are not derived from these anchor residues but from neighboring positions, which may serve to expand optimized protein core regions.
Collapse
Affiliation(s)
- Deanne W Sammond
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Noah Kastelowitz
- Department of Chemistry & Biochemistry and the BioFrontiers Institute, University of Colorado, Boulder, Colorado, 80309, United States of America
| | - Michael E Himmel
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Hang Yin
- Department of Chemistry & Biochemistry and the BioFrontiers Institute, University of Colorado, Boulder, Colorado, 80309, United States of America
| | - Michael F Crowley
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Yannick J Bomble
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| |
Collapse
|
18
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
19
|
Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 2014; 26:84-91. [PMID: 24952216 DOI: 10.1016/j.sbi.2014.05.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/19/2014] [Accepted: 05/16/2014] [Indexed: 11/24/2022]
Abstract
The variation among sequences and structures in nature is both determined by physical laws and by evolutionary history. However, these two factors are traditionally investigated by disciplines with different emphasis and philosophy-molecular biophysics on one hand and evolutionary population genetics in another. Here, we review recent theoretical and computational approaches that address the crucial need to integrate these two disciplines. We first articulate the elements of these approaches. Then, we survey their contribution to our mechanistic understanding of molecular evolution, the polymorphisms in coding region, the distribution of fitness effects (DFE) of mutations, the observed folding stability of proteins in nature, and the distribution of protein folds in genomes.
Collapse
|
20
|
Waldispühl J, O'Donnell CW, Will S, Devadas S, Backofen R, Berger B. Simultaneous alignment and folding of protein sequences. J Comput Biol 2014; 21:477-91. [PMID: 24766258 DOI: 10.1089/cmb.2013.0163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/ ).
Collapse
|
21
|
Zhang X, Perica T, Teichmann SA. Evolution of protein structures and interactions from the perspective of residue contact networks. Curr Opin Struct Biol 2013; 23:954-63. [DOI: 10.1016/j.sbi.2013.07.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Revised: 07/02/2013] [Accepted: 07/04/2013] [Indexed: 10/26/2022]
|
22
|
Mannige RV. Two modes of protein sequence evolution and their compositional dependencies. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:062714. [PMID: 23848722 DOI: 10.1103/physreve.87.062714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 05/10/2013] [Indexed: 06/02/2023]
Abstract
Protein sequence evolution has resulted in a vast repertoire of molecular functionality crucial to life. Despite the central importance of sequence evolution to biology, our fundamental understanding of how sequence composition affects evolution is incomplete. This report describes the utilization of lattice model simulations of directed evolution, which indicate that, on average, peptide and protein evolvability is strongly dependent on initial sequence composition. The report also discusses two distinct regimes of sequence evolution by point mutation: (a) the "classical" mode where sequences "crawl" over free energy barriers towards acquiring a target fold, and (b) the "quantum" mode where sequences appear to "tunnel" through large energy barriers generally insurmountable by means of a crawl. Finally, the simulations indicate that oily and charged peptides are the most efficient substrates for evolution at the "classical" and "quantum" regimes, respectively, and that their respective response to temperature is commensurate with analogies made to barrier crossing in classical and quantum systems. On the whole, these results show that sequence composition can tune both the evolvability and the optimal mode of evolution of peptides and proteins.
Collapse
Affiliation(s)
- Ranjan V Mannige
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
23
|
Mohanty S, Purwar M, Srinivasan N, Rekha N. Tethering preferences of domain families co-occurring in multi-domain proteins. MOLECULAR BIOSYSTEMS 2013; 9:1708-25. [PMID: 23571467 DOI: 10.1039/c3mb25481j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Genomic data of several organisms have revealed the presence of a vast repertoire of multi-domain proteins. The role played by individual domains in a multi-domain protein has a profound influence on the overall function of the protein. In the present analysis an attempt has been made to better understand the tethering preferences of domain families that occur in multi-domain proteins. The analysis has been carried out on an exhaustive dataset of 2 961 898 sequences of proteins from 930 organisms, where 741 274 proteins are comprised of at least two domain families. For every domain family, the number of other domain families with which it co-occurs within a protein in this dataset has been enumerated and is referred to as the tethering number of the domain family. It was found that, in the general dataset, the AAA ATPase family and the family of Ser/Thr kinases have the highest tethering numbers of 450 and 444 respectively. Further analysis reveals significant correlation between the number of members in a family and its tethering number. Positive correlation was also observed for the extent of a sequence and functional diversity within a family and the tethering numbers of domain families. Domain families that are present ubiquitously in diverse organisms tend to have large tethering numbers, while organism/kingdom-specific families have low tethering numbers. Thus, the analysis uncovers how domain families recombine and evolve to give rise to multi-domain proteins.
Collapse
Affiliation(s)
- Smita Mohanty
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | | | | | |
Collapse
|
24
|
Dellus-Gur E, Toth-Petroczy A, Elias M, Tawfik DS. What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs. J Mol Biol 2013; 425:2609-21. [PMID: 23542341 DOI: 10.1016/j.jmb.2013.03.033] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Revised: 03/18/2013] [Accepted: 03/24/2013] [Indexed: 12/30/2022]
Abstract
Protein evolvability includes two elements--robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these two conflicting demands bridged? Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one? Here, we hypothesize that the key to innovability is polarity--an active site composed of flexible, loosely packed loops alongside a well-separated, highly ordered scaffold. We show that highly stabilized variants of TEM-1 β-lactamase exhibit selective rigidification of the enzyme's scaffold while the active-site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational analysis indicates that in folds that accommodate only one function throughout evolution, for example, dihydrofolate reductase, ≥ 60% of the active-site residues belong to the scaffold. In contrast, folds associated with multiple functions such as the TIM barrel show high scaffold-active-site polarity (~20% of the active site comprises scaffold residues) and >2-fold higher rates of sequence divergence at active-site positions. Our work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.
Collapse
Affiliation(s)
- Eynat Dellus-Gur
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
25
|
Micheletti C. Comparing proteins by their internal dynamics: exploring structure-function relationships beyond static structural alignments. Phys Life Rev 2012. [PMID: 23199577 DOI: 10.1016/j.plrev.2012.10.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The growing interest for comparing protein internal dynamics owes much to the realisation that protein function can be accompanied or assisted by structural fluctuations and conformational changes. Analogously to the case of functional structural elements, those aspects of protein flexibility and dynamics that are functionally oriented should be subject to evolutionary conservation. Accordingly, dynamics-based protein comparisons or alignments could be used to detect protein relationships that are more elusive to sequence and structural alignments. Here we provide an account of the progress that has been made in recent years towards developing and applying general methods for comparing proteins in terms of their internal dynamics and advance the understanding of the structure-function relationship.
Collapse
Affiliation(s)
- Cristian Micheletti
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, Trieste, Italy.
| |
Collapse
|
26
|
Li Y, Fang J. PROTS-RF: a robust model for predicting mutation-induced protein stability changes. PLoS One 2012; 7:e47247. [PMID: 23077576 PMCID: PMC3471942 DOI: 10.1371/journal.pone.0047247] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 09/11/2012] [Indexed: 11/19/2022] Open
Abstract
The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799,0.782, 0.787, and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, respectively. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on physical principles can be highly useful for testing the robustness of predictive models.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, The University of Kansas, Lawrence, Kansas, United States of America
| | - Jianwen Fang
- Applied Bioinformatics Laboratory, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
27
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
28
|
Hogeweg P. Toward a theory of multilevel evolution: long-term information integration shapes the mutational landscape and enhances evolvability. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 751:195-224. [PMID: 22821460 DOI: 10.1007/978-1-4614-3567-9_10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Most of evolutionary theory has abstracted away from how information is coded in the genome and how this information is transformed into traits on which selection takes place. While in the earliest stages of biological evolution, in the RNA world, the mapping from the genotype into function was largely predefined by the physical-chemical properties of the evolving entities (RNA replicators, e.g. from sequence to folded structure and catalytic sites), in present-day organisms, the mapping itself is the result of evolution. I will review results of several in silico evolutionary studies which examine the consequences of evolving the genetic coding, and the ways this information is transformed, while adapting to prevailing environments. Such multilevel evolution leads to long-term information integration. Through genome, network, and dynamical structuring, the occurrence and/or effect of random mutations becomes nonrandom, and facilitates rapid adaptation. This is what does happen in the in silico experiments. Is it also what did happen in biological evolution? I will discuss some data that suggest that it did. In any case, these results provide us with novel search images to tackle the wealth of biological data.
Collapse
Affiliation(s)
- Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
29
|
The evolution of protein structures and structural ensembles under functional constraint. Genes (Basel) 2011; 2:748-62. [PMID: 24710290 PMCID: PMC3927589 DOI: 10.3390/genes2040748] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2011] [Revised: 10/15/2011] [Accepted: 10/19/2011] [Indexed: 02/06/2023] Open
Abstract
Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (http://www.ncbi.nlm.nih.gov/), and our knowledge of function through a limited set of in-vitro biochemical studies. How these intersect through evolution is described in the first part of the review. In the second part, our understanding of a series of questions is addressed. This includes how sequences evolve within structures, how evolutionary processes enable structural transitions, how the folding process can change through evolution and what the fitness impacts of this might be. Moving beyond static structures, the evolution of protein kinetics (including normal modes) is discussed, as is the evolution of conformational ensembles and structurally disordered proteins. This ties back to a question of the role of neostructuralization and how it relates to selection on sequences for functions. The relationship between metastability, the fitness landscape, sequence divergence, and organismal effective population size is explored. Lastly, a brief discussion of modeling the evolution of sequences of ordered and disordered proteins is entertained.
Collapse
|
30
|
Rorick MM, Wagner GP. Protein structural modularity and robustness are associated with evolvability. Genome Biol Evol 2011; 3:456-75. [PMID: 21602570 PMCID: PMC3134980 DOI: 10.1093/gbe/evr046] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Theory suggests that biological modularity and robustness allow for maintenance of fitness under mutational change, and when this change is adaptive, for evolvability. Empirical demonstrations that these traits promote evolvability in nature remain scant however. This is in part because modularity, robustness, and evolvability are difficult to define and measure in real biological systems. Here, we address whether structural modularity and/or robustness confer evolvability at the level of proteins by looking for associations between indices of protein structural modularity, structural robustness, and evolvability. We propose a novel index for protein structural modularity: the number of regular secondary structure elements (helices and strands) divided by the number of residues in the structure. We index protein evolvability as the proportion of sites with evidence of being under positive selection multiplied by the average rate of adaptive evolution at these sites, and we measure this as an average over a phylogeny of 25 mammalian species. We use contact density as an index of protein designability, and thus, structural robustness. We find that protein evolvability is positively associated with structural modularity as well as structural robustness and that the effect of structural modularity on evolvability is independent of the structural robustness index. We interpret these associations to be the result of reduced constraints on amino acid substitutions in highly modular and robust protein structures, which results in faster adaptation through natural selection.
Collapse
Affiliation(s)
- Mary M Rorick
- Department of Genetics, Yale University, New Haven, Connecticut, USA.
| | | |
Collapse
|
31
|
The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011; 188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.
Collapse
|
32
|
Systematic assessment of accuracy of comparative model of proteins belonging to different structural fold classes. J Mol Model 2011; 17:2831-7. [PMID: 21301906 DOI: 10.1007/s00894-011-0976-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 01/17/2011] [Indexed: 10/18/2022]
Abstract
In the absence of experimental structures, comparative modeling continues to be the chosen method for retrieving structural information on target proteins. However, models lack the accuracy of experimental structures. Alignment error and structural divergence (between target and template) influence model accuracy the most. Here, we examine the potential additional impact of backbone geometry, as our previous studies have suggested that the structural class (all-α, αβ, all-β) of a protein may influence the accuracy of its model. In the twilight zone (sequence identity ≤ 30%) and at a similar level of target-template divergence, the accuracy of protein models does indeed follow the trend all-α > αβ > all-β. This is mainly because the alignment accuracy follows the same trend (all-α > αβ > all-β), with backbone geometry playing only a minor role. Differences in the diversity of sequences belonging to different structural classes leads to the observed accuracy differences, thus enabling the accuracy of alignments/models to be estimated a priori in a class-dependent manner. This study provides a systematic description of and quantifies the structural class-dependent effect in comparative modeling. The study also suggests that datasets for large-scale sequence/structure analyses should have equal representations of different structural classes to avoid class-dependent bias.
Collapse
|
33
|
Deeds EJ, Shakhnovich EI. A structure-centric view of protein evolution, design, and adaptation. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:133-91, xi-xii. [PMID: 17124867 DOI: 10.1002/9780471224464.ch2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent in the indispensability of proteins for living cells is the fact that a given protein can adopt a specific three-dimensional shape that is specified solely by the protein's sequence of amino acids. Over the past several decades, structural biologists have demonstrated that the array of structures that proteins may adopt is quite astounding, and this has lead to a strong interest in understanding how protein structures change and evolve over time. In this review we consider a large body of recent work that attempts to illuminate this structure-centric picture of protein evolution. Much of this work has focused on the question of how completely new protein structures (i.e., new folds or topologies) are discovered by protein sequences as they evolve. Pursuant to this question of structural innovation has been a desire to describe and understand the observation that certain types of protein structures are far more abundant than others and how this uneven distribution of proteins implicates on the process through which new shapes are discovered. We consider a number of theoretical models that have been successful at explaining this heterogeneity in protein populations and discuss the increasing amount of evidence that indicates that the process of structural evolution involves the divergence of protein sequences and structures from one another. We also consider the topic of protein designability, which concerns itself with understanding how a protein's structure influences the number of sequences that can fold successfully into that structure. Understanding and quantifying the relationship between the physical feature of a structure and its designability has been a long-standing goal of the study of protein structure and evolution, and we discuss a number of recent advances that have yielded a promising answer to this question. Finally, we review the relatively new field of protein structural phylogeny, an area of study in which information about the distribution of protein structures among different organisms is used to reconstruct the evolutionary relationships between them. Taken together, the work that we review presents an increasingly coherent picture of how these unique polymers have evolved over the course of life on Earth.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
34
|
|
35
|
Kleinman CL, Rodrigue N, Lartillot N, Philippe H. Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol 2010; 27:1546-60. [PMID: 20159780 DOI: 10.1093/molbev/msq047] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Assessing the influence of three-dimensional protein structure on sequence evolution is a difficult task, mainly because of the assumption of independence between sites required by probabilistic phylogenetic methods. Recently, models that include an explicit treatment of protein structure and site interdependencies have been developed: a statistical potential (an energy-like scoring system for sequence-structure compatibility) is used to evaluate the probability of fixation of a given mutation, assuming a coarse-grained protein structure that is constant through evolution. Yet, due to the novelty of these models and the small degree of overlap between the fields of structural and evolutionary biology, only simple representations of protein structure have been used so far. In this work, we present new forms of statistical potentials using a probabilistic framework recently developed for evolutionary studies. Terms related to pairwise distance interactions, torsion angles, solvent accessibility, and flexibility of the residues are included in the potentials, so as to study the effects of the main factors known to influence protein structure. The new potentials, with a more detailed representation of the protein structure, yield a better fit than the previously used scoring functions, with pairwise interactions contributing to more than half of this improvement. In a phylogenetic context, however, the structurally constrained models are still outperformed by some of the available site-independent models in terms of fit, possibly indicating that alternatives to coarse-grained statistical potentials should be explored in order to better model structural constraints.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Département de Biochimie, Centre Robert Cedergren, Université de Montréal, Montreal, Quebec, Canada.
| | | | | | | |
Collapse
|
36
|
Covariation of branch lengths in phylogenies of functionally related genes. PLoS One 2009; 4:e8487. [PMID: 20041191 PMCID: PMC2793527 DOI: 10.1371/journal.pone.0008487] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 11/25/2009] [Indexed: 12/05/2022] Open
Abstract
Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.
Collapse
|
37
|
Lan T, Yang ZL, Yang X, Liu YJ, Wang XR, Zeng QY. Extensive functional diversification of the Populus glutathione S-transferase supergene family. THE PLANT CELL 2009; 21:3749-66. [PMID: 19996377 PMCID: PMC2814494 DOI: 10.1105/tpc.109.070219] [Citation(s) in RCA: 151] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Revised: 10/31/2009] [Accepted: 11/16/2009] [Indexed: 05/17/2023]
Abstract
Identifying how genes and their functions evolve after duplication is central to understanding gene family radiation. In this study, we systematically examined the functional diversification of the glutathione S-transferase (GST) gene family in Populus trichocarpa by integrating phylogeny, expression, substrate specificity, and enzyme kinetic data. GSTs are ubiquitous proteins in plants that play important roles in stress tolerance and detoxification metabolism. Genome annotation identified 81 GST genes in Populus that were divided into eight classes with distinct divergence in their evolutionary rate, gene structure, expression responses to abiotic stressors, and enzymatic properties of encoded proteins. In addition, when all the functional parameters were examined, clear divergence was observed within tandem clusters and between paralogous gene pairs, suggesting that subfunctionalization has taken place among duplicate genes. The two domains of GST proteins appear to have evolved under differential selective pressures. The C-terminal domain seems to have been subject to more relaxed functional constraints or divergent directional selection, which may have allowed rapid changes in substrate specificity, affinity, and activity, while maintaining the primary function of the enzyme. Our findings shed light on mechanisms that facilitate the retention of duplicate genes, which can result in a large gene family with a broad substrate spectrum and a wide range of reactivity toward different substrates.
Collapse
Affiliation(s)
- Ting Lan
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Graduate School, Chinese Academy of Sciences, Beijing 100049, China
| | - Zhi-Ling Yang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Graduate School, Chinese Academy of Sciences, Beijing 100049, China
| | - Xue Yang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Graduate School, Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Jing Liu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Xiao-Ru Wang
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, SE-901 87 Umeå, Sweden
| | - Qing-Yin Zeng
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- Address correspondence to
| |
Collapse
|
38
|
Liu X, Zhao YP. Donut-shaped fingerprint in homologous polypeptide relationships--a topological feature related to pathogenic structural changes in conformational disease. J Theor Biol 2009; 258:294-301. [PMID: 19248793 PMCID: PMC7094133 DOI: 10.1016/j.jtbi.2009.02.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Revised: 01/06/2009] [Accepted: 02/11/2009] [Indexed: 02/05/2023]
Abstract
Features of homologous relationship of proteins can provide us a general picture of protein universe, assist protein design and analysis, and further our comprehension of the evolution of organisms. Here we carried out a study of the evolution of protein molecules by investigating homologous relationships among residue segments. The motive was to identify detailed topological features of homologous relationships for short residue segments in the whole protein universe. Based on the data of a large number of non-redundant proteins, the universe of non-membrane polypeptide was analyzed by considering both residue mutations and structural conservation. By connecting homologous segments with edges, we obtained a homologous relationship network of the whole universe of short residue segments, which we named the graph of polypeptide relationships (GPR). Since the network is extremely complicated for topological transitions, to obtain an in-depth understanding, only subgraphs composed of vital nodes of the GPR were analyzed. Such analysis of vital subgraphs of the GPR revealed a donut-shaped fingerprint. Utilization of this topological feature revealed the switch sites (where the beginning of exposure of previously hidden "hot spots" of fibril-forming happens, in consequence a further opportunity for protein aggregation is provided; 188-202) of the conformational conversion of the normal alpha-helix-rich prion protein PrP(C) to the beta-sheet-rich PrP(Sc) that is thought to be responsible for a group of fatal neurodegenerative diseases, transmissible spongiform encephalopathies. Efforts in analyzing other proteins related to various conformational diseases are also introduced.
Collapse
Affiliation(s)
- Xin Liu
- Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080, China
| | - Ya-Pu Zhao
- The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences. Beijing 100190, China
| |
Collapse
|
39
|
Trifonov EN, Frenkel ZM. Evolution of protein modularity. Curr Opin Struct Biol 2009; 19:335-40. [PMID: 19386484 DOI: 10.1016/j.sbi.2009.03.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Accepted: 03/16/2009] [Indexed: 10/20/2022]
Abstract
Proteins in their evolution appear to follow several discrete stages, which is reflected in their modular organization. The sequences of the protein modules are highly variable while their functions and structures are rather conserved. The relatedness of the variable sequences is well represented by the networks in natural protein sequence space that also suggests evolutionary connections.
Collapse
Affiliation(s)
- Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel.
| | | |
Collapse
|
40
|
Gu J, Hilser VJ. Predicting the energetics of conformational fluctuations in proteins from sequence: a strategy for profiling the proteome. Structure 2009; 16:1627-37. [PMID: 19000815 DOI: 10.1016/j.str.2008.08.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 08/07/2008] [Accepted: 08/19/2008] [Indexed: 11/30/2022]
Abstract
The abundance of dynamic and disordered regions in proteins suggests that structural determinants alone may not be sufficient to describe function. Instead, descriptors that account for the dynamic features of the energy landscape populated by the protein ensemble may be required. Here, we show that the thermodynamics of the dynamical complexity that imparts biological function can be largely reconstructed using sequence information alone, allowing thermodynamic characterization of entire proteomes without the need for structural analysis. We show that this tool can be used to analyze conserved energetic signatures within classes of proteins, as well as to compare the thermodynamic character of different proteomes.
Collapse
Affiliation(s)
- Jenny Gu
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, 77555-1068, USA
| | | |
Collapse
|
41
|
Abstract
Neutralism and selectionism are extremes of an explanatory spectrum for understanding patterns of molecular evolution and the emergence of evolutionary innovation. Although recent genome-scale data from protein-coding genes argue against neutralism, molecular engineering and protein evolution data argue that neutral mutations and mutational robustness are important for evolutionary innovation. Here I propose a reconciliation in which neutral mutations prepare the ground for later evolutionary adaptation. Key to this perspective is an explicit understanding of molecular phenotypes that has only become accessible in recent years.
Collapse
|
42
|
Peto M, Kloczkowski A, Honavar V, Jernigan RL. Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable. BMC Bioinformatics 2008; 9:487. [PMID: 19014713 PMCID: PMC2655094 DOI: 10.1186/1471-2105-9-487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2008] [Accepted: 11/18/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. RESULTS First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly- or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. CONCLUSION By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy -- in some cases exceeding 95%.
Collapse
Affiliation(s)
- Myron Peto
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| | - Andrzej Kloczkowski
- Laurence H Baker Center for Bioinformatics and Biological Statistics, 112 Office and Lab Bldg, Iowa State University, Ames, IA 50011-3020, USA
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| | - Vasant Honavar
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Robert L Jernigan
- Laurence H Baker Center for Bioinformatics and Biological Statistics, 112 Office and Lab Bldg, Iowa State University, Ames, IA 50011-3020, USA
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020, USA
| |
Collapse
|
43
|
Ferrada E, Wagner A. Protein robustness promotes evolutionary innovations on large evolutionary time-scales. Proc Biol Sci 2008; 275:1595-602. [PMID: 18430649 DOI: 10.1098/rspb.2007.1617] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent laboratory experiments suggest that a molecule's ability to evolve neutrally is important for its ability to generate evolutionary innovations. In contrast to laboratory experiments, life unfolds on time-scales of billions of years. Here, we ask whether a molecule's ability to evolve neutrally-a measure of its robustness-facilitates evolutionary innovation also on these large time-scales. To this end, we use protein designability, the number of sequences that can adopt a given protein structure, as an estimate of the structure's ability to evolve neutrally. Based on two complementary measures of functional diversity-catalytic diversity and molecular functional diversity in gene ontology-we show that more robust proteins have a greater capacity to produce functional innovations. Significant associations among structural designability, folding rate and intrinsic disorder also exist, underlining the complex relationship of the structural factors that affect protein evolution.
Collapse
Affiliation(s)
- Evandro Ferrada
- Department of Biochemistry, University of Zurich, Building Y27, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| | | |
Collapse
|
44
|
Zeldovich KB, Chen P, Shakhnovich BE, Shakhnovich EI. A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds. PLoS Comput Biol 2008; 3:e139. [PMID: 17630830 PMCID: PMC1914367 DOI: 10.1371/journal.pcbi.0030139] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 06/04/2007] [Indexed: 11/19/2022] Open
Abstract
In this work we develop a microscopic physical model of early evolution where phenotype—organism life expectancy—is directly related to genotype—the stability of its proteins in their native conformations—which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the “Big Bang” scenario whereby exponential population growth ensues as soon as favorable sequence–structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species—subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution. Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Peiqiu Chen
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Boris E Shakhnovich
- Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
45
|
Zeldovich KB, Shakhnovich EI. Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 2008; 59:105-27. [PMID: 17937598 DOI: 10.1146/annurev.physchem.58.032806.104449] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Collapse
Affiliation(s)
- Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
46
|
Shakhnovich BE, Shakhnovich EI. Improvisation in evolution of genes and genomes: whose structure is it anyway? Curr Opin Struct Biol 2008; 18:375-81. [PMID: 18487041 DOI: 10.1016/j.sbi.2008.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 02/13/2008] [Indexed: 01/31/2023]
Abstract
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Collapse
Affiliation(s)
- Boris E Shakhnovich
- Department of Molecular and Cellular Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States
| | | |
Collapse
|
47
|
Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18:170-7. [DOI: 10.1016/j.sbi.2008.01.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 12/20/2007] [Accepted: 01/09/2008] [Indexed: 11/29/2022]
|
48
|
Zhou T, Drummond DA, Wilke CO. Contact density affects protein evolutionary rate from bacteria to animals. J Mol Evol 2008; 66:395-404. [PMID: 18379715 DOI: 10.1007/s00239-008-9094-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Revised: 02/16/2008] [Accepted: 02/25/2008] [Indexed: 12/29/2022]
Abstract
The density of contacts or the fraction of buried sites in a protein structure is thought to be related to a protein's designability, and genes encoding more designable proteins should evolve faster than other genes. Several recent studies have tested this hypothesis but have found conflicting results. Here, we investigate how a gene's evolutionary rate is affected by its protein's contact density, considering the four species Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We find for all four species that contact density correlates positively with evolutionary rate, and that these correlations do not seem to be confounded by gene expression level. The strength of this signal, however, varies widely among species. We also study the effect of contact density on domain evolution in multidomain proteins and find that a domain's contact density influences the domain's evolutionary rate. Within the same protein, a domain with higher contact density tends to evolve faster than a domain with lower contact density. Our study provides evidence that contact density can increase evolutionary rates, and that it acts similarly on the level of entire proteins and of individual protein domains.
Collapse
Affiliation(s)
- Tong Zhou
- Center for Computational Biology and Bioinformatics, Section of Integrative Biology, University of Texas at Austin, Austin, TX 78731, USA
| | | | | |
Collapse
|
49
|
Wong P, Frishman D. Designability and disease. Methods Mol Biol 2008; 484:491-504. [PMID: 18592197 DOI: 10.1007/978-1-59745-398-1_29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Structural designability is the number of ways it is possible to encode for structure. A protein's designability has been equated with the size of sequence space encoding for the protein's structure, a measure that reflects the structure's robustness to mutation. Current evidence suggests that designability is fundamental to our understanding of the evolvability and distribution of structures in nature and is a significant factor associated with human disease. Here, we describe definitions and principles underlying the concept of designability and discuss its relation to disease.
Collapse
Affiliation(s)
- Philip Wong
- Institute for Bioinformatics, GSF-National Research Center for Environment and Health, Neuherberg, Germany
| | | |
Collapse
|
50
|
Galzitskaya OV, Reifsnyder DC, Bogatyreva NS, Ivankov DN, Garbuzynskiy SO. More compact protein globules exhibit slower folding rates. Proteins 2007; 70:329-32. [PMID: 17876831 DOI: 10.1002/prot.21619] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We have demonstrated that, among proteins of the same size, alpha/beta proteins have on the average a greater number of contacts per residue due to their more compact (more "spherical") structure, rather than due to tighter packing. We have examined the relationship between the average number of contacts per residue and folding rates in globular proteins according to general protein structural class (all-alpha, all-beta, alpha/beta, alpha+beta). Our analysis demonstrates that alpha/beta proteins have both the greatest number of contacts and the slowest folding rates in comparison to proteins from the other structural classes. Because alpha/beta proteins are also known to be the oldest proteins, it can be suggested that proteins have evolved to pack more quickly and into looser structures.
Collapse
Affiliation(s)
- Oxana V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.
| | | | | | | | | |
Collapse
|