1
|
Liu S, Kou Y, Chen L. Novel Few-Shot Learning Neural Network for Predicting Carbohydrate-Active Enzyme Affinity Toward Fructo-Oligosaccharides. J Comput Biol 2021; 28:1208-1218. [PMID: 34898254 DOI: 10.1089/cmb.2021.0091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The enzymatic activity of the microbiome toward carbohydrates in the human digestive system is of enormous health significance. Predicting how carbohydrates through food intake may affect the distribution and balance of gut microbiota remains a major challenge. Understanding the enzyme/substrate specificity relationship of the carbohydrate-active enzyme (CAZyme) encoded by the vast gut microbiome will be an important step to address this question. In this study, we seek to establish an in silico approach to studying the enzyme/substrate binding interaction. We focused on the key CAZyme and established a novel Poisson noise-based few-shot learning neural network (pFSLNN) for predicting the binding affinity of indigestible carbohydrates. This approach achieved higher accuracy than other classic FSLNNs, and we have also formulated new algorithms for feature generation using only a few amino acid (AA) sequences. Sliding bin regression is integrated with minimum redundancy maximum relevance for feature selection. The resulting pFSLNN is an efficient model to predict the binding affinity between CAZyme and common oligosaccharides. This model can be potentially applied to the binding affinity prediction of other protein/ligand interactions based on limited AA sequences.
Collapse
Affiliation(s)
- Shaoxun Liu
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Yi Kou
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Lin Chen
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
2
|
Allostery and Epistasis: Emergent Properties of Anisotropic Networks. ENTROPY 2020; 22:e22060667. [PMID: 33286439 PMCID: PMC7517209 DOI: 10.3390/e22060667] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 11/17/2022]
Abstract
Understanding the underlying mechanisms behind protein allostery and non-additivity of substitution outcomes (i.e., epistasis) is critical when attempting to predict the functional impact of mutations, particularly at non-conserved sites. In an effort to model these two biological properties, we extend the framework of our metric to calculate dynamic coupling between residues, the Dynamic Coupling Index (DCI) to two new metrics: (i) EpiScore, which quantifies the difference between the residue fluctuation response of a functional site when two other positions are perturbed with random Brownian kicks simultaneously versus individually to capture the degree of cooperativity of these two other positions in modulating the dynamics of the functional site and (ii) DCIasym, which measures the degree of asymmetry between the residue fluctuation response of two sites when one or the other is perturbed with a random force. Applied to four independent systems, we successfully show that EpiScore and DCIasym can capture important biophysical properties in dual mutant substitution outcomes. We propose that allosteric regulation and the mechanisms underlying non-additive amino acid substitution outcomes (i.e., epistasis) can be understood as emergent properties of an anisotropic network of interactions where the inclusion of the full network of interactions is critical for accurate modeling. Consequently, mutations which drive towards a new function may require a fine balance between functional site asymmetry and strength of dynamic coupling with the functional sites. These two tools will provide mechanistic insight into both understanding and predicting the outcome of dual mutations.
Collapse
|
3
|
Bao Q, Hotz-Wagenblatt A, Betts MJ, Hipp M, Hugo A, Pougialis G, Lei-Rossmann J, Löchelt M. Shared and cell type-specific adaptation strategies of Gag and Env yield high titer bovine foamy virus variants. INFECTION GENETICS AND EVOLUTION 2020; 82:104287. [PMID: 32179148 DOI: 10.1016/j.meegid.2020.104287] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 03/05/2020] [Accepted: 03/11/2020] [Indexed: 12/27/2022]
Abstract
During in vitro selection and evolution screens to adapt the tightly cell-associated bovine foamy virus BFV to high titer cell-free transmission, common, cell-type specific and concurrent adaptive changes in Gag and Env, the major players of foamy virus particle assembly and release, were detected. Upon early establishment of cell type-independent pioneering mutations in Env and, subsequently in Gag, a diverse virus pool emerged that was characterized by the occurrence of shared and additional cell type-specific exchanges. At late passages and saturated titers, remarkably homogeneous virus populations characterized by functionally important mutations developed which may be partly due to stochastic evolutionary events that occurred earlier during adaptation. Reverse genetics showed that defined mutations were functionally important for high titer cell-free transmission.
Collapse
Affiliation(s)
- Qiuying Bao
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | | | - Matthew J Betts
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.
| | - Michaela Hipp
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Annette Hugo
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Georgios Pougialis
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Janet Lei-Rossmann
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Martin Löchelt
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| |
Collapse
|
4
|
The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. Methods Mol Biol 2019; 1851:215-231. [PMID: 30298399 DOI: 10.1007/978-1-4939-8736-8_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Collapse
|
5
|
Liberles DA, Teufel AI. Evolution and Structure of Proteins and Proteomes. Genes (Basel) 2018; 9:E583. [PMID: 30487453 PMCID: PMC6315575 DOI: 10.3390/genes9120583] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 11/26/2018] [Indexed: 12/13/2022] Open
Abstract
This themed issue centered on the evolution and structure of proteins and proteomes is comprised of seven published manuscripts. [...].
Collapse
Affiliation(s)
- David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122, USA.
| | - Ashley I Teufel
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
6
|
Pareek V, Samanta M, Joshi NV, Balaram H, Murthy MRN, Balaram P. Connecting Active-Site Loop Conformations and Catalysis in Triosephosphate Isomerase: Insights from a Rare Variation at Residue 96 in the Plasmodial Enzyme. Chembiochem 2016; 17:620-9. [PMID: 26762569 DOI: 10.1002/cbic.201500532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Indexed: 12/12/2022]
Abstract
Despite extensive research into triosephosphate isomerases (TIMs), there exists a gap in understanding of the remarkable conjunction between catalytic loop-6 (residues 166-176) movement and the conformational flip of Glu165 (catalytic base) upon substrate binding that primes the active site for efficient catalysis. The overwhelming occurrence of serine at position 96 (98% of the 6277 unique TIM sequences), spatially proximal to E165 and the loop-6 residues, raises questions about its role in catalysis. Notably, Plasmodium falciparum TIM has an extremely rare residue--phenylalanine--at this position whereas, curiously, the mutant F96S was catalytically defective. We have obtained insights into the influence of residue 96 on the loop-6 conformational flip and E165 positioning by combining kinetic and structural studies on the PfTIM F96 mutants F96Y, F96A, F96S/S73A, and F96S/L167V with sequence conservation analysis and comparative analysis of the available apo and holo structures of the enzyme from diverse organisms.
Collapse
Affiliation(s)
- Vidhi Pareek
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Moumita Samanta
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Niranjan V Joshi
- Centre for Ecological Sciences, Indian Institute of Science, Bangalore, 560012, India
| | - Hemalatha Balaram
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore, 560064, India
| | - Mathur R N Murthy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Padmanabhan Balaram
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
7
|
Sammond DW, Kastelowitz N, Himmel ME, Yin H, Crowley MF, Bomble YJ. Comparing Residue Clusters from Thermophilic and Mesophilic Enzymes Reveals Adaptive Mechanisms. PLoS One 2016; 11:e0145848. [PMID: 26741367 PMCID: PMC4704809 DOI: 10.1371/journal.pone.0145848] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
Understanding how proteins adapt to function at high temperatures is important for deciphering the energetics that dictate protein stability and folding. While multiple principles important for thermostability have been identified, we lack a unified understanding of how internal protein structural and chemical environment determine qualitative or quantitative impact of evolutionary mutations. In this work we compare equivalent clusters of spatially neighboring residues between paired thermophilic and mesophilic homologues to evaluate adaptations under the selective pressure of high temperature. We find the residue clusters in thermophilic enzymes generally display improved atomic packing compared to mesophilic enzymes, in agreement with previous research. Unlike residue clusters from mesophilic enzymes, however, thermophilic residue clusters do not have significant cavities. In addition, anchor residues found in many clusters are highly conserved with respect to atomic packing between both thermophilic and mesophilic enzymes. Thus the improvements in atomic packing observed in thermophilic homologues are not derived from these anchor residues but from neighboring positions, which may serve to expand optimized protein core regions.
Collapse
Affiliation(s)
- Deanne W Sammond
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Noah Kastelowitz
- Department of Chemistry & Biochemistry and the BioFrontiers Institute, University of Colorado, Boulder, Colorado, 80309, United States of America
| | - Michael E Himmel
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Hang Yin
- Department of Chemistry & Biochemistry and the BioFrontiers Institute, University of Colorado, Boulder, Colorado, 80309, United States of America
| | - Michael F Crowley
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| | - Yannick J Bomble
- Biosciences Center, National Renewable Energy Laboratory, Golden, Colorado, 80401, United States of America
| |
Collapse
|
8
|
Abstract
The process of amino acid replacement in proteins is context-dependent, with substitution rates influenced by local structure, functional role, and amino acids at other locations. Predicting how these differences affect replacement processes is difficult. To make such inference easier, it is often assumed that the acceptabilities of different amino acids at a position are constant. However, evolutionary interactions among residue positions will tend to invalidate this assumption. Here, we use simulations of purple acid phosphatase evolution to show that amino acid propensities at a position undergo predictable change after an amino acid replacement at that position. After a replacement, the new amino acid and similar amino acids tend to become gradually more acceptable over time at that position. In other words, proteins tend to equilibrate to the presence of an amino acid at a position through replacements at other positions. Such a shift is reminiscent of the spectroscopy effect known as the Stokes shift, where molecules receiving a quantum of energy and moving to a higher electronic state will adjust to the new state and emit a smaller quantum of energy whenever they shift back down to the original ground state. Predictions of changes in stability in real proteins show that mutation reversals become less favorable over time, and thus, broadly support our results. The observation of an evolutionary Stokes shift has profound implications for the study of protein evolution and the modeling of evolutionary processes.
Collapse
|
9
|
The coevolution of phycobilisomes: molecular structure adapting to functional evolution. Comp Funct Genomics 2011; 2011:230236. [PMID: 21904470 PMCID: PMC3166575 DOI: 10.1155/2011/230236] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Revised: 05/22/2011] [Accepted: 06/19/2011] [Indexed: 12/30/2022] Open
Abstract
Phycobilisome is the major light-harvesting complex in cyanobacteria and red alga. It consists of phycobiliproteins and their associated linker peptides which play key role in absorption and unidirectional transfer of light energy and the stability of the whole complex system, respectively. Former researches on the evolution among PBPs and linker peptides had mainly focused on the phylogenetic analysis and selective evolution. Coevolution is the change that the conformation of one residue is interrupted by mutation and a compensatory change selected for in its interacting partner. Here, coevolutionary analysis of allophycocyanin, phycocyanin, and phycoerythrin and covariation analysis of linker peptides were performed. Coevolution analyses reveal that these sites are significantly correlated, showing strong evidence of the functional and structural importance of interactions among these residues. According to interprotein coevolution analysis, less interaction was found between PBPs and linker peptides. Our results also revealed the correlations between the coevolution and adaptive selection in PBS were not directly related, but probably demonstrated by the sites coupled under physical-chemical interactions.
Collapse
|
10
|
Lakner C, Holder MT, Goldman N, Naylor GJP. What's in a Likelihood? Simple Models of Protein Evolution and the Contribution of Structurally Viable Reconstructions to the Likelihood. Syst Biol 2011; 60:161-74. [DOI: 10.1093/sysbio/syq088] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Clemens Lakner
- Department of Biological Science, Section of Ecology and Evolution
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
| | - Mark T. Holder
- Department of Ecology and Evolution, University of Kansas, 6031 Haworth, 1200 Sunnyside Avenue, Lawrence, KS 66045
| | - Nick Goldman
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gavin J. P. Naylor
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
| |
Collapse
|
11
|
Exploiting models of molecular evolution to efficiently direct protein engineering. J Mol Evol 2010; 72:193-203. [PMID: 21132281 DOI: 10.1007/s00239-010-9415-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 11/19/2010] [Indexed: 10/18/2022]
Abstract
Directed evolution and protein engineering approaches used to generate novel or enhanced biomolecular function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries containing millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomolecular properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-standard nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field of 'evolutionary synthetic biology'.
Collapse
|
12
|
Lovell SC, Robertson DL. An integrated view of molecular coevolution in protein-protein interactions. Mol Biol Evol 2010; 27:2567-75. [PMID: 20551042 DOI: 10.1093/molbev/msq144] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Protein-protein interactions effectively mediate molecular function. They are the result of specific interactions between protein interfaces and are maintained by the action of evolutionary pressure on the regions of the interacting proteins that contribute to binding. For the most part, selection restricts amino acid replacements, accounting for the conservation of binding interfaces. However, in some cases, change in one protein will be mitigated by compensatory change in its binding partner, maintaining function in the face of evolutionary change. There have been several attempts to use correlations in sequence evolution to predict interactions of proteins. Most commonly, these approaches use the entire sequence to identify correlations and so infer probable binding. However, other factors such as shared evolutionary history and similarities in the rates of evolution confound these whole-sequence-based approaches. Here, we discuss recent work on this topic and argue that both site-specific coevolutionary change and whole-sequence evolution contribute to evolutionary signals in sets of interacting proteins. We discuss the relative effects of both types of selection and how they might be identified. This permits an integrated view of protein-protein interactions, their evolution, and coevolution.
Collapse
Affiliation(s)
- Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, United Kingdom.
| | | |
Collapse
|
13
|
Davis BH, Poon AFY, Whitlock MC. Compensatory mutations are repeatable and clustered within proteins. Proc Biol Sci 2009; 276:1823-7. [PMID: 19324785 PMCID: PMC2674493 DOI: 10.1098/rspb.2008.1846] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Compensatory mutations improve fitness in genotypes that contain deleterious mutations but have no beneficial effects otherwise. As such, compensatory mutations represent a very specific form of epistasis. We show that intragenic compensatory mutations occur non-randomly over gene sequence. Compensatory mutations are more likely to appear at some sites than others. Moreover, the sites of compensatory mutations are more likely than expected by chance to be near the site of the original deleterious mutation. Furthermore, compensatory mutations tend to occur more commonly in certain regions of the protein even when controlling for clustering around the site of the deleterious mutation. These results suggest that compensatory evolution at the protein level is partially predictable and may be convergent.
Collapse
Affiliation(s)
- Brad H Davis
- Department of Zoology, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4.
| | | | | |
Collapse
|
14
|
Williams SG, Lovell SC. The effect of sequence evolution on protein structural divergence. Mol Biol Evol 2009; 26:1055-65. [PMID: 19193735 DOI: 10.1093/molbev/msp020] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The complex constraints imposed by protein structure and function result in varied rates of sequence and structural divergence in proteins. Analysis of sequence differences between homologous proteins can advance our understanding of structural divergence and some of the constraints that govern the evolution of these molecules. Here, we assess the relationship between amino acid sequence and structural divergence. Firstly, we demonstrate that the relationship between protein sequence and structural divergence is governed by a variety of evolutionary constraints, including solvent exposure and secondary structure. Secondly, although compensatory substitutions are widespread, we find many radical size-changing mutations that are not compensated by neighboring complementary changes. Instead, these noncompensated substitutions are mitigated by alteration of protein structure. These results suggest a combined mechanism of accommodating substitutions in proteins, involving both coevolution and structural accommodation. Such a mechanism can explain previously observed correlated substitutions of residues that are distant both in sequence and structure, allowing an integrated view of sequence and structural divergence of proteins.
Collapse
Affiliation(s)
- Simon G Williams
- Faculty of Life Sciences, University of Manchester, Manchester, UK
| | | |
Collapse
|
15
|
Caporaso JG, Smit S, Easton BC, Hunter L, Huttley GA, Knight R. Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics. BMC Evol Biol 2008; 8:327. [PMID: 19055758 PMCID: PMC2637866 DOI: 10.1186/1471-2148-8-327] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Accepted: 12/03/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. RESULTS Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. CONCLUSION The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.
Collapse
Affiliation(s)
- J Gregory Caporaso
- Department of Chemistry and Biochemistry, University of Colorado at Boulder, Boulder, CO, USA.
| | | | | | | | | | | |
Collapse
|
16
|
Castoe TA, Jiang ZJ, Gu W, Wang ZO, Pollock DD. Adaptive evolution and functional redesign of core metabolic proteins in snakes. PLoS One 2008; 3:e2201. [PMID: 18493604 PMCID: PMC2376058 DOI: 10.1371/journal.pone.0002201] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2007] [Accepted: 04/01/2008] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Adaptive evolutionary episodes in core metabolic proteins are uncommon, and are even more rarely linked to major macroevolutionary shifts. METHODOLOGY/PRINCIPAL FINDINGS We conducted extensive molecular evolutionary analyses on snake mitochondrial proteins and discovered multiple lines of evidence suggesting that the proteins at the core of aerobic metabolism in snakes have undergone remarkably large episodic bursts of adaptive change. We show that snake mitochondrial proteins experienced unprecedented levels of positive selection, coevolution, convergence, and reversion at functionally critical residues. We examined Cytochrome C oxidase subunit I (COI) in detail, and show that it experienced extensive modification of normally conserved residues involved in proton transport and delivery of electrons and oxygen. Thus, adaptive changes likely altered the flow of protons and other aspects of function in CO, thereby influencing fundamental characteristics of aerobic metabolism. We refer to these processes as "evolutionary redesign" because of the magnitude of the episodic bursts and the degree to which they affected core functional residues. CONCLUSIONS/SIGNIFICANCE The evolutionary redesign of snake COI coincided with adaptive bursts in other mitochondrial proteins and substantial changes in mitochondrial genome structure. It also generally coincided with or preceded major shifts in ecological niche and the evolution of extensive physiological adaptations related to lung reduction, large prey consumption, and venom evolution. The parallel timing of these major evolutionary events suggests that evolutionary redesign of metabolic and mitochondrial function may be related to, or underlie, the extreme changes in physiological and metabolic efficiency, flexibility, and innovation observed in snake evolution.
Collapse
Affiliation(s)
- Todd A. Castoe
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Zhi J. Jiang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Wanjun Gu
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Zhengyuan O. Wang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - David D. Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- * E-mail:
| |
Collapse
|
17
|
Codoñer FM, O'Dea S, Fares MA. Reducing the false positive rate in the non-parametric analysis of molecular coevolution. BMC Evol Biol 2008; 8:106. [PMID: 18402697 PMCID: PMC2362121 DOI: 10.1186/1471-2148-8-106] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 04/10/2008] [Indexed: 11/14/2022] Open
Abstract
Background The strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature. In fact, amino acid sites within proteins coevolve due to their functional and/or structural relationships. Different methods have been developed that attempt to account for the evolutionary dependencies between amino acid sites. Researchers have invested a significant effort to increase the sensitivity of such methods. However, the difficulty in disentangling functional co-dependencies from historical covariation has fuelled the scepticism over their power to detect biologically meaningful results. In addition, the biological parameters connecting linear sequence evolution to structure evolution remain elusive. For these reasons, most of the evolutionary studies aimed at identifying functional dependencies among protein domains have focused on the structural properties of proteins rather than on the information extracted from linear multiple sequence alignments (MSA). Non-parametric methods to detect coevolution have been reported to be especially susceptible to produce false positive results based on the properties of MSAs. However, no formal statistical analysis has been performed to definitively test the differential effects of these properties on the sensitivity of such methods. Results Here we test the effect that variations on the MSA properties have over the sensitivity of non-parametric methods to detect coevolution. We test the effect that the size of the MSA (number of sequences), mean pairwise amino acid distance per site and the strength of the coevolution signal have on the ability of non-parametric methods to detect coevolution. Our results indicate that all three factors have significant effects on the accuracy of non-parametric methods. Further, introducing statistical filters improves the sensitivity and increases the statistical power of the methods to detect functional coevolution. Statistical analysis of the physico-chemical properties of amino acid sites in the context of the protein structure reveals striking dependencies among amino acid sites. Results indicate a covariation trend in the hydrophobicities and molecular weight characteristics of amino acid sites when analysing a non-redundant set of 8000 protein structures. Using this biological information as filter in coevolutionary analyses minimises the false positive rate of these methods. Application of these filters to three different proteins with known functional domains supports the importance of using biological filters to detect coevolution. Conclusion Coevolutionary analyses using non-parametric methods have proved difficult and highly prone to provide spurious results depending on the properties of MSAs and on the strength of coevolution between amino acid sites. The application of statistical filters to the number of pairs detected as coevolving reduces significantly the number of artifactual results. Analysis of the physico-chemical properties of amino acid sites in the protein structure context reveals their structure-dependent covariation. The application of this known biological information to the analysis of covariation greatly enhances the functional coevolutionary signal and removes historical covariation. Simultaneous use of statistical and biological data is instrumental in the detection of functional amino acid sites dependencies and compensatory changes at the protein level.
Collapse
Affiliation(s)
- Francisco M Codoñer
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland.
| | | | | |
Collapse
|
18
|
Wang ZO, Pollock DD. Coevolutionary Patterns in Cytochrome c Oxidase Subunit I Depend on Structural and Functional Context. J Mol Evol 2007; 65:485-95. [DOI: 10.1007/s00239-007-9018-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
Yeang CH, Haussler D. Detecting coevolution in and among protein domains. PLoS Comput Biol 2007; 3:e211. [PMID: 17983264 PMCID: PMC2098842 DOI: 10.1371/journal.pcbi.0030211] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2007] [Accepted: 09/17/2007] [Indexed: 01/17/2023] Open
Abstract
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. The sequences of different components within and across genes often undergo coordinated changes in order to maintain the structures or functions of the genes. Identifying the coordinated changes—the “coevolution”—of those components in the context of evolution is important in predicting the structures, interactions, and functions of genes. The authors incur a large-scale screening on all the known protein sequences and build a compendium about the coevolving relations of all protein domains—subunits of proteins. The majority of the coevolving protein domains either belongs to the same proteins, appears in the same protein complexes, or shares the same functional annotations. Furthermore, coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes. More strikingly, many coevolving positions are located at functionally important sites of the molecules. The results provide useful insights about the relations between sequence evolution and protein structures and functions.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, New Jersey, United States of America.
| | | |
Collapse
|
20
|
Nahum LA, Reynolds MT, Wang ZO, Faith JJ, Jonna R, Jiang ZJ, Meyer TJ, Pollock DD. EGenBio: a data management system for evolutionary genomics and biodiversity. BMC Bioinformatics 2006; 7 Suppl 2:S7. [PMID: 17118150 PMCID: PMC1683573 DOI: 10.1186/1471-2105-7-s2-s7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity.
Collapse
Affiliation(s)
- Laila A Nahum
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
- Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA
| | - Matthew T Reynolds
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
| | - Zhengyuan O Wang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
| | - Jeremiah J Faith
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Rahul Jonna
- Division of Developmental Disabilities, Arizona State Department, Phoenix, AZ 85012 USA
| | - Zhi J Jiang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
| | - Thomas J Meyer
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
| | - David D Pollock
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803 USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
21
|
Xu YO, Hall RW, Goldstein RA, Pollock DD. Divergence, recombination and retention of functionality during protein evolution. Hum Genomics 2006; 2:158-67. [PMID: 16197733 PMCID: PMC2943960 DOI: 10.1186/1479-7364-2-3-158] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as 'ab initio evolution' to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems.
Collapse
Affiliation(s)
- Yanlong O Xu
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Randall W Hall
- Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute for Medical Research, Mill Hill, London NW7 1AA, UK
| | - David D Pollock
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
22
|
Stern A, Pupko T. An evolutionary space-time model with varying among-site dependencies. Mol Biol Evol 2005; 23:392-400. [PMID: 16267143 DOI: 10.1093/molbev/msj044] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
It is now widely accepted that sites in a protein do not undergo independent evolutionary processes. The underlying assumption is that proteins are composed of conserved and variable linear domains, and thus rates at neighboring sites are correlated. In this paper, we comprehensively examine the performance of an autocorrelation model of evolutionary rates in protein sequences. We further develop a model in which the level of correlation between rates at adjacent sites is not equal at all sites of the protein. High correlation is expected, for example, in linear functional domains. On the other hand, when we consider nonlinear functional regions (e.g., active sites), low correlation is expected because the interaction between distant sites imposes independence of rates in the linear sequence. Our model is based on a hidden Markov model, which accounts for autocorrelation at certain regions of the protein and rate independence at others. We study the differences between the novel model and models which assume either independence or a fixed level of dependence throughout the protein. Using a diverse set of protein data sets we show that the novel model better fits most data sets. We further analyze the potassium-channel protein family and illustrate the relationship between the dependence of rates at adjacent sites and the tertiary structure of the protein.
Collapse
Affiliation(s)
- Adi Stern
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | | |
Collapse
|
23
|
Buck MJ, Atchley WR. Networks of coevolving sites in structural and functional domains of serpin proteins. Mol Biol Evol 2005; 22:1627-34. [PMID: 15858204 DOI: 10.1093/molbev/msi157] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Amino acids do not occur randomly in proteins; rather, their occurrence at any given site is strongly influenced by the amino acid composition at other sites, the structural and functional aspects of the region of the protein in which they occur, and the evolutionary history of the protein. The goal of our research study is to identify networks of coevolving sites within the serpin proteins (serine protease inhibitors) and classify them as being caused by structural-functional constraints or by evolutionary history. To address this, a matrix of pairwise normalized mutual information (NMI) values was computed among amino acid sites for the serpin proteins. The NMI matrix was partitioned into orthogonal patterns of amino acid variability by factor analysis. Each common factor pattern was interpreted as having phylogenetic and/or structural-functional explanations. In addition, we used a bootstrap factor analysis technique to limit the effects of phylogenetic history on our factor patterns. Our results show an extensive network of correlations among amino acid sites in key functional regions (reactive center loop, shutter, and breach). Additionally, we have discovered long-range coevolution for packed amino acids within the serpin protein core. Lastly, we have discovered a group of serpin sites which coevolve in the hydrophobic core region (s5B and s4B) and appear to represent sites important for formation of the "native" instead of the "latent" serpin structure. This research provides a better understanding on how protein structure evolves; in particular, it elucidates the selective forces creating coevolution among protein sites.
Collapse
Affiliation(s)
- Michael J Buck
- Department of Genetics and The Center for Computational Biology, North Carolina State University, USA.
| | | |
Collapse
|