1
|
Shishir TA, Saha O, Rajia S, Mondol SM, Masum MHU, Rahaman MM, Hossen F, Bahadur NM, Ahmed F, Naser IB, Amin MR. Genome-wide study of globally distributed respiratory syncytial virus (RSV) strains implicates diversification utilizing phylodynamics and mutational analysis. Sci Rep 2023; 13:13531. [PMID: 37598270 PMCID: PMC10439963 DOI: 10.1038/s41598-023-40760-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/16/2023] [Indexed: 08/21/2023] Open
Abstract
Respiratory syncytial virus (RSV) is a common respiratory pathogen that causes mild cold-like symptoms and severe lower respiratory tract infections, causing hospitalizations in children, the elderly and immunocompromised individuals. Due to genetic variability, this virus causes life-threatening pneumonia and bronchiolitis in young infants. Thus, we examined 3600 whole genome sequences submitted to GISAID by 31 December 2022 to examine the genetic variability of RSV. While RSVA and RSVB coexist throughout RSV seasons, RSVA is more prevalent, fatal, and epidemic-prone in several countries, including the United States, the United Kingdom, Australia, and China. Additionally, the virus's attachment glycoprotein and fusion protein were highly mutated, with RSVA having higher Shannon entropy than RSVB. The genetic makeup of these viruses contributes significantly to their prevalence and epidemic potential. Several strain-specific SNPs co-occurred with specific haplotypes of RSVA and RSVB, followed by different haplotypes of the viruses. RSVA and RSVB have the highest linkage probability at loci T12844A/T3483C and G13959T/C2198T, respectively. The results indicate that specific haplotypes and SNPs may significantly affect their spread. Overall, this analysis presents a promising strategy for tracking the evolving epidemic situation and genetic variants of RSV, which could aid in developing effective control, prophylactic, and treatment strategies.
Collapse
Affiliation(s)
- Tushar Ahmed Shishir
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
| | - Otun Saha
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh.
| | - Sultana Rajia
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | | - Md Habib Ullah Masum
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | | - Foysal Hossen
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | | - Firoz Ahmed
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Iftekhar Bin Naser
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
| | - Mohammad Ruhul Amin
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh.
| |
Collapse
|
2
|
Stability and expression of SARS-CoV-2 spike-protein mutations. Mol Cell Biochem 2022; 478:1269-1280. [PMID: 36302994 PMCID: PMC9612610 DOI: 10.1007/s11010-022-04588-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/12/2022] [Indexed: 12/02/2022]
Abstract
Protein fold stability likely plays a role in SARS-CoV-2 S-protein evolution, together with ACE2 binding and antibody evasion. While few thermodynamic stability data are available for S-protein mutants, many systematic experimental data exist for their expression. In this paper, we explore whether such expression levels relate to the thermodynamic stability of the mutants. We studied mutation-induced SARS-CoV-2 S-protein fold stability, as computed by three very distinct methods and eight different protein structures to account for method- and structure-dependencies. For all methods and structures used (24 comparisons), computed stability changes correlate significantly (99% confidence level) with experimental yeast expression from the literature, such that higher expression is associated with relatively higher fold stability. Also significant, albeit weaker, correlations were seen between stability and ACE2 binding effects. The effect of thermodynamic fold stability may be direct or a correlate of amino acid or site properties, notably the solvent exposure of the site. Correlation between computed stability and experimental expression and ACE2 binding suggests that functional properties of the SARS-CoV-2 S-protein mutant space are largely determined by a few simple features, due to underlying correlations. Our study lends promise to the development of computational tools that may ideally aid in understanding and predicting SARS-CoV-2 S-protein evolution.
Collapse
|
3
|
Baek KT, Kepp KP. Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models. J Comput Chem 2022; 43:504-518. [PMID: 35040492 DOI: 10.1002/jcc.26810] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 12/13/2021] [Accepted: 01/03/2022] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias BM . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the Ssym data set) while still performing well for all data sets (R ~ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.
Collapse
Affiliation(s)
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
4
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
5
|
Vargas LDB, Beltrame MH, Ho B, Marin WM, Dandekar R, Montero-Martín G, Fernández-Viña MA, Hurtado AM, Hill KR, Tsuneto LT, Hutz MH, Salzano FM, Petzl-Erler ML, Hollenbach JA, Augusto DG. Remarkably low KIR and HLA diversity in Amerindians reveals signatures of strong purifying selection shaping the centromeric KIR region. Mol Biol Evol 2021; 39:6388041. [PMID: 34633459 PMCID: PMC8763117 DOI: 10.1093/molbev/msab298] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The killer-cell immunoglobulin-like receptors (KIR) recognize human leukocyte antigen (HLA) molecules to regulate the cytotoxic and inflammatory responses of natural killer cells. KIR genes are encoded by a rapidly evolving gene family on chromosome 19 and present an unusual variation of presence and absence of genes and high allelic diversity. Although many studies have associated KIR polymorphism with susceptibility to several diseases over the last decades, the high-resolution allele-level haplotypes have only recently started to be described in populations. Here, we use a highly innovative custom next-generation sequencing method that provides a state-of-art characterization of KIR and HLA diversity in 706 individuals from eight unique South American populations: five Amerindian populations from Brazil (three Guarani and two Kaingang); one Amerindian population from Paraguay (Aché); and two urban populations from Southern Brazil (European and Japanese descendants from Curitiba). For the first time, we describe complete high-resolution KIR haplotypes in South American populations, exploring copy number, linkage disequilibrium, and KIR-HLA interactions. We show that all Amerindians analyzed to date exhibit the lowest numbers of KIR-HLA interactions among all described worldwide populations, and that 83-97% of their KIR-HLA interactions rely on a few HLA-C molecules. Using multiple approaches, we found signatures of strong purifying selection on the KIR centromeric region, which codes for the strongest NK cell educator receptors, possibly driven by the limited HLA diversity in these populations. Our study expands the current knowledge of KIR genetic diversity in populations to understand KIR-HLA coevolution and its impact on human health and survival.
Collapse
Affiliation(s)
- Luciana de Brito Vargas
- Programa de Pós-Graduação em Genética, Departamento de Genética, Universidade Federal do Paraná, Curitiba, PR, 81531-980, Brazil
| | - Marcia H Beltrame
- Programa de Pós-Graduação em Genética, Departamento de Genética, Universidade Federal do Paraná, Curitiba, PR, 81531-980, Brazil
| | - Brenda Ho
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Wesley M Marin
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Ravi Dandekar
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Gonzalo Montero-Martín
- Department of Pathology, Stanford University School of Medicine, Palo Alto, CA, 94304, USA
| | | | - A Magdalena Hurtado
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, 85287, USA
| | - Kim R Hill
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, 85287, USA
| | - Luiza T Tsuneto
- Departamento de Análises Clínicas, Universidade Estadual de Maringá, Maringá, PR, 87020-900, Brazil
| | - Mara H Hutz
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, 91501-970, Brazil
| | - Francisco M Salzano
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, 91501-970, Brazil
| | - Maria Luiza Petzl-Erler
- Programa de Pós-Graduação em Genética, Departamento de Genética, Universidade Federal do Paraná, Curitiba, PR, 81531-980, Brazil
| | - Jill A Hollenbach
- Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, CA, 94158, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, 94158, USA
| | - Danillo G Augusto
- Programa de Pós-Graduação em Genética, Departamento de Genética, Universidade Federal do Paraná, Curitiba, PR, 81531-980, Brazil.,Weill Institute for Neurosciences, Department of Neurology, University of California, San Francisco, San Francisco, CA, 94158, USA
| |
Collapse
|
6
|
Wangchuk J, Chatterjee A, Patil S, Madugula SK, Kondabagil K. The coevolution of large and small terminases of bacteriophages is a result of purifying selection leading to phenotypic stabilization. Virology 2021; 564:13-25. [PMID: 34598064 DOI: 10.1016/j.virol.2021.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 09/14/2021] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
Genome packaging in many dsDNA phages requires a series of precisely coordinated actions of two phage-coded proteins, namely, large terminase (TerL) and small terminase (TerS) with DNA and ATP, and with each other. Despite the strict functional conservation, TerL and TerS homologs exhibit large sequence variations. We investigated the sequence variability across eight phage types and observed a coevolutionary framework wherein the genealogy of TerL homologs mirrored that of the corresponding TerS homologs. Furthermore, a high purifying selection observed (dN/dS«1) indicated strong structural constraints on both TerL and TerS, and identify coevolving residues in TerL and TerS of phage T4 and lambda. Using the highly coevolving (correlation coefficient of 0.99) TerL and TerS of phage N4, we show that their biochemical features are similar to the phylogenetically divergent phage λ terminases. We also demonstrate using the Surface Plasma Resonance (SPR) technique that phage N4 TerL transiently interacts with TerS.
Collapse
Affiliation(s)
- Jigme Wangchuk
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Anirvan Chatterjee
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Supriya Patil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Santhosh Kumar Madugula
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Kiran Kondabagil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India.
| |
Collapse
|
7
|
Latrille T, Lartillot N. Quantifying the impact of changes in effective population size and expression level on the rate of coding sequence evolution. Theor Popul Biol 2021; 142:57-66. [PMID: 34563555 DOI: 10.1016/j.tpb.2021.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 02/07/2023]
Abstract
Molecular sequences are shaped by selection, where the strength of selection relative to drift is determined by effective population size (Ne). Populations with high Ne are expected to undergo stronger purifying selection, and consequently to show a lower substitution rate for selected mutations relative to the substitution rate for neutral mutations (ω). However, computational models based on biophysics of protein stability have suggested that ω can also be independent of Ne. Together, the response of ω to changes in Ne depends on the specific mapping from sequence to fitness. Importantly, an increase in protein expression level has been found empirically to result in decrease of ω, an observation predicted by theoretical models assuming selection for protein stability. Here, we derive a theoretical approximation for the response of ω to changes in Ne and expression level, under an explicit genotype-phenotype-fitness map. The method is generally valid for additive traits and log-concave fitness functions. We applied these results to protein undergoing selection for their conformational stability and corroborate out findings with simulations under more complex models. We predict a weak response of ω to changes in either Ne or expression level, which are interchangeable. Based on empirical data, we propose that fitness based on the conformational stability may not be a sufficient mechanism to explain the empirically observed variation in ω across species. Other aspects of protein biophysics might be explored, such as protein-protein interactions, which can lead to a stronger response of ω to changes in Ne.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France; École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France.
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
8
|
Caldararu O, Blundell TL, Kepp KP. Three Simple Properties Explain Protein Stability Change upon Mutation. J Chem Inf Model 2021; 61:1981-1988. [PMID: 33848149 DOI: 10.1021/acs.jcim.1c00201] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurate prediction of protein stability upon mutation enables rational engineering of new proteins and insights into protein evolution and monogenetic diseases caused by single-point amino acid substitutions. Many tools have been developed to this aim, ranging from energy-based models to machine-learning methods that use large amounts of experimental data. However, as the methods become more complex, the interpretation of the chemistry underlying the protein stability effects becomes obscure. It is thus of interest to identify the simplest prediction model that retains complete amino acid specific interpretation; for a given number of input descriptors, we expect such a model to be almost universal. In this study, we identify such a limiting model, SimBa, a simple multilinear regression model trained on a substitution-type-balanced experimental data set. The model accounts only for the solvent accessibility of the site, volume difference, and polarity difference caused by mutation. Our results show that this very simple and directly applicable model performs comparably to other much more complex, widely used protein stability prediction methods. This suggests that a hard limit of ∼1 kcal/mol numerical accuracy and an R ∼ 0.5 trend accuracy exists and that new features, such as account of unfolded states, water colocalization, and amino acid correlations, are required to improve accuracy to, e.g., 1/2 kcal/mol.
Collapse
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, United Kingdom
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
9
|
In Silico Molecular Docking Analysis of α-Pinene: An Antioxidant and Anticancer Drug Obtained from Myrtus communis. INTERNATIONAL JOURNAL OF CANCER MANAGEMENT 2021. [DOI: 10.5812/ijcm.89116] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background: Testis-specific protein on Y chromosome (TSPY) is the output of a tandem gene cluster. TSPY expression has been observed in gonadoblastoma and numerous distinct kinds of germ cell tumors, such as carcinoma in situ/intratubular germ cell neoplasia, seminoma, and extragonadal intracranial germ cell tumors (GCT). Myrtus communis extract rich in α-pinene showed high antioxidant and anticancer activity against a TSPY. Methods: The molecular weight and theoretical isoelectric of the TSPY proteins were calculated, using the ExPASSY ProtParam tools. Some software like mega 6, BioEdit, NEB cutter (New England Biolabs), and CAP3 were used to analyze clustering and find restriction enzymes on the TSPY sequence. To evaluate the nucleotide diversity of all sequences, the number of diverse situations and Tajima’s and Watterson’s estimators of theta were assessed. Nucleotide polymorphism can be measured by several parameters, such as haplotypes diversity, nucleotide diversity, theta using Dnasp software. To find interaction networks of protein-protein search tool for the retrieval of interacting genes/proteins (STRING) tools and to predict 3D structure, SWISS-MODEL was used; however, for docking protein-peptide based on interaction, Swiss Dock, Galaxy web, and CABS-dock software were employed. Results: We report a high (0.91) dN/dS index, positive Tajima's D, Fu, and Li’s tests, and a non-significant D test suggesting the occurrence of old modifications or a decrease of newborn mutations in the TSPY gene family. Interestingly, several hub proteins produced a strong chain or an operative module within their protein groups, such as nucleosome assembly protein (1NAP1L), RBMXL2, TBL1Y, and AMELY, which are all associated with the same cellular appliance elements and/or genetic uses. The docking of the TSPY target with α-pinene using docking revealed that the computationally-prognosticated lowest energy networks of TSPY are established by intermolecular hydrogen bonds and stacking interactions. Conclusions: The results of this study demonstrated that α-pinene interacts with the TSPY protein target and could be developed as a promising candidate for the new anticancer agent.
Collapse
|
10
|
Caldararu O, Blundell TL, Kepp KP. A base measure of precision for protein stability predictors: structural sensitivity. BMC Bioinformatics 2021; 22:88. [PMID: 33632133 PMCID: PMC7908712 DOI: 10.1186/s12859-021-04030-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/15/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Prediction of the change in fold stability (ΔΔG) of a protein upon mutation is of major importance to protein engineering and screening of disease-causing variants. Many prediction methods can use 3D structural information to predict ΔΔG. While the performance of these methods has been extensively studied, a new problem has arisen due to the abundance of crystal structures: How precise are these methods in terms of structure input used, which structure should be used, and how much does it matter? Thus, there is a need to quantify the structural sensitivity of protein stability prediction methods. RESULTS We computed the structural sensitivity of six widely-used prediction methods by use of saturated computational mutagenesis on a diverse set of 87 structures of 25 proteins. Our results show that structural sensitivity varies massively and surprisingly falls into two very distinct groups, with methods that take detailed account of the local environment showing a sensitivity of ~ 0.6 to 0.8 kcal/mol, whereas machine-learning methods display much lower sensitivity (~ 0.1 kcal/mol). We also observe that the precision correlates with the accuracy for mutation-type-balanced data sets but not generally reported accuracy of the methods, indicating the importance of mutation-type balance in both contexts. CONCLUSIONS The structural sensitivity of stability prediction methods varies greatly and is caused mainly by the models and less by the actual protein structural differences. As a new recommended standard, we therefore suggest that ΔΔG values are evaluated on three protein structures when available and the associated standard deviation reported, to emphasize not just the accuracy but also the precision of the method in a specific study. Our observation that machine-learning methods deemphasize structure may indicate that folded wild-type structures alone, without the folded mutant and unfolded structures, only add modest value for assessing protein stability effects, and that side-chain-sensitive methods overstate the significance of the folded wild-type structure.
Collapse
Affiliation(s)
- Octav Caldararu
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kgs. Lyngby, Denmark
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, 2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
11
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
12
|
Liang Q, Shu F, Dong X, Feng P. The evolution of a bitter taste receptor gene in primates. Chem Senses 2021; 46:6449468. [PMID: 34864939 DOI: 10.1093/chemse/bjab049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bitter taste perception is critical to prevent animals from ingesting potentially harmful substances. The aim of this study was to characterize the evolution of T2R4 and test the hypothesis that different regions of the T2R gene are subject to disparate selective pressures, with extracellular regions (ECs) being erratic while transmembrane (TMs) and intracellular regions (ICs) being constrained. Thus, we examined the selective pressures acting on T2R4 and its different regions in 37 primates, and discovered that T2R4 and ECs were subject to neutral evolution and purifying selection, respectively, whereas both TMs and ICs showed purifying selection, as suggested by the hypothesis. We attribute this result to the relatively conservative property of T2R4 gene and the limited number of bitter tastants that T2R4 can respond to. Furthermore, we found that positive selection had acted on the first loop of extracellular regions (EL1). In contrast, the second loop (EL2) and transmembrane region-3, -6, -7 (TM367) were subject to purifying selection, and the third loop (EL3) was subject to neutral evolution. This discovery is probably because EL2, EL3, and TMs play a crucial role in the ligand-binding process, and EL1 is involved in the tastant recognition process. We further tested whether the ω of T2R4 differs among species with different diets and found that a specialized diet affected the evolution of T2R4. Feeding habits, fewer T2Rs, and a dietary shift may account for the results. This study can help to uncover the evolution of T2Rs during the primate evolutionary course.
Collapse
Affiliation(s)
- Qiufang Liang
- Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, Guilin, Guangxi, China.,Guangxi Key Laboratory of Rare and Endangered Animal Ecology, Guangxi Normal University, Guilin, Guangxi, China
| | - Fanglan Shu
- Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, Guilin, Guangxi, China.,Guangxi Key Laboratory of Rare and Endangered Animal Ecology, Guangxi Normal University, Guilin, Guangxi, China
| | - Xiaoyan Dong
- Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, Guilin, Guangxi, China.,Guangxi Key Laboratory of Rare and Endangered Animal Ecology, Guangxi Normal University, Guilin, Guangxi, China
| | - Ping Feng
- Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Guangxi Normal University), Ministry of Education, Guilin, Guangxi, China.,Guangxi Key Laboratory of Rare and Endangered Animal Ecology, Guangxi Normal University, Guilin, Guangxi, China
| |
Collapse
|
13
|
Abstract
Darwin's theory of evolution emphasized that positive selection of functional proficiency provides the fitness that ultimately determines the structure of life, a view that has dominated biochemical thinking of enzymes as perfectly optimized for their specific functions. The 20th-century modern synthesis, structural biology, and the central dogma explained the machinery of evolution, and nearly neutral theory explained how selection competes with random fixation dynamics that produce molecular clocks essential e.g. for dating evolutionary histories. However, quantitative proteomics revealed that selection pressures not relating to optimal function play much larger roles than previously thought, acting perhaps most importantly via protein expression levels. This paper first summarizes recent progress in the 21st century toward recovering this universal selection pressure. Then, the paper argues that proteome cost minimization is the dominant, underlying 'non-function' selection pressure controlling most of the evolution of already functionally adapted living systems. A theory of proteome cost minimization is described and argued to have consequences for understanding evolutionary trade-offs, aging, cancer, and neurodegenerative protein-misfolding diseases.
Collapse
|
14
|
Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13341] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology University of Vigo Vigo Spain
- Biomedical Research Center (CINBIO) University of Vigo Vigo Spain
| | - Ugo Bastolla
- Bioinformatics Unit Centre for Molecular Biology Severo Ochoa (CSIC) Madrid Spain
| |
Collapse
|
15
|
Wei K, Ma L, Zhang T. Characterization of gene promoters in pig: conservative elements, regulatory motifs and evolutionary trend. PeerJ 2019; 7:e7204. [PMID: 31275764 PMCID: PMC6598670 DOI: 10.7717/peerj.7204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 05/29/2019] [Indexed: 02/04/2023] Open
Abstract
It is vital to understand the conservation and evolution of gene promoter sequences in order to understand environmental adaptation. The level of promoter conservation varies greatly between housekeeping (HK) and tissue-specific (TS) genes, denoting differences in the strength of the evolutionary constraints. Here, we analyzed promoter conservation and evolution to exploit differential regulation between HK and TS genes. The analysis of conserved elements showed CpG islands, short tandem repeats and G-quadruplex sequences are highly enriched in HK promoters relative to TS promoters. In addition, the type and density of regulatory motifs in TS promoters are much higher than HK promoters, indicating that TS genes show more complex regulatory patterns than HK genes. Moreover, the evolutionary dynamics of promoters showed similar evolutionary trend to coding sequences. HK promoters suffer more stringent selective pressure in the long-term evolutionary process. HK genes tend to show increased upstream sequence conservation due to stringent selection pressures acting on the promoter regions. The specificity of TS gene expression may be due to complex regulatory motifs acting in different tissues or conditions. The results from this study can be used to deepen our understanding of adaptive evolution.
Collapse
Affiliation(s)
- Kai Wei
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China.,Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Byern, Germany
| | - Lei Ma
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| | - Tingting Zhang
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| |
Collapse
|
16
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
17
|
Dasmeh P, Serohijos AWR. Estimating the contribution of folding stability to nonspecific epistasis in protein evolution. Proteins 2018; 86:1242-1250. [DOI: 10.1002/prot.25588] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 06/28/2018] [Accepted: 07/18/2018] [Indexed: 12/28/2022]
Affiliation(s)
- Pouria Dasmeh
- Department of BiochemistryUniversity of Montreal Montreal Quebec Canada
- Cedergren Center for Bioinformatics and GenomicsUniversity of Montreal Montreal, Quebec Canada
- Department of Biochemistry and Institute for Data Valorization (IVADO)University of Montreal Montreal, Quebec Canada
| | - Adrian W. R. Serohijos
- Department of BiochemistryUniversity of Montreal Montreal Quebec Canada
- Cedergren Center for Bioinformatics and GenomicsUniversity of Montreal Montreal, Quebec Canada
| |
Collapse
|
18
|
Ansari S, Solouki M, Fakheri B, Fazeli-Nasab B, Mahdinezhad N. Assesment of molecular diversity of internal transcribed spacer region in some lines and landrace of Persian clover (Trifolium resupinatum L.). POTRAVINARSTVO 2018. [DOI: 10.5219/960] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Clover which is an herbaceous, annual, and self-pollinated plant belongs to fabaceae family (legumes) and has become naturalized in Iran, Asia Minor and the Mediterranean eastern suburban countries. The aim of the present study is ITS molecular evaluation of the nuclear ribosomal genes of lines and landraces of Persian Clover. The sequences were aligned using ClustalW method and by MegAlign software and the dendrogram of different phylogenetic and matrix relationships between the sequences were drawn. The results showed little genetic diversity between the lines and the landrace. The conserved sequence of the analyzed gene in the Persian clover is 561 base. Totally, 740 loci (69 and 671 loci, respectively, with and without removal and addition), 9 Singletons, and 5 haplotypes were identified. The highest rate of transfer was observed in pyrimidine (%16.3). The numerical value of the ratio (dN/dS) was 0.86, and since it was less than 1, the pure selection on the studied gene happened. The lines and landraces were not separated based on their geographic locations. In general, the results indicated that the highest rate of the regional diversity belonged to the clover plants in Lorestan region. Moreover, ITS markers did not seem suitable enough for evaluating the intra- species genetic variation, but it was quite well- suited for inter-species or intergeneric evaluation.
Collapse
|
19
|
Abstract
Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
Collapse
|
20
|
Wei K, Zhang T, Ma L. Divergent and convergent evolution of housekeeping genes in human-pig lineage. PeerJ 2018; 6:e4840. [PMID: 29844985 PMCID: PMC5971102 DOI: 10.7717/peerj.4840] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 05/03/2018] [Indexed: 11/27/2022] Open
Abstract
Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.
Collapse
Affiliation(s)
- Kai Wei
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| | - Tingting Zhang
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| | - Lei Ma
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| |
Collapse
|
21
|
Adaptive evolution of osmoregulatory-related genes provides insight into salinity adaptation in Chinese mitten crab, Eriocheir sinensis. Genetica 2018; 146:303-311. [DOI: 10.1007/s10709-018-0021-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 04/30/2018] [Indexed: 12/18/2022]
|
22
|
Platt A, Weber CC, Liberles DA. Protein evolution depends on multiple distinct population size parameters. BMC Evol Biol 2018; 18:17. [PMID: 29422024 PMCID: PMC5806465 DOI: 10.1186/s12862-017-1085-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
That population size affects the fate of new mutations arising in genomes, modulating both how frequently they arise and how efficiently natural selection is able to filter them, is well established. It is therefore clear that these distinct roles for population size that characterize different processes should affect the evolution of proteins and need to be carefully defined. Empirical evidence is consistent with a role for demography in influencing protein evolution, supporting the idea that functional constraints alone do not determine the composition of coding sequences. Given that the relationship between population size, mutant fitness and fixation probability has been well characterized, estimating fitness from observed substitutions is well within reach with well-formulated models. Molecular evolution research has, therefore, increasingly begun to leverage concepts from population genetics to quantify the selective effects associated with different classes of mutation. However, in order for this type of analysis to provide meaningful information about the intra- and inter-specific evolution of coding sequences, a clear definition of concepts of population size, what they influence, and how they are best parameterized is essential. Here, we present an overview of the many distinct concepts that “population size” and “effective population size” may refer to, what they represent for studying proteins, and how this knowledge can be harnessed to produce better specified models of protein evolution.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA.
| |
Collapse
|
23
|
Divergence of protein sensing (TLR 4, 5) and nucleic acid sensing (TLR 3, 7) within the reptilian lineage. Mol Phylogenet Evol 2017; 119:210-224. [PMID: 29196206 DOI: 10.1016/j.ympev.2017.11.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 10/22/2017] [Accepted: 11/27/2017] [Indexed: 11/21/2022]
|
24
|
Dasmeh P, Girard É, Serohijos AWR. Highly expressed genes evolve under strong epistasis from a proteome-wide scan in E. coli. Sci Rep 2017; 7:15844. [PMID: 29158562 PMCID: PMC5696520 DOI: 10.1038/s41598-017-16030-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 11/06/2017] [Indexed: 11/11/2022] Open
Abstract
Epistasis or the non-additivity of mutational effects is a major force in protein evolution, but it has not been systematically quantified at the level of a proteome. Here, we estimated the extent of epistasis for 2,382 genes in E. coli using several hundreds of orthologs for each gene within the class Gammaproteobacteria. We found that the average epistasis is ~41% across genes in the proteome and that epistasis is stronger among highly expressed genes. This trend is quantitatively explained by the prevailing model of sequence evolution based on minimizing the fitness cost of protein unfolding and aggregation. The genes with the highest epistasis are also functionally involved in the maintenance of proteostasis, translation and central metabolism. In contrast, genes evolving with low epistasis mainly encode for membrane proteins and are involved in transport activity. Our results highlight the coupling between selection and epistasis in the long-term evolution of a proteome.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Éric Girard
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Adrian W R Serohijos
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
| |
Collapse
|
25
|
Fan W, Xu Y, Zhang P, Chen P, Zhu Y, Cheng Z, Zhao X, Liu Y, Liu J. Analysis of molecular evolution of nucleocapsid protein in Newcastle disease virus. Oncotarget 2017; 8:97127-97136. [PMID: 29228598 PMCID: PMC5722550 DOI: 10.18632/oncotarget.21373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 08/30/2017] [Indexed: 11/25/2022] Open
Abstract
The present study investigated the molecular evolution of nucleocapsid protein (NP) in different Newcastle disease virus (NDV) genotypes. The evolutionary timescale and rate were estimated using the Bayesian Markov chain Monte Carlo (MCMC) method. The p-distance, Bayesian skyline plot (BSP), and positively selected sites were also analyzed. The MCMC tree indicated that NDV diverged about 250 years ago with a rapid evolution rate (1.059 × 10-2 substitutions/site/year) and that different NDV genotypes formed three lineages. The p-distance results reflected the great genetic diversity of NDV. BSP analysis suggested that the effective population size of NDV has been increasing since 2000 and that the basic reproductive number (R0) of NDV ranged from 1.003 to 1.006. The abundance of negatively selected sites in the NP and the mean dN/dS value of 0.07 indicated that the NP of NDV may have undergone purifying selection. However, the predicted positively selected site at position 370 was located in the known effective epitopic region of the NP. In conclusion, although NDV evolved at a high rate and showed great genetic diversity, the structure and function of the NP had been well conserved. However, R0>1 suggests that NDV might have been causing an epidemic since the time of radiation.
Collapse
Affiliation(s)
- Wentao Fan
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China.,Shandong Provincial Engineering Technology Research Center of Animal Disease Control and Prevention, Shandong Agricultural University, Tai'an 271018, China
| | - Yuliang Xu
- Research Center for Animal Disease Control Engineering Shandong Province, Shandong Agricultural University, Tai'an 271018, PR China
| | - Pu Zhang
- Central Hospital of Tai'an City, Tai'an 271018, China
| | - Peng Chen
- Research Center for Animal Disease Control Engineering Shandong Province, Shandong Agricultural University, Tai'an 271018, PR China
| | - Yiran Zhu
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China
| | - Ziqiang Cheng
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China
| | - Xiaona Zhao
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China
| | - Yongxia Liu
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China
| | - Jianzhu Liu
- College of Animal Medicine and Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, PR China.,Research Center for Animal Disease Control Engineering Shandong Province, Shandong Agricultural University, Tai'an 271018, PR China.,Shandong Provincial Engineering Technology Research Center of Animal Disease Control and Prevention, Shandong Agricultural University, Tai'an 271018, China
| |
Collapse
|
26
|
The Adaptive Evolution Database (TAED): A New Release of a Database of Phylogenetically Indexed Gene Families from Chordates. J Mol Evol 2017; 85:46-56. [PMID: 28795237 DOI: 10.1007/s00239-017-9806-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 08/03/2017] [Indexed: 12/11/2022]
Abstract
With the large collections of gene and genome sequences, there is a need to generate curated comparative genomic databases that enable interpretation of results in an evolutionary context. Such resources can facilitate an understanding of the co-evolution of genes in the context of a genome mapped onto a phylogeny, of a protein structure, and of interactions within a pathway. A phylogenetically indexed gene family database, the adaptive evolution database (TAED), is presented that organizes gene families and their evolutionary histories in a species tree context. Gene families include alignments, phylogenetic trees, lineage-specific dN/dS ratios, reconciliation with the species tree to enable both the mapping and the identification of duplication events, mapping of gene families onto pathways, and mapping of amino acid substitutions onto protein structures. In addition to organization of the data, new phylogenetic visualization tools have been developed to aid in interpreting the data that are also available, including TreeThrasher and TAED Tree Viewer. A new resource of gene families organized by species and taxonomic lineage promises to be a valuable comparative genomics database for molecular biologists, evolutionary biologists, and ecologists. The new visualization tools and database framework will be of interest to both evolutionary biologists and bioinformaticians.
Collapse
|
27
|
Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 2017; 42:59-66. [DOI: 10.1016/j.sbi.2016.10.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/19/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
|
28
|
Suvorov A, Jensen NO, Sharkey CR, Fujimoto MS, Bodily P, Wightman HMC, Ogden TH, Clement MJ, Bybee SM. Opsins have evolved under the permanent heterozygote model: insights from phylotranscriptomics of Odonata. Mol Ecol 2016; 26:1306-1322. [DOI: 10.1111/mec.13884] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 09/24/2016] [Accepted: 10/04/2016] [Indexed: 02/04/2023]
Affiliation(s)
- Anton Suvorov
- Department of Biology; Brigham Young University; Provo UT 84602 USA
| | | | | | | | - Paul Bodily
- Computer Science Department; Brigham Young University; Provo UT 84602 USA
| | | | - T. Heath Ogden
- Department of Biology; Utah Valley University; Orem UT 84058 USA
| | - Mark J. Clement
- Computer Science Department; Brigham Young University; Provo UT 84602 USA
| | - Seth M. Bybee
- Department of Biology; Brigham Young University; Provo UT 84602 USA
| |
Collapse
|
29
|
Bershtein S, Serohijos AW, Shakhnovich EI. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations. Curr Opin Struct Biol 2016; 42:31-40. [PMID: 27810574 DOI: 10.1016/j.sbi.2016.10.013] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 10/14/2016] [Indexed: 01/11/2023]
Abstract
Bridging the gap between the molecular properties of proteins and organismal/population fitness is essential for understanding evolutionary processes. This task requires the integration of the several physical scales of biological organization, each defined by a distinct set of mechanisms and constraints, into a single unifying model. The molecular scale is dominated by the constraints imposed by the physico-chemical properties of proteins and their substrates, which give rise to trade-offs and epistatic (non-additive) effects of mutations. At the systems scale, biological networks modulate protein expression and can either buffer or enhance the fitness effects of mutations. The population scale is influenced by the mutational input, selection regimes, and stochastic changes affecting the size and structure of populations, which eventually determine the evolutionary fate of mutations. Here, we summarize the recent advances in theory, computer simulations, and experiments that advance our understanding of the links between various physical scales in biology.
Collapse
Affiliation(s)
- Shimon Bershtein
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva 84501, Israel
| | - Adrian Wr Serohijos
- Département de Biochimie, Centre Robert-Cedergren en Bioinformatique & Génomique, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, United States.
| |
Collapse
|
30
|
Orlenko A, Teufel AI, Chi PB, Liberles DA. Selection on metabolic pathway function in the presence of mutation-selection-drift balance leads to rate-limiting steps that are not evolutionarily stable. Biol Direct 2016; 11:31. [PMID: 27393343 PMCID: PMC4938953 DOI: 10.1186/s13062-016-0133-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/02/2016] [Indexed: 11/15/2022] Open
Abstract
Background While commonly assumed in the biochemistry community that the control of metabolic pathways is thought to be critical to cellular function, it is unclear if metabolic pathways generally have evolutionarily stable rate limiting (flux controlling) steps. Results A set of evolutionary simulations using a kinetic model of a metabolic pathway was performed under different conditions to evaluate the evolutionary stability of rate limiting steps. Simulations used combinations of selection for steady state flux, selection against the cost of molecular biosynthesis, and selection against the accumulation of high concentrations of a deleterious intermediate. Two mutational regimes were used, one with mutations that on average were neutral to molecular phenotype and a second with a preponderance of activity-destroying mutations. The evolutionary stability of rate limiting steps was low in all simulations with non-neutral mutational processes. Clustering of parameter co-evolution showed divergent inter-molecular evolutionary patterns under different evolutionary regimes. Conclusions This study provides a null model for pathway evolution when compensatory processes dominate with potential applications to predicting pathway functional change. This result also suggests a possible mechanism in which studies in statistical genetics that aim to associate a genotype to a phenotype assuming independent action of variants may be mis-specified through a mis-characterization of the link between individual gene function and pathway function. A better understanding of the genotype-phenotype map has potential applications in differentiating between compensatory changes and directional selection on pathways as well as detecting SNPs and fixed differences that might have phenotypic effects. Reviewers This article was reviewed by Arne Elofsson, David Ardell, and Shamil Sunyaev. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0133-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alena Orlenko
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Ashley I Teufel
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Peter B Chi
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA.,Department of Mathematics and Computer Science, Ursinus College, Collegeville, PA, 19426, USA
| | - David A Liberles
- Center for Computational Genetics and Genomics and Department of Biology, Temple University, Bio-Life Building, 1900 N. 12th Street, Philadelphia, PA, 19122-1801, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
31
|
Chi PB, Liberles DA. Selection on protein structure, interaction, and sequence. Protein Sci 2016; 25:1168-78. [PMID: 26808055 PMCID: PMC4918422 DOI: 10.1002/pro.2886] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/19/2016] [Indexed: 11/10/2022]
Abstract
Characterizing the probabilities of observing amino acid substitutions at specific sites in a protein over evolutionary time is a major goal in the field of molecular evolution. While purely statistical approaches at different levels of complexity exist, approaches rooted in underlying biological processes are necessary to characterize both the context-dependence of sequence changes (epistasis) and to extrapolate to sequences not observed in biological databases. To develop such approaches, an understanding of the different selective forces that act on amino acid substitution is necessary. Here, an overview of selection on and corresponding modeling of folding stability, folding specificity, binding affinity and specificity for ligands, the evolution of new binding sites on protein surfaces, protein dynamics, intrinsic disorder, and protein aggregation as well as the interplay with protein expression level (concentration) and biased mutational processes are presented.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
- Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
32
|
Shahmoradi A, Wilke CO. Dissecting the roles of local packing density and longer-range effects in protein sequence evolution. Proteins 2016; 84:841-54. [PMID: 26990194 PMCID: PMC5292938 DOI: 10.1002/prot.25034] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Revised: 02/01/2016] [Accepted: 02/24/2016] [Indexed: 11/07/2022]
Abstract
What are the structural determinants of protein sequence evolution? A number of site-specific structural characteristics have been proposed, most of which are broadly related to either the density of contacts or the solvent accessibility of individual residues. Most importantly, there has been disagreement in the literature over the relative importance of solvent accessibility and local packing density for explaining site-specific sequence variability in proteins. We show that this discussion has been confounded by the definition of local packing density. The most commonly used measures of local packing, such as contact number and the weighted contact number, represent the combined effects of local packing density and longer-range effects. As an alternative, we propose a truly local measure of packing density around a single residue, based on the Voronoi cell volume. We show that the Voronoi cell volume, when calculated relative to the geometric center of amino-acid side chains, behaves nearly identically to the relative solvent accessibility, and each individually can explain, on average, approximately 34% of the site-specific variation in evolutionary rate in a data set of 209 enzymes. An additional 10% of variation can be explained by nonlocal effects that are captured in the weighted contact number. Consequently, evolutionary variation at a site is determined by the combined effects of the immediate amino-acid neighbors of that site and effects mediated by more distant amino acids. We conclude that instead of contrasting solvent accessibility and local packing density, future research should emphasize on the relative importance of immediate contacts and longer-range effects on evolutionary variation. Proteins 2016; 84:841-854. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Amir Shahmoradi
- Department of Physics, The University of Texas at Austin
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
| | - Claus O. Wilke
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
- Department of Integrative Biology, The University of Texas at
Austin
| |
Collapse
|
33
|
Orlenko A, Hermansen RA, Liberles DA. Flux Control in Glycolysis Varies Across the Tree of Life. J Mol Evol 2016; 82:146-61. [PMID: 26920685 DOI: 10.1007/s00239-016-9731-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/29/2022]
Abstract
Biochemical thought posits that rate-limiting steps (defined here as points of flux control) are strongly selected as points of pathway regulation and control and are thus expected to be evolutionarily conserved. Conversely, population genetic thought based upon the concepts of mutation-selection-drift balance at the pathway level might suggest variation in flux controlling steps over evolutionary time. Glycolysis, as one of the most conserved and best characterized pathways, was studied to evaluate its evolutionary conservation. The flux controlling step in glycolysis was found to vary over the tree of life. Further, phylogenetic analysis suggested at least 60 events of gene duplication and additional events of putative positive selection that might alter pathway kinetic properties. Together, these results suggest that even with presumed largely negative selection on pathway output on glycolysis, the co-evolutionary process under the hood is dynamic.
Collapse
Affiliation(s)
- Alena Orlenko
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - Russell A Hermansen
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
34
|
Selection maintaining protein stability at equilibrium. J Theor Biol 2016; 391:21-34. [DOI: 10.1016/j.jtbi.2015.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/29/2015] [Accepted: 12/01/2015] [Indexed: 11/24/2022]
|
35
|
Hermansen RA, Mannakee BK, Knecht W, Liberles DA, Gutenkunst RN. Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC Evol Biol 2015; 15:232. [PMID: 26511837 PMCID: PMC4625875 DOI: 10.1186/s12862-015-0515-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 10/20/2015] [Indexed: 12/05/2022] Open
Abstract
Background Selection on proteins is typically measured with the assumption that each protein acts independently. However, selection more likely acts at higher levels of biological organization, requiring an integrative view of protein function. Here, we built a kinetic model for de novo pyrimidine biosynthesis in the yeast Saccharomyces cerevisiae to relate pathway function to selective pressures on individual protein-encoding genes. Results Gene families across yeast were constructed for each member of the pathway and the ratio of nonsynonymous to synonymous nucleotide substitution rates (dN/dS) was estimated for each enzyme from S. cerevisiae and closely related species. We found a positive relationship between the influence that each enzyme has on pathway function and its selective constraint. Conclusions We expect this trend to be locally present for enzymes that have pathway control, but over longer evolutionary timescales we expect that mutation-selection balance may change the enzymes that have pathway control. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0515-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Russell A Hermansen
- Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA. .,Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| | - Brian K Mannakee
- Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, 85721, USA.
| | - Wolfgang Knecht
- Department of Biology and Lund Protein Production Platform, Lund University, 22362, Lund, Sweden.
| | - David A Liberles
- Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA. .,Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
36
|
Padhi A, Ma L. A testis-specific gene within a widely expressed gene: Contrasting evolutionary patterns of two differentially expressed mammalian proteins encoded by a single gene, CAMK4. Anim Genet 2015; 46:683-92. [PMID: 26388303 DOI: 10.1111/age.12358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2015] [Indexed: 11/28/2022]
Abstract
Understanding the patterns of genetic variations within fertility-related genes and the evolutionary forces that shape such variations is crucial in predicting the fitness landscapes of subsequent generations. This study reports distinct evolutionary features of two differentially expressed mammalian proteins [CaMKIV (Ca(2+) /calmodulin-dependent protein kinase IV) and CaS (calspermin)] that are encoded by a single gene, CAMK4. The multifunctional CaMKIV, which is expressed in multiple tissues including testis and ovary, is evolving at a relatively low rate (0.46-0.64 × 10(-9) nucleotide substitutions/site/year), whereas the testis-specific CaS gene, which is predominantly expressed in post-meiotic cells, evolves at least three to four times faster (1.48-1.98 × 10(-9) substitutions/site/year). Concomitantly, maximum-likelihood-based selection analyses revealed that the ubiquitously expressed CaMKIV is constrained by intense purifying selection and, therefore, remained functionally highly conserved throughout the mammalian evolution, whereas the testis-specific CaS gene is under strong positive selection. The substitution rates of different mammalian lineages within both genes are positively correlated with GC content, indicating the possible influence of GC-biased gene conversion on the estimated substitution rates. The observation of such unusually high GC content of the CaS gene (≈74%), particularly in the lineage that comprises the bovine species, suggests the possible role of GC-biased gene conversion in the evolution of CaS that mimics positive selection.
Collapse
Affiliation(s)
- Abinash Padhi
- Department of Animal and Avian Sciences, University of Maryland, College Park, 20742, MD, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, 20742, MD, USA
| |
Collapse
|
37
|
Padhi A, Ma L. Time-dependent selection pressure on two arthropod-borne RNA viruses in the same serogroup. INFECTION GENETICS AND EVOLUTION 2015; 32:255-64. [PMID: 25801608 DOI: 10.1016/j.meegid.2015.03.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 03/11/2015] [Accepted: 03/15/2015] [Indexed: 12/20/2022]
Abstract
Understanding the genetic basis of viral adaptation to taxonomically diverse groups of host species inhabiting different eco-climatic zones is crucial for the discovery of factors underpinning the successful establishment of these infectious pathogens in new hosts/environments. To gain insights into the dynamics of nonsynonymous (dN) and synonymous substitutions (dS) and the ratio between the two (ω=dN/dS), we analyzed the complete nucleotide coding sequence data of the M segment, which encodes glycoproteins of two negative-sense RNA viruses, Akabane virus (AKV) and Schmallenberg virus (SBV) that belong to the same serogroup. While AKV is relatively older and has been circulating in ruminant populations since 1970s, SBV was first reported in 2011. The ω was estimated to be 1.67 and 0.09 for SBV and AKV, respectively, and the estimated mutation rate of SBV is at least 25 times higher than that of AKV. Given the different evolutionary stages of the two viruses, most of the slightly deleterious mutations were likely purged out or kept in low frequency in the AKV genome, whereas positive selection together with the accumulation of slightly deleterious mutations might contribute to such an inflated mutation rate of SBV. The evolutionary distance (d) is nonlinearly and negatively correlated with ω, but is positively correlated with dN and dS. Collectively, the different patterns in ω, dN, dS, and d between AKV and SBV identified in this study provide empirical evidence for a time-dependent selection pressure.
Collapse
Affiliation(s)
- Abinash Padhi
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|