1
|
Gupta MK, Vadde R. Next-generation development and application of codon model in evolution. Front Genet 2023; 14:1091575. [PMID: 36777719 PMCID: PMC9911445 DOI: 10.3389/fgene.2023.1091575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/17/2023] [Indexed: 01/28/2023] Open
Abstract
To date, numerous nucleotide, amino acid, and codon substitution models have been developed to estimate the evolutionary history of any sequence/organism in a more comprehensive way. Out of these three, the codon substitution model is the most powerful. These models have been utilized extensively to detect selective pressure on a protein, codon usage bias, ancestral reconstruction and phylogenetic reconstruction. However, due to more computational demanding, in comparison to nucleotide and amino acid substitution models, only a few studies have employed the codon substitution model to understand the heterogeneity of the evolutionary process in a genome-scale analysis. Hence, there is always a question of how to develop more robust but less computationally demanding codon substitution models to get more accurate results. In this review article, the authors attempted to understand the basis of the development of different types of codon-substitution models and how this information can be utilized to develop more robust but less computationally demanding codon substitution models. The codon substitution model enables to detect selection regime under which any gene or gene region is evolving, codon usage bias in any organism or tissue-specific region and phylogenetic relationship between different lineages more accurately than nucleotide and amino acid substitution models. Thus, in the near future, these codon models can be utilized in the field of conservation, breeding and medicine.
Collapse
|
2
|
Adaptive evolution of peptidoglycan recognition protein family regulates the innate signaling against microbial pathogens in vertebrates. Microb Pathog 2020; 147:104361. [PMID: 32622926 DOI: 10.1016/j.micpath.2020.104361] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Revised: 04/28/2020] [Accepted: 06/22/2020] [Indexed: 12/16/2022]
Abstract
The innate immune system is the first line of defense in vertebrates against microbial pathogens. This defense system depends on the peptidoglycan pathogen recognition of receptors (PGRPs) existing in both invertebrates and vertebrates. Although some studies revealed the structural and functional differences between them, however, the evolutionary history and the selection pressures on these genes during adaptive evolution are poorly understood. In this study, we examined four (PGLYRP1, PGLYRP2, PGLYRP3, and PGLYRP4) genes of 127 vertebrates' species, conserved across vertebrates to evaluate positive selection pressure drives by adaptive evolution. The codons under positive selection were recognized through likelihood tests by comparing different models based on ω ratios in these genes across the vertebrate species. The positive selection test used two sets of models M1a vs. M2a and M7 vs. M8. The results showed that the test of these genes in M1a vs. M2a was not significant with the likelihood value 2ΔlnL = 0, while the likelihood ratios (2ΔlnL) were 2ΔlnL = 12.386, 2ΔlnL = 4.9283, 2ΔlnL = 24.031, and 2ΔlnL = 103.39 for PGLYRP1, PGLYRP2, PGLYRP3, and PGLYRP4 in M7 vs. M8, respectively. Our study identified the evidence of robust positive selection for these four genes across the vertebrates. These protuberant changes in PGRPs evolution of vertebrates reveal their role in innate immunity. Our study provides an insight based on PGRP genes to understand the evolution of host and pathogens interaction that leads to the progress of the novel conducts for immune diseases that include proteins linked to the recognition of pathogens.
Collapse
|
3
|
Adaptive Molecular Evolution of AKT3 Gene for Positive Diversifying Selection in Mammals. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2584627. [PMID: 32550227 PMCID: PMC7256775 DOI: 10.1155/2020/2584627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 01/12/2020] [Accepted: 02/14/2020] [Indexed: 01/17/2023]
Abstract
The V-Akt Murine Thymoma Viral Oncogene Homolog 3 (AKT3) gene is of the serine/threonine-protein kinase family and influences the production of milk fats and cholesterol by acting on the sterol administrative area restricting protein (SREBP). The AKT3 gene is highly preserved in animals, and during lactation in cattle, its expression increases. The AKT3 gene is expressed in the digestive system, mammary gland, and immune cells. A phylogenetic investigation was performed to clarify the evolutionary role of AKT3, by maximum probability. The AKT3 gene sequence data of various mammalian species was evident even with animals undergoing breeding selection. From 39 mammalian species studied, there was a signal of positive diversifying selection with Hominidae at 13Q, 16G, 23R, 24P, 121P, 294K, 327V, 376L, 397K, 445T, and 471F among other codon sites of the AKT3 gene. These sites were codes for amino acids such as arginine, proline, lysine, and leucine indicating major roles for the function of immunological proteins, and in particular, the study highlighted the importance of changes in gene expression of AKT3 on immunity.
Collapse
|
4
|
Adaptation to host-specific bacterial pathogens drive rapid evolution of novel PhoP/PhoQ regulation pathway modulating the virulence. Microb Pathog 2020; 141:103997. [PMID: 31982569 DOI: 10.1016/j.micpath.2020.103997] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 01/14/2020] [Accepted: 01/22/2020] [Indexed: 01/18/2023]
Abstract
The presence of the PhoP-PhoQ system is usually different in various bacterial groups, suggesting that PhoP can control the expression of different genes in species. However, little is known about the evolution of the PhoP-PhoQ system among bacterial pathogens. Here, we study the evolution of PhoP and PhoQ regulation in 15 species of Enterobacteriaceae family. We have determined that the regulatory objectives adopted by PhoP and PhoQ are mainly different, due to the result of horizontal gene transfer events and even the change in the genetic content between closely related species. We have compared many possibilities tests (M1 vs. M2 and M7 with M8) to determine the positive selection. Estimating parameters at M1 and M2, with positive selection in M2 of the two proteins. The proportions of positive selection sites significant with ω = 4.53076 for PhoP and ω = 4.21041 PhQ. M8 was significant for PhoP and PhQ proteins. To further confirm the positive selection results, we used the Selecton server to confer positive selection on individual sites using the Mechanistic-Empirical Combination model, and we noticed that several sites had been identified under selection pressure during the evolution. There was a strong indication for the positive selection in bacterial genes of PhoP and PhoQ showed the results. By the use of REL and IFEL, the positive selection for PhoP was detected 14 and 11 sites respectively at different codon positions. The positively selected sites of amino acids such as Arginine, Alanine, Lysine, and Leucine are more important for the production of signals. Our results suggest that the positive selection of PhoP-PhoQ genes in host adaptation during evolution raises an intriguing possibility causes subtle variations in actions of PhoP-PhoQ and also increases the opportunities that cause modification in protein structure for the evolution of increasing pathogenicity in bacterial pathogens.
Collapse
|
5
|
Jones CT, Youssef N, Susko E, Bielawski JP. Phenomenological Load on Model Parameters Can Lead to False Biological Conclusions. Mol Biol Evol 2019; 35:1473-1488. [PMID: 29596684 DOI: 10.1093/molbev/msy049] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
When a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.
Collapse
Affiliation(s)
- Christopher T Jones
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Noor Youssef
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
6
|
Rizzato F, Rodriguez A, Biarnés X, Laio A. Predicting Amino Acid Substitution Probabilities Using Single Nucleotide Polymorphisms. Genetics 2017; 207:643-652. [PMID: 28754661 PMCID: PMC5629329 DOI: 10.1534/genetics.117.300078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 07/18/2017] [Indexed: 11/18/2022] Open
Abstract
Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85-100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
Collapse
Affiliation(s)
- Francesca Rizzato
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
| | - Alex Rodriguez
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
| | - Xevi Biarnés
- Laboratory of Biochemistry, Institut Químic de Sarrià (IQS), Universitat Ramon Llull (URL), 08017 Barcelona, Spain
| | - Alessandro Laio
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
- The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| |
Collapse
|
7
|
Ahmad HI, Liu G, Jiang X, Liu C, Chong Y, Huarong H. Adaptive molecular evolution of MC1R gene reveals the evidence for positive diversifying selection in indigenous goat populations. Ecol Evol 2017; 7:5170-5180. [PMID: 28770057 PMCID: PMC5528238 DOI: 10.1002/ece3.2919] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 02/10/2017] [Accepted: 02/13/2017] [Indexed: 12/16/2022] Open
Abstract
Detecting signatures of selection can provide a new insight into the mechanism of contemporary breeding and artificial selection and further reveal the causal genes associated to the phenotypic variation. However, the signatures of selection on genes entailing for profitable traits between Chinese commercial and indigenous goats have been poorly interpreted. We noticed footprints of positive selection at MC1R gene containing SNPs genotyped in five Chinese native goat breeds. An experimental distribution of FST was built based on approximations of FST for each SNP across five breeds. We identified selection using the high FST outlier method and found that MC1R candidate gene show evidence of positive selection. Furthermore, adaptive selection pressure on specific codons was determined using different codon based on maximum‐likelihood methods; signature of positive selection in mammalian MC1R was explored in individual codons. Evolutionary analyses were inferred under maximum likelihood models, the HyPhy package implemented in the DATAMONKEY Web Server. The results of codon selection displayed positive diversifying selection at the sites were mainly involved in development of genetic variations in coat color in various mammalian species. Positive diversifying selection inferred with recent evolutionary changes in domesticated goat MC1R provides new insights that the gene evolution may have been modulated by domestication events in goats.
Collapse
Affiliation(s)
- Hafiz Ishfaq Ahmad
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| | - Guiqiong Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| | - Xunping Jiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| | - Chenhui Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| | - Yuqing Chong
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| | - Huang Huarong
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education College of Animal Science and Technology Huazhong Agricultural University Wuhan China
| |
Collapse
|
8
|
Miyazawa S. Superiority of a mechanistic codon substitution model even for protein sequences in phylogenetic analysis. BMC Evol Biol 2013; 13:257. [PMID: 24256155 PMCID: PMC4225520 DOI: 10.1186/1471-2148-13-257] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 11/14/2013] [Indexed: 11/25/2022] Open
Abstract
Background Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified? Results Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate. Conclusions The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.
Collapse
|
9
|
Abstract
Paired epistatic interactions, such as those in the stem regions of RNA, play an important role in many biological processes. However, unlike protein-coding regions, paired epistatic interactions have lacked the appropriate statistical tools for the detection of departures from selective neutrality. Here, a model is presented for the analysis of paired epistatic regions that draws upon the population genetics of the compensatory substitution process to detect the relative strength of natural selection acting against deleterious combinations of alleles. The method is based upon the relative rates of double and single substitution, and can differentiate between nonindependent interactions and negatively epistatic ones. The model is implemented in a fully Bayesian framework for parameter estimation and is demonstrated using a 5S rRNA data set. In addition to the detection of selection, modeling the double and single substitution processes in this manner inherently accounts for a substantial proportion of rate variation among stem positions.
Collapse
|
10
|
Miyazawa S. Prediction of contact residue pairs based on co-substitution between sites in protein structures. PLoS One 2013; 8:e54252. [PMID: 23342110 PMCID: PMC3546969 DOI: 10.1371/journal.pone.0054252] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 12/10/2012] [Indexed: 11/18/2022] Open
Abstract
Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints in varying degrees on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing phylogenetic correlations and indirect dependences through other residues within a protein or even through other molecules. Rapid growth of protein families with unknown folds requires an accurate de novo prediction method for protein structure. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that inferred residue-residue proximities can be sufficient information to predict a protein fold without the use of known three-dimensional structures. Here, we propose an alternative method of inferring coevolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. Substitution probability and physico-chemical changes (volume, charge, hydrogen-bonding capability, and others) accompanied by substitutions at each site in each branch of a phylogenetic tree are estimated with the likelihood of each substitution, and their direct correlations between sites are used to detect concurrent and compensatory substitutions. In order to extract direct dependences between sites, partial correlation coefficients of the characteristic changes along branches between sites, in which linear multiple dependences on feature vectors at other sites are removed, are calculated and used to rank coevolving site pairs. Accuracy of contact prediction based on the present coevolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families taken from the Pfam release 26.0. Besides, this excellent accuracy indicates that compensatory substitutions are significant in protein evolution.
Collapse
Affiliation(s)
- Sanzo Miyazawa
- Graduate School of Engineering, Gunma University, Kiryu, Gunma, Japan.
| |
Collapse
|
11
|
Vanneste K, Van de Peer Y, Maere S. Inference of genome duplications from age distributions revisited. Mol Biol Evol 2012; 30:177-90. [PMID: 22936721 DOI: 10.1093/molbev/mss214] [Citation(s) in RCA: 112] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Whole-genome duplications (WGDs), thought to facilitate evolutionary innovations and adaptations, have been uncovered in many phylogenetic lineages. WGDs are frequently inferred from duplicate age distributions, where they manifest themselves as peaks against a small-scale duplication background. However, the interpretation of duplicate age distributions is complicated by the use of K(S), the number of synonymous substitutions per synonymous site, as a proxy for the age of paralogs. Two particular concerns are the stochastic nature of synonymous substitutions leading to increasing uncertainty in K(S) with increasing age since duplication and K(S) saturation caused by the inability of evolutionary models to fully correct for the occurrence of multiple substitutions at the same site. K(S) stochasticity is expected to erode the signal of older WGDs, whereas K(S) saturation may lead to artificial peaks in the distribution. Here, we investigate the consequences of these effects on K(S)-based age distributions and WGD inference by simulating the evolution of duplicated sequences according to predefined real age distributions and re-estimating the corresponding K(S) distributions. We show that, although K(S) estimates can be used for WGD inference far beyond the commonly accepted K(S) threshold of 1, K(S) saturation effects can cause artificial peaks at higher ages. Moreover, K(S) stochasticity and saturation may lead to confounded peaks encompassing multiple WGD events and/or saturation artifacts. We argue that K(S) effects need to be properly accounted for when inferring WGDs from age distributions and that the failure to do so could lead to false inferences.
Collapse
Affiliation(s)
- Kevin Vanneste
- Department of Plant Systems Biology, VIB, Ghent, Belgium
| | | | | |
Collapse
|