1
|
Patel R, Carnevale V, Kumar S. Epistasis Creates Invariant Sites and Modulates the Rate of Molecular Evolution. Mol Biol Evol 2022; 39:msac106. [PMID: 35575390 PMCID: PMC9156017 DOI: 10.1093/molbev/msac106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
2
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
3
|
Structure of the space of taboo-free sequences. J Math Biol 2020; 81:1029-1057. [PMID: 32940748 PMCID: PMC7560954 DOI: 10.1007/s00285-020-01535-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 08/19/2020] [Indexed: 11/29/2022]
Abstract
Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition site can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathbb {T}$$\end{document}T and any allowed string as a taboo-free string. We consider the so-called Hamming graph \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varGamma _n(\mathbb {T})$$\end{document}Γn(T), whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals one. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids taboos. We describe the construction of the vertex set of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varGamma _n(\mathbb {T})$$\end{document}Γn(T). Then we state conditions under which \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\varGamma _n(\mathbb {T})$$\end{document}Γn(T) and its suffix subgraphs are connected. Moreover, we provide an algorithm that determines if all these graphs are connected for an arbitrary \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathbb {T}$$\end{document}T. As an application of the algorithm, we show that about \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$87\%$$\end{document}87% of bacteria listed in REBASE have a taboo-set that induces connected taboo-free Hamming graphs, because they have less than four type II restriction enzymes. On the other hand, four properly chosen taboos are enough to disconnect one suffix subgraph, and consequently connectivity of taboo-free Hamming graphs could change depending on the composition of restriction sites.
Collapse
|
4
|
Moshe A, Pupko T. Ancestral sequence reconstruction: accounting for structural information by averaging over replacement matrices. Bioinformatics 2020; 35:2562-2568. [PMID: 30590382 DOI: 10.1093/bioinformatics/bty1031] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 12/03/2018] [Accepted: 12/16/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences. AVAILABILITY AND IMPLEMENTATION http://fastml.tau.ac.il. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Asher Moshe
- Department of Cell Research and Immunology, School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
5
|
Crotty SM, Minh BQ, Bean NG, Holland BR, Tuke J, Jermiin LS, Haeseler AV. GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments. Syst Biol 2019; 69:249-264. [DOI: 10.1093/sysbio/syz051] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 07/18/2019] [Accepted: 07/22/2019] [Indexed: 01/01/2023] Open
Abstract
Abstract
Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.
Collapse
Affiliation(s)
- Stephen M Crotty
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria
- School of Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
| | - Bui Quang Minh
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Nigel G Bean
- School of Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, SA, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - Jonathan Tuke
- School of Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, The University of Adelaide, Adelaide, SA, Australia
| | - Lars S Jermiin
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- CSIRO Land & Water, Black Mountain Laboratories, Canberra, ACT 2601, Australia
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
- Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Arndt Von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria
- Bioinformatics & Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| |
Collapse
|
6
|
Maciel AO, de Castro TM, Sturaro MJ, Costa Silva IE, Ferreira JG, dos Santos R, Risse-Quaioto B, Barboza BA, Oliveira JC, Sampaio I, Schneider H. Phylogenetic systematics of the Neotropical caecilian amphibian Luetkenotyphlus (Gymnophiona: Siphonopidae) including the description of a new species from the vulnerable Brazilian Atlantic Forest. ZOOL ANZ 2019. [DOI: 10.1016/j.jcz.2019.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
7
|
Maciel AO, Sampaio MI, Hoogmoed MS, Schneider H. Description of Two New Species ofRhinatrema(Amphibia: Gymnophiona) from Brazil and the Return ofEpicrionops nigertoRhinatrema. SOUTH AMERICAN JOURNAL OF HERPETOLOGY 2018. [DOI: 10.2994/sajh-d-17-00054.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Affiliation(s)
- Adriano O. Maciel
- Programa de Capacitação Institucional, Museu Paraense Emílio Goeldi, Coordenação de Zoologia. Av. Perimetral 1901, Terra Firme, CEP 66077-830, Belém, PA, Brazil
| | - Maria I.C. Sampaio
- Instituto de Estudos Costeiros, Universidade Federal do Pará, 68600-000, Bragança, Pará, Brazil
| | - Marinus S. Hoogmoed
- Coordenação de Zoologia, Museu Paraense Emílio Goeldi, Perimetral, 1901, Terra Firme, CEP 66077-830, Belém, Pará, Brazil
| | - Horacio Schneider
- Instituto de Estudos Costeiros, Universidade Federal do Pará, 68600-000, Bragança, Pará, Brazil
| |
Collapse
|
8
|
DeBlasio D, Kececioglu J. Adaptive Local Realignment of Protein Sequences. J Comput Biol 2018; 25:780-793. [PMID: 29889553 DOI: 10.1089/cmb.2018.0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.
Collapse
Affiliation(s)
- Dan DeBlasio
- 1 Computational Biology Department, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - John Kececioglu
- 2 Department of Computer Science, The University of Arizona , Tucson, Arizona
| |
Collapse
|
9
|
Zhai Y, Alexandre BC. A Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference. Syst Biol 2018; 66:698-714. [PMID: 28204784 DOI: 10.1093/sysbio/syx033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 01/27/2017] [Indexed: 01/22/2023] Open
Abstract
While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor joining. Using simulation studies, we show that ignoring indel rate variation may have a detrimental effect on the accuracy of the inferred phylogenies, and that our proposed method can sidestep this issue by inferring latent indel rate categories. We also show that our phylogenetic inference method may be more stable to taxa subsampling than methods that either ignore indels or indel rate variation. [evolutionary stochastic process; indel rate variation; Poisson indel process; TKF91.].
Collapse
Affiliation(s)
- Yongliang Zhai
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Bouchard-Côté Alexandre
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| |
Collapse
|
10
|
Uzzell T, Pilbeam D. PHYLETIC DIVERGENCE DATES OF HOMINOID PRIMATES: A COMPARISON OF FOSSIL AND MOLECULAR DATA. Evolution 2017; 25:615-635. [DOI: 10.1111/j.1558-5646.1971.tb01921.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/1971] [Indexed: 11/30/2022]
Affiliation(s)
- Thomas Uzzell
- Departments of Biology and Anthropology and Peabody Museum of Natural History; Yale University; New Haven Connecticut 06520
| | - David Pilbeam
- Departments of Biology and Anthropology and Peabody Museum of Natural History; Yale University; New Haven Connecticut 06520
| |
Collapse
|
11
|
Maciel AO, Sampaio MIC, Hoogmoed MS, Schneider H. Phylogenetic relationships of the largest lungless tetrapod (Gymnophiona,Atretochoana) and the evolution of lunglessness in caecilians. ZOOL SCR 2016. [DOI: 10.1111/zsc.12206] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Adriano O. Maciel
- Coordenação de Zoologia; Museu Paraense Emílio Goeldi; Avenida Perimetral, 1901, Terra Firme CEP 66077-530 Belém Pará Brazil
| | - Maria I. C. Sampaio
- Instituto de Estudos Costeiros; Universidade Federal do Para; 68600-000 Braganca Pará Brazil
| | - Marinus S. Hoogmoed
- Coordenação de Zoologia; Museu Paraense Emílio Goeldi; Perimetral, 1901, Terra Firme CEP 66077-530 Belém Pará Brazil
| | - Horacio Schneider
- Instituto de Estudos Costeiros; Universidade Federal do Para; 68600-000 Braganca Pará Brazil
| |
Collapse
|
12
|
|
13
|
Wang CJ, Chan YL, Shien CH, Yeh KW. Molecular characterization of fruit-specific class III peroxidase genes in tomato (Solanum lycopersicum). JOURNAL OF PLANT PHYSIOLOGY 2015; 177:83-92. [PMID: 25703772 DOI: 10.1016/j.jplph.2015.01.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 01/16/2015] [Accepted: 01/16/2015] [Indexed: 06/04/2023]
Abstract
In this study, expression of four peroxidase genes, LePrx09, LePrx17, LePrx35 and LePrxA, was identified in immature tomato fruits, and the function in the regulation of fruit growth was characterized. Analysis of amino acid sequences revealed that these genes code for class III peroxidases, containing B, D and F conserved domains, which bind heme groups, and a buried salt bridge motif. LePrx35 and LePrxA were identified as novel peroxidase genes in Solanum lycopersicum (L.). The temporal expression patterns at various fruit growth stages revealed that LePrx35 and LePrxA were expressed only in immature green (IMG) fruits, whereas LePrx17 and LePrx09 were expressed in both immature and mature green fruits. Tissue-specific expression profiles indicated that only LePrx09 was expressed in the mesocarp but not the inner tissue of immature fruits. The effects of hormone treatments and stresses on the four genes were examined; only the expression levels of LePrx17 and LePrx09 were altered. Transcription of LePrx17 was up-regulated by jasmonic acid (JA) and pathogen infection and expression of LePrx09 was induced by ethephon, salicylic acid (SA) and JA, in particular, as well as wounding, pathogen infection and H2O2 stress. Tomato plants over-expressing LePrx09 displayed enhanced resistance to H2O2 stress, suggesting that LePrx09 may participate in the H2O2 signaling pathway to regulate fruit growth and disease resistance in tomato fruits.
Collapse
Affiliation(s)
- Chii-Jeng Wang
- Institute of Plant Biology, National Taiwan University, Taipei, Taiwan; Hualien District Agricultural Research and Extension Station, Council of Agriculture, Hualien, Taiwan
| | - Yuan-Li Chan
- AVRDC-The World Vegetable Center, PO Box 42, Shanhua, Tainan 74199, Taiwan
| | - Chin Hui Shien
- Ecological Materials Technology Department, Green Energy & Eco-technology System Center, ITRI South Campus, Industrial Technology Research Institute, Tainan, Taiwan
| | - Kai-Wun Yeh
- Institute of Plant Biology, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
14
|
Jia F, Lo N, Ho SYW. The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales. PLoS One 2014; 9:e95722. [PMID: 24798481 PMCID: PMC4010409 DOI: 10.1371/journal.pone.0095722] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/28/2014] [Indexed: 12/23/2022] Open
Abstract
Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable mixture model, which assumes that a proportion of sites in the sequence are completely resistant to change, while substitution rates at the remaining sites are gamma-distributed. For data sampled at the intraspecific level, however, biological assumptions involved in the invariable-sites model are commonly violated. We examined the use of these models in analyses of five intraspecific data sets. We show that using 6-10 rate categories for the discrete gamma distribution of rates among sites is sufficient to provide a good approximation of the marginal likelihood. Increasing the number of gamma rate categories did not have a substantial effect on estimates of the substitution rate or coalescence time, unless rates varied strongly among sites in a non-gamma-distributed manner. The assumption of a proportion of invariable sites provided a better approximation of the asymptotic marginal likelihood when the number of gamma categories was small, but had minimal impact on estimates of rates and coalescence times. However, the estimated proportion of invariable sites was highly susceptible to changes in the number of gamma rate categories. The concurrent use of gamma and invariable-site models for intraspecific data is not biologically meaningful and has been challenged on statistical grounds; here we have found that the assumption of a proportion of invariable sites has no obvious impact on Bayesian estimates of rates and timescales from intraspecific data.
Collapse
Affiliation(s)
- Fangzhi Jia
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Nathan Lo
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Simon Y. W. Ho
- School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
15
|
Kitada S, Fujikake C, Asakura Y, Yuki H, Nakajima K, Vargas KM, Kawashima S, Hamasaki K, Kishino H. Molecular and morphological evidence of hybridization between native Ruditapes philippinarum and the introduced Ruditapes form in Japan. CONSERV GENET 2013. [DOI: 10.1007/s10592-013-0467-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Abstract
Marine aquaculture and stock enhancement are major causes of the introduction of alien species. A good example of such an introduction is the Japanese shortneck clam Ruditapes philippinarum, one of the most important fishery resources in the world. To meet the domestic shortage of R. philippinarum caused by depleted catches, clams were imported to Japan from China and the Korean peninsula. The imported clam is an alien species that has a very similar morphology, and was misidentified as R. philippinarum (hereafter, Ruditapes form). We genotyped 1,186 clams of R. philippinarum and R. form at four microsatellite loci, sequenced mitochondrial DNA (COI gene fragment) of 485 clams, 34 of which were R. variegatus, and measured morphometric and meristic characters of 754 clams from 12 populations in Japan and China, including the Ariake Sea and Tokyo Bay, where large numbers of R. form were released. Our analyses confirmed that R. form was from the genus Ruditapes, and the genetic differentiation between R. philippinarum and R. form was distinct, but small, compared with five bivalve outgroups. However, R. form had distinct shell morphology, especially larger numbers of radial ribs on the shell surface, suggesting that R. form might be a new Ruditapes species or a variation of R. philippinarum that originated from southern China. A genetic affinity of the sample from the Ariake Sea to R. form was found with the intermediate shell morphology and number of radial ribs, and the hybrid proportion was estimated at 51.3 ± 4.6 % in the Ariake Sea.
Collapse
|
16
|
Maximum Parsimony Phylogenetic Inference Using Simulated Annealing. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2013. [DOI: 10.1007/978-3-642-31519-0_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
17
|
Abstract
Although several decades of study have revealed the ubiquity of variation of evolutionary rates among sites, reliable methods for studying rate variation were not developed until very recently. Early methods fit theoretical distributions to the numbers of changes at sites inferred by parsimony and substantially underestimate the rate variation. Recent analyses show that failure to account for rate variation can have drastic effects, leading to biased dating of speciation events, biased estimation of the transition:transversion rate ratio, and incorrect reconstruction of phylogenies.
Collapse
Affiliation(s)
- Z Yang
- Ziheng Yang is at the Dept of Integrative Biology, University of California, Berkeley, CA 94720, USA; College of Animal Science and Technology, Beijing Agricultural University, Beijing 100094, China
| |
Collapse
|
18
|
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011. [PMID: 21546353 DOI: 10.1093/molbev/msr121.] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.
Collapse
Affiliation(s)
- Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
| | | | | | | | | | | |
Collapse
|
19
|
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011; 28:2731-9. [PMID: 21546353 DOI: 10.1093/molbev/msr121] [Citation(s) in RCA: 27996] [Impact Index Per Article: 2153.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.
Collapse
Affiliation(s)
- Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
| | | | | | | | | | | |
Collapse
|
20
|
Dang CC, Le QS, Gascuel O, Le VS. FLU, an amino acid substitution model for influenza proteins. BMC Evol Biol 2010; 10:99. [PMID: 20384985 PMCID: PMC2873421 DOI: 10.1186/1471-2148-10-99] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 04/12/2010] [Indexed: 01/28/2023] Open
Abstract
Background The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses. Results A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from ~113, 000 influenza protein sequences, consisting of ~20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains ~42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/. Conclusions FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.
Collapse
Affiliation(s)
- Cuong Cao Dang
- College of Technology, Vietnam National University Hanoi, Cau Giay, Hanoi, Vietnam
| | | | | | | |
Collapse
|
21
|
Molecular systematics: A synthesis of the common methods and the state of knowledge. Cell Mol Biol Lett 2010; 15:311-41. [PMID: 20213503 PMCID: PMC6275913 DOI: 10.2478/s11658-010-0010-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Accepted: 03/01/2010] [Indexed: 11/21/2022] Open
Abstract
The comparative and evolutionary analysis of molecular data has allowed researchers to tackle biological questions that have long remained unresolved. The evolution of DNA and amino acid sequences can now be modeled accurately enough that the information conveyed can be used to reconstruct the past. The methods to infer phylogeny (the pattern of historical relationships among lineages of organisms and/or sequences) range from the simplest, based on parsimony, to more sophisticated and highly parametric ones based on likelihood and Bayesian approaches. In general, molecular systematics provides a powerful statistical framework for hypothesis testing and the estimation of evolutionary processes, including the estimation of divergence times among taxa. The field of molecular systematics has experienced a revolution in recent years, and, although there are still methodological problems and pitfalls, it has become an essential tool for the study of evolutionary patterns and processes at different levels of biological organization. This review aims to present a brief synthesis of the approaches and methodologies that are most widely used in the field of molecular systematics today, as well as indications of future trends and state-of-the-art approaches.
Collapse
|
22
|
Ahrens D, Ribera I. Inferring speciation modes in a clade of Iberian chafers from rates of morphological evolution in different character systems. BMC Evol Biol 2009; 9:234. [PMID: 19754949 PMCID: PMC2753572 DOI: 10.1186/1471-2148-9-234] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 09/15/2009] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Studies of speciation mode based on phylogenies usually test the predicted effect on diversification patterns or on geographical distribution of closely related species. Here we outline an approach to infer the prevalent speciation mode in Iberian Hymenoplia chafers through the comparison of the evolutionary rates of morphological character systems likely to be related to sexual or ecological selection. Assuming that mitochondrial evolution is neutral and not related to measured phenotypic differences among the species, we contrast hypothetic outcomes of three speciation modes: 1) geographic isolation with subsequent random morphological divergence, resulting in overall change proportional to the mtDNA rate; 2) sexual selection on size and shape of the male intromittent organs, resulting in an evolutionary rate decoupled to that of the mtDNA; and 3) ecological segregation, reflected in character systems presumably related to ecological or biological adaptations, with rates decoupled from that of the mtDNA. RESULTS The evolutionary rate of qualitative external body characters was significantly correlated to that of the mtDNA both for the overall root-to-tip patristic distances and the individual inter-node branches, as measured with standard statistics and the randomization of a global comparison metric (the z-score). The rate of the body morphospace was significantly correlated to that of the mtDNA only for the individual branches, but not for the patristic distances, while that of the paramere outline was significantly correlated with mtDNA rates only for the patristic distances but not for the individual branches. CONCLUSION Structural morphological characters, often used for species recognition, have evolved at a rate proportional to that of the mtDNA, with no evidence of directional or stabilising selection according to our measures. The change in body morphospace seems to have evolved randomly at short term, but the overall change is different from that expected under a pure random drift or randomly fluctuating selection, reflecting either directional or stabilising selection or developmental constraints. Short term changes in paramere shape possibly reflect sexual selection, but their overall amount of change was unconstrained, possibly reflecting their lack of functionality. Our approach may be useful to provide indirect insights into the prevalence of different speciation modes in entire lineages when direct evidence is lacking.
Collapse
Affiliation(s)
- Dirk Ahrens
- Zoologische Staatssammlung München, Münchhausenstr. 21, 81247 Munich, Germany
- Department of Entomology, Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Ignacio Ribera
- Museo Nacional de Ciencias Naturales, José Gutiérrez Abascal 2, 28006 Madrid, Spain
- Institute of Evolutionary Biology (CSIC-UPF), Passeig Maritim de la Barceloneta 37-49, 08003 Barcelona, Spain
| |
Collapse
|
23
|
Wang DP, Wan HL, Zhang S, Yu J. Gamma-MYN: a new algorithm for estimating Ka and Ks with consideration of variable substitution rates. Biol Direct 2009; 4:20. [PMID: 19531225 PMCID: PMC2702329 DOI: 10.1186/1745-6150-4-20] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Accepted: 06/16/2009] [Indexed: 12/11/2022] Open
Abstract
Background Over the past two decades, there have been several approximate methods that adopt different mutation models and used for estimating nonsynonymous and synonymous substitution rates (Ka and Ks) based on protein-coding sequences across species or even different evolutionary lineages. Among them, MYN method (a Modified version of Yang-Nielsen method) considers three major dynamic features of evolving DNA sequences–bias in transition/transversion rate, nucleotide frequency, and unequal transitional substitution but leaves out another important feature: unequal substitution rates among different sites or nucleotide positions. Results We incorporated a new feature for analyzing evolving DNA sequences–unequal substitution rates among different sites–into MYN method, and proposed a modified version, namely γ (gamma)-MYN, based on an assumption that the evolutionary rate at each site follows a mode of γ-distribution. We applied γ-MYN to analyze the key estimator of selective pressure ω (Ka/Ks) and other relevant parameters in comparison to two other related methods, YN and MYN, and found that neglecting the variation of substitution rates among different sites may lead to biased estimations of ω. Our new method appears to have minimal deviations when relevant parameters vary within normal ranges defined by empirical data. Conclusion Our results indicate that unequal substitution rates among different sites have variable influences on ω under different evolutionary rates while both transition/transversion rate ratio and unequal nucleotide frequencies affect Ka and Ks thus selective pressure ω. Reviewers This paper was reviewed by Kateryna Makova, David A. Liberles (nominated by David H Ardell), Zhaolei Zhang (nominated by Mark Gerstein), and Shamil Sunyaev.
Collapse
Affiliation(s)
- Da-Peng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, PR China.
| | | | | | | |
Collapse
|
24
|
Schneider A, Cannarozzi GM. Support patterns from different outgroups provide a strong phylogenetic signal. Mol Biol Evol 2009; 26:1259-72. [PMID: 19240194 DOI: 10.1093/molbev/msp034] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
It is known that the accuracy of phylogenetic reconstruction decreases when more distant outgroups are used. We quantify this phenomenon with a novel scoring method, the outgroup score pOG. This score expresses if the support for a particular branch of a tree decreases with increasingly distant outgroups. Large-scale simulations confirmed that the outgroup support follows this expectation and that the pOG score captures this pattern. The score often identifies the correct topology even when the primary reconstruction methods fail, particularly in the presence of model violations. In simulations of problematic phylogenetic scenarios such as rate variation among lineages (which can lead to long-branch attraction artifacts) and quartet-based reconstruction, the pOG analysis outperformed the primary reconstruction methods. Because the pOG method does not make any assumptions about the evolutionary model (besides the decreasing support from increasingly distant outgroups), it can detect cases of violations not treated by a specific model or too strong to be fully corrected. When used as an optimization criterion in the construction of a tree of 23 mammals, the outgroup signal confirmed many well-accepted mammalian orders and superorders. It supports Atlantogenata, a clade of Afrotheria and Xenarthra, and suggests an Artiodactyla-Chiroptera clade.
Collapse
Affiliation(s)
- Adrian Schneider
- ETH Zurich, Department of Computer Science, Zurich, Switzerland.
| | | |
Collapse
|
25
|
Ries JB, Anderson MA, Hill RT. Seawater Mg/Ca controls polymorph mineralogy of microbial CaCO3: a potential proxy for calcite-aragonite seas in Precambrian time. GEOBIOLOGY 2008; 6:106-119. [PMID: 18380873 DOI: 10.1111/j.1472-4669.2007.00134.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A previously published hydrothermal brine-river water mixing model driven by ocean crust production suggests that the molar Mg/Ca ratio of seawater (mMg/Ca(sw)) has varied significantly (approximately 1.0-5.2) over Precambrian time, resulting in six intervals of aragonite-favouring seas (mMg/Ca(sw) > 2) and five intervals of calcite-favouring seas (mMg/Ca(sw) < 2) since the Late Archaean. To evaluate the viability of microbial carbonates as mineralogical proxy for Precambrian calcite-aragonite seas, calcifying microbial marine biofilms were cultured in experimental seawaters formulated over the range of Mg/Ca ratios believed to have characterized Precambrian seawater. Biofilms cultured in experimental aragonite seawater (mMg/Ca(sw) = 5.2) precipitated primarily aragonite with lesser amounts of high-Mg calcite (mMg/Ca(calcite) = 0.16), while biofilms cultured in experimental calcite seawater (mMg/Ca(sw) = 1.5) precipitated exclusively lower magnesian calcite (mMg/Ca(calcite) = 0.06). Furthermore, Mg/Ca(calcite )varied proportionally with Mg/Ca(sw). This nearly abiotic mineralogical response of the biofilm CaCO3 to altered Mg/Ca(sw) is consistent with the assertion that biofilm calcification proceeds more through the elevation of , via metabolic removal of CO2 and/or H+, than through the elevation of Ca2+, which would alter the Mg/Ca ratio of the biofilm's calcifying fluid causing its pattern of CaCO3 polymorph precipitation (aragonite vs. calcite; Mg-incorporation in calcite) to deviate from that of abiotic calcification. If previous assertions are correct that the physicochemical properties of Precambrian seawater were such that Mg/Ca(sw) was the primary variable influencing CaCO3 polymorph mineralogy, then the observed response of the biofilms' CaCO3 polymorph mineralogy to variations in Mg/Ca(sw), combined with the ubiquity of such microbial carbonates in Precambrian strata, suggests that the original polymorph mineralogy and Mg/Ca(calcite )of well-preserved microbial carbonates may be an archive of calcite-aragonite seas throughout Precambrian time. These results invite a systematic evaluation of microbial carbonate primary mineralogy to empirically constrain Precambrian seawater Mg/Ca.
Collapse
Affiliation(s)
- J B Ries
- Department of Geology and Geophysics, The Woods Hole Oceanographic Institution, MS #23, Woods Hole, Massachusetts 02543, USA.
| | | | | |
Collapse
|
26
|
Autio KJ, Kastaniotis AJ, Pospiech H, Miinalainen IJ, Schonauer MS, Dieckmann CL, Hiltunen JK. An ancient genetic link between vertebrate mitochondrial fatty acid synthesis and RNA processing. FASEB J 2007; 22:569-78. [PMID: 17898086 DOI: 10.1096/fj.07-8986] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In bacteria, functionally related gene products are often encoded by a common transcript. Such polycistronic transcripts are rare in eukaryotes. Here we isolated several clones from human cDNA libraries, which rescued the respiratory-deficient phenotype of a yeast mitochondrial 3-hydroxyacyl thioester dehydratase 2 (htd2) mutant strain. All complementing cDNAs were derived from the RPP14 transcript previously described to encode the RPP14 subunit of the human ribonuclease P (RNase P) complex. We identified a second, 3' open reading frame (ORF) on the RPP14 transcript encoding a protein showing similarity to known dehydratases and hydratase 2 enzymes. The protein was localized in mitochondria, and the recombinant enzyme exhibited (3R)-specific hydratase 2 activity. Based on our results, we named the protein human 3-hydroxyacyl-thioester dehydratase 2 (HsHTD2), which is involved in mitochondrial fatty acid synthesis. The bicistronic arrangement of RPP14 and HsHTD2, as well as the general exon structure of the gene, is conserved in vertebrates from fish to humans, indicating a genetic link conserved for 400 million years between RNA processing and mitochondrial fatty acid synthesis.
Collapse
Affiliation(s)
- Kaija J Autio
- Department of Biochemistry and Biocenter Oulu, University of Oulu, P.O. Box 3000, FIN-90014 Oulu, Finland.
| | | | | | | | | | | | | |
Collapse
|
27
|
Affiliation(s)
- Naoyuki Takahata
- Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa 240-0193, Japan.
| |
Collapse
|
28
|
Abstract
Background The rate of evolution varies spatially along genomes and temporally in time. The presence of evolutionary rate variation is an informative signal that often marks functional regions of genomes and historical selection events. There exist many tests for temporal rate variation, or heterotachy, that start by partitioning sampled sequences into two or more groups and testing rate homogeneity among the groups. I develop a Bayesian method to infer phylogenetic trees with a divergence point, or dramatic temporal shifts in selection pressure that affect many nucleotide sites simultaneously, located at an unknown position in the tree. Results Simulation demonstrates that the method is most able to detect divergence points when rate variation and the number of affected sites is high, but not beyond biologically relevant values. The method is applied to two viral data sets. A divergence point is identified separating the B and C subtypes, two genetically distinct variants of HIV that have spread into different human populations with the AIDS epidemic. In contrast, no strong signal of temporal rate variation is found in a sample of F and H genotypes, two genetic variants of HBV that have likely evolved with humans during their immigration and expansion into the Americas. Conclusion Temporal shifts in evolutionary rate of sufficient magnitude are detectable in the history of sampled sequences. The ability to detect such divergence points without the need to specify a prior hypothesis about the location or timing of the divergence point should help scientists identify historically important selection events and decipher mechanisms of evolution.
Collapse
Affiliation(s)
- Karin S Dorman
- Department of Statistics, and the Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, USA.
| |
Collapse
|
29
|
Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT. Incorporating Molecular Evolution into Phylogenetic Analysis, and a New Compilation of Conserved Polymerase Chain Reaction Primers for Animal Mitochondrial DNA. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2006. [DOI: 10.1146/annurev.ecolsys.37.091305.110018] [Citation(s) in RCA: 429] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Chris Simon
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut 06269
- School of Biological Sciences, Victoria University of Wellington, Wellington 6014, New Zealand
| | | | - Francesco Frati
- Department of Evolutionary Biology, University of Siena, 53100 Siena, Italy;
| | - James B. Stewart
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada; ,
- Department of Laboratory Medicine, Division of Metabolic Diseases, Karolinska Institutet, Norvum 141 86, Stockholm, Sweden
| | - Andrew T. Beckenbach
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada; ,
| |
Collapse
|
30
|
Abstract
MOTIVATION Evolutionary conservation estimated from a multiple sequence alignment is a powerful indicator of the functional significance of a residue and helps to predict active sites, ligand binding sites, and protein interaction interfaces. Many algorithms that calculate conservation work well, provided an accurate and balanced alignment is used. However, such a strong dependence on the alignment makes the results highly variable. We attempted to improve the conservation prediction algorithm by making it more robust and less sensitive to (1) local alignment errors, (2) overrepresentation of sequences in some branches and (3) occasional presence of unrelated sequences. RESULTS A novel method is presented for robust constrained Bayesian estimation of evolutionary rates that avoids overfitting independent rates and satisfies the above requirements. The method is evaluated and compared with an entropy-based conservation measure on a set of 1494 protein interfaces. We demonstrated that approximately 62% of the analyzed protein interfaces are more conserved than the remaining surface at the 5% significance level. A consistent method to incorporate alignment reliability is proposed and demonstrated to reduce arbitrary variation of calculated rates upon inclusion of distantly related or unrelated sequences into the alignment.
Collapse
Affiliation(s)
- Andrew J Bordner
- Molsoft LLC, 3366 North Torrey Pines Court, Suite 300, La Jolla, CA 92037, USA.
| | | |
Collapse
|
31
|
Ting CT, Tsaur SC, Sun S, Browne WE, Chen YC, Patel NH, Wu CI. Gene duplication and speciation in Drosophila: evidence from the Odysseus locus. Proc Natl Acad Sci U S A 2004; 101:12232-5. [PMID: 15304653 PMCID: PMC514461 DOI: 10.1073/pnas.0401975101] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2004] [Accepted: 06/30/2004] [Indexed: 11/18/2022] Open
Abstract
The importance of gene duplication in evolution has long been recognized. Because duplicated genes are prone to diverge in function, gene duplication could plausibly play a role in species differentiation. However, experimental evidence linking gene duplication with speciation is scarce. Here, we show that a hybrid-male sterility gene, Odysseus (OdsH), arose by gene duplication in the Drosophila genome. OdsH has evolved at a very high rate, whereas its most immediate paralog, unc-4, is nearly identical among species in the Drosophila melanogaster subgroup. The disparity in their sequence evolution is echoed by the divergence in their expression patterns in both soma and reproductive tissues. We suggest that duplicated genes that have yet to evolve a stable function at the time of speciation may be candidates for "speciation genes," which is broadly defined as genes that contribute to differential adaptation between species.
Collapse
Affiliation(s)
- Chau-Ti Ting
- Department of Life Science, National Tsing Hua University, Hsinchu 300, Taiwan.
| | | | | | | | | | | | | |
Collapse
|
32
|
Reyes A, Pesole G, Saccone C. Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. Gene 2000; 259:177-87. [PMID: 11163975 DOI: 10.1016/s0378-1119(00)00438-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The phylogenetic relationships among major lineages of rodents is one of the issues most debated by both paleontologists and molecular biologists. In the present study, we have analyzed all complete mammalian mitochondrial genomes available in the databases, including five rodent species (rat, mouse, dormouse, squirrel and guinea-pig). Phylogenetic analyses were performed on H-strand amino acid sequences by means of maximum-likelihood and on H-strand protein-coding and ribosomal genes by means of distance methods. Also, log-likelihood ratio tests were applied to different tree topologies under the assumption of rodent monophyly, paraphyly or polyphyly. The analyses significantly rejected rodent monophyly and showed the existence of two differentiated clades, one containing non-murids (dormouse, squirrel and guinea-pig) and the other containing murids (rat and mouse). Long-branch attraction between murids and the outgroups could not be responsible for the existence of two different rodent clades, as no significant differences in evolutionary rate have been observed, except in the case of the squirrel, which shows a lower rate. The impact of among-site rate variation models on the phylogeny of rodents has been evaluated using the gamma distribution model. Results have shown that relationships among rodents remained unchanged, and the general topology of the tree was not affected, even though some branches were not properly resolved, most likely due to a lack of fit between estimated and real rate heterogeneity parameters.
Collapse
Affiliation(s)
- A Reyes
- Centro di Studio sui Mitocondri e Metabolismo Energetico, CNR, via Amendola 165/A, 70126 Bari, Italy
| | | | | |
Collapse
|
33
|
Morozov P, Sitnikova T, Churchill G, Ayala FJ, Rzhetsky A. A new method for characterizing replacement rate variation in molecular sequences. Application of the Fourier and wavelet models to Drosophila and mammalian proteins. Genetics 2000; 154:381-95. [PMID: 10628997 PMCID: PMC1460898 DOI: 10.1093/genetics/154.1.381] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.
Collapse
Affiliation(s)
- P Morozov
- Columbia Genome Center, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
34
|
Chang JT. Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. Math Biosci 1996; 134:189-215. [PMID: 8664540 DOI: 10.1016/0025-5564(95)00172-7] [Citation(s) in RCA: 114] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
A fundamental problem in reconstructing the evolutionary history of a set of species is to infer the topology of the evolutionary tree that relates those species. A statistical method for estimating such a topology from character data is called consistent if, given data from more and more characters, the method is sure to converge to the true topology. A number of popular methods are based on modeling the evolution of each character as a Markov process along the evolutionary tree. The standard models further assume that each character has in fact evolved according to the same Markov process. This homogeneity assumption is unrealistic; for example, different types of characters are known to experience substitutions at different rates. Certain distance and maximum likelihood methods for topology estimation have been shown to be consistent under the homogeneity assumption. Here we give examples showing that these methods can fail to be consistent when the homogeneity assumption is relaxed. The examples are very simple, requiring only four taxa, binary characters, and characters that evolve at two different rates.
Collapse
Affiliation(s)
- J T Chang
- Yale University Statistics Department, New Haven, Connecticut 06520-8290, USA
| |
Collapse
|
35
|
Abstract
A new model of molecular evolution is introduced that allows for heterogeneous rates across the sequence positions. The development of this model was motivated by two issues: first, a number of studies have shown that the positions in a DNA sequence evolve at different rates, and second, it has been shown that not accounting for this heterogeneity can lead to biased estimates of evolutionary parameters. The authors generalize the Markovian model of molecular evolution to allow for heterogeneous rates and explore some of the consequences of such a model. In particular, they quantify the biases incurred by incorrectly assuming an equal-rate model and consider what can be learned about evolutionary parameters under a heterogeneous model.
Collapse
Affiliation(s)
- C Kelly
- Department of Computer Science and Statistics, University of Rhode Island, Kingston 02881, USA
| | | |
Collapse
|
36
|
Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 1994; 39:306-14. [PMID: 7932792 DOI: 10.1007/bf00160154] [Citation(s) in RCA: 1736] [Impact Index Per Article: 57.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the "discrete gamma model," uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called "fixed-rates model", classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsenstein's (1981, J Mol Evol 17:368-376) model, which assumes a single rate for all the sites.
Collapse
Affiliation(s)
- Z Yang
- Department of Zoology, Natural History Museum, London, United Kingdom
| |
Collapse
|
37
|
Wakeley J. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J Mol Evol 1993; 37:613-23. [PMID: 8114114 DOI: 10.1007/bf00182747] [Citation(s) in RCA: 163] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
More than an order of magnitude difference in substitution rate exists among sites within hypervariable region 1 of the control region of human mitochondrial DNA. A two-rate Poisson mixture and a negative binomial distribution are used to describe the distribution of the inferred number of changes per nucleotide site in this region. When three data sets are pooled, however, the two-rate model cannot explain the data. The negative binomial distribution always fits, suggesting that substitution rates are approximately gamma distributed among sites. Simulations presented here provide support for the use of a biased, yet commonly employed, method of examining rate variation. The use of parsimony in the method to infer the number of changes at each site introduces systematic errors into the analysis. These errors preclude an unbiased quantification of variation in substitution rate but make the method conservative overall. The method can be used to distinguish sites with highly elevated rates, and 29 such sites are identified in hypervariable region 1. Variation does not appear to be clustered within this region. Simulations show that biases in rates of substitution among nucleotides and non-uniform base composition can mimic the effects of variation in rate among sites. However, these factors contribute little to the levels of rate variation observed in hypervariable region 1.
Collapse
Affiliation(s)
- J Wakeley
- Department of Integrative Biology, University of California, Berkeley 94720
| |
Collapse
|
38
|
Abstract
We introduce a general class of models for sequence evolution that includes network phylogenies. Networks, a generalization of strictly tree-like phylogenies, are proposed to model situations where multiple lineages contribute to the observed sequences. An algorithm to compute the probability distribution of binary character-state configurations is presented and statistical inference for this model is developed in a likelihood framework. A stepwise procedure based on likelihood ratios is used to explore the space of models. Starting with a star phylogeny, new splits (nontrivial bipartitions of the sequence set) are successively added to the model until no significant change in the likelihood is observed. A novel feature of our approach is that the new splits are not necessarily constrained to be consistent with a treelike mode of evolution. The fraction of invariable sites is estimated by maximum likelihood simultaneously with other model parameters and is essential to obtain a good fit to the data. The effect of finite sequence length on the inference methods is discussed. Finally, we provide an illustrative example using aligned VP1 genes from the foot and mouth disease viruses (FMDV). The different serotypes of the FMDV exhibit a range of treelike and network evolutionary relationships.
Collapse
Affiliation(s)
- A von Haeseler
- Department of Zoology, University of Munich, Federal Republic of Germany
| | | |
Collapse
|
39
|
White SH, Jacobs RE. The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 1993; 36:79-95. [PMID: 8433379 DOI: 10.1007/bf02407307] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identify using the run test statistic (ro) of Mood (1940, Ann. Math. Stat. 11, 367-392). The probability density of ro for a collection of random sequences has mean = 0 and variance = 1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong alpha-helix propensity show a strong tendency to cluster whereas those with beta-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic "patterns" that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.
Collapse
Affiliation(s)
- S H White
- Department of Physiology and Biophysics, University of California, Irvine 92717
| | | |
Collapse
|
40
|
Abstract
Mitochondrial DNA data have been used extensively to study evolution and early human origins. These applications require estimates of the rate at which nucleotide substitutions occur in the DNA sequence. We consider the problem of estimating substitution rates in the presence of site-to-site rate variation. A coalescent model is presented that allows for different substitution rates for purines and pyrimidines, as well as more detailed models that allow fast and slow rates within each of the purine and pyrimidine classes. A method for estimating such rates is presented. Even for these simple models of site heterogeneity, there are, typically, insufficient data to obtain reliable estimates of site-specific substitution rates. However, estimates of the average rate across all sites appear to be relatively stable even in the presence of site heterogeneity. Simulations of models with site-to-site variation in mutation rate show that hypervariable sites can produce peaks in the pairwise difference curves that have previously been attributed to population dynamics.
Collapse
Affiliation(s)
- R Lundstrom
- Collaborative Research, Inc., Waltham, Massachusetts 012154
| | | | | |
Collapse
|
41
|
Sidow A, Nguyen T, Speed TP. Estimating the fraction of invariable codons with a capture-recapture method. J Mol Evol 1992; 35:253-60. [PMID: 1518092 DOI: 10.1007/bf00178601] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
A codon-based approach to estimating the number of variable sites in a protein is presented. When first and second positions of codons are assumed to be replacement positions, a capture-recapture model can be used to estimate the number of variable codons from every pair of homologous and aligned sequences. The capture-recapture estimate is compared to a maximum likelihood estimate of the number of variable codons and to previous approaches that estimate the number of variable sites (not codons) in a sequence. Computer simulations are presented that show under which circumstances the capture-recapture estimate can be used to correct biases in distance matrices. Analysis of published sequences of two genes, calmodulin and serum albumin, shows that distance corrections that employ a capture-recapture estimate of the number of variable sites may be considerably different from corrections that assume that the number of variable sites is equal to the total number of positions in the sequence.
Collapse
Affiliation(s)
- A Sidow
- Department of Molecular and Cell Biology, University of California, Berkeley 94720
| | | | | |
Collapse
|
42
|
White SH, Jacobs RE. Statistical distribution of hydrophobic residues along the length of protein chains. Implications for protein folding and evolution. Biophys J 1990; 57:911-21. [PMID: 2188687 PMCID: PMC1280792 DOI: 10.1016/s0006-3495(90)82611-4] [Citation(s) in RCA: 60] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
We consider in this paper the statistical distribution of hydrophobic residues along the length of protein chains. For this purpose we used a binary hydrophobicity scale which assigns hydrophobic residues a value of one and non-hydrophobes a value of zero. The resulting binary sequences are tested for randomness using the standard run test. For the majority of the 5,247 proteins examined, the distribution of hydrophobic residues along a sequence cannot be distinguished from that expected for a random distribution. This suggests that (a) functional proteins may have originated from random sequences, (b) the folding of proteins into compact structures may be much more permissive with less sequence specificity than previously thought, and (c) the clusters of hydrophobic residues along chains which are revealed by hydrophobicity plots are a natural consequence of a random distribution and can be conveniently described by binomial statistics.
Collapse
Affiliation(s)
- S H White
- Department of Physiology and Biophysics, University of California, Irvine 92717
| | | |
Collapse
|
43
|
Palumbi SR. Rates of molecular evolution and the fraction of nucleotide positions free to vary. J Mol Evol 1989; 29:180-7. [PMID: 2509718 DOI: 10.1007/bf02100116] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Selective constraints on DNA sequence change were incorporated into a model of DNA divergence by restricting substitutions to a subset of nucleotide positions. A simple model showed that both mutation rate and the fraction of nucleotide positions free to vary are strong determinants of DNA divergence over time. When divergence between two species approaches the fraction of positions free to vary, standard methods that correct for multiple mutations yield severe underestimates of the number of substitutions per site. A modified method appropriate for use with DNA sequence, restriction site, or thermal renaturation data is derived taking this fraction into account. The model also showed that the ratio of divergence in two gene classes (e.g., nuclear and mitochondrial) may vary widely over time even if the ratio of mutation rates remains constant. DNA sequence divergence data are used increasingly to detect differences in rates of molecular evolution. Often, variation in divergence rate is assumed to represent variation in mutation rate. The present model suggests that differing divergence rates among comparisons (either among gene classes or taxa) should be interpreted cautiously. Differences in the fraction of nucleotide positions free to vary can serve as an important alternative hypothesis to explain differences in DNA divergence rates.
Collapse
Affiliation(s)
- S R Palumbi
- Department of Zoology, University of Hawaii, Honolulu 96822
| |
Collapse
|
44
|
Syvanen M, Hartman H, Stevens PF. Classical plant taxonomic ambiguities extend to the molecular level. J Mol Evol 1989; 28:536-44. [PMID: 2549257 DOI: 10.1007/bf02602934] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The molecular evolution of cytochrome c from angiosperms is compared to that from vertebrates. On the basis of a cladistic analysis from 26 plant species, compared to that from 27 vertebrate species, we find that although the vertebrate sequences yield reasonably well-defined minimal trees that are congruent with the biological tree, the plant sequences yield multiple minimal trees that are not only highly incongruent with each other, but none of which is congruent with any reasonably biological tree. That is, the plant sequence set is much more homoplastic than that of the animal. However, as judged by the relative rate test, the extent of divergence, and degree of functional constraint, cytochrome c evolution in plants does not appear to differ from that of vertebrates.
Collapse
Affiliation(s)
- M Syvanen
- Department of Medical Microbiology and Immunology, Medical School, University of California, Davis 95616
| | | | | |
Collapse
|
45
|
Fitch WM. The estimate of total nucleotide substitutions from pairwise differences is biased. Philos Trans R Soc Lond B Biol Sci 1986; 312:317-24. [PMID: 2870524 DOI: 10.1098/rstb.1986.0010] [Citation(s) in RCA: 33] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
A nomographic method is presented that estimates the number of nucleotide substitutions since the common ancestor of two nucleotide sequences with no assumption about the proportion of transition and transversion substitutions except that it is constant over time. Of two previous methods of estimating this number, that of M. Kimura (Proc. natn. Acad. Sci. U.S.A. 78, 454-458 (1981) obtains the same result, and is thus confirmed by this work, while that of W. M. Brown, E. M. Prager, A. Wang & A. C. Wilson (J. molec. Evol. 18, 225-239 (1982] does not get the same result. The method presented here also obtains the fraction of all substitutions that are transitions. If one has three or more homologous sequences to compare, one can test the validity of the model by examining the constancy of the estimated proportion of substitutions that are transitions across the various pairs of sequences in a simple visual way. The method is general for any pair of mutually exclusive nucleotide substitutional categories, not just transitions and transversions. Mitochondrial data provide evidence that, for this and probably other current models correcting for superimposed substitutions, one or more of the underlying assumptions is incorrect. This is because there is some unknown systematic bias affecting this evolutionary process. It is suggested that at least part of the bias arises from incorrectly assuming that all sites are variable. In the absence of evidence that this bias is not present in other data, all estimates of the number of substitutions based upon pairs of sequences and current methods of estimating superimposed substitutions at a single site should be viewed as uncertain.
Collapse
|
46
|
Nei M, Tateno Y. Statistical properties of the Jukes-Holmquist method of estimating the number of nucleotide substitutions: reply to Holmquist and Conroy's criticism. J Mol Evol 1981; 17:182-7. [PMID: 6167733 DOI: 10.1007/bf01733912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Conducting computer simulations, Nei and Tateno (1978) have shown that Jukes and Holmquist's (1972) method of estimating the number of nucleotide substitutions tends to give an overestimate and the estimate obtained has a large variance. Holmquist and Conroy (1980) repeated some parts of our simulation and claim that the overestimation of nucleotide substitutions in our paper occurred mainly because we used selected data. Examination of Holmquist and Conroy's simulation indicates that their results are essentially the same as ours when the Jukes-Holmquist method is used, but since they used a different method of computation their estimates of nucleotide substitutions differed substantially from ours. Another problem in Holmquist and Conroy's Letter is that they confused the expected number of nucleotide substitution with the number in a sample. This confusion has resulted in a number of unnecessary arguments. They also criticized our X2 measure, but this criticism is apparently due to a misunderstanding of the assumptions of our method and a failure to use our method in the way we described. We believe that our earlier conclusions remain unchanged.
Collapse
|
47
|
Fitch WM. Estimating the total number of nucleotide substitutions since the common ancestor of a pair of homologous genes: comparison of several methods and three beta hemoglobin messenger RNA's. J Mol Evol 1980; 16:153-209. [PMID: 7009879 DOI: 10.1007/bf01804976] [Citation(s) in RCA: 53] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
48
|
|
49
|
Margoliash E. Evolutionary adaptation of mitochondrial cytochrome c to its functional milieu. UCLA FORUM IN MEDICAL SCIENCES 1980:299-321. [PMID: 233495 DOI: 10.1016/b978-0-12-643150-6.50024-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
50
|
Boyer SH, Scott AF, Kunkel LM, Smith KD. The proportion of all point mutations which are unacceptable: an estimate based on hemoglobin amino acid and nucleotide sequences. CANADIAN JOURNAL OF GENETICS AND CYTOLOGY. JOURNAL CANADIEN DE GENETIQUE ET DE CYTOLOGIE 1978; 20:111-37. [PMID: 350360 DOI: 10.1139/g78-013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Statistical analysis of the distribution of 156 kinds of human hemoglobin beta (Hbbeta) chain variants suggests that mutations are essentially random in their location. Thus differential fitness, not differential mutability, is the principal source of nonrandom distribution of interspecies differences in Hbbeta amino acid sequence. Similar analyses of both the location and the kind of interspecies differences detected among primates support this viewpoint and lead us to estimate that at least 95% of all amino acid subsitutions,i.e., nonsynonymous mutations, in Hbbeta are functionally unacceptable in homozygous state. Through the combined use of this estimate and the number of nonsynonymous and synonymous substitutions per nucleotide site inferred from comparisons of entire human and rabbit HbbetamRNA nucleotide sequences, we calculate (a) approximately 70% of synonymous Hbbeta mutations are adaptively undersirable and (b) the mutation rate underlying all changes is lesser than or equal to 10(-8) nucleotide substitutions per nucleotide site per year. Apart from such calculations, analyses of nucleotide patterns in HbbetamRNA as well as in rat preproinsulin mRNA reinforce the notion that a large portion of synonymous mutations are functionally unacceptable and rendered so by selective constraint, at a pretranslational level, of the abundance of particular nucleotide doublets such as CpG.
Collapse
|