1
|
Cunningham CW, Zhu H, Hillis DM. BEST‐FIT MAXIMUM‐LIKELIHOOD MODELS FOR PHYLOGENETIC INFERENCE: EMPIRICAL TESTS WITH KNOWN PHYLOGENIES. Evolution 2017; 52:978-987. [DOI: 10.1111/j.1558-5646.1998.tb01827.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/1997] [Accepted: 04/16/1998] [Indexed: 12/01/2022]
Affiliation(s)
| | - H. Zhu
- Zoology Department Duke University Durham North Carolina 27708
| | - D. M. Hillis
- Department of Zoology and Institute of Cellular and Molecular Biology University of Texas Austin Texas 78712
| |
Collapse
|
2
|
Harrison LB, Larsson HCE. Among-Character Rate Variation Distributions in Phylogenetic Analysis of Discrete Morphological Characters. Syst Biol 2014; 64:307-24. [DOI: 10.1093/sysbio/syu098] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Luke B. Harrison
- Redpath Museum, McGill University, 859 Sherbrooke Street West Montreal, Quebec, Canada H3A 0C4 and 2Redpath Museum, McGill University, 859 Sherbrooke ST W, Montreal, Quebec, Canada H3A 0C4
| | - Hans C. E. Larsson
- Redpath Museum, McGill University, 859 Sherbrooke Street West Montreal, Quebec, Canada H3A 0C4 and 2Redpath Museum, McGill University, 859 Sherbrooke ST W, Montreal, Quebec, Canada H3A 0C4
| |
Collapse
|
3
|
Abstract
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260).
Collapse
Affiliation(s)
- Janet S. Sinsheimer
- Human Genetics Department, University of California, Los Angeles
- Biomathematics Department, University of California, Los Angeles
- Biostatistics Department, University of California, Los Angeles
| | | | - James A. Lake
- Human Genetics Department, University of California, Los Angeles
- Molecular, Cell and Developmental Biology, University of California, Los Angeles
- *Corresponding author: E-mail:
| |
Collapse
|
4
|
Addario-Berry L, Chor B, Hallett M, Lagergren J, Panconesi A, Wareham T. ANCESTRAL MAXIMUM LIKELIHOOD OF EVOLUTIONARY TREES IS HARD. J Bioinform Comput Biol 2011; 2:257-71. [PMID: 15297981 DOI: 10.1142/s0219720004000557] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2003] [Revised: 01/19/2003] [Accepted: 01/26/2003] [Indexed: 11/18/2022]
Abstract
Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task — in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such hardness result has been obtained so far for ML.In this work we make a first step in this direction by proving that ancestral maximum likelihood (AML) is NP-complete. The input to this problem is a set of aligned sequences of equal length and the goal is to find a tree and an assignment of ancestral sequences for all of that tree's internal vertices such that the likelihood of generating both the ancestral and contemporary sequences is maximized. Our NP-hardness proof follows that for MP given in (Day, Johnson and Sankoff, 1986) in that we use the same reduction from VERTEX COVER; however, the proof of correctness for this reduction relative to AML is different and substantially more involved.
Collapse
|
5
|
Waddell PJ, Ota R, Penny D. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests. J Mol Evol 2009; 69:289-99. [PMID: 19851702 DOI: 10.1007/s00239-009-9268-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Accepted: 07/28/2009] [Indexed: 11/29/2022]
Abstract
Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P << 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.
Collapse
Affiliation(s)
- Peter J Waddell
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47906, USA.
| | | | | |
Collapse
|
6
|
von Reumont BM, Meusemann K, Szucsich NU, Dell'Ampio E, Gowri-Shankar V, Bartel D, Simon S, Letsch HO, Stocsits RR, Luan YX, Wägele JW, Pass G, Hadrys H, Misof B. Can comprehensive background knowledge be incorporated into substitution models to improve phylogenetic analyses? A case study on major arthropod relationships. BMC Evol Biol 2009; 9:119. [PMID: 19473484 PMCID: PMC2695459 DOI: 10.1186/1471-2148-9-119] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2008] [Accepted: 05/27/2009] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating non-stationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes. RESULTS We analyzed a large set of arthropod sequences, applied new tools for quality control of data prior to tree reconstruction, and increased the biological realism of substitution models. Although the split-decomposition network indicated a high noise content in the data set, our measures were able to both improve the analyses and give causal explanations for some incongruities mentioned from analyses of rRNA sequences. However, misleading effects did not completely disappear. CONCLUSION Analyses of data sets that result in ambiguous phylogenetic hypotheses demand for methods, which do not only filter stochastic noise, but likewise allow to differentiate phylogenetic signal from systematic biases. Such methods can only rely on our findings regarding the evolution of the analyzed data. Analyses on independent data sets then are crucial to test the plausibility of the results. Our approach can easily be extended to genomic data, as well, whereby layers of quality assessment are set up applicable to phylogenetic reconstructions in general.
Collapse
Affiliation(s)
| | - Karen Meusemann
- Molecular Lab, Zoologisches Forschungsmuseum A. Koenig, Bonn, Germany
| | | | | | | | - Daniela Bartel
- Department of Evolutionary Biology, University Vienna, Vienna, Austria
| | - Sabrina Simon
- ITZ, Ecology & Evolution, Stiftung Tieraerztliche Hochschule Hannover, Hannover, Germany
| | - Harald O Letsch
- Molecular Lab, Zoologisches Forschungsmuseum A. Koenig, Bonn, Germany
| | - Roman R Stocsits
- Molecular Lab, Zoologisches Forschungsmuseum A. Koenig, Bonn, Germany
| | - Yun-xia Luan
- Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, PR China
| | | | - Günther Pass
- Department of Evolutionary Biology, University Vienna, Vienna, Austria
| | - Heike Hadrys
- ITZ, Ecology & Evolution, Stiftung Tieraerztliche Hochschule Hannover, Hannover, Germany
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Bernhard Misof
- UHH Biozentrum Grindel und Zoologisches Museum, University of Hamburg, Hamburg, Germany
| |
Collapse
|
7
|
Gruenheit N, Lockhart PJ, Steel M, Martin W. Difficulties in testing for covarion-like properties of sequences under the confounding influence of changing proportions of variable sites. Mol Biol Evol 2008; 25:1512-20. [PMID: 18424773 DOI: 10.1093/molbev/msn098] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The covarion (COV)-like properties of sequences are poorly described and their impact on phylogenetic analyses poorly understood. We demonstrate using simulations that, under an evolutionary model where the proportion of variable sites changes in nonadjacent lineages, log likelihood values for rates across site (RAS) and COV models become similar, making models difficult to distinguish. Further, although COV and RAS models provide a great improvement in likelihood scores over a homogeneous model with these simulated data, reconstruction accuracy of tree building is low, suggesting caution when it is suspected that proportions of variable sites differ in different evolutionary lineages. We study the performance of a recently developed contingency test that detects the presence of COV-type evolution modified for protein data. We report that if proportions of variable sites (p(var)) change in a lineage-specific manner such that their distributions in different lineages become sufficiently nonoverlapping, then the contingency test can incorrectly suggest a homogeneous model. Also of concern is the possibility of different proportions of variable sites between the groups being studied. In a study of chloroplast proteins, interpretation of the test is found to be susceptible to different partitioning of taxon groups, making the test very subjective in its implementation. Extreme intergroup differences in the extent of divergence and difference in proportions of variable sites could be contributing to this effect.
Collapse
Affiliation(s)
- Nicole Gruenheit
- Institute of Botany III, University of Düsseldorf, Düsseldorf, Germany.
| | | | | | | |
Collapse
|
8
|
Kelchner SA, Thomas MA. Model use in phylogenetics: nine key questions. Trends Ecol Evol 2006; 22:87-94. [PMID: 17049674 DOI: 10.1016/j.tree.2006.10.004] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2006] [Revised: 09/19/2006] [Accepted: 10/05/2006] [Indexed: 11/16/2022]
Abstract
Models of character evolution underpin all phylogeny estimations, thus model adequacy remains a crucial issue for phylogenetics and its many applications. Although progress has been made in selecting appropriate models for phylogeny estimation, there is still concern about their purpose and proper use. How do we interpret models in a phylogenetic context? What are their effects on phylogeny estimation? How can we improve confidence in the models that we choose? That the phylogenetics community is asking such questions denotes an important stage in the use of explicit models. Here, we examine these and other common questions and draw conclusions about how the community is using and choosing models, and where this process will take us next.
Collapse
Affiliation(s)
- Scot A Kelchner
- Department of Biological Sciences, Idaho State University, Pocatello, ID 83209-8007, USA.
| | | |
Collapse
|
9
|
Waddell PJ. Measuring the fit of sequence data to phylogenetic model: allowing for missing data. Mol Biol Evol 2004; 22:395-401. [PMID: 15470228 DOI: 10.1093/molbev/msi002] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
It is fundamentally important to assess the fit of data to model in phylogenetic and evolutionary studies. Phylogenetic methods using molecular sequences typically start with a multiple alignment. It is possible to measure the fit of data to model expectations of data, for example, via the likelihood-ratio (G) test or the X(2) test, if all sites in all sequences have an unambiguous residue. However, nearly all alignments of interest contain sites (columns of the alignment) with missing data, that is, ambiguous nucleotides, gaps, or unsequenced regions, which must presently be removed before using the above tests. Unfortunately, this is often either undesirable or impractical, as it will discard much of the data. Here, we show how iterative ML estimators may directly estimate the site-pattern probabilities for columns with missing data, given only standard i.i.d. assumptions. The optimization may use an EM or Newton algorithm, or any other hill-climbing approach. The resulting optimal likelihood under the unconstrained or multinomial model may be compared directly with the likelihood of the data coming from the model (a G statistic). Alternatively the modified observed and the expected frequencies of site patterns may be compared using a X(2) test. The distribution of such statistics is best assessed using appropriate simulations. The new method is applicable to models using codons or paired sites. The methods are also useful with Hadamard conjugations (spectral analysis) and are illustrated with these and with ML evolutionary models that allow site-rate variability.
Collapse
Affiliation(s)
- Peter J Waddell
- Department of Statistics, Department of Biological Sciences, University of South Carolina, Columbia, USA.
| |
Collapse
|
10
|
Cejchan PA. LUCA, or just a conserved Archaeon?: Comments on Xue et al. (2003). Gene 2004; 333:47-50. [PMID: 15177679 DOI: 10.1016/j.gene.2004.02.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2003] [Revised: 09/24/2003] [Accepted: 02/05/2004] [Indexed: 11/24/2022]
Abstract
In their recent paper, Xue et al. used an unusual technique of rooting the universal phylogenetic tree, which resulted in positioning of the last universal common ancestor within Archaea. The present paper brings some criticisms on the methods and results achieved.
Collapse
Affiliation(s)
- Peter A Cejchan
- Laboratory of Paleobiology and Paleoecology, IG ASCR, Rozvojova 135, Prague CZ-16502, Czech Republic.
| |
Collapse
|
11
|
Wägele JW, Holland B, Dreyer H, Hackethal B. Searching factors causing implausible non-monophyly: ssu rDNA phylogeny of Isopoda Asellota (Crustacea: Peracarida) and faster evolution in marine than in freshwater habitats. Mol Phylogenet Evol 2003; 28:536-51. [PMID: 12927137 DOI: 10.1016/s1055-7903(03)00053-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
This contribution addresses two questions: which alignment patterns are causing non-monophyly of the Asellota and what is the phylogenetic history of this group? The Asellota are small benthic crustaceans occurring in most aquatic habitats. In view of the complex morphological apomorphies known for this group, monophyly of the Asellota has never been questioned. Using ssu rDNA sequences of outgroups and of 16 asellote species from fresh water, littoral marine habitats and from deep-sea localities, the early divergence between the lineages in fresh water and in the ocean, and the monophyly of the deep-sea taxon Munnopsidae are confirmed. Relative substitution rates of freshwater species are much lower than in other isopod species, rates being highest in some littoral marine genera (Carpias and Jaera). Furthermore, more sequence sites are variable in marine than in freshwater species, the latter conserve outgroup character states. Monophyly is recovered with parsimony methods, but not with distance and maximum likelihood analyses, which tear apart the marine from the freshwater species. The information content of alignments was studied with spectra of supporting positions. The scarcity of signal (=apomorphic nucleotides) supporting monophyly of the Asellota is attributed to a short stem-line of this group or to erosion of signal in fast evolving marine species. Parametric boostrapping in combination with spectra indicates that a tree model cannot explain the data and that monophyly of the Asellota should not be rejected even though many topologies do not recover this taxon.
Collapse
Affiliation(s)
- Johann-Wolfgang Wägele
- Lehrstuhl Spezielle Zoologie, Fakultät Biologie, Ruhr-Universität Bochum, 44780 Bochum, Germany.
| | | | | | | |
Collapse
|
12
|
|
13
|
Shpak M, Churchill GA. The information content of a character under a Markov model of evolution. Mol Phylogenet Evol 2000; 17:231-43. [PMID: 11083937 DOI: 10.1006/mpev.2000.0846] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The rate of evolutionary change associated with a character determines its utility for the reconstruction of phylogenetic history. For a given age of lineage splits, we examine the information content of a character to assess the magnitude and range of an optimal rate of substitution. On the one hand an optimal transition rate must provide sufficiently many character changes to distinguish subclades, whereas on the other hand changes must be sufficiently rare that reversals on a single branch (and hence homoplasy) are uncommon. In this study, we evolve binary characters over three tree topologies with fixed branch lengths, while varying transition rate as a parameter. We use the character state distribution obtained to measure the "information content" of a character given a transition rate. This is done with respect to several criteria-the probability of obtaining the correct tree using parsimony, the probability of infering the correct ancestral state, and Shannon-Weaver and Fisher information measures on the configuration of probability distributions. All of the information measures suggest the intuitive result of the existence of optimal rates for phylogeny reconstruction. This nonzero optimum is less pronounced if one conditions on there having been a change, in which case the parsimony-based results of minimum change being the most informative tends to hold.
Collapse
Affiliation(s)
- M Shpak
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520-8106, USA.
| | | |
Collapse
|
14
|
Martin P, Kaygorodova I, Sherbakov DY, Verheyen E. Rapidly evolving lineages impede the resolution of phylogenetic relationships among Clitellata (Annelida). Mol Phylogenet Evol 2000; 15:355-68. [PMID: 10860645 DOI: 10.1006/mpev.1999.0764] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The phylogenetic relationships of the Clitellata were investigated using a data set with published and new complete or partial 18S rRNA and mtCOI gene sequences of 13 and 49 taxa representing 8 and 14 families, respectively. Three different alignments were considered for 18S, and the possible influence of departures from rate constancy among sites was evaluated by analyses using a Gamma model of rate heterogeneity. Maximum-likelihood estimates of the shape parameter alpha of the Gamma distribution were very low, whatever the alignment or the gene considered, suggesting that phylogenetic reconstructions taking into account the rate heterogeneity among sites are likely to be the most reliable. Analyzed separately, the two genes did not resolve the relationships among the Clitellata, but the consensus tree was congruent with the morphology-based relationships. Our data suggest the inclusion of the Euhirudinea, Acanthobdellida, and Branchiobdellida in the Oligochaeta and suggest the Lumbriculidae as the link between both assemblages. Although separate analyses of both genes, as well as different alignments for the 18S rRNA sequences, yielded conflicting results concerning the phylogenetic position of leeches and leech-like worms vis-à-vis the Oligochaeta, subsequent analyses using the Gamma model greatly reduced the observed inconsistencies. Our analyses show that among the Clitellata, the leeches and the leech-like and gutless worms represent significantly faster evolving lineages. It is suggested that the observed higher mutation rates may be explained by the fact that these lineages contain almost exclusively commensal and/or parasitic taxa.
Collapse
Affiliation(s)
- P Martin
- Freshwater Biology Section, Taxonomy and Systematics Section, Royal Belgian Institute of Natural Sciences, rue Vautier 29, Brussels, B-1000, Belgium
| | | | | | | |
Collapse
|
15
|
Ota R, Waddell PJ, Hasegawa M, Shimodaira H, Kishino H. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol Biol Evol 2000; 17:798-803. [PMID: 10779540 DOI: 10.1093/oxfordjournals.molbev.a026358] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We show how to make appropriate likelihood ratio tests for evolutionary tree models when parameters such as edge (internodes or branches) lengths have nonnegativity constraints. In such cases, under the null model of an edge length being zero, the marginal distribution of this parameter is proven to be a "half-normal", that is, 50% zero values and 50% the positive half of a normal distribution. Other constrained parameters, such as the proportion of invariant sites, give similar results. To make likelihood ratio tests between nested models, e.g., H(0): homogeneous site rates, and H(1): site rates follow a gamma distribution with variance 1/k, then asymptotically as sequence length increases, the distribution under H(0) becomes a mixture of chi distributions, in this case 50% chi(0), and 50% chi(1) (where the subscript denotes degrees of freedom, i.e. , not the usually assumed 100% chi(1); which leads to a conservative test). Such mixtures are sometimes called distributions. Simulations show that even with sequences as short as 125 sites, some parameters, including the proportion of invariant sites, fit asymptotic distributions closely.
Collapse
Affiliation(s)
- R Ota
- The Graduate University for Advanced Studies and The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo, Japan
| | | | | | | | | |
Collapse
|
16
|
Abstract
Great interest is given to species emerging early in phylogenetic reconstruction because they are often assumed to represent an ancestor. Recent studies indicate, however, that species branching deep in molecular trees are often fast-evolving ones, misplaced because of the long-branch artefact. The detection of genuinely deep-branching organisms remains an elusive task.
Collapse
Affiliation(s)
- H Philippe
- Laboratoire de Biologie Cellulaire (URA CNRS 2227), Bâtiment 444, Université Paris-Sud, 91405 Orsay Cedex, France. herve.philippe@bio4. bc4.u-psud.fr
| | | |
Collapse
|
17
|
Gu X, Li WH. Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution. Proc Natl Acad Sci U S A 1998; 95:5899-905. [PMID: 9600890 PMCID: PMC34493 DOI: 10.1073/pnas.95.11.5899] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Estimation of evolutionary distances has always been a major issue in the study of molecular evolution because evolutionary distances are required for estimating the rate of evolution in a gene, the divergence dates between genes or organisms, and the relationships among genes or organisms. Other closely related issues are the estimation of the pattern of nucleotide substitution, the estimation of the degree of rate variation among sites in a DNA sequence, and statistical testing of the molecular clock hypothesis. Mathematical treatments of these problems are considerably simplified by the assumption of a stationary process in which the nucleotide compositions of the sequences under study have remained approximately constant over time, and there now exist fairly extensive studies of stationary models of nucleotide substitution, although some problems remain to be solved. Nonstationary models are much more complex, but significant progress has been recently made by the development of the paralinear and LogDet distances. This paper reviews recent studies on the above issues and reports results on correcting the estimation bias of evolutionary distances, the estimation of the pattern of nucleotide substitution, and the estimation of rate variation among the sites in a sequence.
Collapse
Affiliation(s)
- X Gu
- Institute of Molecular Evolutionary Genetics, 328 Mueller Laboratory, Pennsylvania State University, University Park, PA 16802, USA
| | | |
Collapse
|
18
|
Waddell PJ, Steel MA. General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites. Mol Phylogenet Evol 1997; 8:398-414. [PMID: 9417897 DOI: 10.1006/mpev.1997.0452] [Citation(s) in RCA: 138] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A series of new results useful to the study of DNA sequences using Markov models of substitution are presented with proofs. General time-reversible distances can be extended to accommodate any fixed distribution of rates across sites by replacing the logarithmic function of a matrix with the inverse of a moment generating function. Estimators are presented assuming a gamma distribution, the inverse Gaussian distribution, or a mixture of either of these with invariant sites. Also considered are the different ways invariant sites may be removed and how these differences may affect estimated distances. Through collaboration, we implemented these distances into PAUP in 1994. The variance of these new distances is approximated via the delta method. It is also shown how to predict the divergence expected for a pair of sequences given a rate matrix and a distribution of rates across sites, allowing iterated ML estimates of distances under any reversible model. A simple test of whether a rate matrix is time reversible is also presented. These new methods are used to estimate the divergence time of humans and chimps from mtDNA sequence data. These analyses support suggestions that the human lineage has an enhanced transition rate relative to other hominoids. These studies also show that transversion distances differ substantially from the overall distances which are dominated by transitions. Transversions alone apparently suggest a very recent divergence time for humans versus chimps and/or a very old (> 16 myr) divergence time for humans versus orangutans. This work illustrates graphically ways to interpret the reliability of distance-based transformations, using the corrected transition to transversion ratio returned for pairs of sequences which are successively more diverged.
Collapse
Affiliation(s)
- P J Waddell
- School of Biological Sciences, Massey University, Palmerston North, New Zealand.
| | | |
Collapse
|