Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Williams TA, Cox CJ, Foster PG, Szöllősi GJ, Embley TM. Phylogenomics provides robust support for a two-domains tree of life. Nat Ecol Evol 2020;4:138-147. [PMID: 31819234 PMCID: PMC6942926 DOI: 10.1038/s41559-019-1040-x] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/15/2019] [Indexed: 11/09/2022]

Weber CC, Whelan S. Physicochemical Amino Acid Properties Better Describe Substitution Rates in Large Populations. Mol Biol Evol 2019;36:679-690. [PMID: 30668757 DOI: 10.1093/molbev/msz003] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Beaulieu JM, O’Meara BC, Zaretzki R, Landerer C, Chai J, Gilchrist MA. Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach. Mol Biol Evol 2019;36:834-851. [PMID: 30521036 PMCID: PMC6445302 DOI: 10.1093/molbev/msy222] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Goldstein RA, Pollock DD. The tangled bank of amino acids. Protein Sci 2016;25:1354-62. [PMID: 27028523 PMCID: PMC4918418 DOI: 10.1002/pro.2930] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Revised: 03/24/2016] [Accepted: 03/24/2016] [Indexed: 12/01/2022]

Kück P, Wägele JW. Plesiomorphic character states cause systematic errors in molecular phylogenetic analyses: a simulation study. Cladistics 2015;32:461-478. [DOI: 10.1111/cla.12132] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/15/2015] [Indexed: 01/17/2023] Open

Money D, Whelan S. GeLL: a generalized likelihood library for phylogenetic models. Bioinformatics 2015;31:2391-3. [PMID: 25725494 DOI: 10.1093/bioinformatics/btv126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 02/23/2015] [Indexed: 11/13/2022] Open

Md Mukarram Hossain AS, Blackburne BP, Shah A, Whelan S. Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty. Genome Biol Evol 2015;7:2102-16. [PMID: 26139831 PMCID: PMC4558847 DOI: 10.1093/gbe/evv127] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Evolutionary studies usually use a two-step process to investigate sequence data. Step one estimates a multiple sequence alignment (MSA) and step two applies phylogenetic methods to ask evolutionary questions of that MSA. Modern phylogenetic methods infer evolutionary parameters using maximum likelihood or Bayesian inference, mediated by a probabilistic substitution model that describes sequence change over a tree. The statistical properties of these methods mean that more data directly translates to an increased confidence in downstream results, providing the substitution model is adequate and the MSA is correct. Many studies have investigated the robustness of phylogenetic methods in the presence of substitution model misspecification, but few have examined the statistical properties of those methods when the MSA is unknown. This simulation study examines the statistical properties of the complete two-step process when inferring sequence divergence and the phylogenetic tree topology. Both nucleotide and amino acid analyses are negatively affected by the alignment step, both through inaccurate guide tree estimates and through overfitting to that guide tree. For many alignment tools these effects become more pronounced when additional sequences are added to the analysis. Nucleotide sequences are particularly susceptible, with MSA errors leading to statistical support for long-branch attraction artifacts, which are usually associated with gross substitution model misspecification. Amino acid MSAs are more robust, but do tend to arbitrarily resolve multifurcations in favor of the guide tree. No inference strategies produce consistently accurate estimates of divergence between sequences, although amino acid MSAs are again more accurate than their nucleotide counterparts. We conclude with some practical suggestions about how to limit the effect of MSA uncertainty on evolutionary inference.

Collapse

Spielman SJ, Wilke CO. The relationship between dN/dS and scaled selection coefficients. Mol Biol Evol 2015;32:1097-108. [PMID: 25576365 PMCID: PMC4379412 DOI: 10.1093/molbev/msv003] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS ≤ 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. In contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.

Collapse

Brown JM. Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. Syst Biol 2014;63:334-48. [PMID: 24415681 DOI: 10.1093/sysbio/syu002] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mallatt J, Chittenden KD. The GC content of LSU rRNA evolves across topological and functional regions of the ribosome in all three domains of life. Mol Phylogenet Evol 2014;72:17-30. [PMID: 24394731 DOI: 10.1016/j.ympev.2013.12.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/28/2013] [Accepted: 12/24/2013] [Indexed: 12/21/2022]

Abstract

Large-subunit rRNA is the ribozyme that catalyzes protein synthesis by translation, and many of its features vary along a deep-to-superficial gradient. By measuring the G+C proportions in this rRNA in all three domains of life (60 bacteria, 379 eukaryote, and 23 archaean sequences), we tested whether the proportion of GC nucleotides varies along this in-out gradient. The rRNA regions used were several zones identified by Bokov and Steinberg (2009) as being arranged from deep to superficial within the LSU. To the Bokov-Steinberg zones, we added the most superficial zone of all, the divergent domains (expansion segments), which are greatly enlarged in eukaryotes. Regression lines constructed from the hundreds of species of organisms revealed the expected in-out gradient, showing that species with high %GC (or high %AT) in their rRNA distribute more of these abundant nucleotides into the peripheral zones. This could be explained by the evolutionary rates of replacement of all nucleotides (A, C, G, T), because these latter rates are fastest at the periphery and slowest near the conserved core. As an overall explanation, we propose that when extrinsic factors (whole-genome nucleotide composition, or environmental temperature) demand the percentage of GC in the rRNA of a species be high or low, then the deep-lying zones are buffered against GC variation because they are the slowest to evolve. The deep, conserved zones are also the most involved in translation, hinting that stabilizing selection there prevents a high GC variability that would diminish LSU rRNA's core functions. We found only a few domain-specific trends in rRNA-GC distribution, which relate to many Archaea living at high temperatures or to the highly complex genes and adaptations of Eukaryota. Use of rRNA sequences in molecular phylogenetic studies, for reconstructing the relationships of organisms across the tree of life, requires accurate models of how rRNA evolves. The demonstration that GC distributes in regular patterns across rRNA regions can improve these tree-reconstruction models in the future and should yield phylogenies of greater accuracy.

Collapse

Roquet C, Thuiller W, Lavergne S. Building megaphylogenies for macroecology: taking up the challenge. ECOGRAPHY 2013;36:13-26. [PMID: 24790290 PMCID: PMC4001083 DOI: 10.1111/j.1600-0587.2012.07773.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]

Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Wu CH, Suchard MA, Drummond AJ. Bayesian selection of nucleotide substitution models and their site assignments. Mol Biol Evol 2012;30:669-88. [PMID: 23233462 PMCID: PMC3563969 DOI: 10.1093/molbev/mss258] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Holland BR, Jarvis PD, Sumner JG. Low-Parameter Phylogenetic Inference Under the General Markov Model. Syst Biol 2012;62:78-92. [DOI: 10.1093/sysbio/sys072] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Sumner JG, Fernández-Sánchez J, Jarvis PD. Lie Markov models. J Theor Biol 2011;298:16-31. [PMID: 22212913 DOI: 10.1016/j.jtbi.2011.12.017] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Revised: 12/15/2011] [Accepted: 12/16/2011] [Indexed: 10/14/2022]

Jordan G, Goldman N. The Effects of Alignment Error and Alignment Filtering on the Sitewise Detection of Positive Selection. Mol Biol Evol 2011;29:1125-39. [DOI: 10.1093/molbev/msr272] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited. PLoS One 2011;6:e22783. [PMID: 21850235 PMCID: PMC3151265 DOI: 10.1371/journal.pone.0022783] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Accepted: 07/05/2011] [Indexed: 11/19/2022] Open

Regier JC, Zwick A. Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS One 2011;6:e23408. [PMID: 21829732 PMCID: PMC3150433 DOI: 10.1371/journal.pone.0023408] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 07/15/2011] [Indexed: 11/25/2022] Open

Abstract

BACKGROUND

This study aims to investigate the strength of various sources of phylogenetic information that led to recent seemingly robust conclusions about higher-level arthropod phylogeny and to assess the role of excluding or downweighting synonymous change for arriving at those conclusions.

METHODOLOGY/PRINCIPAL FINDINGS

The current study analyzes DNA sequences from 68 gene segments of 62 distinct protein-coding nuclear genes for 80 species. Gene segments analyzed individually support numerous nodes recovered in combined-gene analyses, but few of the higher-level nodes of greatest current interest. However, neither is there support for conflicting alternatives to these higher-level nodes. Gene segments with higher rates of nonsynonymous change tend to be more informative overall, but those with lower rates tend to provide stronger support for deeper nodes. Higher-level nodes with bootstrap values in the 80% - 99% range for the complete data matrix are markedly more sensitive to substantial drops in their bootstrap percentages after character subsampling than those with 100% bootstrap, suggesting that these nodes are likely not to have been strongly supported with many fewer data than in the full matrix. Data set partitioning of total data by (mostly) synonymous and (mostly) nonsynonymous change improves overall node support, but the result remains much inferior to analysis of (unpartitioned) nonsynonymous change alone. Clusters of genes with similar nonsynonymous rate properties (e.g., faster vs. slower) show some distinct patterns of node support but few conflicts. Synonymous change is shown to contribute little, if any, phylogenetic signal to the support of higher-level nodes, but it does contribute nonphylogenetic signal, probably through its underlying heterogeneous nucleotide composition. Analysis of seemingly conservative indels does not prove useful.

CONCLUSIONS

Generating a robust molecular higher-level phylogeny of Arthropoda is currently possible with large amounts of data and an exclusive reliance on nonsynonymous change.

Collapse

Ané C. Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 2011;3:246-58. [PMID: 21362638 PMCID: PMC3070431 DOI: 10.1093/gbe/evr013] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Wang HC, Susko E, Roger AJ. Fast statistical tests for detecting heterotachy in protein evolution. Mol Biol Evol 2011;28:2305-15. [PMID: 21343603 DOI: 10.1093/molbev/msr050] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Abstract

The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites.

Collapse

den Bakker HC, Bundrant BN, Fortes ED, Orsi RH, Wiedmann M. A population genetics-based and phylogenetic approach to understanding the evolution of virulence in the genus Listeria. Appl Environ Microbiol 2010;76:6085-100. [PMID: 20656873 PMCID: PMC2937515 DOI: 10.1128/aem.00447-10] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Accepted: 07/12/2010] [Indexed: 11/20/2022] Open

Whelan S, Blackburne BP, Spencer M. Phylogenetic substitution models for detecting heterotachy during plastid evolution. Mol Biol Evol 2010;28:449-58. [PMID: 20724379 DOI: 10.1093/molbev/msq215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Stöver BC, Müller KF. TreeGraph 2: combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics 2010;11:7. [PMID: 20051126 PMCID: PMC2806359 DOI: 10.1186/1471-2105-11-7] [Citation(s) in RCA: 895] [Impact Index Per Article: 63.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2009] [Accepted: 01/05/2010] [Indexed: 01/22/2023] Open

Abstract

Background

Today it is common to apply multiple potentially conflicting data sources to a given phylogenetic problem. At the same time, several different inference techniques are routinely employed instead of relying on just one. In view of both trends it is becoming increasingly important to be able to efficiently compare different sets of statistical values supporting (or conflicting with) the nodes of a given tree topology, and merging this into a meaningful representation. A tree editor supporting this should also allow for flexible editing operations and be able to produce ready-to-publish figures.

Results

We developed TreeGraph 2, a GUI-based graphical editor for phylogenetic trees (available from http://treegraph.bioinfweb.info). It allows automatically combining information from different phylogenetic analyses of a given dataset (or from different subsets of the dataset), and helps to identify and graphically present incongruences. The program features versatile editing and formatting options, such as automatically setting line widths or colors according to the value of any of the unlimited number of variables that can be assigned to each node or branch. These node/branch data can be imported from spread sheets or other trees, be calculated from each other by specified mathematical expressions, filtered, copied from and to other internal variables, be kept invisible or set visible and then be freely formatted (individually or across the whole tree). Beyond typical editing operations such as tree rerooting and ladderizing or moving and collapsing of nodes, whole clades can be copied from other files and be inserted (along with all node/branch data and legends), but can also be manually added and, thus, whole trees can quickly be manually constructed de novo. TreeGraph 2 outputs various graphic formats such as SVG, PDF, or PNG, useful for tree figures in both publications and presentations.

Conclusion

TreeGraph 2 is a user-friendly, fully documented application to produce ready-to-publish trees. It can display any number of annotations in several ways, and permits easily importing and combining them. Additionally, a great number of editing- and formatting-operations is available.

Collapse

Wang HC, Susko E, Roger AJ. PROCOV: maximum likelihood estimation of protein phylogeny under covarion models and site-specific covarion pattern analysis. BMC Evol Biol 2009;9:225. [PMID: 19737395 PMCID: PMC2758850 DOI: 10.1186/1471-2148-9-225] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 09/08/2009] [Indexed: 11/12/2022] Open

Abstract

Background

The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference.

Results

PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors.

Conclusion

Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.

Collapse

Wu J, Susko E. General heterotachy and distance method adjustments. Mol Biol Evol 2009;26:2689-97. [PMID: 19687305 DOI: 10.1093/molbev/msp184] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Whelan S. The genetic code can cause systematic bias in simple phylogenetic models. Philos Trans R Soc Lond B Biol Sci 2009;363:4003-11. [PMID: 18852102 DOI: 10.1098/rstb.2008.0171] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Phylogenetic analysis depends on inferential methodology estimating accurately the degree of divergence between sequences. Inaccurate estimates can lead to misleading evolutionary inferences, including incorrect tree topology estimates and poor dating of historical species divergence. Protein coding sequences are ubiquitous in phylogenetic inference, but many of the standard methods commonly used to describe their evolution do not explicitly account for the dependencies between sites in a codon induced by the genetic code. This study evaluates the performance of several standard methods on datasets simulated under a simple substitution model, describing codon evolution under a range of different types of selective pressures. This approach also offers insights into the relative performance of different phylogenetic methods when there are dependencies acting between the sites in the data. Methods based on statistical models performed well when there was no or limited purifying selection in the simulated sequences (low degree of dependency between sites in a codon), although more biologically realistic models tended to outperform simpler models. Phylogenetic methods exhibited greater variability in performance for sequences simulated under strong purifying selection (high degree of the dependencies between sites in a codon). Simple models substantially underestimate the degree of divergence between sequences, and underestimation was more pronounced on the internal branches of the tree. This underestimation resulted in some statistical methods performing poorly and exhibiting evidence for systematic bias in tree inference. Amino acid-based and nucleotide models that contained generic descriptions of spatial and temporal heterogeneity, such as mixture and temporal hidden Markov models, coped notably better, producing more accurate estimates of evolutionary divergence and the tree topology.

Collapse

Spencer M, Sangaralingam A. A phylogenetic mixture model for gene family loss in parasitic bacteria. Mol Biol Evol 2009;26:1901-8. [PMID: 19435739 DOI: 10.1093/molbev/msp102] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open