51
|
Simmons MP, Sloan DB, Gatesy J. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 2016; 97:76-89. [PMID: 26768112 DOI: 10.1016/j.ympev.2015.12.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/03/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA
| |
Collapse
|
52
|
Springer MS, Gatesy J. The gene tree delusion. Mol Phylogenet Evol 2016; 94:1-33. [DOI: 10.1016/j.ympev.2015.07.018] [Citation(s) in RCA: 145] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 06/04/2015] [Accepted: 07/22/2015] [Indexed: 10/23/2022]
|
53
|
Huang JP, Knowles LL. The Species versus Subspecies Conundrum: Quantitative Delimitation from Integrating Multiple Data Types within a Single Bayesian Approach in Hercules Beetles. Syst Biol 2015; 65:685-99. [PMID: 26681696 DOI: 10.1093/sysbio/syv119] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 12/10/2015] [Indexed: 11/13/2022] Open
Abstract
With the recent attention and focus on quantitative methods for species delimitation, an overlooked but equally important issue regards what has actually been delimited. This study investigates the apparent arbitrariness of some taxonomic distinctions, and in particular how species and subspecies are assigned. Specifically, we use a recently developed Bayesian model-based approach to show that in the Hercules beetles (genus Dynastes) there is no statistical difference in the probability that putative taxa represent different species, irrespective of whether they were given species or subspecies designations. By considering multiple data types, as opposed to relying exclusively on genetic data alone, we also show that both previously recognized species and subspecies represent a variety of points along the speciation spectrum (i.e., previously recognized species are not systematically further along the continuum than subspecies). For example, based on evolutionary models of divergence, some taxa are statistically distinguishable on more than one axis of differentiation (e.g., along both phenotypic and genetic dimensions), whereas other taxa can only be delimited statistically from a single data type. Because both phenotypic and genetic data are analyzed in a common Bayesian framework, our study provides a framework for investigating whether disagreements in species boundaries among data types reflect (i) actual discordance with the actual history of lineage splitting, or instead (ii) differences among data types in the amount of time required for differentiation to become apparent among the delimited taxa. We discuss what the answers to these questions imply about what characters are used to delimit species, as well as the diverse processes involved in the origin and maintenance of species boundaries. With this in mind, we then reflect more generally on how quantitative methods for species delimitation are used to assign taxonomic status.
Collapse
Affiliation(s)
- Jen-Pan Huang
- Department of Ecology and Evolutionary Biology, 1109 Geddes Avenue, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| | - L Lacey Knowles
- Department of Ecology and Evolutionary Biology, 1109 Geddes Avenue, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| |
Collapse
|
54
|
Wade EJ, Hertach T, Gogala M, Trilar T, Simon C. Molecular species delimitation methods recover most song‐delimited cicada species in the European
Cicadetta montana
complex. J Evol Biol 2015; 28:2318-36. [DOI: 10.1111/jeb.12756] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 07/23/2015] [Accepted: 09/06/2015] [Indexed: 12/30/2022]
Affiliation(s)
- E. J. Wade
- Department of Ecology and Evolutionary Biology University of Connecticut Storrs CT USA
| | - T. Hertach
- Department of Environmental Sciences, Biogeography University of Basel Basel Switzerland
| | - M. Gogala
- Slovenian Academy of Sciences and Arts Ljubljana Slovenia
| | - T. Trilar
- Slovenian Museum of Natural History Ljubljana Slovenia
| | - C. Simon
- Department of Ecology and Evolutionary Biology University of Connecticut Storrs CT USA
- School of Biological Sciences Victoria University of Wellington Wellington New Zealand
| |
Collapse
|
55
|
De Maio N, Schrempf D, Kosiol C. PoMo: An Allele Frequency-Based Approach for Species Tree Estimation. Syst Biol 2015; 64:1018-31. [PMID: 26209413 PMCID: PMC4604832 DOI: 10.1093/sysbio/syv048] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 06/11/2015] [Indexed: 11/24/2022] Open
Abstract
Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.
Collapse
Affiliation(s)
- Nicola De Maio
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien 1210, Austria; Vienna Graduate School of Population Genetics, Wien, Austria; and Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Dominik Schrempf
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien 1210, Austria; Vienna Graduate School of Population Genetics, Wien, Austria; and
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien 1210, Austria;
| |
Collapse
|
56
|
Ruane S, Raxworthy CJ, Lemmon AR, Lemmon EM, Burbrink FT. Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes. BMC Evol Biol 2015; 15:221. [PMID: 26459325 PMCID: PMC4603904 DOI: 10.1186/s12862-015-0503-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/01/2015] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar's serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci. METHODS Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species tree approaches. We also examined the individual gene trees in comparison to the 377-locus species tree using the program MetaTree. RESULTS Using the full anchored dataset under a variety of methods gave us the same, well-supported phylogeny for pseudoxyrhophiines. The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations associated with tooth venom delivery may have played a major role in the early diversification of this radiation. The comparison of tree topologies from the concatenated and species-tree methods using different datasets indicated the 5-locus dataset cannot beused to infer a correct phylogeny for the pseudoxyrhophiines under any method tested here and that summary statistics methods require 50 or more loci to consistently recover the species-tree inferred using the complete anchored dataset. However, as few as 15 loci may infer the correct topology when using the full coalescent species tree method *BEAST. MetaTree analyses of each gene tree from the Sanger and anchored datasets found that none of the individual gene trees matched the 377-locus species tree, and that no gene trees were identical with respect to topology. CONCLUSIONS Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summaryspecies-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci necessary to infer robust species trees.
Collapse
Affiliation(s)
- Sara Ruane
- Department of Herpetology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.
| | - Christopher J Raxworthy
- Department of Herpetology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.
| | - Alan R Lemmon
- Department of Biology, Florida State University, 319 Stadium Drive, P.O. Box 3064295, Tallahassee, FL, 32306-4295, USA.
| | - Emily Moriarty Lemmon
- Department of Biology, Florida State University, 319 Stadium Drive, P.O. Box 3064295, Tallahassee, FL, 32306-4295, USA.
| | - Frank T Burbrink
- Department of Herpetology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.
- Biology Department, College of Staten Island/CUNY, 2800 Victory Boulevard, Staten Island, NY, 10314, USA.
| |
Collapse
|
57
|
Gruenstaeudl M, Reid NM, Wheeler GL, Carstens BC. Posterior predictive checks of coalescent models: P2C2M, an R package. Mol Ecol Resour 2015; 16:193-205. [DOI: 10.1111/1755-0998.12435] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 05/22/2015] [Accepted: 05/26/2015] [Indexed: 02/04/2023]
Affiliation(s)
- Michael Gruenstaeudl
- Department of Evolution, Ecology & Organismal Biology; Ohio State University; Columbus OH 43210 USA
| | - Noah M. Reid
- Department of Environmental Toxicology; University of California; Davis CA 95616 USA
| | - Gregory L. Wheeler
- Department of Evolution, Ecology & Organismal Biology; Ohio State University; Columbus OH 43210 USA
| | - Bryan C. Carstens
- Department of Evolution, Ecology & Organismal Biology; Ohio State University; Columbus OH 43210 USA
| |
Collapse
|
58
|
Monte Carlo Strategies for Selecting Parameter Values in Simulation Experiments. Syst Biol 2015; 64:741-51. [DOI: 10.1093/sysbio/syv030] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Accepted: 05/12/2015] [Indexed: 11/14/2022] Open
|
59
|
Giarla TC, Esselstyn JA. The Challenges of Resolving a Rapid, Recent Radiation: Empirical and Simulated Phylogenomics of Philippine Shrews. Syst Biol 2015; 64:727-40. [DOI: 10.1093/sysbio/syv029] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 05/07/2015] [Indexed: 01/30/2023] Open
|
60
|
Liu L, Xi Z, Wu S, Davis CC, Edwards SV. Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci 2015; 1360:36-53. [DOI: 10.1111/nyas.12747] [Citation(s) in RCA: 129] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Liang Liu
- Department of Statistics; University of Georgia; Athens Georgia
- Institute of Bioinformatics; University of Georgia; Athens Georgia
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Shaoyuan Wu
- Department of Biochemistry and Molecular Biology & Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences; Tianjin Medical University; Tianjin China
| | - Charles C. Davis
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| |
Collapse
|
61
|
Multi-locus fossil-calibrated phylogeny of Atheriniformes (Teleostei, Ovalentaria). Mol Phylogenet Evol 2015; 86:8-23. [PMID: 25769409 DOI: 10.1016/j.ympev.2015.03.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Revised: 02/21/2015] [Accepted: 03/02/2015] [Indexed: 11/21/2022]
Abstract
Phylogenetic relationships among families within the order Atheriniformes have been difficult to resolve on the basis of morphological evidence. Molecular studies so far have been fragmentary and based on a small number taxa and loci. In this study, we provide a new phylogenetic hypothesis based on sequence data collected for eight molecular markers for a representative sample of 103 atheriniform species, covering 2/3 of the genera in this order. The phylogeny is calibrated with six carefully chosen fossil taxa to provide an explicit timeframe for the diversification of this group. Our results support the subdivision of Atheriniformes into two suborders (Atherinopsoidei and Atherinoidei), the nesting of Notocheirinae within Atherinopsidae, and the monophyly of tribe Menidiini, among others. We propose taxonomic changes for Atherinopsoidei, but a few weakly supported nodes in our phylogeny suggests that further study is necessary to support a revised taxonomy of Atherinoidei. The time-calibrated phylogeny was used to infer ancestral habitat reconstructions to explain the current distribution of marine and freshwater taxa. Based on these results, the current distribution of Atheriniformes is likely due to widespread marine dispersal along the margins of continents, infrequent trans-oceanic dispersal, and repeated invasion of freshwater habitats. This conclusion is supported by post-Gondwanan divergence times among families within the order, and a high probability of a marine ancestral habitat.
Collapse
|
62
|
Tonini J, Moore A, Stern D, Shcheglovitova M, Ortí G. Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions. PLOS CURRENTS 2015; 7. [PMID: 25901289 PMCID: PMC4391732 DOI: 10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Phylogeneticists have long understood that several biological processes can cause a gene tree to disagree with its species tree. In recent years, molecular phylogeneticists have increasingly foregone traditional supermatrix approaches in favor of species tree methods that account for one such source of error, incomplete lineage sorting (ILS). While gene tree-species tree discordance no doubt poses a significant challenge to phylogenetic inference with molecular data, researchers have only recently begun to systematically evaluate the relative accuracy of traditional and ILS-sensitive methods. Here, we report on simulations demonstrating that concatenation can perform as well or better than methods that attempt to account for sources of error introduced by ILS. Based on these and similar results from other researchers, we argue that concatenation remains a useful component of the phylogeneticist’s toolbox and highlight that phylogeneticists should continue to make explicit comparisons of results produced by contemporaneous and classical methods.
Collapse
Affiliation(s)
- João Tonini
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| | - Andrew Moore
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - David Stern
- Computational Biology Institute, Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - Maryia Shcheglovitova
- Department of Geography & Environmental Systems, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| |
Collapse
|
63
|
Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor Popul Biol 2015; 100C:56-62. [DOI: 10.1016/j.tpb.2014.12.005] [Citation(s) in RCA: 174] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Revised: 11/10/2014] [Accepted: 12/18/2014] [Indexed: 01/14/2023]
|
64
|
Lanier HC, Knowles LL. Applying species-tree analyses to deep phylogenetic histories: Challenges and potential suggested from a survey of empirical phylogenetic studies. Mol Phylogenet Evol 2015; 83:191-9. [DOI: 10.1016/j.ympev.2014.10.022] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 08/30/2014] [Accepted: 10/29/2014] [Indexed: 10/24/2022]
|
65
|
Chifman J, Kubatko L. Quartet inference from SNP data under the coalescent model. Bioinformatics 2014; 30:3317-24. [PMID: 25104814 PMCID: PMC4296144 DOI: 10.1093/bioinformatics/btu530] [Citation(s) in RCA: 637] [Impact Index Per Article: 63.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 07/30/2014] [Accepted: 08/01/2014] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Increasing attention has been devoted to estimation of species-level phylogenetic relationships under the coalescent model. However, existing methods either use summary statistics (gene trees) to carry out estimation, ignoring an important source of variability in the estimates, or involve computationally intensive Bayesian Markov chain Monte Carlo algorithms that do not scale well to whole-genome datasets. RESULTS We develop a method to infer relationships among quartets of taxa under the coalescent model using techniques from algebraic statistics. Uncertainty in the estimated relationships is quantified using the nonparametric bootstrap. The performance of our method is assessed with simulated data. We then describe how our method could be used for species tree inference in larger taxon samples, and demonstrate its utility using datasets for Sistrurus rattlesnakes and for soybeans. AVAILABILITY AND IMPLEMENTATION The method to infer the phylogenetic relationship among quartets is implemented in the software SVDquartets, available at www.stat.osu.edu/∼lkubatko/software/SVDquartets.
Collapse
Affiliation(s)
- Julia Chifman
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Laura Kubatko
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC 27157, Department of Statistics, The Ohio State University, Columbus, OH 43210 and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
66
|
Gatesy J, Springer MS. Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol Phylogenet Evol 2014; 80:231-66. [DOI: 10.1016/j.ympev.2014.08.013] [Citation(s) in RCA: 239] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Revised: 07/26/2014] [Accepted: 08/10/2014] [Indexed: 11/16/2022]
|
67
|
Wielstra B, Arntzen JW, van der Gaag KJ, Pabijan M, Babik W. Data concatenation, Bayesian concordance and coalescent-based analyses of the species tree for the rapid radiation of Triturus newts. PLoS One 2014; 9:e111011. [PMID: 25337997 PMCID: PMC4206468 DOI: 10.1371/journal.pone.0111011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 09/22/2014] [Indexed: 11/18/2022] Open
Abstract
The phylogenetic relationships for rapid species radiations are difficult to disentangle. Here we study one such case, namely the genus Triturus, which is composed of the marbled and crested newts. We analyze data for 38 genetic markers, positioned in 3-prime untranslated regions of protein-coding genes, obtained with 454 sequencing. Our dataset includes twenty Triturus newts and represents all nine species. Bayesian analysis of population structure allocates all individuals to their respective species. The branching patterns obtained by data concatenation, Bayesian concordance analysis and coalescent-based estimations of the species tree differ from one another. The data concatenation based species tree shows high branch support but branching order is considerably affected by allele choice in the case of heterozygotes in the concatenation process. Bayesian concordance analysis expresses the conflict between individual gene trees for part of the Triturus species tree as low concordance factors. The coalescent-based species tree is relatively similar to a previously published species tree based upon morphology and full mtDNA and any conflicting internal branches are not highly supported. Our findings reflect high gene tree discordance due to incomplete lineage sorting (possibly aggravated by hybridization) in combination with low information content of the markers employed (as can be expected for relatively recent species radiations). This case study highlights the complexity of resolving rapid radiations and we acknowledge that to convincingly resolve the Triturus species tree even more genes will have to be consulted.
Collapse
Affiliation(s)
- Ben Wielstra
- Naturalis Biodiversity Center, Leiden, The Netherlands
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | | | | | - Maciej Pabijan
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
- Institute of Zoology, Jagiellonian University, Kraków, Poland
| | - Wieslaw Babik
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| |
Collapse
|
68
|
A time-calibrated, multi-locus phylogeny of piranhas and pacus (Characiformes: Serrasalmidae) and a comparison of species tree methods. Mol Phylogenet Evol 2014; 81:242-57. [PMID: 25261120 DOI: 10.1016/j.ympev.2014.06.018] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 06/17/2014] [Accepted: 06/18/2014] [Indexed: 12/13/2022]
Abstract
The phylogeny of piranhas, pacus, and relatives (family Serrasalmidae) was inferred on the basis of DNA sequences from eleven gene fragments that include the mitochondrial control region plus 10 nuclear genes (two exons and eight introns). The new data were obtained for a representative sampling of 53 specimens, collected from all major South American rivers, accounting for over 40% of the valid species and all genera excluding Utiaritichthys. Two fossil calibration points and relaxed-clock Bayesian analyses were used to estimate the timing of diversification. The new multilocus dataset also is used to compare several species-tree approaches against the results obtained using the concatenated alignment analyzed under maximum likelihood and Bayesian inference. Individual gene trees showed substantial topological discordance, but analyses based on concatenation and Bayesian and maximum likelihood-based species trees approaches converged onto a single phylogeny. The resulting phylogenetic hypothesis is robust and supports a division of the family into three major clades, consistent with previous results based on mitochondrial DNA alone. The earliest branching event separated a "pacu" clade (Colossoma, Mylossoma and Piaractus) from the rest of the family in the Late Cretaceous (over 68 Ma). The other two clades, that contain most of the diversity, are formed by the "true piranhas" (Metynnis, Pygopristis, Pygocentrus, Pristobrycon, Catoprion, and Serrasalmus) and the Myleus-like pacus (the Myleus clade). The "true" piranha clade originated during the Eocene (∼53 Ma) but the most recent diversification of flesh-eating piranhas within the genera Serrasalmus and Pygocentrus did not start until the Miocene (∼17 Ma). A comparison of species tree approaches indicates that most methods tested are consistent with results obtained by concatenation, suggesting that the gene-tree incongruence observed is mild and will not produce misleading results under simple concatenation analysis. Non-monophyly of several genera (Pristobrycon, Tometes, Myloplus, Mylesinus) and putative species (Serrasalmus rhombeus) was obtained, suggesting that further study of this family is necessary.
Collapse
|
69
|
Jockusch EL, Martínez-Solano I, Timpe EK. The Effects of Inference Method, Population Sampling, and Gene Sampling on Species Tree Inferences: An Empirical Study in Slender Salamanders (Plethodontidae: Batrachoseps). Syst Biol 2014; 64:66-83. [DOI: 10.1093/sysbio/syu078] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Elizabeth L. Jockusch
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Iñigo Martínez-Solano
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Elizabeth K. Timpe
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| |
Collapse
|
70
|
Huang H, Knowles LL. Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences. Syst Biol 2014; 65:357-65. [DOI: 10.1093/sysbio/syu046] [Citation(s) in RCA: 202] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 06/22/2014] [Indexed: 11/13/2022] Open
|
71
|
Sistrom M, Hutchinson M, Bertozzi T, Donnellan S. Evaluating evolutionary history in the face of high gene tree discordance in Australian Gehyra (Reptilia: Gekkonidae). Heredity (Edinb) 2014; 113:52-63. [PMID: 24642886 PMCID: PMC4815653 DOI: 10.1038/hdy.2014.6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 10/24/2013] [Accepted: 11/12/2013] [Indexed: 11/09/2022] Open
Abstract
Species tree methods have provided improvements for estimating species relationships and the timing of diversification in recent radiations by allowing for gene tree discordance. Although gene tree discordance is often observed, most discordance is attributed to incomplete lineage sorting rather than other biological phenomena, and the causes of discordance are rarely investigated. We use species trees from multi-locus data to estimate the species relationships, evolutionary history and timing of diversification among Australian Gehyra-a group renowned for taxonomic uncertainty and showing a large degree of gene tree discordance. We find support for a recent Asian origin and two major clades: a tropically adapted clade and an arid adapted clade, with some exceptions, but no support for allopatric speciation driven by chromosomal rearrangement in the group. Bayesian concordance analysis revealed high gene tree discordance and comparisons of Robinson-Foulds distances showed that discordance between gene trees was significantly higher than that generated by topological uncertainty within each gene. Analysis of gene tree discordance and incomplete taxon sampling revealed that gene tree discordance was high whether terminal taxon or gene sampling was maximized, indicating discordance is due to biological processes, which may be important in contributing to gene tree discordance in many recently diversified organisms.
Collapse
Affiliation(s)
- M Sistrom
- South Australian Museum, North Terrace, Adelaide, South Australia, Australia
- Australian Centre for Evolutionary Biology and Biodiversity, The University of Adelaide, Adelaide, South Australia, Australia
| | - M Hutchinson
- South Australian Museum, North Terrace, Adelaide, South Australia, Australia
- Australian Centre for Evolutionary Biology and Biodiversity, The University of Adelaide, Adelaide, South Australia, Australia
| | - T Bertozzi
- South Australian Museum, North Terrace, Adelaide, South Australia, Australia
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, South Australia, Australia
| | - S Donnellan
- South Australian Museum, North Terrace, Adelaide, South Australia, Australia
- Australian Centre for Evolutionary Biology and Biodiversity, The University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
72
|
Huang H, Tran LAP, Knowles LL. Do estimated and actual species phylogenies match? Evaluation of East African cichlid radiations. Mol Phylogenet Evol 2014; 78:56-65. [PMID: 24837624 DOI: 10.1016/j.ympev.2014.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 05/02/2014] [Accepted: 05/06/2014] [Indexed: 10/25/2022]
Abstract
A large number of published phylogenetic estimates are based on a single locus or the concatenation of multiple loci, even though genealogies of single or concatenated loci may not accurately reflect the true history of species diversification (i.e., the species tree). The increased availability of genomic data, coupled with new computational methods, improves resolution of species relationships beyond what was possible in the past. Such developments will no doubt benefit future phylogenetic studies. It remains unclear how robust phylogenies that predate these developments (i.e., the bulk of phylogenetic studies) are to departures from the assumption of strict gene tree-species tree concordance. Here, we present a parametric bootstrap (PBST) approach that assesses the reliability of past phylogenetic estimates in which gene tree-species tree discord was ignored. We focus on a universal cause of discord-the random loss of gene lineages from genetic drift-and apply the method in a meta-analysis of East African cichlids, a group encompassing historical scenarios that are particularly challenging for phylogenetic estimation. Although we identify some evolutionary relationships that are robust to gene tree discord, many past phylogenetic estimates of cichlids are not. We discuss the utility of the PBST method for evaluating the robustness of gene tree-based phylogenetic estimations in general as well as for testing the clade-specific performance of species tree estimation methods and designing sampling strategies that increase the accuracy of estimated species relationships.
Collapse
Affiliation(s)
- Huateng Huang
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| | - Lucy A P Tran
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| | - L Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| |
Collapse
|
73
|
Abeysundera M, Kenney T, Field C, Gu H. Combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition. PLoS One 2014; 9:e94279. [PMID: 24732341 PMCID: PMC3986248 DOI: 10.1371/journal.pone.0094279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 03/14/2014] [Indexed: 11/26/2022] Open
Abstract
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.
Collapse
Affiliation(s)
- Melanie Abeysundera
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Chris Field
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| |
Collapse
|
74
|
Hipp AL, Eaton DAR, Cavender-Bares J, Fitzek E, Nipper R, Manos PS. A framework phylogeny of the American oak clade based on sequenced RAD data. PLoS One 2014; 9:e93975. [PMID: 24705617 PMCID: PMC3976371 DOI: 10.1371/journal.pone.0093975] [Citation(s) in RCA: 142] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 03/05/2014] [Indexed: 11/19/2022] Open
Abstract
Previous phylogenetic studies in oaks (Quercus, Fagaceae) have failed to resolve the backbone topology of the genus with strong support. Here, we utilize next-generation sequencing of restriction-site associated DNA (RAD-Seq) to resolve a framework phylogeny of a predominantly American clade of oaks whose crown age is estimated at 23-33 million years old. Using a recently developed analytical pipeline for RAD-Seq phylogenetics, we created a concatenated matrix of 1.40 E06 aligned nucleotides, constituting 27,727 sequence clusters. RAD-Seq data were readily combined across runs, with no difference in phylogenetic placement between technical replicates, which overlapped by only 43-64% in locus coverage. 17% (4,715) of the loci we analyzed could be mapped with high confidence to one or more expressed sequence tags in NCBI Genbank. A concatenated matrix of the loci that BLAST to at least one EST sequence provides approximately half as many variable or parsimony-informative characters as equal-sized datasets from the non-EST loci. The EST-associated matrix is more complete (fewer missing loci) and has slightly lower homoplasy than non-EST subsampled matrices of the same size, but there is no difference in phylogenetic support or relative attribution of base substitutions to internal versus terminal branches of the phylogeny. We introduce a partitioned RAD visualization method (implemented in the R package RADami; http://cran.r-project.org/web/packages/RADami) to investigate the possibility that suboptimal topologies supported by large numbers of loci--due, for example, to reticulate evolution or lineage sorting--are masked by the globally optimal tree. We find no evidence for strongly-supported alternative topologies in our study, suggesting that the phylogeny we recover is a robust estimate of large-scale phylogenetic patterns in the American oak clade. Our study is one of the first to demonstrate the utility of RAD-Seq data for inferring phylogeny in a 23-33 million year-old clade.
Collapse
Affiliation(s)
- Andrew L. Hipp
- The Morton Arboretum, Lisle, Illinois, United States of America
- The Field Museum, Department of Botany, Chicago, Illinois, United States of America
- * E-mail:
| | - Deren A. R. Eaton
- The Field Museum, Department of Botany, Chicago, Illinois, United States of America
- University of Chicago, Committee on Evolutionary Biology, Chicago, Illinois, United States of America
| | - Jeannine Cavender-Bares
- University of Minnesota, College of Biological Sciences, Saint Paul, Minnesota, United States of America
| | | | - Rick Nipper
- Floragenex, Inc., Eugene, Oregon, United States of America
| | - Paul S. Manos
- Duke University, Department of Biology, Durham, North Carolina, United States of America
| |
Collapse
|
75
|
Vences M, Sanchez E, Hauswaldt JS, Eikelmann D, Rodríguez A, Carranza S, Donaire D, Gehara M, Helfer V, Lötters S, Werner P, Schulz S, Steinfartz S. Nuclear and mitochondrial multilocus phylogeny and survey of alkaloid content in true salamanders of the genus Salamandra (Salamandridae). Mol Phylogenet Evol 2014; 73:208-16. [DOI: 10.1016/j.ympev.2013.12.009] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 12/27/2013] [Accepted: 12/29/2013] [Indexed: 11/28/2022]
|
76
|
DeGiorgio M, Syring J, Eckert AJ, Liston A, Cronn R, Neale DB, Rosenberg NA. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines. BMC Evol Biol 2014; 14:67. [PMID: 24678701 PMCID: PMC4021425 DOI: 10.1186/1471-2148-14-67] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 02/10/2014] [Indexed: 12/26/2022] Open
Abstract
Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.
| | | | | | | | | | | | | |
Collapse
|
77
|
García-Pereira MJ, Carvajal-Rodríguez A, Whelan S, Caballero A, Quesada H. Impact of deep coalescence and recombination on the estimation of phylogenetic relationships among species using AFLP markers. Mol Phylogenet Evol 2014; 76:102-9. [PMID: 24631855 DOI: 10.1016/j.ympev.2014.03.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 02/26/2014] [Accepted: 03/04/2014] [Indexed: 10/25/2022]
Abstract
Deep coalescence and the nongenealogical pattern of descent caused by recombination have emerged as a common problem for phylogenetic inference at the species level. Here we use computer simulations to assess whether AFLP-based phylogenies are robust to the uncertainties introduced by these factors. Our results indicate that phylogenetic signal can prevail even in the face of extensive deep coalescence allowing recovering the correct species tree topology. The impact of recombination on tree accuracy was related to total tree depth and species effective population size. The correct tree topology could be recovered upon many simulation settings due to a trade-off between the conflicting signals resulting from intra-locus recombination and the benefits of the joint consideration of unlinked loci that better matched overall the true species tree. Errors in tree topology were not only determined by deep coalescence, but also by the timing of divergence and the tree-building errors arising from an insufficient number of characters. DNA sequences generally outperformed AFLPs upon any simulated scenario, but this difference in performance was nearly negligible when a sufficient number of AFLP characters were sampled. Our simulations suggest that the impact of deep coalescence and intra-locus recombination on the reliability of AFLP trees could be minimal for effective population sizes equal to or lower than 10,000 (typical of many vertebrates and tree plants) given tree depths above 0.02 substitutions per site.
Collapse
Affiliation(s)
- María Jesús García-Pereira
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Antonio Carvajal-Rodríguez
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Simon Whelan
- Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala 75236-SE, Sweden.
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Humberto Quesada
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| |
Collapse
|
78
|
Edwards DL, Knowles LL. Species detection and individual assignment in species delimitation: can integrative data increase efficacy? Proc Biol Sci 2014; 281:20132765. [PMID: 24403337 PMCID: PMC3896021 DOI: 10.1098/rspb.2013.2765] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 12/02/2013] [Indexed: 11/12/2022] Open
Abstract
Statistical species delimitation usually relies on singular data, primarily genetic, for detecting putative species and individual assignment to putative species. Given the variety of speciation mechanisms, singular data may not adequately represent the genetic, morphological and ecological diversity relevant to species delimitation. We describe a methodological framework combining multivariate and clustering techniques that uses genetic, morphological and ecological data to detect and assign individuals to putative species. Our approach recovers a similar number of species recognized using traditional, qualitative taxonomic approaches that are not detected when using purely genetic methods. Furthermore, our approach detects groupings that traditional, qualitative taxonomic approaches do not. This empirical test suggests that our approach to detecting and assigning individuals to putative species could be useful in species delimitation despite varying levels of differentiation across genetic, phenotypic and ecological axes. This work highlights a critical, and often overlooked, aspect of the process of statistical species delimitation-species detection and individual assignment. Irrespective of the species delimitation approach used, all downstream processing relies on how individuals are initially assigned, and the practices and statistical issues surrounding individual assignment warrant careful consideration.
Collapse
Affiliation(s)
- Danielle L. Edwards
- Department of Ecology and Evolutionary Biology, Yale University, 21 Sachem Street, New Haven CT 06520, USA
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, The University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
79
|
Olave M, Solà E, Knowles LL. Upstream Analyses Create Problems with DNA-Based Species Delimitation. Syst Biol 2014; 63:263-71. [DOI: 10.1093/sysbio/syt106] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Melisa Olave
- Centro Nacional Patagónico – Consejo Nacional de Investigaciones Científicas y Técnicas (CENPAT-CONICET), Puerto Madryn, Chubut U 9120 ACD, Argentina, 2Department de Genètica, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Av. Diagonal, 643, 08028, Barcelona, Catalonia, Spain and 3Department of Ecology and Evolutionary Biology, The University of Michigan, Ann Arbor, MI 41809-1029, USA
| | - Eduard Solà
- Centro Nacional Patagónico – Consejo Nacional de Investigaciones Científicas y Técnicas (CENPAT-CONICET), Puerto Madryn, Chubut U 9120 ACD, Argentina, 2Department de Genètica, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Av. Diagonal, 643, 08028, Barcelona, Catalonia, Spain and 3Department of Ecology and Evolutionary Biology, The University of Michigan, Ann Arbor, MI 41809-1029, USA
| | - L. Lacey Knowles
- Centro Nacional Patagónico – Consejo Nacional de Investigaciones Científicas y Técnicas (CENPAT-CONICET), Puerto Madryn, Chubut U 9120 ACD, Argentina, 2Department de Genètica, Facultat de Biologia and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Av. Diagonal, 643, 08028, Barcelona, Catalonia, Spain and 3Department of Ecology and Evolutionary Biology, The University of Michigan, Ann Arbor, MI 41809-1029, USA
| |
Collapse
|
80
|
Betancur-R. R, Naylor GJ, Ortí G. Conserved Genes, Sampling Error, and Phylogenomic Inference. Syst Biol 2014; 63:257-62. [DOI: 10.1093/sysbio/syt073] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Ricardo Betancur-R.
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| | - Gavin J.P. Naylor
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| |
Collapse
|
81
|
Lanier HC, Huang H, Knowles LL. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol Phylogenet Evol 2014; 70:112-9. [DOI: 10.1016/j.ympev.2013.09.006] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 08/12/2013] [Accepted: 09/06/2013] [Indexed: 11/16/2022]
|
82
|
Multi-locus species tree for the Amazonian peacock basses (Cichlidae: Cichla): Emergent phylogenetic signal despite limited nuclear variation. Mol Phylogenet Evol 2013; 69:479-90. [DOI: 10.1016/j.ympev.2013.07.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Revised: 07/31/2013] [Accepted: 07/31/2013] [Indexed: 11/18/2022]
|
83
|
Nakhleh L. Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol Evol 2013; 28:719-28. [PMID: 24094331 PMCID: PMC3855310 DOI: 10.1016/j.tree.2013.09.004] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2013] [Revised: 09/02/2013] [Accepted: 09/03/2013] [Indexed: 01/20/2023]
Abstract
An intricate relation exists between gene trees and species phylogenies, due to evolutionary processes that act on the genes within and across the branches of the species phylogeny. From an analytical perspective, gene trees serve as character states for inferring accurate species phylogenies, and species phylogenies serve as a backdrop against which gene trees are contrasted for elucidating evolutionary processes and parameters. In a 1997 paper, Maddison discussed this relation, reviewed the signatures left by three major evolutionary processes on the gene trees, and surveyed parsimony and likelihood criteria for utilizing these signatures to elucidate computationally this relation. Here, I review progress that has been made in developing computational methods for analyses under these two criteria, and survey remaining challenges.
Collapse
Affiliation(s)
- Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX 77005, USA; Department of Ecology and Evolutionary Biology, Rice University, Houston, TX 77005, USA.
| |
Collapse
|
84
|
Harris RB, Carling MD, Lovette IJ. The influence of sampling design on species tree inference: a new relationship for the New World chickadees (Aves: Poecile). Evolution 2013; 68:501-13. [PMID: 24111665 DOI: 10.1111/evo.12280] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 09/19/2013] [Indexed: 11/28/2022]
Abstract
In this study, we explore the long-standing issue of how many loci are needed to infer accurate phylogenetic relationships, and whether loci with particular attributes (e.g., parsimony informativeness, variability, gene tree resolution) outperform others. To do so, we use an empirical data set consisting of the seven species of chickadees (Aves: Paridae), an analytically tractable, recently diverged group, and well-studied ecologically but lacking a nuclear phylogeny. We estimate relationships using 40 nuclear loci and mitochondrial DNA using four coalescent-based species tree inference methods (BEST, *BEAST, STEM, STELLS). Collectively, our analyses contrast with previous studies and support a sister relationship between the Black-capped and Carolina Chickadee, two superficially similar species that hybridize along a long zone of contact. Gene flow is a potential source of conflict between nuclear and mitochondrial gene trees, yet we find a significant, albeit low, signal of gene flow. Our results suggest that relatively few loci with high information content may be sufficient for estimating an accurate species tree, but that substantially more loci are necessary for accurate parameter estimation. We provide an empirical reference point for researchers designing sampling protocols with the purpose of inferring phylogenies and population parameters of closely related taxa.
Collapse
Affiliation(s)
- Rebecca B Harris
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Cornell University, Ithaca, New York, 14850; Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, 14850; Department of Biology and Burke Museum, University of Washington, Seattle, Washington.
| | | | | |
Collapse
|
85
|
Guevara EE, Steiper ME. Molecular phylogenetic analysis of the Papionina using concatenation and species tree methods. J Hum Evol 2013; 66:18-28. [PMID: 24161610 DOI: 10.1016/j.jhevol.2013.09.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 08/30/2013] [Accepted: 09/13/2013] [Indexed: 10/26/2022]
Abstract
The Papionina is a geographically widespread subtribe of African cercopithecid monkeys whose evolutionary history is of particular interest to anthropologists. The phylogenetic relationships among arboreal mangabeys (Lophocebus), baboons (Papio), and geladas (Theropithecus) remain unresolved. Molecular phylogenetic analyses have revealed marked gene tree incongruence for these taxa, and several recent concatenated phylogenetic analyses of multilocus datasets have supported different phylogenetic hypotheses. To address this issue, we investigated the phylogeny of the Lophocebus + Papio + Theropithecus group using concatenation methods, as well as alternative methods that incorporate gene tree heterogeneity to estimate a 'species tree.' Our compiled DNA sequence dataset was ∼56 kb pairs long and included 57 independent partitions. All analyses of concatenated alignments strongly supported a Lophocebus + Papio clade and a basal position for Theropithecus. The Bayesian concordance analysis supported the same phylogeny. A coalescent-based Bayesian method resulted in a very poorly resolved species tree. The topological agreement between concatenation and the Bayesian concordance analysis offers considerable support for a Lophocebus + Papio clade as the dominant relationship across the genome. However, the results of the Bayesian concordance analysis indicate that almost half the genome has an alternative history. As such, our results offer a well-supported phylogenetic hypothesis for the Papio/Lophocebus/Theropithecus trichotomy, while at the same time providing evidence for a complex evolutionary history that likely includes hybridization among lineages.
Collapse
Affiliation(s)
- Elaine E Guevara
- Department of Anthropology, Hunter College, City University of New York, 695 Park Avenue, New York, NY 10065, USA.
| | - Michael E Steiper
- Department of Anthropology, Hunter College, City University of New York, 695 Park Avenue, New York, NY 10065, USA; Program in Anthropology, The Graduate Center, City University of New York, 365 5th Avenue, New York, NY 10016, USA; Program in Biology, The Graduate Center, City University of New York, 365 5th Avenue, New York, NY 10016, USA; New York Consortium in Evolutionary Primatology (NYCEP), New York, NY, USA.
| |
Collapse
|
86
|
Poor Fit to the Multispecies Coalescent is Widely Detectable in Empirical Data. Syst Biol 2013; 63:322-33. [DOI: 10.1093/sysbio/syt057] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
87
|
Species tree reconstruction of a poorly resolved clade of salamanders (Ambystomatidae) using multiple nuclear loci. Mol Phylogenet Evol 2013; 68:671-82. [DOI: 10.1016/j.ympev.2013.04.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Revised: 04/14/2013] [Accepted: 04/16/2013] [Indexed: 11/23/2022]
|
88
|
DeGiorgio M, Degnan JH. Robustness to divergence time underestimation when inferring species trees from estimated gene trees. Syst Biol 2013; 63:66-82. [PMID: 23988674 DOI: 10.1093/sysbio/syt059] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
To infer species trees from gene trees estimated from phylogenomic data sets, tractable methods are needed that can handle dozens to hundreds of loci. We examine several computationally efficient approaches-MP-EST, STAR, STEAC, STELLS, and STEM-for inferring species trees from gene trees estimated using maximum likelihood (ML) and Bayesian approaches. Among the methods examined, we found that topology-based methods often performed better using ML gene trees and methods employing coalescent times typically performed better using Bayesian gene trees, with MP-EST, STAR, STEAC, and STELLS outperforming STEM under most conditions. We examine why the STEM tree (also called GLASS or Maximum Tree) is less accurate on estimated gene trees by comparing estimated and true coalescence times, performing species tree inference using simulations, and analyzing a great ape data set keeping track of false positive and false negative rates for inferred clades. We find that although true coalescence times are more ancient than speciation times under the multispecies coalescent model, estimated coalescence times are often more recent than speciation times. This underestimation can lead to increased bias and lack of resolution with increased sampling (either alleles or loci) when gene trees are estimated with ML. The problem appears to be less severe using Bayesian gene-tree estimates.
Collapse
Affiliation(s)
- Michael DeGiorgio
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Biology, Pennsylvania State University, University Park, PA 16802, USA; and Department of Mathematics and Statistics, University of New Mexico, 1 University of New Mexico, Albuquerque, NM 87131, USA
| | | |
Collapse
|
89
|
|
90
|
Hovmöller R, Knowles LL, Kubatko LS. Effects of missing data on species tree estimation under the coalescent. Mol Phylogenet Evol 2013; 69:1057-62. [PMID: 23769751 DOI: 10.1016/j.ympev.2013.06.004] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Accepted: 06/05/2013] [Indexed: 11/20/2022]
Abstract
With recent advances in genomic sequencing, the importance of taking the effects of the processes that can cause discord between the speciation history and the individual gene histories into account has become evident. For multilocus datasets, it is difficult to achieve complete coverage of all sampled loci across all sample specimens, a problem that also arises when combining incompletely overlapping datasets. Here we examine how missing data affects the accuracy of species tree reconstruction. In our study, 10- and 100-locus sequence datasets were simulated under the coalescent model from shallow and deep speciation histories, and species trees were estimated using the maximum likelihood and Bayesian frameworks (with STEM and (*)BEAST, respectively). The accuracy of the estimated species trees was evaluated using the symmetric difference and the SPR distance. We examine the effects of sampling more than one individual per species, as well as the effects of different patterns of missing data (i.e., different amounts of missing data, which is represented among random taxa as opposed to being concentrated in specific taxa, as is often the case for empirical studies). Our general conclusion is that the species tree estimates are remarkably resilient to the effects of missing data. We find that for datasets with more limited numbers of loci, sampling more than one individual per species has the strongest effect on improving species tree accuracy when there is missing data, especially at higher degrees of missing data. For larger multilocus datasets (e.g., 25-100 loci), the amount of missing data has a negligible effect on species tree reconstruction, even at 50% missing data and a single sampled individual per species.
Collapse
Affiliation(s)
- Rasmus Hovmöller
- Department of Statistics, The Ohio State University, 404 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, United States
| | | | | |
Collapse
|
91
|
Betancur-R. R, Li C, Munroe TA, Ballesteros JA, Ortí G. Addressing Gene Tree Discordance and Non-Stationarity to Resolve a Multi-Locus Phylogeny of the Flatfishes (Teleostei: Pleuronectiformes). Syst Biol 2013; 62:763-85. [DOI: 10.1093/sysbio/syt039] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Ricardo Betancur-R.
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Chenhong Li
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Thomas A. Munroe
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Jesus A. Ballesteros
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; 2College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and 3National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA
| |
Collapse
|
92
|
Affiliation(s)
- James H. Degnan
- Department of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
93
|
Zapata F. A multilocus phylogenetic analysis of Escallonia (Escalloniaceae): diversification in montane South America. AMERICAN JOURNAL OF BOTANY 2013; 100:526-545. [PMID: 23400495 DOI: 10.3732/ajb.1200297] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
PREMISE OF THE STUDY The mountains of South America are hotspots of plant diversity. How this diversity originated and evolved, and what roles geographic and environmental factors may have played in the diversification of lineages occurring in these regions, is not well understood. Escallonia, a morphologically and ecologically diverse group of shrubs and trees widely distributed in these mountains, provides an ideal opportunity for studying the historical underpinnings that have shaped the extraordinarily distinctive, diverse, and endangered flora of these regions, and for evaluating the role of abiotic factors in the process of lineage divergence. • METHODS I analyzed neutral DNA sequence data from two nuclear loci and one chloroplast locus using maximum parsimony, maximum likelihood, and Bayesian phylogenetic approaches. I used a Bayesian approach to analyze the geographic structure of gene trees, and a phylogenetically controlled decomposition of the variance in bioclimatic variables to analyze the eco-climatic structure of gene trees. • KEY RESULTS I found that Escallonia (1) is monophyletic, (2) has a remarkable level of geographic and climatic phylogenetic structure, (3) likely originated in the tropical Andes, and (4) has a widespread absence of species exclusivity. • CONCLUSIONS Geography played an important role early in the history of Escallonia by separating populations that later diversify likely in isolation. Although geographic isolation was generally accompanied by changes in climate, it is not clear whether environmental gradients along elevation have influenced more recent diversification events or whether species have evolved broader environmental tolerances.
Collapse
Affiliation(s)
- Felipe Zapata
- Department of Biology, University of Missouri-St. Louis, One University Blvd., St. Louis, MO 63121, USA.
| |
Collapse
|
94
|
Oliver JC. Microevolutionary processes generate phylogenomic discordance at ancient divergences. Evolution 2013; 67:1823-30. [PMID: 23730773 DOI: 10.1111/evo.12047] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 12/31/2012] [Indexed: 11/27/2022]
Abstract
Stochastic population processes may cause differences between species histories and gene histories. These processes are assumed to only influence the most recent divergences in the tree of life; however, there may be underappreciated potential for microevolutionary processes to impact deep divergences. I used multispecies coalescent models to determine the impact of stochastic processes on deep phylogenomic histories. Here I show phylogenomic discordance between gene histories and species histories is expected at deep divergences for many eukaryotic taxa, and the probability of discordance increases with population size, generation time, and the number of species in the tree. Five eukaryotic clades (angiosperms, birds, harpaline beetles, mammals, and nymphalid butterflies) demonstrate significant discordance potential at divergences over 50 million years old, and this discordance potential is independent of the age of divergence. These findings demonstrate population processes acting over very short timescales will leave a lasting impact on genomic histories, even for divergence events occurring tens to hundreds of millions of years ago.
Collapse
Affiliation(s)
- Jeffrey C Oliver
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA.
| |
Collapse
|
95
|
HOLLEMAN WOUTER, VON DER HEYDEN SOPHIE, ZSILAVECZ GUIDO. Delineating the fishes of the Clinus superciliosus species complex in southern African waters (Blennioidei: Clinidae: Clinini), with the validation of Clinus arborescens Gilchrist & Thompson, 1908 and Clinus ornatus Gilchrist & Thompson, 1908, and with de. Zool J Linn Soc 2012. [DOI: 10.1111/j.1096-3642.2012.00865.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
96
|
Full modeling versus summarizing gene-tree uncertainty: Method choice and species-tree accuracy. Mol Phylogenet Evol 2012; 65:501-9. [DOI: 10.1016/j.ympev.2012.07.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2011] [Revised: 04/06/2012] [Accepted: 07/08/2012] [Indexed: 11/20/2022]
|
97
|
O'Neill EM, Schwartz R, Bullock CT, Williams JS, Shaffer HB, Aguilar-Miguel X, Parra-Olea G, Weisrock DW. Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex. Mol Ecol 2012; 22:111-29. [PMID: 23062080 DOI: 10.1111/mec.12049] [Citation(s) in RCA: 101] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Revised: 08/10/2012] [Accepted: 08/21/2012] [Indexed: 12/20/2022]
Abstract
Modern analytical methods for population genetics and phylogenetics are expected to provide more accurate results when data from multiple genome-wide loci are analysed. We present the results of an initial application of parallel tagged sequencing (PTS) on a next-generation platform to sequence thousands of barcoded PCR amplicons generated from 95 nuclear loci and 93 individuals sampled across the range of the tiger salamander (Ambystoma tigrinum) species complex. To manage the bioinformatic processing of this large data set (344 330 reads), we developed a pipeline that sorts PTS data by barcode and locus, identifies high-quality variable nucleotides and yields phased haplotype sequences for each individual at each locus. Our sequencing and bioinformatic strategy resulted in a genome-wide data set with relatively low levels of missing data and a wide range of nucleotide variation. structure analyses of these data in a genotypic format resulted in strongly supported assignments for the majority of individuals into nine geographically defined genetic clusters. Species tree analyses of the most variable loci using a multi-species coalescent model resulted in strong support for most branches in the species tree; however, analyses including more than 50 loci produced parameter sampling trends that indicated a lack of convergence on the posterior distribution. Overall, these results demonstrate the potential for amplicon-based PTS to rapidly generate large-scale data for population genetic and phylogenetic-based research.
Collapse
Affiliation(s)
- Eric M O'Neill
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA.
| | | | | | | | | | | | | | | |
Collapse
|
98
|
Smith JV, Braun EL, Kimball RT. Ratite nonmonophyly: independent evidence from 40 novel Loci. Syst Biol 2012; 62:35-49. [PMID: 22831877 DOI: 10.1093/sysbio/sys067] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Large-scale multilocus studies have become common in molecular phylogenetics, but the best way to interpret these studies when their results strongly conflict with prior information about phylogeny remains unclear. An example of such a conflict is provided by the ratites (the large flightless birds of southern land masses, including ostriches, emus, and rheas). Ratite monophyly is strongly supported by both morphological data and many earlier molecular studies and is used as a textbook example of vicariance biogeography. However, recent studies have indicated that ratites are not monophyletic; instead, the volant tinamous nest inside the ratites rather than forming their sister group within the avian superorder Palaeognathae. Large-scale studies can exhibit biases that reflect a number of factors, including limitations in the fit of the evolutionary models used for analyses and problems with sequence alignment, so the unexpected conclusion that ratites are not monophyletic needs to be rigorously evaluated. A rigorous approach to testing novel hypotheses generated by large-scale studies is to collect independent evidence (i.e., excluding the loci and/or traits used to generate the hypotheses). We used 40 nuclear loci not used in previous studies that investigated the relationship among ratites and tinamous. Our results strongly support the recent molecular studies, revealing that the deepest branch within Palaeognathae separates the ostrich from other members of the clade, rather than the traditional hypothesis that separates the tinamous from the ratites. To ensure these results reflected evolutionary history, we examined potential biases in types of loci used, heterotachy, alignment biases, and discordance between gene trees and the species tree. All analyses consistently supported nonmonophyly of the ratites and no confounding biases were identified. This confirmation that ratites are not monophyletic using independent evidence will hopefully stimulate further comparative research on paleognath development and genetics that might reveal the basis of the morphological convergence in these large, flightless birds.
Collapse
Affiliation(s)
- Jordan V Smith
- Department of Biology, University of Florida, P.O. Box 118525, Gainesville, FL 32611, USA
| | | | | |
Collapse
|
99
|
Lemmon AR, Emme SA, Lemmon EM. Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst Biol 2012; 61:727-44. [PMID: 22605266 DOI: 10.1093/sysbio/sys049] [Citation(s) in RCA: 558] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.
Collapse
Affiliation(s)
- Alan R Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA.
| | | | | |
Collapse
|
100
|
Stadler T, Degnan JH. A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree. Algorithms Mol Biol 2012; 7:7. [PMID: 22546066 PMCID: PMC3637458 DOI: 10.1186/1748-7188-7-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 04/02/2012] [Indexed: 11/17/2022] Open
Abstract
Background The ancestries of genes form gene trees which do not necessarily have the same topology as the species tree due to incomplete lineage sorting. Available algorithms determining the probability of a gene tree given a species tree require exponential computational runtime. Results In this paper, we provide a polynomial time algorithm to calculate the probability of a ranked gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown. Conclusions Polynomial algorithms for calculating ranked gene tree probabilities may become useful in developing methodology to infer species trees based on a collection of gene trees, leading to a more accurate reconstruction of ancestral species relationships.
Collapse
|