1
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
2
|
Ren CQ, Zhang DQ, Liu XY, Zhang JQ. Genomic data provide a robust phylogeny backbone for Rhodiola L. (Crassulaceae) and reveal extensive reticulate evolution during its rapid radiation. Mol Phylogenet Evol 2023:107863. [PMID: 37329933 DOI: 10.1016/j.ympev.2023.107863] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 06/19/2023]
Abstract
The Tibetan Plateau and adjacent mountain regions (TP; including the Tibetan Plateau, Himalaya, Hengduan Mountains and Mountains of Central Asia) harbor great biodiversity, some lineages on which may have undergone rapid radiations. However, only a few studies have investigated the evolutionary pattern of such diversification in depth using genomic data. In this study, we reconstructed a robust phylogeny backbone of Rhodiola, a lineage that may have undergone rapid radiation in the TP, using Genotyping-by-sequencing data, and conducted a series of gene flow and diversification analyses. The concatenation and coalescent-based methods yield similar tree topologies, and five well-supported clades were revealed. Potential gene flow and introgression events were detected, both between species from different major clades and closely related species, suggesting pervasive hybridization and introgression. An initial rapid and later slowdown of the diversification rate was revealed, indicating niche filling. Molecular dating and correlation analyses showed that the uplift of TP and global cooling in the mid-Miocene might have played an important role in promoting the rapid radiation of Rhodiola. Our work demonstrates that gene flow and introgression might be an important contributor to rapid radiation possibly by quickly reassembling old genetic variation into new combinations.
Collapse
Affiliation(s)
- Chun-Qian Ren
- National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China; Key Laboratory of Medicinal Plant Resource and Natural Pharmaceutical Chemistry of Ministry of Education, Shaanxi Normal University, Xi'an 710119, China
| | - Dan-Qing Zhang
- National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China; Key Laboratory of Medicinal Plant Resource and Natural Pharmaceutical Chemistry of Ministry of Education, Shaanxi Normal University, Xi'an 710119, China
| | - Xiao-Ying Liu
- National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China; Key Laboratory of Medicinal Plant Resource and Natural Pharmaceutical Chemistry of Ministry of Education, Shaanxi Normal University, Xi'an 710119, China
| | - Jian-Qiang Zhang
- National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China; Key Laboratory of Medicinal Plant Resource and Natural Pharmaceutical Chemistry of Ministry of Education, Shaanxi Normal University, Xi'an 710119, China.
| |
Collapse
|
3
|
Mills KK, Everson KM, Hildebrandt KPB, Brandler OV, Steppan SJ, Olson LE. Ultraconserved elements improve resolution of marmot phylogeny and offer insights into biogeographic history. Mol Phylogenet Evol 2023; 184:107785. [PMID: 37085130 DOI: 10.1016/j.ympev.2023.107785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/01/2023] [Accepted: 04/13/2023] [Indexed: 04/23/2023]
Abstract
Marmots (Marmota spp.) comprise a lineage of large-bodied ground squirrels that diversified rapidly in the Pleistocene, when the planet quickly transitioned to a drier, colder, and highly seasonal climate-particularly at high latitudes. Fossil evidence indicates the genus spread from North America, across Beringia, and into the European Alps over the course of only a few million years, beginning in the late Pliocene. Marmots are highly adapted to survive long and severely cold winters, and this likely favored their expansion and diversification over this time period. Previous phylogenetic studies have identified two major subgenera of marmots, but the timing of important speciation events and some species relationships have been difficult to resolve. Here we use ultraconserved elements and mitogenomes, with samples from all 15 extant species, to more precisely retrace how and when marmots came to inhabit a vast Holarctic range. Our results indicate marmots arose in North America in the mid Miocene (∼16.3 Mya) and dispersed across the Bering Land Bridge in the late Pliocene (∼3-4 Mya); in addition, our fossil-calibrated timeline is suggestive of the rise and spread of open grasslands as being particularly important to marmot diversification. The woodchuck (M. monax) and the Alaska marmot (M. broweri) are found to be more closely related to the Eurasian species than to the other North American species. Paraphyly is evident in the bobak marmot (M. bobak) and the hoary marmot (M. caligata), and in the case of the latter the data are highly suggestive of a second, cryptic species in the Cascade Mountains of Washington.
Collapse
Affiliation(s)
- Kendall K Mills
- Department of Biology and Wildlife, University of Alaska Fairbanks, 982 North Koyukuk Drive, Fairbanks, AK 99775, USA; Department of Mammalogy, University of Alaska Museum, 1962 Yukon Drive, Fairbanks, AK 99775, USA.
| | - Kathryn M Everson
- Department of Mammalogy, University of Alaska Museum, 1962 Yukon Drive, Fairbanks, AK 99775, USA; Department of Integrative Biology, Oregon State University, 2701 SW Campus Way, Corvallis, OR 97331, USA
| | - Kyndall P B Hildebrandt
- Department of Mammalogy, University of Alaska Museum, 1962 Yukon Drive, Fairbanks, AK 99775, USA
| | - Oleg V Brandler
- Koltzov Institute of Developmental Biology of Russian Academy of Sciences, Vavilova 26, Moscow, Russia
| | - Scott J Steppan
- Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
| | - Link E Olson
- Department of Mammalogy, University of Alaska Museum, 1962 Yukon Drive, Fairbanks, AK 99775, USA
| |
Collapse
|
4
|
Li J, Liang D, Zhang P. Simultaneously collecting coding and non-coding phylogenomic data using homemade full-length cDNA probes, tested by resolving the high-level relationships of Colubridae. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.969581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Resolving intractable phylogenetic relationships often requires simultaneously analyzing a large number of coding and non-coding orthologous loci. To gather both coding and non-coding data, traditional sequence capture methods require custom-designed commercial probes. Here, we present a cost-effective sequence capture method based on homemade probes, to capture thousands of coding and non-coding orthologous loci simultaneously, suitable for all organisms. This approach, called “FLc-Capture,” synthesizes biotinylated full-length cDNAs from mRNA as capture probes, eliminates the need for costly commercial probe design and synthesis. To demonstrate the utility of FLc-Capture, we prepared full-length cDNA probes from mRNA extracted from a common colubrid snake. We performed capture experiments with these homemade cDNA probes and successfully obtained thousands of coding and non-coding genomic loci from 24 Colubridae species and 12 distantly related snake species of other families. The average capture specificity of FLc-Capture across all tested snake species is 35%, similar to the previously published EecSeq method. We constructed two phylogenomic data sets, one including 1,075 coding loci (∼817,000 bp) and the other including 1,948 non-coding loci (∼1,114,000 bp), to study the phylogeny of Colubridae. Both data sets yielded highly similar and well-resolved trees, with 85% of nodes having >95% bootstrap support. Our experimental tests show that FLc-Capture is a flexible, fast, and cost-effective sequence capture approach for simultaneously gathering coding and non-coding phylogenomic data sets to study intractable phylogenetic questions. We hope that this method will serve as a new data collection tool for evolutionary biologists working in the era of phylogenomics.
Collapse
|
5
|
Out of chaos: Phylogenomics of Asian Sonerileae. Mol Phylogenet Evol 2022; 175:107581. [PMID: 35810973 DOI: 10.1016/j.ympev.2022.107581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 05/23/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
Sonerileae is a diverse Melastomataceae lineage comprising ca. 1000 species in 44 genera, with >70% of genera and species distributed in Asia. Asian Sonerileae are taxonomically intractable with obscure generic circumscriptions. The backbone phylogeny of this group remains poorly resolved, possibly due to complexity caused by rapid species radiation in early and middle Miocene, which hampers further systematic study. Here, we used genome resequencing data to reconstruct the phylogeny of Asian Sonerileae. Three parallel datasets, viz. single-copy ortholog (SCO), genomic SNPs, and whole plastome, were assembled from genome resequencing data of 205 species for this purpose. Based on these genome-scale data, we provided the first well resolved phylogeny of Asian Sonerileae, with 34 major clades identified and 74% of the interclade relationships consistently resolved by both SCO and genomic data. Meanwhile, widespread phylogenetic discordance was detected among SCO gene trees as well as species trees reconstructed using different tree estimation methods (concatenation/site-based coalescent method/summary method) or different datasets (SCO/genomic/plastome). We explored sources of discordance using multiple approaches and found that the observed discordance in Asian Sonerileae was mainly caused by a combination of biased distribution of missing data, random noise from uninformative genes, incomplete lineage sorting, and hybridization/introgression. Exploration of these sources can enable us to generate hypotheses for future testing, which is the first step towards understanding the evolution of Asian Sonerileae. We also detected high levels of homoplasy for some characters traditionally used in taxonomy, which explains current chaotic generic delimitations. The backbone phylogeny of Asian Sonerileae revealed in this study offers a solid basis for future taxonomic revision at the generic level.
Collapse
|
6
|
Abreu EF, Pavan SE, Tsuchiya MTN, McLean BS, Wilson DE, Percequillo AR, Maldonado JE. Old specimens for old branches: Assessing effects of sample age in resolving a rapid Neotropical radiation of squirrels. Mol Phylogenet Evol 2022; 175:107576. [PMID: 35809853 DOI: 10.1016/j.ympev.2022.107576] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/10/2022] [Accepted: 07/01/2022] [Indexed: 11/15/2022]
Abstract
Ultraconserved Elements (UCEs) have been useful to resolve challenging phylogenies of non-model clades, unpuzzling long-conflicted relationships in key branches of the Tree of Life at both deep and shallow levels. UCEs are often reliably recovered from historical samples, unlocking a vast number of preserved natural history specimens for analysis. However, the extent to which sample age and preservation method impact UCE recovery as well as downstream inferences remains unclear. Furthermore, there is an ongoing debate on how to curate, filter, and properly analyze UCE data when locus recovery is uneven across sample age and quality. In the present study we address these questions with an empirical dataset composed of over 3800 UCE loci from 219 historical and modern samples of Sciuridae, a globally distributed and ecologically important family of rodents. We provide a genome-scale phylogeny of two squirrel subfamilies (Sciurillinae and Sciurinae: Sciurini) and investigate their placement within Sciuridae. For historical specimens, recovery of UCE loci and mean length per locus were inversely related to sample age; deeper sequencing improved the number of UCE loci recovered but not locus length. Most of our phylogenetic inferences-performed on six datasets with alternative data-filtering strategies, and using three distinct optimality criteria-resulted in distinct topologies. Datasets containing more loci (40% and 50% taxa representativeness matrices) yielded more concordant topologies and higher support values than strictly filtered datasets (60% matrices) particularly with IQ-Tree and SVDquartets, while filtering based on information content provided better topological resolution for inferences with the coalescent gene-tree based approach in ASTRAL-III. We resolved deep relationships in Sciuridae (including among the five currently recognized subfamilies) and relationships among the deepest branches of Sciurini, but conflicting relationships remain at both genus- and species-levels for the rapid Neotropical tree squirrel radiation. Our results suggest that phylogenomic consensus can be difficult and heavily influenced by the age of available samples and the filtering steps used to optimize dataset properties.
Collapse
Affiliation(s)
- Edson F Abreu
- Laboratório de Mamíferos, Departamento de Ciências Biológicas, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Piracicaba, SP, Brazil; Center for Conservation Genomics, Smithsonian National Zoo and Conservation Biology Institute, Washington, DC, USA.
| | - Silvia E Pavan
- Center for Conservation Genomics, Smithsonian National Zoo and Conservation Biology Institute, Washington, DC, USA
| | - Mirian T N Tsuchiya
- Center for Conservation Genomics, Smithsonian National Zoo and Conservation Biology Institute, Washington, DC, USA; Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
| | - Bryan S McLean
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC, USA
| | - Don E Wilson
- Division of Mammals, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Alexandre R Percequillo
- Laboratório de Mamíferos, Departamento de Ciências Biológicas, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Piracicaba, SP, Brazil
| | - Jesús E Maldonado
- Center for Conservation Genomics, Smithsonian National Zoo and Conservation Biology Institute, Washington, DC, USA
| |
Collapse
|
7
|
Mai U, Mirarab S. Completing gene trees without species trees in sub-quadratic time. Bioinformatics 2022; 38:1532-1541. [PMID: 34978565 DOI: 10.1093/bioinformatics/btab875] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/27/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree. RESULTS We study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the given set of gene trees. We extend a seminal algorithm by Brodal et al. to solve this problem in quasi-linear time. In simulated studies and on a large empirical data, we show that completion of gene trees using other gene trees is relatively accurate and, unlike the case where a species tree is available, is unbiased. AVAILABILITY AND IMPLEMENTATION Our method, tripVote, is available at https://github.com/uym2/tripVote. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Uyen Mai
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| |
Collapse
|
8
|
McLean BS, Bell KC, Cook JA. SNP-based Phylogenomic Inference in Holarctic Ground Squirrels (Urocitellus). Mol Phylogenet Evol 2022; 169:107396. [PMID: 35031463 DOI: 10.1016/j.ympev.2022.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/02/2021] [Accepted: 12/08/2021] [Indexed: 11/24/2022]
Abstract
Resolution of rapid evolutionary radiations requires harvesting maximal signal from phylogenomic datasets. However, studies of non-model clades often target conserved loci that are characterized by reduced information content, which can negatively affect gene tree precision and species tree accuracy. Single nucleotide polymorphism (SNP)-based methods are an underutilized but potentially valuable tool for estimating phylogeny and divergence times because they do not rely on resolved gene trees, allowing information from many or all variant loci to be leveraged in species tree reconstruction. We evaluated the utility of SNP-based methods in resolving phylogeny of Holarctic ground squirrels (Urocitellus), a radiation that has been difficult to disentangle, even in prior phylogenomic studies. We inferred phylogeny from a dataset of >3,000 ultraconserved element loci (UCEs) using two methods (SNAPP, SVDquartets) and compared our results with a new mitogenome phylogeny. We also systematically evaluated how phasing of UCEs improves per-locus information content, and inference of topology and other parameters within each of these SNP-based methods. Phasing improved topological resolution and branch length estimation at shallow levels (within species complexes), but less so at deeper levels, likely reflecting true uncertainty due to ancestral polymorphisms segregating in these rapidly diverging lineages. We resolved several key clades in Urocitellus and present targeted opportunities for future phylogenomic inquiry. Our results extend the roadmap for use of SNPs to address vertebrate radiations and support comparative analyses at multiple temporal scales.
Collapse
Affiliation(s)
- Bryan S McLean
- University of North Carolina Greensboro, Department of Biology, Greensboro, NC 27402 USA.
| | - Kayce C Bell
- Natural History Museum of Los Angeles County, Department of Mammalogy, Los Angeles, CA 90007 USA.
| | - Joseph A Cook
- University of New Mexico, Department of Biology and Museum of Southwestern Biology, Albuquerque, NM 87131 USA.
| |
Collapse
|
9
|
Lyu R, He J, Luo Y, Lin L, Yao M, Cheng J, Xie L, Pei L, Yan S, Li L. Natural Hybrid Origin of the Controversial "Species" Clematis × pinnata (Ranunculaceae) Based on Multidisciplinary Evidence. FRONTIERS IN PLANT SCIENCE 2021; 12:745988. [PMID: 34712260 PMCID: PMC8545901 DOI: 10.3389/fpls.2021.745988] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/22/2021] [Indexed: 05/23/2023]
Abstract
Interspecific hybridization is common and has often been viewed as a driving force of plant diversity. However, it raises taxonomic problems and thus impacts biodiversity estimation and biological conservation. Although previous molecular phylogenetic studies suggested that interspecific hybridization may be rather common in Clematis, and artificial hybridization has been widely applied to produce new Clematis cultivars for nearly two centuries, the issue of natural hybridization of Clematis has never been addressed in detail. In this study, we tested the hybrid origin of a mesophytic and cold-adapted vine species, Clematis pinnata, which is a rare and taxonomically controversial taxon endemic to northern China. Using field investigations, flow cytometry (FCM), phylogenomic analysis, morphological statistics, and niche modeling, we tested hybrid origin and species status of C. pinnata. The FCM results showed that all the tested species were homoploid (2n = 16). Phylonet and HyDe analyses based on transcriptome data showed the hybrid origins of C. × pinnata from either C. brevicaudata × C. heracleifolia or C. brevicaudata × C. tubulosa. The plastome phylogeny depicted that C. × pinnata in different sampling sites originated by different hybridization events. Morphological analysis showed intermediacy of C. × pinnata between its putative parental species in many qualitative and quantitative characters. Niche modeling results suggested that C. × pinnata had not been adapted to a novel ecological niche independent of its putative parents. These findings demonstrated that plants of C. × pinnata did not formed a self-evolved clade and should not be treated as a species. The present study also suggests that interspecific hybridization is a common mechanism in Clematis to generate diversity and variation, and it may play an important role in the evolution and diversification of this genus. Our study implies that morphological diversity caused by natural hybridization may overstate the real species diversity in Clematis.
Collapse
Affiliation(s)
- Rudan Lyu
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Jian He
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Yike Luo
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Lele Lin
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Min Yao
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Jin Cheng
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Lei Xie
- School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, China
| | - Linying Pei
- Beijing Engineering Research Center for Landscape Plant, Beijing Forestry University Forest Science Co. Ltd., Beijing, China
| | - Shuangxi Yan
- College of Landscape Architecture and Art, Henan Agricultural University, Zhengzhou, China
| | - Liangqian Li
- Institute of Botany, The Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
10
|
Mongiardino Koch N. Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci. Mol Biol Evol 2021; 38:4025-4038. [PMID: 33983409 DOI: 10.1101/2021.02.13.431075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
11
|
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
12
|
Smith BT, Mauck WM, Benz BW, Andersen MJ. Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets. Genome Biol Evol 2021; 12:1131-1147. [PMID: 32470111 PMCID: PMC7486955 DOI: 10.1093/gbe/evaa113] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2020] [Indexed: 01/21/2023] Open
Abstract
The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.
Collapse
Affiliation(s)
- Brian Tilston Smith
- Department of Ornithology, American Museum of Natural History, New York, New York
| | - William M Mauck
- Department of Ornithology, American Museum of Natural History, New York, New York.,New York Genome Center, New York, New York
| | - Brett W Benz
- Museum of Zoology and Department of Ecology and Evolutionary Biology, University of Michigan
| | - Michael J Andersen
- Department of Biology and Museum of Southwestern Biology, University of New Mexico
| |
Collapse
|
13
|
Thomas AE, Igea J, Meudt HM, Albach DC, Lee WG, Tanentzap AJ. Using target sequence capture to improve the phylogenetic resolution of a rapid radiation in New Zealand Veronica. AMERICAN JOURNAL OF BOTANY 2021; 108:1289-1306. [PMID: 34173225 DOI: 10.1002/ajb2.1678] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 03/10/2021] [Indexed: 05/08/2023]
Abstract
PREMISE Recent, rapid radiations present a challenge for phylogenetic reconstruction. Fast successive speciation events typically lead to low sequence divergence and poorly resolved relationships with standard phylogenetic markers. Target sequence capture of many independent nuclear loci has the potential to improve phylogenetic resolution for rapid radiations. METHODS Here we applied target sequence capture with 353 protein-coding genes (Angiosperms353 bait kit) to Veronica sect. Hebe (common name hebe) to determine its utility for improving the phylogenetic resolution of rapid radiations. Veronica section Hebe originated 5-10 million years ago in New Zealand, forming a monophyletic radiation of ca 130 extant species. RESULTS We obtained approximately 150 kbp of 353 protein-coding exons and an additional 200 kbp of flanking noncoding sequences for each of 77 hebe and two outgroup species. When comparing coding, noncoding, and combined data sets, we found that the latter provided the best overall phylogenetic resolution. While some deep nodes in the radiation remained unresolved, our phylogeny provided broad and often improved support for subclades identified by both morphology and standard markers in previous studies. Gene-tree discordance was nonetheless widespread, indicating that additional methods are needed to disentangle fully the history of the radiation. CONCLUSIONS Phylogenomic target capture data sets both increase phylogenetic signal and deliver new insights into the complex evolutionary history of rapid radiations as compared with traditional markers. Improving methods to resolve remaining discordance among loci from target sequence capture is now important to facilitate the further study of rapid radiations.
Collapse
Affiliation(s)
- Anne E Thomas
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Javier Igea
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Heidi M Meudt
- Museum of New Zealand Te Papa Tongarewa, Wellington, New Zealand
| | - Dirk C Albach
- Carl von Ossietzky-University, Oldenburg, D-26111, Germany
| | - William G Lee
- Manaaki Whenua - Landcare Research Otago, Dunedin, New Zealand
| | - Andrew J Tanentzap
- Ecosystems and Global Change Group, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| |
Collapse
|
14
|
Parada A, Hanson J, D'Elía G. Ultraconserved Elements Improve the Resolution of Difficult Nodes within the Rapid Radiation of Neotropical Sigmodontine Rodents (Cricetidae: Sigmodontinae). Syst Biol 2021; 70:1090-1100. [PMID: 33787920 DOI: 10.1093/sysbio/syab023] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 03/23/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open
Abstract
Sigmodontine rodents (Cricetidae, Sigmodontinae) represent the second largest muroid subfamily and the most species-rich group of New World mammals, encompassing above 410 living species and ca. 87 genera. Even with advances on the clarification of sigmodontine phylogenetic relationships that have been made recently, the phylogenetic relationships among the 12 main group of genera (i.e., tribes) remain poorly resolved, in particular among those forming the large clade Oryzomyalia. This pattern has been interpreted as consequence of a rapid radiation upon the group entrance into South America. Here, we attempted to resolve phylogenetic relationships within Sigmodontinae using target capture and high-throughput sequencing of ultraconserved elements (UCEs). We enriched and sequenced UCEs for 56 individuals and collected data from four already available genomes. Analyses of distinct data sets, based on the capture of 4,634 loci, resulted in a highly resolved phylogeny consistent across different methods. Coalescent species-tree based approaches, concatenated matrices, and Bayesian analyses recovered similar topologies that were congruent at the resolution of difficult nodes. We recovered good support for the intertribal relationships within Oryzomyalia; for instance, the tribe Oryzomyini appears as the sister taxa of the remaining oryzomyalid tribes. The estimates of divergence times agree with results of previous studies. We inferred the crown age of the sigmodontine rodents at the end of Middle Miocene, while the main lineages of Oryzomyalia appear to have radiated in a short interval during the Late Miocene. Thus, the collection of a genomic scale data set with a wide taxonomic sampling, provided resolution for the first time of the relationships among the main lineages of Sigmodontinae. We expect the phylogeny presented here will become the backbone for future systematic and evolutionary studies of the group.
Collapse
Affiliation(s)
- Andrés Parada
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia, Chile
| | - John Hanson
- RTLGenomics, Lubbock, TX, USA. Department of Biology, Columbus State University, Columbus, GA, USA
| | - Guillermo D'Elía
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia, Chile
| |
Collapse
|
15
|
Joubran SS, Cassin-Sackett L. Genomic resources for an ecologically important rodent, Gunnison’s prairie dogs (Cynomys gunnisoni). CONSERV GENET RESOUR 2021. [DOI: 10.1007/s12686-021-01192-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses. Mol Phylogenet Evol 2021; 158:107092. [PMID: 33545272 DOI: 10.1016/j.ympev.2021.107092] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/30/2020] [Accepted: 01/28/2021] [Indexed: 01/15/2023]
Abstract
In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.
Collapse
|
17
|
Jiang X, Edwards SV, Liu L. The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets. Syst Biol 2021; 69:795-812. [PMID: 32011711 PMCID: PMC7302055 DOI: 10.1093/sysbio/syaa008] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 12/24/2019] [Accepted: 01/02/2020] [Indexed: 11/30/2022] Open
Abstract
A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically congruent gene trees suggest that a poor fit of substitution models, rejected by 44% of loci, and concatenation models, rejected by 38% of loci, is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across six major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models. Although conducted on reduced data sets due to computational constraints, Bayesian model validation and comparison both strongly favor the MSC over concatenation across all data sets; the concatenation assumption of congruent gene trees rarely holds for phylogenomic data sets with more than 10 loci. Thus, for large phylogenomic data sets, model comparisons are expected to consistently and more strongly favor the coalescent model over the concatenation model. We also found that loci rejecting the MSC have little effect on species tree estimation. Our study reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference. [Bayes factor; Bayesian model validation; coalescent prior; congruent gene trees; independent prior; Metazoa; posterior predictive simulation.]
Collapse
Affiliation(s)
- Xiaodong Jiang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA.,Institute of Bioinformatics, University of Georgia, 120 Green Street, Athens, GA 30602, USA
| |
Collapse
|
18
|
Menéndez I, Gómez Cano AR, Cantalapiedra JL, Peláez‐Campomanes P, Álvarez‐Sierra MÁ, Hernández Fernández M. A multi‐layered approach to the diversification of squirrels. Mamm Rev 2020. [DOI: 10.1111/mam.12215] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Iris Menéndez
- Departamento de Geodinámica, Estratigrafía y Paleontología, Facultad de Ciencias Geológicas Universidad Complutense de Madrid C/ José Antonio Novais 12 Madrid28040 Spain
- Departamento de Cambio Medioambiental Instituto de Geociencias (UCM, CSIC) C/Severo Ochoa 7 Madrid28040 Spain
| | | | - Juan L. Cantalapiedra
- Departamento de Ciencias de la Vida, GloCEE Global Change Ecology and Evolution Research Group Universidad de Alcalá Plaza de San Diego s/n, Alcalá de Henares Madrid28801 Spain
| | - Pablo Peláez‐Campomanes
- Departameto de Paleobiología Museo Nacional de Ciencias Naturales, MNCN‐CSIC C/ José Gutiérrez Abascal, 2 Madrid28006 Spain
| | - María Ángeles Álvarez‐Sierra
- Departamento de Geodinámica, Estratigrafía y Paleontología, Facultad de Ciencias Geológicas Universidad Complutense de Madrid C/ José Antonio Novais 12 Madrid28040 Spain
- Departamento de Cambio Medioambiental Instituto de Geociencias (UCM, CSIC) C/Severo Ochoa 7 Madrid28040 Spain
| | - Manuel Hernández Fernández
- Departamento de Geodinámica, Estratigrafía y Paleontología, Facultad de Ciencias Geológicas Universidad Complutense de Madrid C/ José Antonio Novais 12 Madrid28040 Spain
- Departamento de Cambio Medioambiental Instituto de Geociencias (UCM, CSIC) C/Severo Ochoa 7 Madrid28040 Spain
| |
Collapse
|
19
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Das I, Brown RM. Gene flow creates a mirage of cryptic species in a Southeast Asian spotted stream frog complex. Mol Ecol 2020; 29:3970-3987. [PMID: 32808335 DOI: 10.1111/mec.15603] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 02/06/2023]
Abstract
Most new cryptic species are described using conventional tree- and distance-based species delimitation methods (SDMs), which rely on phylogenetic arrangements and measures of genetic divergence. However, although numerous factors such as population structure and gene flow are known to confound phylogenetic inference and species delimitation, the influence of these processes is not frequently evaluated. Using large numbers of exons, introns, and ultraconserved elements obtained using the FrogCap sequence-capture protocol, we compared conventional SDMs with more robust genomic analyses that assess population structure and gene flow to characterize species boundaries in a Southeast Asian frog complex (Pulchrana picturata). Our results showed that gene flow and introgression can produce phylogenetic patterns and levels of divergence that resemble distinct species (up to 10% divergence in mitochondrial DNA). Hybrid populations were inferred as independent (singleton) clades that were highly divergent from adjacent populations (7%-10%) and unusually similar (<3%) to allopatric populations. Such anomalous patterns are not uncommon in Southeast Asian amphibians, which brings into question whether the high levels of cryptic diversity observed in other amphibian groups reflect distinct cryptic species-or, instead, highly admixed and structured metapopulation lineages. Our results also provide an alternative explanation to the conundrum of divergent (sometimes nonsister) sympatric lineages-a pattern that has been celebrated as indicative of true cryptic speciation. Based on these findings, we recommend that species delimitation of continuously distributed "cryptic" groups should not rely solely on conventional SDMs, but should necessarily examine population structure and gene flow to avoid taxonomic inflation.
Collapse
Affiliation(s)
- Kin O Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, Singapore
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Perry L Wood
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL, USA
| | - L L Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, Riverside, CA, USA
| | - Indraneil Das
- Institute of Biodiversity and Environmental Conservation, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
20
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
21
|
Morales-Briones DF, Kadereit G, Tefarikis DT, Moore MJ, Smith SA, Brockington SF, Timoneda A, Yim WC, Cushman JC, Yang Y. Disentangling Sources of Gene Tree Discordance in Phylogenomic Data Sets: Testing Ancient Hybridizations in Amaranthaceae s.l. Syst Biol 2020; 70:219-235. [PMID: 32785686 PMCID: PMC7875436 DOI: 10.1093/sysbio/syaa066] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 03/01/2020] [Accepted: 09/03/2020] [Indexed: 12/26/2022] Open
Abstract
Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.]
Collapse
Affiliation(s)
- Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| | - Gudrun Kadereit
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099 Mainz, Germany
| | - Delphine T Tefarikis
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099 Mainz, Germany
| | - Michael J Moore
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH 44074-1097, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA
| | - Samuel F Brockington
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge CB2 3EA, UK
| | - Alfonso Timoneda
- Department of Plant Sciences, University of Cambridge, Tennis Court Road, Cambridge CB2 3EA, UK
| | - Won C Yim
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, 89577, USA
| | - John C Cushman
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, 89577, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| |
Collapse
|
22
|
Simmons MP, Kessenich J. Divergence and support among slightly suboptimal likelihood gene trees. Cladistics 2019; 36:322-340. [DOI: 10.1111/cla.12404] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2019] [Indexed: 12/18/2022] Open
Affiliation(s)
- Mark P. Simmons
- Department of Biology Colorado State University Fort Collins CO 80523‐1878 USA
| | - John Kessenich
- 305 W. Magnolia Street PMB 134 Fort Collins CO 80521 USA
| |
Collapse
|
23
|
Gomes-Da-Silva J, Santos-Silva F, Forzza RC. Does nomenclatural stability justify para/polyphyletic taxa? A phylogenetic classification in the xeric clade Pitcairnioideae (Bromeliaceae). SYST BIODIVERS 2019. [DOI: 10.1080/14772000.2019.1646834] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- JanaÍna Gomes-Da-Silva
- Jardim Botânico do Rio de Janeiro, Rua Pacheco Leão, 915, Rio de Janeiro, RJ, 22460-030, Brazil
- Programa de Pós-Graduação em Botânica, Universidade Federal do Paraná, Av. Francisco Heráclito dos Santos s.n., Campus do Centro Politécnico, Curitiba, PR, 81531-980, Brazil
| | - Fernanda Santos-Silva
- Departamento de Ciências Exatas e Naturais, Universidade Estadual do Sudoeste da Bahia, Campus Universitário Juvino Oliveira, Rodovia BR 415, km 04, Itapetinga, BA, 45700-000, Brazil
| | | |
Collapse
|
24
|
Mongiardino Koch N, Coppard SE, Lessios HA, Briggs DEG, Mooi R, Rouse GW. A phylogenomic resolution of the sea urchin tree of life. BMC Evol Biol 2018; 18:189. [PMID: 30545284 PMCID: PMC6293586 DOI: 10.1186/s12862-018-1300-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 11/19/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Echinoidea is a clade of marine animals including sea urchins, heart urchins, sand dollars and sea biscuits. Found in benthic habitats across all latitudes, echinoids are key components of marine communities such as coral reefs and kelp forests. A little over 1000 species inhabit the oceans today, a diversity that traces its roots back at least to the Permian. Although much effort has been devoted to elucidating the echinoid tree of life using a variety of morphological data, molecular attempts have relied on only a handful of genes. Both of these approaches have had limited success at resolving the deepest nodes of the tree, and their disagreement over the positions of a number of clades remains unresolved. RESULTS We performed de novo sequencing and assembly of 17 transcriptomes to complement available genomic resources of sea urchins and produce the first phylogenomic analysis of the clade. Multiple methods of probabilistic inference recovered identical topologies, with virtually all nodes showing maximum support. In contrast, the coalescent-based method ASTRAL-II resolved one node differently, a result apparently driven by gene tree error induced by evolutionary rate heterogeneity. Regardless of the method employed, our phylogenetic structure deviates from the currently accepted classification of echinoids, with neither Acroechinoidea (all euechinoids except echinothurioids), nor Clypeasteroida (sand dollars and sea biscuits) being monophyletic as currently defined. We show that phylogenetic signal for novel resolutions of these lineages is strong and distributed throughout the genome, and fail to recover systematic biases as drivers of our results. CONCLUSIONS Our investigation substantially augments the molecular resources available for sea urchins, providing the first transcriptomes for many of its main lineages. Using this expanded genomic dataset, we resolve the position of several clades in agreement with early molecular analyses but in disagreement with morphological data. Our efforts settle multiple phylogenetic uncertainties, including the position of the enigmatic deep-sea echinothurioids and the identity of the sister clade to sand dollars. We offer a detailed assessment of evolutionary scenarios that could reconcile our findings with morphological evidence, opening up new lines of research into the development and evolutionary history of this ancient clade.
Collapse
Affiliation(s)
| | - Simon E. Coppard
- Department of Biology, Hamilton College, Clinton, NY USA
- Smithsonian Tropical Research Institute, Balboa, Panama
| | | | - Derek E. G. Briggs
- Department of Geology and Geophysics, Yale University, New Haven, CT USA
- Peabody Museum of Natural History, Yale University, New Haven, CT USA
| | - Rich Mooi
- Department of Invertebrate Zoology and Geology, California Academy of Sciences, San Francisco, CA USA
| | - Greg W. Rouse
- Scripps Institution of Oceanography, UC San Diego, La Jolla, CA USA
| |
Collapse
|