1
|
Zhang Z, Liu G, Li M. Incomplete lineage sorting and gene flow within Allium (Amayllidaceae). Mol Phylogenet Evol 2024; 195:108054. [PMID: 38471599 DOI: 10.1016/j.ympev.2024.108054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/01/2024] [Accepted: 03/07/2024] [Indexed: 03/14/2024]
Abstract
The phylogeny and systematics of the genus Allium have been studied with a variety of diverse data types, including an increasing amount of molecular data. However, strong phylogenetic discordance and high levels of uncertainty have prevented the identification of a consistent phylogeny. The difficulty in establishing phylogenetic consensus and evidence for genealogical discordance make Allium a compelling test case to assess the relative contribution of incomplete lineage sorting (ILS), gene flow and gene tree estimation error on phylogenetic reconstruction. In this study, we obtained 75 transcriptomes of 38 Allium species across 10 subgenera. Whole plastid genome, single copy genes and consensus CDS were generated to estimate phylogenetic trees both using coalescence and concatenation methods. Multiple approaches including coalescence simulation, quartet sampling, reticulate network inference, sequence simulation, theta of ILS and reticulation index were carried out across the CDS gene trees to investigate the degrees of ILS, gene flow and gene tree estimation error. Afterward, a regression analysis was used to test the relative contributions of each of these forms of uncertainty to the final phylogeny. Despite extensive topological discordance among gene trees, we found a fully supported species tree that agrees with the most of well-accepted relationships and establishes monophyly of the genus Allium. We presented clear evidence for substantial ILS across the phylogeny of Allium. Further, we identified two ancient hybridization events for the formation of the second evolutionary line and subg. Butomissa as well as several introgression events between recently diverged species. Our regression analysis revealed that gene tree inference error and gene flow were the two most dominant factors explaining for the overall gene tree variation, with the difficulty in disentangling the effects of ILS and gene tree estimation error due to a positive correlation between them. Based on our efforts to mitigate the methodological errors in reconstructing trees, we believed ILS and gene flow are two principal reasons for the oft-reported phylogenetic heterogeneity of Allium. This study presents a strongly-supported and well-resolved phylogenetic backbone for the sampled Allium species, and exemplifies how to untangle heterogeneity in phylogenetic signal and reconstruct the true evolutionary history of the target taxa.
Collapse
Affiliation(s)
- ZengZhu Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Gang Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Minjie Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China.
| |
Collapse
|
2
|
Lähteenaro M, Benda D, Straka J, Nylander JAA, Bergsten J. Phylogenomic analysis of Stylops reveals the evolutionary history of a Holarctic Strepsiptera radiation parasitizing wild bees. Mol Phylogenet Evol 2024; 195:108068. [PMID: 38554985 DOI: 10.1016/j.ympev.2024.108068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/07/2024] [Accepted: 03/24/2024] [Indexed: 04/02/2024]
Abstract
Holarctic Stylops is the largest genus of the enigmatic insect order Strepsiptera, twisted winged parasites. Members of Stylops are obligate endoparasites of Andrena mining bees and exhibit extreme sexual dimorphism typical of Strepsiptera. So far, molecular studies on Stylops have focused on questions on species delimitation. Here, we utilize the power of whole genome sequencing to infer the phylogeny of this morphologically challenging genus from thousands of loci. We use a species tree method, concatenated maximum likelihood analysis and Bayesian analysis with a relaxed clock model to reconstruct the phylogeny of 46 Stylops species, estimate divergence times, evaluate topological consistency across methods and infer the root position. Furthermore, the biogeographical history and coevolutionary patterns with host species are assessed. All methods recovered a well resolved topology with close to all nodes maximally supported and only a handful of minor topological variations. Based on the result, we find that included species can be divided into 12 species groups, seven of them including only Palaearctic species, three Nearctic and two were geographically mixed. We find a strongly supported root position between a clade formed by the spreta, thwaitesi and gwynanae species groups and the remaining species and that the sister group of Stylops is Eurystylops or Eurystylops + Kinzelbachus. Our results indicate that Stylops originated in the Western Palaearctic or Western Palaearctic and Nearctic in the early Neogene or late Paleogene, with four independent dispersal events to the Nearctic. Cophylogenetic analyses indicate that the diversification of Stylops has been shaped by both significant coevolution with the mining bee hosts and host-shifting. The well resolved and strongly supported phylogeny will provide a valuable phylogenetic basis for further studies into the fascinating world of Strepsipterans.
Collapse
Affiliation(s)
- Meri Lähteenaro
- Department of Zoology, Swedish Museum of Natural History, P. O. Box 50007, SE-104 05 Stockholm, Sweden; Department of Zoology, Faculty of Science, Stockholm University, SE-106 91 Stockholm, Sweden.
| | - Daniel Benda
- Department of Zoology, Faculty of Science, Charles University, Vinicna 7, CZ-128 44, Prague 2, Czech Republic; Department of Entomology, National Museum of the Czech Republic, Cirkusová 1740, CZ-19300 Prague 9, Czech Republic.
| | - Jakub Straka
- Department of Zoology, Faculty of Science, Charles University, Vinicna 7, CZ-128 44, Prague 2, Czech Republic.
| | - Johan A A Nylander
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, P.O. Box 50007, SE-106 91 Stockholm, Sweden.
| | - Johannes Bergsten
- Department of Zoology, Swedish Museum of Natural History, P. O. Box 50007, SE-104 05 Stockholm, Sweden; Department of Zoology, Faculty of Science, Stockholm University, SE-106 91 Stockholm, Sweden.
| |
Collapse
|
3
|
Sinaiko G, Cao Y, Dietrich CH. Phylogenomics of the leafhopper genus Neoaliturus Distant, 1918 (Hemiptera: Cicadellidae: Deltocephalinae) reveals genetically divergent lineages in the invasive beet leafhopper. Mol Phylogenet Evol 2024; 195:108071. [PMID: 38579933 DOI: 10.1016/j.ympev.2024.108071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/20/2024] [Accepted: 04/02/2024] [Indexed: 04/07/2024]
Abstract
Phylogenomic analysis based on nucleotide sequences of 398 nuclear gene loci for 67 representatives of the leafhopper genus Neoaliturus yielded well-resolved estimates of relationships among species of the genus. Subgenus Neoaliturus (Neoaliturus) is consistently paraphyletic with respect to Neoaliturus (Circulifer). The analysis revealed the presence of at least ten genetically divergent clades among specimens consistent with the previous morphology-based definition of the leafhopper genus "Circulifer" which includes three previously recognized "species complexes." Specimens of the American beet leafhopper, N. tenellus (Baker), collected from the southwestern USA consistently group with one of these clades, comprising specimens from the eastern Mediterranean. Some of the remaining lineages are consistent with ecological differences previously observed among eastern Mediterranean populations and suggest that N. tenellus, as previously defined, comprises multiple monophyletic species, distinguishable by slight morphological differences.
Collapse
Affiliation(s)
- Guy Sinaiko
- School of Zoology, Tel-Aviv University, Tel-Aviv 6997801, Israel.
| | - Yanghui Cao
- Key Laboratory of Plant Protection Resources and Pest Management of the Ministry of Education, Entomological Museum, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Christopher H Dietrich
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL 61820, USA
| |
Collapse
|
4
|
Naranjo AA, Edwards CE, Gitzendanner MA, Soltis DE, Soltis PS. Abundant incongruence in a clade endemic to a biodiversity hotspot: Phylogenetics of the scrub mint clade (Lamiaceae). Mol Phylogenet Evol 2024; 192:108014. [PMID: 38199595 DOI: 10.1016/j.ympev.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/26/2023] [Accepted: 01/06/2024] [Indexed: 01/12/2024]
Abstract
The Scrub Mint clade(Lamiaceae) provides a unique system for investigating the evolutionary processes driving diversification in the North American Coastal Plain from both a systematic and biogeographic context. The clade comprisesDicerandra, Conradina, Piloblephis, Stachydeoma, and four species of the broadly defined genus Clinopodium(Mentheae; Lamiaceae), almost all of which are endemic to the North American Eastern Coastal Plain. Most species of this clade are threatened or endangered and restricted to sandhill or a mosaic of scrub habitats. We analyzed relationships in this clade to understand the evolution of the group and identify evolutionary mechanisms acting on the clade, with important implications for conservation. We used a target-capture method to sequence and analyze 238 nuclear loci across all species of scrub mints, reconstructed the phylogeny, and calculated gene tree concordance, gene tree estimation error, and reticulation indices for every node in the tree using ML methods. Phylogenetic networks were used to determine reticulation events. Our nuclear phylogenetic estimates were consistent with previous results, while greatly increasing the robustness of taxon sampling. The phylogeny resolved the full relationship between Dicerandra and Conradina and the less-studied members of the clade (Piloblephis, Stachydeoma, Clinopodium spp.). We found hotspots of gene tree discordance and reticulation throughout the tree, especially in perennial Dicerandra. Several instances of reticulation events were uncovered between annual and perennial Dicerandra, and within the Conradina + allies clade. Incomplete lineage sorting also likely contributed to phylogenetic discordance. These results clarify phylogenetic relationships in the clade and provide insight on important evolutionary drivers in the clade, such as hybridization. General relationships in the group were confirmed, while the large amount of gene tree discordance is likely due to reticulation across the phylogeny.
Collapse
Affiliation(s)
- Andre A Naranjo
- Institute of Environment, Department of Biological Sciences, Florida International University, 11200 SW 8th ST, Miami, FL 33199, USA; Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA.
| | | | - Matthew A Gitzendanner
- Department of Biology, University of Florida, PO Box 118526, Gainesville, FL 32611-8526, USA
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA; Department of Biology, University of Florida, PO Box 118526, Gainesville, FL 32611-8526, USA
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA
| |
Collapse
|
5
|
Redelings BD, Holder MT. Speeding up iterative applications of the BUILD supertree algorithm. PeerJ 2024; 12:e16624. [PMID: 38188165 PMCID: PMC10768670 DOI: 10.7717/peerj.16624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 11/16/2023] [Indexed: 01/09/2024] Open
Abstract
The Open Tree of Life (OToL) project produces a supertree that summarizes phylogenetic knowledge from tree estimates published in the primary literature. The supertree construction algorithm iteratively calls Aho's Build algorithm thousands of times in order to assess the compatability of different phylogenetic groupings. We describe an incrementalized version of the Build algorithm that is able to share work between successive calls to Build. We provide details that allow a programmer to implement the incremental algorithm BuildInc, including pseudo-code and a description of data structures. We assess the effect of BuildInc on our supertree algorithm by analyzing simulated data and by analyzing a supertree problem taken from the OpenTree 13.4 synthesis tree. We find that BuildInc provides up to 550-fold speedup for our supertree algorithm.
Collapse
Affiliation(s)
- Benjamin D. Redelings
- Biology Department, Duke University, Durham, NC, United States of America
- Ronin Institute, Durham, NC, United States of America
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States of America
| | - Mark T. Holder
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States of America
- Biodiversity Institute, University of Kansas, Lawrence, KS, United States of America
| |
Collapse
|
6
|
Li J, Han G, Tian X, Liang D, Zhang P. UPrimer: A Clade-Specific Primer Design Program Based on Nested-PCR Strategy and Its Applications in Amplicon Capture Phylogenomics. Mol Biol Evol 2023; 40:msad230. [PMID: 37832226 PMCID: PMC10630340 DOI: 10.1093/molbev/msad230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 10/15/2023] Open
Abstract
Amplicon capture is a promising target sequence capture approach for phylogenomic analyses, and the design of clade-specific nuclear protein-coding locus (NPCL) amplification primers is crucial for its successful application. In this study, we developed a primer design program called UPrimer that can quickly design clade-specific NPCL amplification primers based on genome data, without requiring manual intervention. Unlike other available primer design programs, UPrimer uses a nested-PCR strategy that greatly improves the amplification success rate of the designed primers. We examined all available metazoan genome data deposited in NCBI and developed NPCL primer sets for 21 metazoan groups with UPrimer, covering a wide range of taxa, including arthropods, mollusks, cnidarians, echinoderms, and vertebrates. On average, each clade-specific NPCL primer set comprises ∼1,000 NPCLs. PCR amplification tests were performed in 6 metazoan groups, and the developed primers showed a PCR success rate exceeding 95%. Furthermore, we demonstrated a phylogenetic case study in Lepidoptera, showing how NPCL primers can be used for phylogenomic analyses with amplicon capture. Our results indicated that using 100 NPCL probes recovered robust high-level phylogenetic relationships among butterflies, highlighting the utility of the newly designed NPCL primer sets for phylogenetic studies. We anticipate that the automated tool UPrimer and the developed NPCL primer sets for 21 metazoan groups will enable researchers to obtain phylogenomic data more efficiently and cost-effectively and accelerate the resolution of various parts of the Tree of Life.
Collapse
Affiliation(s)
- JiaXuan Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - GuangCheng Han
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Xiao Tian
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| |
Collapse
|
7
|
Roberts WR, Ruck EC, Downey KM, Pinseel E, Alverson AJ. Resolving Marine-Freshwater Transitions by Diatoms Through a Fog of Gene Tree Discordance. Syst Biol 2023; 72:984-997. [PMID: 37335140 DOI: 10.1093/sysbio/syad038] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 06/02/2023] [Accepted: 06/16/2023] [Indexed: 06/21/2023] Open
Abstract
Despite the obstacles facing marine colonists, most lineages of aquatic organisms have colonized and diversified in freshwaters repeatedly. These transitions can trigger rapid morphological or physiological change and, on longer timescales, lead to increased rates of speciation and extinction. Diatoms are a lineage of ancestrally marine microalgae that have diversified throughout freshwater habitats worldwide. We generated a phylogenomic data set of genomes and transcriptomes for 59 diatom taxa to resolve freshwater transitions in one lineage, the Thalassiosirales. Although most parts of the species tree were consistently resolved with strong support, we had difficulties resolving a Paleocene radiation, which affected the placement of one freshwater lineage. This and other parts of the tree were characterized by high levels of gene tree discordance caused by incomplete lineage sorting and low phylogenetic signal. Despite differences in species trees inferred from concatenation versus summary methods and codons versus amino acids, traditional methods of ancestral state reconstruction supported six transitions into freshwaters, two of which led to subsequent species diversification. Evidence from gene trees, protein alignments, and diatom life history together suggest that habitat transitions were largely the product of homoplasy rather than hemiplasy, a condition where transitions occur on branches in gene trees not shared with the species tree. Nevertheless, we identified a set of putatively hemiplasious genes, many of which have been associated with shifts to low salinity, indicating that hemiplasy played a small but potentially important role in freshwater adaptation. Accounting for differences in evolutionary outcomes, in which some taxa became locked into freshwaters while others were able to return to the ocean or become salinity generalists, might help further distinguish different sources of adaptive mutation in freshwater diatoms.
Collapse
Affiliation(s)
- Wade R Roberts
- Department of Biological Sciences, University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Elizabeth C Ruck
- Department of Biological Sciences, University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Kala M Downey
- Department of Biological Sciences, University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Eveline Pinseel
- Department of Biological Sciences, University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| | - Andrew J Alverson
- Department of Biological Sciences, University of Arkansas, 1 University of Arkansas, Fayetteville, AR, 72701, USA
| |
Collapse
|
8
|
Liu L, Yu L, Wu S, Arnold J, Whalen C, Davis C, Edwards S. Short branch attraction in phylogenomic inference under the multispecies coalescent. Front Ecol Evol 2023; 11:1134764. [PMID: 39233780 PMCID: PMC11372852 DOI: 10.3389/fevo.2023.1134764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024] Open
Abstract
Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star treeT * = S 1 , S 2 , S 3 , S 4 with two short branches leading to the speciesS 1 andS 2 , we demonstrate that maximum likelihood significantly favors the wrong bifurcating treeS 1 , S 2 , S 3 , S 4 grouping the two speciesS 1 andS 2 with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Lili Yu
- Department of Biostatistics, Georgia Southern University, Statesboro, GA, United States
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, Jiangsu International Joint Center of Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
| | - Jonathan Arnold
- Department of Genetics, University of Georgia, Athens, GA, United States
| | - Christopher Whalen
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, United States
| | - Charles Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| | - Scott Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| |
Collapse
|
9
|
Karin BR, Arellano S, Wang L, Walzer K, Pomerantz A, Vasquez JM, Chatla K, Sudmant PH, Bach BH, Smith LL, McGuire JA. Highly-multiplexed and efficient long-amplicon PacBio and Nanopore sequencing of hundreds of full mitochondrial genomes. BMC Genomics 2023; 24:229. [PMID: 37131128 PMCID: PMC10155392 DOI: 10.1186/s12864-023-09277-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/24/2023] [Indexed: 05/04/2023] Open
Abstract
BACKGROUND Mitochondrial genome sequences have become critical to the study of biodiversity. Genome skimming and other short-read based methods are the most common approaches, but they are not well-suited to scale up to multiplexing hundreds of samples. Here, we report on a new approach to sequence hundreds to thousands of complete mitochondrial genomes in parallel using long-amplicon sequencing. We amplified the mitochondrial genome of 677 specimens in two partially overlapping amplicons and implemented an asymmetric PCR-based indexing approach to multiplex 1,159 long amplicons together on a single PacBio SMRT Sequel II cell. We also tested this method on Oxford Nanopore Technologies (ONT) MinION R9.4 to assess if this method could be applied to other long-read technologies. We implemented several optimizations that make this method significantly more efficient than alternative mitochondrial genome sequencing methods. RESULTS With the PacBio sequencing data we recovered at least one of the two fragments for 96% of samples (~ 80-90%) with mean coverage ~ 1,500x. The ONT data recovered less than 50% of input fragments likely due to low throughput and the design of the Barcoded Universal Primers which were optimized for PacBio sequencing. We compared a single mitochondrial gene alignment to half and full mitochondrial genomes and found, as expected, increased tree support with longer alignments, though whole mitochondrial genomes were not significantly better than half mitochondrial genomes. CONCLUSIONS This method can effectively capture thousands of long amplicons in a single run and be used to build more robust phylogenies quickly and effectively. We provide several recommendations for future users depending on the evolutionary scale of their system. A natural extension of this method is to collect multi-locus datasets consisting of mitochondrial genomes and several long nuclear loci at once.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA.
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA.
| | - Selene Arellano
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Laura Wang
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Kayla Walzer
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Aaron Pomerantz
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Juan Manuel Vasquez
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Kamalakar Chatla
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Peter H Sudmant
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Bryan H Bach
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Lydia L Smith
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| | - Jimmy A McGuire
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| |
Collapse
|
10
|
The Structure of Evolutionary Model Space for Proteins across the Tree of Life. BIOLOGY 2023; 12:biology12020282. [PMID: 36829559 PMCID: PMC9952988 DOI: 10.3390/biology12020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Collapse
|
11
|
Hill M, Legried B, Roch S. Species tree estimation under joint modeling of coalescence and duplication: Sample complexity of quartet methods. ANN APPL PROBAB 2022. [DOI: 10.1214/22-aap1799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Max Hill
- Department of Mathematics, University of Wisconsin–Madison
| | | | - Sebastien Roch
- Department of Mathematics, University of Wisconsin–Madison
| |
Collapse
|
12
|
Astudillo-Clavijo V, Stiassny MLJ, Ilves KL, Musilova Z, Salzburger W, López-Fernández H. Exon-based phylogenomics and the relationships of African cichlid fishes: tackling the challenges of reconstructing phylogenies with repeated rapid radiations. Syst Biol 2022; 72:134-149. [PMID: 35880863 DOI: 10.1093/sysbio/syac051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
African cichlids (subfamily: Pseudocrenilabrinae) are among the most diverse vertebrates, and their propensity for repeated rapid radiation has made them a celebrated model system in evolutionary research. Nonetheless, despite numerous studies, phylogenetic uncertainty persists, and riverine lineages remain comparatively underrepresented in higher-level phylogenetic studies. Heterogeneous gene histories resulting from incomplete lineage sorting (ILS) and hybridization are likely sources of uncertainty, especially during episodes of rapid speciation. We investigate relationships of Pseudocrenilabrinae and its close relatives while accounting for multiple sources of genetic discordance using species tree and hybrid network analyses with hundreds of single-copy exons. We improve sequence recovery for distant relatives, thereby extending the taxonomic reach of our probes, with a hybrid reference guided/de novo assembly approach. Our analyses provide robust hypotheses for most higher-level relationships and reveal widespread gene heterogeneity, including in riverine taxa. ILS and past hybridization are identified as sources of genetic discordance in different lineages. Sampling of various Blenniiformes (formerly Ovalentaria) adds strong phylogenomic support for convict blennies (Pholidichthyidae) as sister to Cichlidae, and points to other potentially useful protein-coding markers across the order. A reliable phylogeny with representatives from diverse environments will support ongoing taxonomic and comparative evolutionary research in the cichlid model system.
Collapse
Affiliation(s)
- Viviana Astudillo-Clavijo
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada.,Department of Natural History, Royal Ontario Museum, Toronto, M5S 2C6, Canada.,Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor, 48109, USA
| | - Melanie L J Stiassny
- Department of Ichthyology, American Museum of Natural History, New York, 10024-5102, USA
| | - Katriina L Ilves
- Research & Collections, Zoology, Canadian Museum of Nature, Ottawa, K1P 6P4, Canada
| | - Zuzana Musilova
- Department of Zoology, Charles University in Prague, Vinicna 7, Prague, CZ-128 44, Czech Republic
| | - Walter Salzburger
- Zoological Institute, University of Basel, Vesalgasse 1, CH-4051, Basel, Switzerland
| | - Hernán López-Fernández
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada.,Department of Natural History, Royal Ontario Museum, Toronto, M5S 2C6, Canada.,Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor, 48109, USA
| |
Collapse
|
13
|
Out of chaos: Phylogenomics of Asian Sonerileae. Mol Phylogenet Evol 2022; 175:107581. [PMID: 35810973 DOI: 10.1016/j.ympev.2022.107581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 05/23/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
Sonerileae is a diverse Melastomataceae lineage comprising ca. 1000 species in 44 genera, with >70% of genera and species distributed in Asia. Asian Sonerileae are taxonomically intractable with obscure generic circumscriptions. The backbone phylogeny of this group remains poorly resolved, possibly due to complexity caused by rapid species radiation in early and middle Miocene, which hampers further systematic study. Here, we used genome resequencing data to reconstruct the phylogeny of Asian Sonerileae. Three parallel datasets, viz. single-copy ortholog (SCO), genomic SNPs, and whole plastome, were assembled from genome resequencing data of 205 species for this purpose. Based on these genome-scale data, we provided the first well resolved phylogeny of Asian Sonerileae, with 34 major clades identified and 74% of the interclade relationships consistently resolved by both SCO and genomic data. Meanwhile, widespread phylogenetic discordance was detected among SCO gene trees as well as species trees reconstructed using different tree estimation methods (concatenation/site-based coalescent method/summary method) or different datasets (SCO/genomic/plastome). We explored sources of discordance using multiple approaches and found that the observed discordance in Asian Sonerileae was mainly caused by a combination of biased distribution of missing data, random noise from uninformative genes, incomplete lineage sorting, and hybridization/introgression. Exploration of these sources can enable us to generate hypotheses for future testing, which is the first step towards understanding the evolution of Asian Sonerileae. We also detected high levels of homoplasy for some characters traditionally used in taxonomy, which explains current chaotic generic delimitations. The backbone phylogeny of Asian Sonerileae revealed in this study offers a solid basis for future taxonomic revision at the generic level.
Collapse
|
14
|
Sianta SA, Kay KM. Phylogenomic analysis does not support a classic but controversial hypothesis of progenitor-derivative origins for the serpentine endemic Clarkia franciscana. Evolution 2022; 76:1246-1259. [PMID: 35403214 PMCID: PMC9322428 DOI: 10.1111/evo.14484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 02/25/2022] [Accepted: 03/04/2022] [Indexed: 01/21/2023]
Abstract
Budding speciation involves isolation of marginal populations at the periphery of a species range and is thought to be a prominent mode of speciation in organisms with low dispersal and/or strong local adaptation among populations. Budding speciation is typically evidenced by abutting, asymmetric ranges of ecologically divergent sister species and low genetic diversity in putative budded species. Yet these indirect patterns may be unreliable, instead caused by postspeciation processes such as range or demographic shifts. Nested phylogenetic relationships provide the most conclusive evidence of budding speciation. A putative case of budding speciation in the serpentine endemic Clarkia franciscana and two closely related widespread congeners was studied by Harlan Lewis, Peter Raven, Leslie Gottlieb, and others over a 20-year period, yet the origin of C. franciscana remains controversial. Here, we reinvestigate this system with phylogenomic analyses to determine whether C. franciscana is a recently derived budded species, phylogenetically nested within one of the other two putative progenitor species. In contrast to the hypothesized pattern of relatedness among the three Clarkia species, we find no evidence for recent budding speciation. Instead, the data suggest the three species diverged simultaneously. We urge caution in using contemporary range patterns to infer geographic modes of speciation.
Collapse
Affiliation(s)
- Shelley A. Sianta
- Department of Ecology and Evolutionary BiologyUniversity of CaliforniaSanta CruzCalifornia95060,Current Address: Department of Plant and Microbial BiologyUniversity of MinnesotaSt. PaulMinnesota55108
| | - Kathleen M. Kay
- Department of Ecology and Evolutionary BiologyUniversity of CaliforniaSanta CruzCalifornia95060
| |
Collapse
|
15
|
Wang Y, Ruhsam M, Milne R, Graham SW, Li J, Tao T, Zhang Y, Mao K. Incomplete lineage sorting and local extinction shaped the complex evolutionary history of the Paleogene relict conifer genus, Chamaecyparis (Cupressaceae). Mol Phylogenet Evol 2022; 172:107485. [PMID: 35452840 DOI: 10.1016/j.ympev.2022.107485] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 03/26/2022] [Accepted: 04/05/2022] [Indexed: 11/24/2022]
Abstract
Inferring accurate biogeographic history of plant taxa with an East Asia (EA)-North America (NA) is usually hindered by conflicting phylogenies and a poor fossil record. The current distribution of Chamaecyparis (false cypress; Cupressaceae) with four species in EA, and one each in western and eastern NA, and its relatively rich fossil record, make it an excellent model for studying the EA-NA disjunction. Here we reconstruct phylogenomic relationships within Chamaecyparis using > 1400 homologous nuclear and 61 plastid genes. Our phylogenomic analyses using concatenated and coalescent approaches revealed strong cytonuclear discordance and conflicting topologies between nuclear gene trees. Incomplete lineage sorting (ILS) and hybridization are possible explanations of conflict; however, our coalescent analyses and simulations suggest that ILS is the major contributor to the observed phylogenetic discrepancies. Based on a well-resolved species tree and four fossil calibrations, the crown lineage of Chamaecyparis is estimated to have originated in the upper Cretaceous, followed by diversification events in the early and middle Paleogene. Ancestral area reconstructions suggest that Chamaecyparis had an ancestral range spanning both EA and NA. Fossil records further indicate that this genus is a relict of the "boreotropical" flora, and that local extinctions of European species were caused by global cooling. Overall, our results unravel a complex evolutionary history of a Paleogene relict conifer genus, which may have involved ILS, hybridization and the extinction of local species.
Collapse
Affiliation(s)
- Yi Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, Sichuan, China
| | - Markus Ruhsam
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK
| | - Richard Milne
- Institute of Molecular Plant Science, School of Biological Science, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, V6T 1Z4, Canada
| | - Jialiang Li
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, Sichuan, China
| | - Tongzhou Tao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, Sichuan, China
| | - Yujiao Zhang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, Sichuan, China
| | - Kangshan Mao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610065, Sichuan, China; College of Science, Tibet University, Lhasa 850000, Xizang Autonomous Region, PR China.
| |
Collapse
|
16
|
Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022; 71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open
Abstract
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Collapse
Affiliation(s)
- James Willson
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mrinmoy Saha Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Baqiao Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Paul Zaharias
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
17
|
Dasarathy G, Mossel E, Nowak R, Roch S. A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements. J Math Biol 2022; 84:36. [PMID: 35394192 PMCID: PMC9258723 DOI: 10.1007/s00285-022-01731-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 10/18/2022]
Abstract
Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect. Another significant challenge in this area is that molecular sequences associated to each gene typically provide limited information about the gene trees themselves. While the modeling of sequence evolution by single-site substitutions is well-studied, few species tree reconstruction methods with theoretical guarantees actually address this latter issue. Instead, a standard-but unsatisfactory-assumption is that gene trees are perfectly reconstructed before being fed into a so-called summary method. Hence much remains to be done in the development of inference methodologies that rigorously account for gene tree estimation error-or completely avoid gene tree estimation in the first place. In previous work, a data requirement trade-off was derived between the number of loci m needed for an accurate reconstruction and the length of the locus sequences k. It was shown that to reconstruct an internal branch of length f, one needs m to be of the order of [Formula: see text]. That previous result was obtained under the restrictive assumption that mutation rates as well as population sizes are constant across the species phylogeny. Here we further generalize this result beyond this assumption. Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent, which we refer to as a stochastic Farris transform. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with [Formula: see text] species, the rooted topology of the species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
Collapse
Affiliation(s)
- Gautam Dasarathy
- School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, USA
| | - Elchanan Mossel
- Department of Mathematics and IDSS, Massachusetts Institute of Technology, Cambridge, USA
| | - Robert Nowak
- Department of Electrical and Computer Engineering, University of Wisconsin, Madison, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin, Madison, USA.
| |
Collapse
|
18
|
McLean BS, Bell KC, Cook JA. SNP-based Phylogenomic Inference in Holarctic Ground Squirrels (Urocitellus). Mol Phylogenet Evol 2022; 169:107396. [PMID: 35031463 DOI: 10.1016/j.ympev.2022.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/02/2021] [Accepted: 12/08/2021] [Indexed: 11/24/2022]
Abstract
Resolution of rapid evolutionary radiations requires harvesting maximal signal from phylogenomic datasets. However, studies of non-model clades often target conserved loci that are characterized by reduced information content, which can negatively affect gene tree precision and species tree accuracy. Single nucleotide polymorphism (SNP)-based methods are an underutilized but potentially valuable tool for estimating phylogeny and divergence times because they do not rely on resolved gene trees, allowing information from many or all variant loci to be leveraged in species tree reconstruction. We evaluated the utility of SNP-based methods in resolving phylogeny of Holarctic ground squirrels (Urocitellus), a radiation that has been difficult to disentangle, even in prior phylogenomic studies. We inferred phylogeny from a dataset of >3,000 ultraconserved element loci (UCEs) using two methods (SNAPP, SVDquartets) and compared our results with a new mitogenome phylogeny. We also systematically evaluated how phasing of UCEs improves per-locus information content, and inference of topology and other parameters within each of these SNP-based methods. Phasing improved topological resolution and branch length estimation at shallow levels (within species complexes), but less so at deeper levels, likely reflecting true uncertainty due to ancestral polymorphisms segregating in these rapidly diverging lineages. We resolved several key clades in Urocitellus and present targeted opportunities for future phylogenomic inquiry. Our results extend the roadmap for use of SNPs to address vertebrate radiations and support comparative analyses at multiple temporal scales.
Collapse
Affiliation(s)
- Bryan S McLean
- University of North Carolina Greensboro, Department of Biology, Greensboro, NC 27402 USA.
| | - Kayce C Bell
- Natural History Museum of Los Angeles County, Department of Mammalogy, Los Angeles, CA 90007 USA.
| | - Joseph A Cook
- University of New Mexico, Department of Biology and Museum of Southwestern Biology, Albuquerque, NM 87131 USA.
| |
Collapse
|
19
|
Singhal S, Derryberry GE, Bravo GA, Derryberry EP, Brumfield RT, Harvey MG. The dynamics of introgression across an avian radiation. Evol Lett 2021; 5:568-581. [PMID: 34917397 PMCID: PMC8645201 DOI: 10.1002/evl3.256] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/11/2021] [Accepted: 08/31/2021] [Indexed: 01/20/2023] Open
Abstract
Hybridization and resulting introgression can play both a destructive and a creative role in the evolution of diversity. Thus, characterizing when and where introgression is most likely to occur can help us understand the causes of diversification dynamics. Here, we examine the prevalence of and variation in introgression using phylogenomic data from a large (1300+ species), geographically widespread avian group, the suboscine birds. We first examine patterns of gene tree discordance across the geographic distribution of the entire clade. We then evaluate the signal of introgression in a subset of 206 species triads using Patterson's D‐statistic and test for associations between introgression signal and evolutionary, geographic, and environmental variables. We find that gene tree discordance varies across lineages and geographic regions. The signal of introgression is highest in cases where species occur in close geographic proximity and in regions with more dynamic climates since the Pleistocene. Our results highlight the potential of phylogenomic datasets for examining broad patterns of hybridization and suggest that the degree of introgression between diverging lineages might be predictable based on the setting in which they occur.
Collapse
Affiliation(s)
- Sonal Singhal
- Department of Biology California State University, Dominguez Hills Carson California 90747
| | - Graham E Derryberry
- Department of Ecology and Evolutionary Biology University of Tennessee Knoxville Tennessee 37996
| | - Gustavo A Bravo
- Department of Organismic and Evolutionary Biology Harvard University Cambridge Massachusetts 02138.,Museum of Comparative Zoology Harvard University Cambridge Massachusetts 02138
| | - Elizabeth P Derryberry
- Department of Ecology and Evolutionary Biology University of Tennessee Knoxville Tennessee 37996
| | - Robb T Brumfield
- Museum of Natural Science Louisiana State University Baton Rouge Louisiana 70803.,Department of Biological Sciences Louisiana State University Baton Rouge Louisiana 70803
| | - Michael G Harvey
- Department of Biological Sciences The University of Texas at El Paso El Paso Texas 79968.,Biodiversity Collections The University of Texas at El Paso El Paso Texas 79968
| |
Collapse
|
20
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
21
|
Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds. DIVERSITY 2021. [DOI: 10.3390/d13110555] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.
Collapse
|
22
|
Yardeni G, Viruel J, Paris M, Hess J, Groot Crego C, de La Harpe M, Rivera N, Barfuss MHJ, Till W, Guzmán-Jacob V, Krömer T, Lexer C, Paun O, Leroy T. Taxon-specific or universal? Using target capture to study the evolutionary history of rapid radiations. Mol Ecol Resour 2021; 22:927-945. [PMID: 34606683 PMCID: PMC9292372 DOI: 10.1111/1755-0998.13523] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 09/09/2021] [Accepted: 09/22/2021] [Indexed: 12/20/2022]
Abstract
Target capture has emerged as an important tool for phylogenetics and population genetics in nonmodel taxa. Whereas developing taxon‐specific capture probes requires sustained efforts, available universal kits may have a lower power to reconstruct relationships at shallow phylogenetic scales and within rapidly radiating clades. We present here a newly developed target capture set for Bromeliaceae, a large and ecologically diverse plant family with highly variable diversification rates. The set targets 1776 coding regions, including genes putatively involved in key innovations, with the aim to empower testing of a wide range of evolutionary hypotheses. We compare the relative power of this taxon‐specific set, Bromeliad1776, to the universal Angiosperms353 kit. The taxon‐specific set results in higher enrichment success across the entire family; however, the overall performance of both kits to reconstruct phylogenetic trees is relatively comparable, highlighting the vast potential of universal kits for resolving evolutionary relationships. For more detailed phylogenetic or population genetic analyses, for example the exploration of gene tree concordance, nucleotide diversity or population structure, the taxon‐specific capture set presents clear benefits. We discuss the potential lessons that this comparative study provides for future phylogenetic and population genetic investigations, in particular for the study of evolutionary radiations.
Collapse
Affiliation(s)
- Gil Yardeni
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | | | - Margot Paris
- Unit of Ecology & Evolution, Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Jaqueline Hess
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria.,Department of Soil Ecology, Helmholtz Centre for Environmental Research, UFZ, Halle (Saale), Germany
| | - Clara Groot Crego
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria.,Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Marylaure de La Harpe
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Norma Rivera
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Michael H J Barfuss
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Walter Till
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Valeria Guzmán-Jacob
- Biodiversity, Macroecology and Biogeography, University of Goettingen, Göttingen, Germany
| | - Thorsten Krömer
- Centro de Investigaciones Tropicales, Universidad Veracruzana, Xalapa, Mexico
| | - Christian Lexer
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Ovidiu Paun
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Thibault Leroy
- Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| |
Collapse
|
23
|
Adams RH, Castoe TA, DeGiorgio M. PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments. Bioinformatics 2021; 37:1923-1925. [PMID: 33051672 DOI: 10.1093/bioinformatics/btaa884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 09/16/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Here, we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data. AVAILABILITYAND IMPLEMENTATION Available at Github (https://github.com/radamsRHA/PhyloWGA). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Richard H Adams
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
24
|
Unravelling hybridization in Phytophthora using phylogenomics and genome size estimation. IMA Fungus 2021; 12:16. [PMID: 34193315 PMCID: PMC8246709 DOI: 10.1186/s43008-021-00068-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/23/2021] [Indexed: 02/06/2023] Open
Abstract
The genus Phytophthora comprises many economically and ecologically important plant pathogens. Hybrid species have previously been identified in at least six of the 12 phylogenetic clades. These hybrids can potentially infect a wider host range and display enhanced vigour compared to their progenitors. Phytophthora hybrids therefore pose a serious threat to agriculture as well as to natural ecosystems. Early and correct identification of hybrids is therefore essential for adequate plant protection but this is hampered by the limitations of morphological and traditional molecular methods. Identification of hybrids is also important in evolutionary studies as the positioning of hybrids in a phylogenetic tree can lead to suboptimal topologies. To improve the identification of hybrids we have combined genotyping-by-sequencing (GBS) and genome size estimation on a genus-wide collection of 614 Phytophthora isolates. Analyses based on locus- and allele counts and especially on the combination of species-specific loci and genome size estimations allowed us to confirm and characterize 27 previously described hybrid species and discover 16 new hybrid species. Our method was also valuable for species identification at an unprecedented resolution and further allowed correct naming of misidentified isolates. We used both a concatenation- and a coalescent-based phylogenomic method to construct a reliable phylogeny using the GBS data of 140 non-hybrid Phytophthora isolates. Hybrid species were subsequently connected to their progenitors in this phylogenetic tree. In this study we demonstrate the application of two validated techniques (GBS and flow cytometry) for relatively low cost but high resolution identification of hybrids and their phylogenetic relations.
Collapse
|
25
|
Doyle JJ. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Syst Biol 2021; 71:476-489. [PMID: 34191012 DOI: 10.1093/sysbio/syab053] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), i.e., that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense, and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are non-recombining in a historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.
Collapse
Affiliation(s)
- Jeff J Doyle
- Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|
26
|
Nydam ML, Lemmon AR, Cherry JR, Kortyna ML, Clancy DL, Hernandez C, Cohen CS. Phylogenomic and morphological relationships among the botryllid ascidians (Subphylum Tunicata, Class Ascidiacea, Family Styelidae). Sci Rep 2021; 11:8351. [PMID: 33863944 PMCID: PMC8052435 DOI: 10.1038/s41598-021-87255-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/16/2021] [Indexed: 02/02/2023] Open
Abstract
Ascidians (Phylum Chordata, Class Ascidiacea) are a large group of invertebrates which occupy a central role in the ecology of marine benthic communities. Many ascidian species have become successfully introduced around the world via anthropogenic vectors. The botryllid ascidians (Order Stolidobranchia, Family Styelidae) are a group of 53 colonial species, several of which are widespread throughout temperate or tropical and subtropical waters. However, the systematics and biology of this group of ascidians is not well-understood. To provide a systematic framework for this group, we have constructed a well-resolved phylogenomic tree using 200 novel loci and 55 specimens. A Principal Components Analysis of all species described in the literature using 31 taxonomic characteristics revealed that some species occupy a unique morphological space and can be easily identified using characteristics of adult colonies. For other species, additional information such as larval or life history characteristics may be required for taxonomic discrimination. Molecular barcodes are critical for guiding the delineation of morphologically similar species in this group.
Collapse
Affiliation(s)
- Marie L Nydam
- Math and Science Program, Soka University of America, 1 University Drive, Aliso Viejo, CA, 92656, USA.
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, 400 Dirac Science Library, Tallahassee, FL, 32306, USA
| | - Jesse R Cherry
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL, 32306, USA
| | - Michelle L Kortyna
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL, 32306, USA
| | - Darragh L Clancy
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| | - Cecilia Hernandez
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| | - C Sarah Cohen
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| |
Collapse
|
27
|
Arcila D, Hughes LC, Meléndez-Vazquez F, Baldwin CC, White W, Carpenter K, Williams JT, Santos MD, Pogonoski J, Miya M, Ortí G, Betancur-R R. Testing the utility of alternative metrics of branch support to address the ancient evolutionary radiation of tunas, stromateoids, and allies (Teleostei: Pelagiaria). Syst Biol 2021; 70:1123-1144. [PMID: 33783539 DOI: 10.1093/sysbio/syab018] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 03/13/2021] [Indexed: 12/19/2022] Open
Abstract
The use of high-throughput sequencing technologies to produce genome-scale datasets was expected to settle some long-standing controversies across the Tree of Life, particularly in areas where short branches occur at deep timescales. Instead, these datasets have often yielded many well-supported but conflicting topologies, and highly variable gene-tree distributions. A variety of branch-support metrics beyond the nonparametric bootstrap are now available to assess how robust a phylogenetic hypothesis may be, as well as new methods to quantify gene-tree discordance. We applied multiple branch support metrics to an ancient group of marine fishes (Teleostei: Pelagiaria) whose interfamilial relationships have proven difficult to resolve due to a rapid accumulation of lineages very early in its history. We analyzed hundreds of loci including published UCE data and newly generated exonic data along with their flanking regions to represent all 16 extant families for more than 150 out of 284 valid species in the group. Branch support was lower for interfamilial relationships (except the SH-like aLRT and aBayes methods) regardless of the type of marker used. Several nodes that were highly supported with bootstrap had very low site and gene-tree concordance, revealing underlying conflict. Despite this conflict, we were able to identify four consistent interfamilial clades, each comprised of two or three families. Combining exons with their flanking regions also produced increased branch lengths in the deep branches of the pelagiarian tree. Our results demonstrate the limitations of employing current metrics of branch support and species-tree estimation when assessing the confidence of ancient evolutionary radiations and emphasize the necessity to embrace alternative measurements to explore phylogenetic uncertainty and discordance in phylogenomic datasets.
Collapse
Affiliation(s)
- Dahiana Arcila
- Department of Ichthyology, Sam Noble Oklahoma Museum of Natural History, Norman, Oklahoma, U.S.A.,Department of Biology, University of Oklahoma, Norman, Oklahoma, U.S.A
| | - Lily C Hughes
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, U.S.A.,Department of Organismal Biology and Anatomy, The University of Chicago, Illinois, Chicago, U.S.A.,Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | - Fernando Meléndez-Vazquez
- Department of Ichthyology, Sam Noble Oklahoma Museum of Natural History, Norman, Oklahoma, U.S.A.,Department of Biology, University of Oklahoma, Norman, Oklahoma, U.S.A
| | - Carole C Baldwin
- Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | - William White
- CSIRO Australian National Fish Collection, National Research Collections Australia, Hobart, Hobart, Tasmania, Australia
| | - Kent Carpenter
- Department of Biological Sciences, Old Dominion University, Norfolk, Virginia, U.S.A
| | - Jeffrey T Williams
- Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | | | - John Pogonoski
- CSIRO Australian National Fish Collection, National Research Collections Australia, Hobart, Hobart, Tasmania, Australia
| | - Masaki Miya
- Natural History Museum and Institute, Chiba, Aoba-cho, Chuo-ku, Chiba, Japan
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, U.S.A.,Department of Vertebrate Zoology, Smithsonian Institution National Museum of Natural History, Washington, District of Columbia, U.S.A
| | | |
Collapse
|
28
|
Freitas FV, Branstetter MG, Griswold T, Almeida EAB. Partitioned Gene-Tree Analyses and Gene-Based Topology Testing Help Resolve Incongruence in a Phylogenomic Study of Host-Specialist Bees (Apidae: Eucerinae). Mol Biol Evol 2021; 38:1090-1100. [PMID: 33179746 PMCID: PMC7947843 DOI: 10.1093/molbev/msaa277] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Incongruence among phylogenetic results has become a common occurrence in analyses of genome-scale data sets. Incongruence originates from uncertainty in underlying evolutionary processes (e.g., incomplete lineage sorting) and from difficulties in determining the best analytical approaches for each situation. To overcome these difficulties, more studies are needed that identify incongruences and demonstrate practical ways to confidently resolve them. Here, we present results of a phylogenomic study based on the analysis 197 taxa and 2,526 ultraconserved element (UCE) loci. We investigate evolutionary relationships of Eucerinae, a diverse subfamily of apid bees (relatives of honey bees and bumble bees) with >1,200 species. We sampled representatives of all tribes within the group and >80% of genera, including two mysterious South American genera, Chilimalopsis and Teratognatha. Initial analysis of the UCE data revealed two conflicting hypotheses for relationships among tribes. To resolve the incongruence, we tested concatenation and species tree approaches and used a variety of additional strategies including locus filtering, partitioned gene-trees searches, and gene-based topological tests. We show that within-locus partitioning improves gene tree and subsequent species-tree estimation, and that this approach, confidently resolves the incongruence observed in our data set. After exploring our proposed analytical strategy on eucerine bees, we validated its efficacy to resolve hard phylogenetic problems by implementing it on a published UCE data set of Adephaga (Insecta: Coleoptera). Our results provide a robust phylogenetic hypothesis for Eucerinae and demonstrate a practical strategy for resolving incongruence in other phylogenomic data sets.
Collapse
Affiliation(s)
- Felipe V Freitas
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Michael G Branstetter
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Terry Griswold
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
29
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
30
|
Sarver BAJ, Herrera ND, Sneddon D, Hunter SS, Settles ML, Kronenberg Z, Demboski JR, Good JM, Sullivan J. Diversification, Introgression, and Rampant Cytonuclear Discordance in Rocky Mountains Chipmunks (Sciuridae: Tamias). Syst Biol 2021; 70:908-921. [PMID: 33410870 DOI: 10.1093/sysbio/syaa085] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 12/18/2022] Open
Abstract
Evidence from natural systems suggests that hybridization between animal species is more common than traditionally thought, but the overall contribution of introgression to standing genetic variation within species remains unclear for most animal systems. Here, we use targeted exon-capture to sequence thousands of nuclear loci and complete mitochondrial genomes from closely related chipmunk species in the Tamias quadrivittatus group that are distributed across the Great Basin and the central and southern Rocky Mountains of North America. This recent radiation includes six overlapping, ecologically distinct species (T. canipes, T. cinereicollis, T. dorsalis, T. quadrivittatus, T. rufus, and T. umbrinus) that show evidence for widespread introgression across species boundaries. Such evidence has historically been derived from a handful of markers, typically focused on mitochondrial loci, to describe patterns of introgression; consequently, the extent of introgression of nuclear genes is less well characterized. We conducted a series of phylogenomic and species-tree analyses to resolve the phylogeny of six species in this group. In addition, we performed several population genomic analyses to characterize nuclear genomes and infer coancestry among individuals. Furthermore, we used emerging quartets-based approaches to simultaneously infer the species tree (SVDquartets) and identify introgression (HyDe). We found that, in spite of rampant introgression of mitochondrial genomes between some species pairs (and sometimes involving up to three species), there appears to be little to no evidence for nuclear introgression. These findings mirror other genomic results where complete mitochondrial capture has occurred between chipmunk species in the absence of appreciable nuclear gene flow. The underlying causes of recurrent massive cytonuclear discordance remain unresolved in this group but mitochondrial DNA appears highly misleading of population histories as a whole. Collectively, it appears that chipmunk species boundaries are largely impermeable to nuclear gene flow and that hybridization, while pervasive with respect to mtDNA, has likely played a relatively minor role in the evolutionary history of this group.
Collapse
Affiliation(s)
- Brice A J Sarver
- Department of Biological Sciences, University of Idaho, Moscow, Idaho.,Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho
| | | | - David Sneddon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho
| | - Samuel S Hunter
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho.,UC-Davis Genome Center, Davis, California
| | | | | | - John R Demboski
- Department of Zoology, Denver Museum of Nature & Sciences, Denver, Colorado
| | - Jeffrey M Good
- Division of Biological Sciences, University of Montana, Missoula, Montana.,Wildlife Biology Program, University of Montana, Missoula, Montana
| | - Jack Sullivan
- Department of Biological Sciences, University of Idaho, Moscow, Idaho.,Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho
| |
Collapse
|
31
|
Abstract
The phylogeny of Neoaves, the largest clade of extant birds, has remained unclear despite intense study. The difficulty associated with resolving the early branches in Neoaves is likely driven by the rapid radiation of this group. However, conflicts among studies may be exacerbated by the data type analyzed. For example, analyses of coding exons typically yield trees that place Strisores (nightjars and allies) sister to the remaining Neoaves, while analyses of non-coding data typically yield trees where Mirandornites (flamingos and grebes) is the sister of the remaining Neoaves. Our understanding of data type effects is hampered by the fact that previous analyses have used different taxa, loci, and types of non-coding data. Herein, we provide strong corroboration of the data type effects hypothesis for Neoaves by comparing trees based on coding and non-coding data derived from the same taxa and gene regions. A simple analytical method known to minimize biases due to base composition (coding nucleotides as purines and pyrimidines) resulted in coding exon data with increased congruence to the non-coding topology using concatenated analyses. These results improve our understanding of the resolution of neoavian phylogeny and point to a challenge—data type effects—that is likely to be an important factor in phylogenetic analyses of birds (and many other taxonomic groups). Using our results, we provide a summary phylogeny that identifies well-corroborated relationships and highlights specific nodes where future efforts should focus.
Collapse
|
32
|
Lv X, Hu J, Hu Y, Li Y, Xu D, Ryder OA, Irwin DM, Yu L. Diverse phylogenomic datasets uncover a concordant scenario of laurasiatherian interordinal relationships. Mol Phylogenet Evol 2020; 157:107065. [PMID: 33387649 DOI: 10.1016/j.ympev.2020.107065] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 12/22/2020] [Accepted: 12/24/2020] [Indexed: 10/22/2022]
Abstract
Resolving the interordinal relationships in the mammalian superorder Laurasiatheria has been among the most intractable problems in higher-level mammalian systematics, with many conflicting hypotheses having been proposed. The present study collected three different sources of genome-scale data with comprehensive taxon sampling of laurasiatherian species, including two protein-coding datasets (4,186 protein-coding genes for an amino acid dataset comprising 2,761,247 amino acid residues and a nucleotide dataset comprising 5,516,340 nucleotides from 1st and 2nd codon positions), an intronic dataset (1,210 introns comprising 1,162,723 nucleotides) and an ultraconserved elements (UCEs) dataset (1,246 UCEs comprising 1,946,472 nucleotides) from 40 species representing all six laurasiatherian orders and 7 non-laurasiatherian outgroups. Remarkably, phylogenetic trees reconstructed with the four datasets using different tree-building methods (RAxML, FastTree, ASTRAL and MP-EST) all supported the relationship (Eulipotyphla, (Chiroptera, ((Carnivora, Pholidota), (Cetartiodactyla, Perissodactyla)))). We find a resolution of interordinal relationships of Laurasiatheria among all types of markers used in the present study, and the likelihood ratio tests for tree comparisons confirmed that the present tree topology is the optimal hypothesis compared to other examined hypotheses. Jackknifing subsampling analyses demonstrate that the results of laurasiatherian tree reconstruction varied with the number of loci and ordinal representatives used, which are likely the two main contributors to phylogenetic disagreements of Laurasiatheria seen in previous studies. Our study provides significant insight into laurasiatherian evolution, and moreover, an important methodological strategy and reference for resolving phylogenies of adaptive radiation, which have been a long-standing challenge in the field of phylogenetics.
Collapse
Affiliation(s)
- Xue Lv
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China; School of Life Sciences, Yunnan University, Kunming, China
| | - Jingyang Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China; School of Life Sciences, Yunnan University, Kunming, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, China
| | - Yiwen Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China; School of Life Sciences, Yunnan University, Kunming, China
| | - Yitian Li
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China; School of Life Sciences, Yunnan University, Kunming, China
| | - Dongming Xu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Kunming, China
| | - Oliver A Ryder
- Institute for Conservation Research, San Diego Zoo Global, Escondido, CA, USA
| | - David M Irwin
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
| | - Li Yu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China.
| |
Collapse
|
33
|
Bossert S, Murray EA, Pauly A, Chernyshov K, Brady SG, Danforth BN. Gene Tree Estimation Error with Ultraconserved Elements: An Empirical Study on Pseudapis Bees. Syst Biol 2020; 70:803-821. [PMID: 33367855 DOI: 10.1093/sysbio/syaa097] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 11/18/2020] [Accepted: 12/02/2020] [Indexed: 11/12/2022] Open
Abstract
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree, and RAxML. We study their performance in the phylogenomic framework of $>$800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (ModelFinder program). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group. [ASTRAL; Bees; concordance; gene tree estimation error; IQ-Tree; MrBayes, Nomiinae; PhyloBayes; RAxML; phylogenomics; stemminess].
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA.,Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Elizabeth A Murray
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Alain Pauly
- O.D. Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Rue Vautier 29, 1000 Brussels, Belgium
| | - Kyrylo Chernyshov
- College of Arts and Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Seán G Brady
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA
| |
Collapse
|
34
|
Cai L, Xi Z, Lemmon EM, Lemmon AR, Mast A, Buddenhagen CE, Liu L, Davis CC. The Perfect Storm: Gene Tree Estimation Error, Incomplete Lineage Sorting, and Ancient Gene Flow Explain the Most Recalcitrant Ancient Angiosperm Clade, Malpighiales. Syst Biol 2020; 70:491-507. [PMID: 33169797 DOI: 10.1093/sysbio/syaa083] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 10/20/2020] [Accepted: 10/28/2020] [Indexed: 12/20/2022] Open
Abstract
The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes 9 of the top 10 most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0$\%$, 34.8$\%$, and 21.4$\%$ of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution. [Coalescent; concatenation; flanking region; hybrid enrichment, introgression; phylogenomics; rapid radiation, triplet frequency.].
Collapse
Affiliation(s)
- Liming Cai
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Emily Moriarty Lemmon
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | - Austin Mast
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Christopher E Buddenhagen
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
- AgResearch, 10 Bisley Road, Hamilton 3214, New Zealand
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| |
Collapse
|
35
|
Meerow AW, Gardner EM, Nakamura K. Phylogenomics of the Andean Tetraploid Clade of the American Amaryllidaceae (Subfamily Amaryllidoideae): Unlocking a Polyploid Generic Radiation Abetted by Continental Geodynamics. FRONTIERS IN PLANT SCIENCE 2020; 11:582422. [PMID: 33250911 PMCID: PMC7674842 DOI: 10.3389/fpls.2020.582422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 10/12/2020] [Indexed: 05/27/2023]
Abstract
One of the two major clades of the endemic American Amaryllidaceae subfam. Amaryllidoideae constitutes the tetraploid-derived (n = 23) Andean-centered tribes, most of which have 46 chromosomes. Despite progress in resolving phylogenetic relationships of the group with plastid and nrDNA, certain subclades were poorly resolved or weakly supported in those previous studies. Sequence capture using anchored hybrid enrichment was employed across 95 species of the clade along with five outgroups and generated sequences of 524 nuclear genes and a partial plastome. Maximum likelihood phylogenetic analyses were conducted on concatenated supermatrices, and coalescent-based species tree analyses were run on the gene trees, followed by hybridization network, age diversification and biogeographic analyses. The four tribes Clinantheae, Eucharideae, Eustephieae, and Hymenocallideae (sister to Clinantheae) are resolved in all analyses with > 90 and mostly 100% support, as are almost all genera within them. Nuclear gene supermatrix and species tree results were largely in concordance; however, some instances of cytonuclear discordance were evident. Hybridization network analysis identified significant reticulation in Clinanthus, Hymenocallis, Stenomesson and the subclade of Eucharideae comprising Eucharis, Caliphruria, and Urceolina. Our data support a previous treatment of the latter as a single genus, Urceolina, with the addition of Eucrosia dodsonii. Biogeographic analysis and penalized likelihood age estimation suggests an origin in the Cauca, Desert and Puna Neotropical bioprovinces for the complex in the mid-Oligocene, with more dispersals than vicariances in its history, but no extinctions. Hymenocallis represents the only instance of long-distance vicariance from the tropical Andean origin of its tribe Hymenocallideae. The absence of extinctions correlates with the lack of diversification rate shifts within the clade. The Eucharideae experienced a sudden lineage radiation ca. 10 Mya. We tie much of the divergences in the Andean-centered lineages to the rise of the Andes, and suggest that the Amotape-Huancabamba Zone functioned as both a corridor (dispersal) and a barrier to migration (vicariance). Several taxonomic changes are made. This is the largest DNA sequence data set to be applied within Amaryllidaceae to date.
Collapse
Affiliation(s)
- Alan W. Meerow
- USDA-ARS-SHRS, National Clonal Germplasm Repository, Miami, FL, United States
| | - Elliot M. Gardner
- Singapore Botanic Gardens, National Parks Board, Singapore, Singapore
- Institute of Environment, Florida International University, Miami, FL, United States
| | - Kyoko Nakamura
- USDA-ARS-SHRS, National Clonal Germplasm Repository, Miami, FL, United States
| |
Collapse
|
36
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Das I, Brown RM. Gene flow creates a mirage of cryptic species in a Southeast Asian spotted stream frog complex. Mol Ecol 2020; 29:3970-3987. [PMID: 32808335 DOI: 10.1111/mec.15603] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 02/06/2023]
Abstract
Most new cryptic species are described using conventional tree- and distance-based species delimitation methods (SDMs), which rely on phylogenetic arrangements and measures of genetic divergence. However, although numerous factors such as population structure and gene flow are known to confound phylogenetic inference and species delimitation, the influence of these processes is not frequently evaluated. Using large numbers of exons, introns, and ultraconserved elements obtained using the FrogCap sequence-capture protocol, we compared conventional SDMs with more robust genomic analyses that assess population structure and gene flow to characterize species boundaries in a Southeast Asian frog complex (Pulchrana picturata). Our results showed that gene flow and introgression can produce phylogenetic patterns and levels of divergence that resemble distinct species (up to 10% divergence in mitochondrial DNA). Hybrid populations were inferred as independent (singleton) clades that were highly divergent from adjacent populations (7%-10%) and unusually similar (<3%) to allopatric populations. Such anomalous patterns are not uncommon in Southeast Asian amphibians, which brings into question whether the high levels of cryptic diversity observed in other amphibian groups reflect distinct cryptic species-or, instead, highly admixed and structured metapopulation lineages. Our results also provide an alternative explanation to the conundrum of divergent (sometimes nonsister) sympatric lineages-a pattern that has been celebrated as indicative of true cryptic speciation. Based on these findings, we recommend that species delimitation of continuously distributed "cryptic" groups should not rely solely on conventional SDMs, but should necessarily examine population structure and gene flow to avoid taxonomic inflation.
Collapse
Affiliation(s)
- Kin O Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, Singapore
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Perry L Wood
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL, USA
| | - L L Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, Riverside, CA, USA
| | - Indraneil Das
- Institute of Biodiversity and Environmental Conservation, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
37
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
38
|
Perea S, Sousa‐Santos C, Robalo J, Doadrio I. Multilocus phylogeny and systematics of Iberian endemicSqualius(Actinopterygii, Leuciscidae). ZOOL SCR 2020. [DOI: 10.1111/zsc.12420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Silvia Perea
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| | - Carla Sousa‐Santos
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Joana Robalo
- MARE – Marine and Environmental Sciences Centre ISPA‐Instituto Universitário Lisbon Portugal
| | - Ignacio Doadrio
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales - CSIC Madrid Spain
| |
Collapse
|
39
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
40
|
Bagley JC, Uribe-Convers S, Carlsen MM, Muchhala N. Utility of targeted sequence capture for phylogenomics in rapid, recent angiosperm radiations: Neotropical Burmeistera bellflowers as a case study. Mol Phylogenet Evol 2020; 152:106769. [PMID: 32081762 DOI: 10.1016/j.ympev.2020.106769] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/10/2020] [Accepted: 02/12/2020] [Indexed: 02/06/2023]
Abstract
Targeted sequence capture is a promising approach for large-scale phylogenomics. However, rapid evolutionary radiations pose significant challenges for phylogenetic inference (e.g. incomplete lineages sorting (ILS), phylogenetic noise), and the ability of targeted nuclear loci to resolve species trees despite such issues remains poorly studied. We test the utility of targeted sequence capture for inferring phylogenetic relationships in rapid, recent angiosperm radiations, focusing on Burmeistera bellflowers (Campanulaceae), which diversified into ~130 species over less than 3 million years. We compared phylogenies estimated from supercontig (exons plus flanking sequences), exon-only, and flanking-only datasets with 506-546 loci (~4.7 million bases) for 46 Burmeistera species/lineages and 10 outgroup taxa. Nuclear loci resolved backbone nodes and many congruent internal relationships with high support in concatenation and coalescent-based species tree analyses, and inferences were largely robust to effects of missing taxa and base composition biases. Nevertheless, species trees were incongruent between datasets, and gene trees exhibited remarkably high levels of conflict (~4-60% congruence, ~40-99% conflict) not simply driven by poor gene tree resolution. Higher gene tree heterogeneity at shorter branches suggests an important role of ILS, as expected for rapid radiations. Phylogenetic informativeness analyses also suggest this incongruence has resulted from low resolving power at short internal branches, consistent with ILS, and homoplasy at deeper nodes, with exons exhibiting much greater risk of incorrect topologies due to homoplasy than other datasets. Our findings suggest that targeted sequence capture is feasible for resolving rapid, recent angiosperm radiations, and that results based on supercontig alignments containing nuclear exons and flanking sequences have higher phylogenetic utility and accuracy than either alone. We use our results to make practical recommendations for future target capture-based studies of Burmeistera and other rapid angiosperm radiations, including that such studies should analyze supercontigs to maximize the phylogenetic information content of loci.
Collapse
Affiliation(s)
- Justin C Bagley
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA; Department of Biology, Virginia Commonwealth University, Richmond, VA 23284, USA.
| | - Simon Uribe-Convers
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Mónica M Carlsen
- Research Department, Science and Conservation Division, Missouri Botanical Garden, St. Louis, MO 63110, USA
| | - Nathan Muchhala
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| |
Collapse
|
41
|
Granados Mendoza C, Jost M, Hágsater E, Magallón S, van den Berg C, Lemmon EM, Lemmon AR, Salazar GA, Wanke S. Target Nuclear and Off-Target Plastid Hybrid Enrichment Data Inform a Range of Evolutionary Depths in the Orchid Genus Epidendrum. FRONTIERS IN PLANT SCIENCE 2020; 10:1761. [PMID: 32063915 PMCID: PMC7000662 DOI: 10.3389/fpls.2019.01761] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 12/16/2019] [Indexed: 05/12/2023]
Abstract
Universal angiosperm enrichment probe sets designed to enrich hundreds of putatively orthologous nuclear single-copy loci are increasingly being applied to infer phylogenetic relationships of different lineages of angiosperms at a range of evolutionary depths. Studies applying such probe sets have focused on testing the universality and performance of the target nuclear loci, but they have not taken advantage of off-target data from other genome compartments generated alongside the nuclear loci. Here we do so to infer phylogenetic relationships in the orchid genus Epidendrum and closely related genera of subtribe Laeliinae. Our aims are to: 1) test the technical viability of applying the plant anchored hybrid enrichment (AHE) method (Angiosperm v.1 probe kit) to our focal group, 2) mine plastid protein coding genes from off-target reads; and 3) evaluate the performance of the target nuclear and off-target plastid loci in resolving and supporting phylogenetic relationships along a range of taxonomical depths. Phylogenetic relationships were inferred from the nuclear data set through coalescent summary and site-based methods, whereas plastid loci were analyzed in a concatenated partitioned matrix under maximum likelihood. The usefulness of target and flanking non-target nuclear regions and plastid loci was assessed through the estimation of their phylogenetic informativeness. Our study successfully applied the plant AHE probe kit to Epidendrum, supporting the universality of this kit in angiosperms. Moreover, it demonstrated the feasibility of mining plastome loci from off-target reads generated with the Angiosperm v.1 probe kit to obtain additional, uniparentally inherited sequence data at no extra sequencing cost. Our analyses detected some strongly supported incongruences between nuclear and plastid data sets at shallow divergences, an indication of potential lineage sorting, hybridization, or introgression events in the group. Lastly, we found that the per site phylogenetic informativeness of the ycf1 plastid gene surpasses that of all other plastid genes and several nuclear loci, making it an excellent candidate for assessing phylogenetic relationships at medium to low taxonomic levels in orchids.
Collapse
Affiliation(s)
- Carolina Granados Mendoza
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Matthias Jost
- Institut für Botanik, Technische Universität Dresden, Dresden, Germany
| | - Eric Hágsater
- Herbario AMO, Instituto Chinoin, A.C., Mexico City, Mexico
| | - Susana Magallón
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Cássio van den Berg
- Departamento de Ciências Biológicas, Universidade Estadual de Feira de Santana, Feira de Santana, Brazil
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, Tallahassee, FL, United States
| | - Alan R. Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL, United States
| | - Gerardo A. Salazar
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Stefan Wanke
- Institut für Botanik, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
42
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
43
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|
44
|
Cloutier A, Sackton TB, Grayson P, Clamp M, Baker AJ, Edwards SV. Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone. Syst Biol 2019; 68:937-955. [PMID: 31135914 PMCID: PMC6857515 DOI: 10.1093/sysbio/syz019] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 03/06/2019] [Accepted: 04/09/2019] [Indexed: 01/17/2023] Open
Abstract
Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.
Collapse
Affiliation(s)
- Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Timothy B Sackton
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Michele Clamp
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Allan J Baker
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario M5S 3B2, Canada
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario M5S 2C6, Canada
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
45
|
Roycroft EJ, Moussalli A, Rowe KC. Phylogenomics Uncovers Confidence and Conflict in the Rapid Radiation of Australo-Papuan Rodents. Syst Biol 2019; 69:431-444. [DOI: 10.1093/sysbio/syz044] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 06/12/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
The estimation of robust and accurate measures of branch support has proven challenging in the era of phylogenomics. In data sets of potentially millions of sites, bootstrap support for bifurcating relationships around very short internal branches can be inappropriately inflated. Such overestimation of branch support may be particularly problematic in rapid radiations, where phylogenetic signal is low and incomplete lineage sorting severe. Here, we explore this issue by comparing various branch support estimates under both concatenated and coalescent frameworks, in the recent radiation Australo-Papuan murine rodents (Muridae: Hydromyini). Using nucleotide sequence data from 1245 independent loci and several phylogenomic inference methods, we unequivocally resolve the majority of genus-level relationships within Hydromyini. However, at four nodes we recover inconsistency in branch support estimates both within and among concatenated and coalescent approaches. In most cases, concatenated likelihood approaches using standard fast bootstrap algorithms did not detect any uncertainty at these four nodes, regardless of partitioning strategy. However, we found this could be overcome with two-stage resampling, that is, across genes and sites within genes (using -bsam GENESITE in IQ-TREE). In addition, low confidence at recalcitrant nodes was recovered using UFBoot2, a recent revision to the bootstrap protocol in IQ-TREE, but this depended on partitioning strategy. Summary coalescent approaches also failed to detect uncertainty under some circumstances. For each of four recalcitrant nodes, an equivalent (or close to equivalent) number of genes were in strong support ($>$ 75% bootstrap) of both the primary and at least one alternative topological hypothesis, suggesting notable phylogenetic conflict among loci not detected using some standard branch support metrics. Recent debate has focused on the appropriateness of concatenated versus multigenealogical approaches to resolving species relationships, but less so on accurately estimating uncertainty in large data sets. Our results demonstrate the importance of employing multiple approaches when assessing confidence and highlight the need for greater attention to the development of robust measures of uncertainty in the era of phylogenomics.
Collapse
Affiliation(s)
- Emily J Roycroft
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Adnan Moussalli
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Kevin C Rowe
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| |
Collapse
|
46
|
Mendes FK, Livera AP, Hahn MW. The perils of intralocus recombination for inferences of molecular convergence. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180244. [PMID: 31154973 DOI: 10.1098/rstb.2018.0244] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Accurate inferences of convergence require that the appropriate tree topology be used. If there is a mismatch between the tree a trait has evolved along and the tree used for analysis, then false inferences of convergence ('hemiplasy') can occur. To avoid problems of hemiplasy when there are high levels of gene tree discordance with the species tree, researchers have begun to construct tree topologies from individual loci. However, due to intralocus recombination, even locus-specific trees may contain multiple topologies within them. This implies that the use of individual tree topologies discordant with the species tree can still lead to incorrect inferences about molecular convergence. Here, we examine the frequency with which single exons and single protein-coding genes contain multiple underlying tree topologies, in primates and Drosophila, and quantify the effects of hemiplasy when using trees inferred from individual loci. In both clades, we find that there are most often multiple diagnosable topologies within single exons and whole genes, with 91% of Drosophila protein-coding genes containing multiple topologies. Because of this underlying topological heterogeneity, even using trees inferred from individual protein-coding genes results in 25% and 38% of substitutions falsely labelled as convergent in primates and Drosophila, respectively. While constructing local trees can reduce the problem of hemiplasy, our results suggest that it will be difficult to completely avoid false inferences of convergence. We conclude by suggesting several ways forward in the analysis of convergent evolution, for both molecular and morphological characters. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Fábio K Mendes
- 1 Department of Computer Science, The University of Auckland , Auckland 1010 , New Zealand.,2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA
| | - Andrew P Livera
- 2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA
| | - Matthew W Hahn
- 2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA.,3 Department of Computer Science, Indiana University , Bloomington, IN 47405 , USA
| |
Collapse
|
47
|
Statistical binning leads to profound model violation due to gene tree error incurred by trying to avoid gene tree error. Mol Phylogenet Evol 2019; 134:164-171. [DOI: 10.1016/j.ympev.2019.02.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 11/30/2018] [Accepted: 02/14/2019] [Indexed: 11/19/2022]
|
48
|
Roch S, Nute M, Warnow T. Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods. Syst Biol 2019; 68:281-297. [PMID: 30247732 DOI: 10.1093/sysbio/syy061] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 09/12/2018] [Indexed: 11/13/2022] Open
Abstract
With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus data sets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining estimated gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, WI 53706, USA
| | - Michael Nute
- Department of Statistics, The University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801-2302, USA
| |
Collapse
|
49
|
Stevens L, Félix M, Beltran T, Braendle C, Caurcel C, Fausett S, Fitch D, Frézal L, Gosse C, Kaur T, Kiontke K, Newton MD, Noble LM, Richaud A, Rockman MV, Sudhaus W, Blaxter M. Comparative genomics of 10 new Caenorhabditis species. Evol Lett 2019; 3:217-236. [PMID: 31007946 PMCID: PMC6457397 DOI: 10.1002/evl3.110] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 02/08/2019] [Accepted: 02/25/2019] [Indexed: 01/29/2023] Open
Abstract
The nematode Caenorhabditis elegans has been central to the understanding of metazoan biology. However, C. elegans is but one species among millions and the significance of this important model organism will only be fully revealed if it is placed in a rich evolutionary context. Global sampling efforts have led to the discovery of over 50 putative species from the genus Caenorhabditis, many of which await formal species description. Here, we present species descriptions for 10 new Caenorhabditis species. We also present draft genome sequences for nine of these new species, along with a transcriptome assembly for one. We exploit these whole-genome data to reconstruct the Caenorhabditis phylogeny and use this phylogenetic tree to dissect the evolution of morphology in the genus. We reveal extensive variation in genome size and investigate the molecular processes that underlie this variation. We show unexpected complexity in the evolutionary history of key developmental pathway genes. These new species and the associated genomic resources will be essential in our attempts to understand the evolutionary origins of the C. elegans model.
Collapse
Affiliation(s)
- Lewis Stevens
- Institute of Evolutionary Biology, Ashworth Laboratories, School of Biological SciencesUniversity of EdinburghEdinburghEH9 3JTUnited Kingdom
| | - Marie‐Anne Félix
- Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, École Normale SupérieureParis Sciences et Lettres75005ParisFrance
| | - Toni Beltran
- MRC London Institute of Medical SciencesLondonW12 0NNUnited Kingdom
| | - Christian Braendle
- Université Côte d'Azur, Centre National de la Recherche Scientifique, InsermInstitute of Biology Valrose06108NiceFrance
| | - Carlos Caurcel
- Institute of Evolutionary Biology, Ashworth Laboratories, School of Biological SciencesUniversity of EdinburghEdinburghEH9 3JTUnited Kingdom
| | - Sarah Fausett
- Université Côte d'Azur, Centre National de la Recherche Scientifique, InsermInstitute of Biology Valrose06108NiceFrance
| | - David Fitch
- Department of BiologyNew York UniversityNew YorkNew York10003
| | - Lise Frézal
- Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, École Normale SupérieureParis Sciences et Lettres75005ParisFrance
| | - Charlie Gosse
- Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, École Normale SupérieureParis Sciences et Lettres75005ParisFrance
| | - Taniya Kaur
- Center for Genomics and Systems Biology, Department of BiologyNew York UniversityNew YorkNew York10003
| | - Karin Kiontke
- Department of BiologyNew York UniversityNew YorkNew York10003
| | - Matthew D. Newton
- MRC London Institute of Medical SciencesLondonW12 0NNUnited Kingdom
- Molecular Virology, Department of MedicineImperial College LondonDu Cane RoadLondonW12 0NNUnited Kingdom
| | - Luke M. Noble
- Center for Genomics and Systems Biology, Department of BiologyNew York UniversityNew YorkNew York10003
| | - Aurélien Richaud
- Institut de Biologie de l'Ecole Normale Supérieure, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, École Normale SupérieureParis Sciences et Lettres75005ParisFrance
| | - Matthew V. Rockman
- Center for Genomics and Systems Biology, Department of BiologyNew York UniversityNew YorkNew York10003
| | - Walter Sudhaus
- Institut für Biologie/ZoologieFreie Universität BerlinBerlinD‐14195Germany
| | - Mark Blaxter
- Institute of Evolutionary Biology, Ashworth Laboratories, School of Biological SciencesUniversity of EdinburghEdinburghEH9 3JTUnited Kingdom
| |
Collapse
|
50
|
Simmons MP, Sloan DB, Springer MS, Gatesy J. Gene-wise resampling outperforms site-wise resampling in phylogenetic coalescence analyses. Mol Phylogenet Evol 2019; 131:80-92. [DOI: 10.1016/j.ympev.2018.10.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 10/01/2018] [Indexed: 01/15/2023]
|