1
|
Cho A, Lax G, Keeling PJ. Phylogenomic analyses of ochrophytes (stramenopiles) with an emphasis on neglected lineages. Mol Phylogenet Evol 2024; 198:108120. [PMID: 38852907 DOI: 10.1016/j.ympev.2024.108120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 05/13/2024] [Accepted: 06/04/2024] [Indexed: 06/11/2024]
Abstract
Ochrophyta is a photosynthetic lineage that crowns the phylogenetic tree of stramenopiles, one of the major eukaryotic supergroups. Due to their ecological impact as a major primary producer, ochrophytes are relatively well-studied compared to the rest of the stramenopiles, yet their evolutionary relationships remain poorly understood. This is in part due to a number of missing lineages in large-scale multigene analyses, and an apparently rapid radiation leading to many short internodes between ochrophyte subgroups in the tree. These short internodes are also found across deep-branching lineages of stramenopiles with limited phylogenetic signal, leaving many relationships controversial overall. We have addressed this issue with other deep-branching stramenopiles recently, and now examine whether contentious relationships within the ochrophytes may be resolved with the help of filling in missing lineages in an updated phylogenomic dataset of ochrophytes, along with exploring various gene filtering criteria to identify the most phylogenetically informative genes. We generated ten new transcriptomes from various culture collections and a single-cell isolation from an environmental sample, added these to an existing phylogenomic dataset, and examined the effects of selecting genes with high phylogenetic signal or low phylogenetic noise. For some previously contentious relationships, we find a variety of analyses and gene filtering criteria consistently unite previously unstable groupings with strong statistical support. For example, we recovered a robust grouping of Eustigmatophyceae with Raphidophyceae-Phaeophyceae-Xanthophyceae while Olisthodiscophyceae formed a sister-lineage to Pinguiophyceae. Selecting genes with high phylogenetic signal or data quality recovered more stable topologies. Overall, we find that adding under-represented groups across different lineages is still crucial in resolving phylogenetic relationships, and discrete gene properties affect lineages of stramenopiles differently. This is something which may be explored to further our understanding of the molecular evolution of stramenopiles.
Collapse
Affiliation(s)
- Anna Cho
- Department of Botany, University of British Columbia, Vancouver V6T 1Z4, British Columbia, Canada.
| | - Gordon Lax
- Department of Botany, University of British Columbia, Vancouver V6T 1Z4, British Columbia, Canada
| | - Patrick J Keeling
- Department of Botany, University of British Columbia, Vancouver V6T 1Z4, British Columbia, Canada
| |
Collapse
|
2
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
3
|
Fleming J, Eriksen PM, Struck TH. Scoutknife: A naïve, whole genome informed phylogenetic robusticity metric. F1000Res 2024; 12:945. [PMID: 38799242 PMCID: PMC11128044 DOI: 10.12688/f1000research.139356.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/05/2024] [Indexed: 05/29/2024] Open
Abstract
Background: The phylogenetic bootstrap, first proposed by Felsenstein in 1985, is a critically important statistical method in assessing the robusticity of phylogenetic datasets. Core to its concept was the use of pseudo sampling - assessing the data by generating new replicates derived from the initial dataset that was used to generate the phylogeny. In this way, phylogenetic support metrics could overcome the lack of perfect, infinite data. With infinite data, however, it is possible to sample smaller replicates directly from the data to obtain both the phylogeny and its statistical robusticity in the same analysis. Due to the growth of whole genome sequencing, the depth and breadth of our datasets have greatly expanded and are set to only expand further. With genome-scale datasets comprising thousands of genes, we can now obtain a proxy for infinite data. Accordingly, we can potentially abandon the notion of pseudo sampling and instead randomly sample small subsets of genes from the thousands of genes in our analyses. Methods: We introduce Scoutknife, a jackknife-style subsampling implementation that generates 100 datasets by randomly sampling a small number of genes from an initial large-gene dataset to jointly establish both a phylogenetic hypothesis and assess its robusticity. We assess its effectiveness by using 18 previously published datasets and 100 simulation studies. Results: We show that Scoutknife is conservative and informative as to conflicts and incongruence across the whole genome, without the need for subsampling based on traditional model selection criteria. Conclusions: Scoutknife reliably achieves comparable results to selecting the best genes on both real and simulation datasets, while being resistant to the potential biases caused by selecting for model fit. As the amount of genome data grows, it becomes an even more exciting option to assess the robusticity of phylogenetic hypotheses.
Collapse
Affiliation(s)
- James Fleming
- Natural History Museum, Universitetet i Oslo, Oslo, Oslo, 0562, Norway
| | | | | |
Collapse
|
4
|
Sharma S, Kumar S. Discovering Fragile Clades and Causal Sequences in Phylogenomics by Evolutionary Sparse Learning. Mol Biol Evol 2024; 41:msae131. [PMID: 38916040 PMCID: PMC11247346 DOI: 10.1093/molbev/msae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/30/2024] [Accepted: 06/20/2024] [Indexed: 06/26/2024] Open
Abstract
Phylogenomic analyses of long sequences, consisting of many genes and genomic segments, reconstruct organismal relationships with high statistical confidence. But, inferred relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species. Here, we introduce novel metrics for gene-species sequence concordance and clade probability derived from evolutionary sparse learning models. We validated these metrics using fungi, plant, and animal phylogenomic datasets, highlighting the ability of the new metrics to pinpoint fragile clades and the sequences responsible. The new approach does not necessitate the investigation of alternative phylogenetic hypotheses, substitution models, or repeated data subset analyses. Our methodology offers a streamlined approach to evaluating major inferred clades and identifying sequences that may distort reconstructed phylogenies using large datasets.
Collapse
Affiliation(s)
- Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
5
|
Gainett G, Klementz BC, Setton EVW, Simian C, Iuri HA, Edgecombe GD, Peretti AV, Sharma PP. A plurality of morphological characters need not equate with phylogenetic accuracy: A rare genomic change refutes the placement of Solifugae and Pseudoscorpiones in Haplocnemata. Evol Dev 2024; 26:e12467. [PMID: 38124251 DOI: 10.1111/ede.12467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/28/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023]
Abstract
Recent advances in higher-level invertebrate phylogeny have leveraged shared features of genomic architecture to resolve contentious nodes across the tree of life. Yet, the interordinal relationships within Chelicerata have remained recalcitrant given competing topologies in recent molecular analyses. As such, relationships between topologically unstable orders remain supported primarily by morphological cladistic analyses. Solifugae, one such unstable chelicerate order, has long been thought to be the sister group of Pseudoscorpiones, forming the clade Haplocnemata, on the basis of eight putative morphological synapomorphies. The discovery, however, of a shared whole genome duplication placing Pseudoscorpiones in Arachnopulmonata provides the opportunity for a simple litmus test evaluating the validity of Haplocnemata. Here, we present the first developmental transcriptome of a solifuge (Titanopuga salinarum) and survey copy numbers of the homeobox genes for evidence of systemic duplication. We find that over 70% of the identified homeobox genes in T. salinarum are retained in a single copy, while representatives of the arachnopulmonates retain orthologs of those genes as two or more copies. Our results refute the placement of Solifugae in Haplocnemata. Subsequent reevaluation of putative interordinal morphological synapomorphies among chelicerates reveals a high incidence of homoplasy, reversals, and inaccurate coding within Haplocnemata and other small clades, as well as Arachnida more broadly, suggesting existing morphological character matrices are insufficient to resolve chelicerate phylogeny.
Collapse
Affiliation(s)
- Guilherme Gainett
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Benjamin C Klementz
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Emily V W Setton
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Catalina Simian
- Departamento de Diversidad Biológica y Ecología, Facultad de Ciencias Exactas, Físicas y Naturales, Universidad Nacional de Córdoba, Córdoba, Argentina
- Laboratorio de Biología Reproductiva y Evolución, Consejo Nacional de Investigaciones Científicas Técnicas (CONICET), Instituto de Diversidad y Ecología Animal (IDEA), Córdoba, Argentina
| | - Hernán A Iuri
- División de Aracnología, Museo Argentino de Ciencias Naturales "Bernardino Rivadavia", Buenos Aires, Argentina
| | - Gregory D Edgecombe
- Department of Earth Sciences, Division ES Invertebrates and Plants Palaeobiology, The Natural History Museum, London, UK
| | - Alfredo V Peretti
- Departamento de Diversidad Biológica y Ecología, Facultad de Ciencias Exactas, Físicas y Naturales, Universidad Nacional de Córdoba, Córdoba, Argentina
- Laboratorio de Biología Reproductiva y Evolución, Consejo Nacional de Investigaciones Científicas Técnicas (CONICET), Instituto de Diversidad y Ecología Animal (IDEA), Córdoba, Argentina
| | - Prashant P Sharma
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
6
|
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, Feng S, Drovetski SV, Liu G, Zhang D, Saitoh T, Alström P, Edwards SV, Lei F, Qu Y. Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae). BMC Biol 2024; 22:49. [PMID: 38413944 PMCID: PMC10900574 DOI: 10.1186/s12915-024-01848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Resolving the phylogeny of rapidly radiating lineages presents a challenge when building the Tree of Life. An Old World avian family Prunellidae (Accentors) comprises twelve species that rapidly diversified at the Pliocene-Pleistocene boundary. RESULTS Here we investigate the phylogenetic relationships of all species of Prunellidae using a chromosome-level de novo assembly of Prunella strophiata and 36 high-coverage resequenced genomes. We use homologous alignments of thousands of exonic and intronic loci to build the coalescent and concatenated phylogenies and recover four different species trees. Topology tests show a large degree of gene tree-species tree discordance but only 40-54% of intronic gene trees and 36-75% of exonic genic trees can be explained by incomplete lineage sorting and gene tree estimation errors. Estimated branch lengths for three successive internal branches in the inferred species trees suggest the existence of an empirical anomaly zone. The most common topology recovered for species in this anomaly zone was not similar to any coalescent or concatenated inference phylogenies, suggesting presence of anomalous gene trees. However, this interpretation is complicated by the presence of gene flow because extensive introgression was detected among these species. When exploring tree topology distributions, introgression, and regional variation in recombination rate, we find that many autosomal regions contain signatures of introgression and thus may mislead phylogenetic inference. Conversely, the phylogenetic signal is concentrated to regions with low-recombination rate, such as the Z chromosome, which are also more resistant to interspecific introgression. CONCLUSIONS Collectively, our results suggest that phylogenomic inference should consider the underlying genomic architecture to maximize the consistency of phylogenomic signal.
Collapse
Affiliation(s)
- Zhiyong Jiang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Wenqing Zang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Per G P Ericson
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden
| | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shaoyuan Wu
- Jiangsu International Joint Center of Genomics, Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, Jiangsu, China
| | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Liangzhu Laboratory, Zhejiang University, 1369 West Wenyi Road, Hangzhou, 311121, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, 314102, China
| | - Sergei V Drovetski
- National Museum of Natural History, Smithsonian Institution, Washington, DC, 20004, USA
- Present address: U.S. Geological Survey, Eastern Ecological Science Center at Patuxent Research Refuge, Laurel, MD, 20708, USA
| | - Gang Liu
- Chinese Academy of Forestry, Institute of Ecological Conservation and Restoration, Beijing, 100091, China
| | - Dezhi Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Takema Saitoh
- Yamashina Institute for Ornithology, Abiko, Chiba, Japan
| | - Per Alström
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, 752 36, Uppsala, Sweden
| | - Scott V Edwards
- Museum of Comparative Zoology and Department of Organismic & Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 02138, USA
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yanhua Qu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden.
| |
Collapse
|
7
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
8
|
McLay TGB, Fowler RM, Fahey PS, Murphy DJ, Udovicic F, Cantrill DJ, Bayly MJ. Phylogenomics reveals extreme gene tree discordance in a lineage of dominant trees: hybridization, introgression, and incomplete lineage sorting blur deep evolutionary relationships despite clear species groupings in Eucalyptus subgenus Eudesmia. Mol Phylogenet Evol 2023; 187:107869. [PMID: 37423562 DOI: 10.1016/j.ympev.2023.107869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 07/11/2023]
Abstract
Eucalypts are a large and ecologically important group of plants on the Australian continent, and understanding their evolution is important in understanding evolution of the unique Australian flora. Previous phylogenies using plastome DNA, nuclear-ribosomal DNA, or random genome-wide SNPs, have been confounded by limited genetic sampling or by idiosyncratic biological features of the eucalypts, including widespread plastome introgression. Here we present phylogenetic analyses of Eucalyptus subgenus Eudesmia (22 species from western, northern, central and eastern Australia), in the first study to apply a target-capture sequencing approach using custom, eucalypt-specific baits (of 568 genes) to a lineage of Eucalyptus. Multiple accessions of all species were included, and target-capture data were supplemented by separate analyses of plastome genes (average of 63 genes per sample). Analyses revealed a complex evolutionary history likely shaped by incomplete lineage sorting and hybridization. Gene tree discordance generally increased with phylogenetic depth. Species, or groups of species, toward the tips of the tree are mostly supported, and three major clades are identified, but the branching order of these clades cannot be confirmed with confidence. Multiple approaches to filtering the nuclear dataset, by removing genes or samples, could not reduce gene tree conflict or resolve these relationships. Despite inherent complexities in eucalypt evolution, the custom bait kit devised for this research will be a powerful tool for investigating the evolutionary history of eucalypts more broadly.
Collapse
Affiliation(s)
- Todd G B McLay
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia.
| | - Rachael M Fowler
- School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Patrick S Fahey
- Research Centre for Ecosystem Resilience, The Royal Botanic Garden Sydney, Sydney 2000, NSW, Australia; Qld Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia 4072, Qld, Australia
| | - Daniel J Murphy
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Frank Udovicic
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia
| | - David J Cantrill
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Michael J Bayly
- School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| |
Collapse
|
9
|
Gorring PS, Farrell BD. Evaluating species boundaries using coalescent delimitation in pine-killing Monochamus (Coleoptera: Cerambycidae) sawyer beetles. Mol Phylogenet Evol 2023; 184:107777. [PMID: 36990304 DOI: 10.1016/j.ympev.2023.107777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 02/18/2023] [Accepted: 03/24/2023] [Indexed: 03/30/2023]
Abstract
Plant-feeding beetle species are diverse and often individually highly variable. Accurate classifications can be difficult to establish yet are essential for study of evolutionary patterns and processes. Molecular data are key to further characterizing morphologically difficult groups and defining genus and species boundaries. Monochamus Dejean species are ecologically and economically significant, and in coniferous forests they vector the nematode that causes Pine Wilt Disease. This study uses nuclear and mitochondrial genes to test the monophyly and relationships of Monochamus and applies coalescent methods to further delimit the conifer-feeding species. Monochamus has also included approximately 120 Old World species associated with diverse angiosperm tree species. We sample from these additional morphologically diverse species to determine their placement in the Lamiini. Through supermatrix and coalescent methods, the higher-level relationships of Monochamus show that conifer-feeders are a monophyletic group that includes the type species and has split into Nearctic and Palearctic clades. Molecular dating indicates a single dispersal of conifer-feeders to North America over the second Bering Land Bridge circa 5.3 Ma. All other Monochamus sampled fall in different parts of the Lamiini tree. Small-bodied angiosperm-feeding Monochamus group with the monotypic genus Microgoes Casey. The African Monochamus subgenera sampled are distantly related to the conifer-feeding clade. The multispecies coalescent delimitation methods BPP and STACEY delimit 17 conifer-feeding Monochamus species for a total of 18 species, and supports the retention of all current species. An interrogation with nuclear gene allele phasing reveals that unphased data can be unreliable for accurate delimitations and divergence times. The delimited species are discussed with integrative evidence, highlighting real-world challenges in recognizing the completion of speciation trajectories.
Collapse
Affiliation(s)
- Patrick S Gorring
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St. Cambridge, MA, USA.
| | - Brian D Farrell
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St. Cambridge, MA, USA
| |
Collapse
|
10
|
Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O'Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol 2022; 32:5180-5188.e3. [PMID: 36356574 DOI: 10.1016/j.cub.2022.10.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/09/2022] [Accepted: 10/18/2022] [Indexed: 11/10/2022]
Abstract
Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum1,2,3,4,5,6 or within Deuterostomia, sister to Ambulacraria.7,8,9,10,11 Although their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. Recent studies have questioned whether Deuterostomia should be considered monophyletic at all.10,12,13 Hidden paralogy and poor phylogenetic signal present a major challenge for reconstructing species phylogenies.14,15,16,17,18 Here, we assess whether these issues have contributed to the conflict over the placement of Xenacoelomorpha. We reanalyzed published datasets, enriching for orthogroups whose gene trees support well-resolved clans elsewhere in the animal tree.16 We find that most genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy and low phylogenetic signal affect the ability to reconstruct branching patterns at deep nodes in the animal tree. We demonstrate that removing orthogroups that cannot recapitulate incontestable relationships alters the final topology that is inferred, while simultaneously improving the fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our set of filtered orthogroups. At a time when we are progressing toward sequencing all life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modeled adequately.19.
Collapse
Affiliation(s)
- Peter O Mulhair
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Charley G P McCarthy
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK.
| |
Collapse
|
11
|
Gatesy J, Springer MS. Phylogenomic Coalescent Analyses of Avian Retroelements Infer Zero-Length Branches at the Base of Neoaves, Emergent Support for Controversial Clades, and Ancient Introgressive Hybridization in Afroaves. Genes (Basel) 2022; 13:1167. [PMID: 35885951 PMCID: PMC9324441 DOI: 10.3390/genes13071167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 06/20/2022] [Accepted: 06/21/2022] [Indexed: 01/25/2023] Open
Abstract
Retroelement insertions (RIs) are low-homoplasy characters that are ideal data for addressing deep evolutionary radiations, where gene tree reconstruction errors can severely hinder phylogenetic inference with DNA and protein sequence data. Phylogenomic studies of Neoaves, a large clade of birds (>9000 species) that first diversified near the Cretaceous−Paleogene boundary, have yielded an array of robustly supported, contradictory relationships among deep lineages. Here, we reanalyzed a large RI matrix for birds using recently proposed quartet-based coalescent methods that enable inference of large species trees including branch lengths in coalescent units, clade-support, statistical tests for gene flow, and combined analysis with DNA-sequence-based gene trees. Genome-scale coalescent analyses revealed extremely short branches at the base of Neoaves, meager branch support, and limited congruence with previous work at the most challenging nodes. Despite widespread topological conflicts with DNA-sequence-based trees, combined analyses of RIs with thousands of gene trees show emergent support for multiple higher-level clades (Columbea, Passerea, Columbimorphae, Otidimorphae, Phaethoquornithes). RIs express asymmetrical support for deep relationships within the subclade Afroaves that hints at ancient gene flow involving the owl lineage (Strigiformes). Because DNA-sequence data are challenged by gene tree-reconstruction error, analysis of RIs represents one approach for improving gene tree-based methods when divergences are deep, internodes are short, terminal branches are long, and introgressive hybridization further confounds species−tree inference.
Collapse
Affiliation(s)
- John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S. Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA;
| |
Collapse
|
12
|
Steenwyk JL, Buida Iii TJ, Gonçalves C, Goltz DC, Morales G, Mead ME, LaBella AL, Chavez CM, Schmitz JE, Hadjifrangiskou M, Li Y, Rokas A. BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data. Genetics 2022; 221:6583183. [PMID: 35536198 PMCID: PMC9252278 DOI: 10.1093/genetics/iyac079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 05/03/2022] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic analysis-such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis-is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | | | - Carla Gonçalves
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal.,UCIBIO-Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
| | | | - Grace Morales
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Matthew E Mead
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Abigail L LaBella
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Christina M Chavez
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Jonathan E Schmitz
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Maria Hadjifrangiskou
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yuanning Li
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
13
|
Superson A, Battistuzzi F. Exclusion of fast evolving genes or fast evolving sites produces different archaean phylogenies. Mol Phylogenet Evol 2022; 170:107438. [DOI: 10.1016/j.ympev.2022.107438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 01/07/2022] [Accepted: 02/03/2022] [Indexed: 11/26/2022]
|
14
|
Simmons MP, Springer MS, Gatesy J. Gene-tree misrooting drives conflicts in phylogenomic coalescent analyses of palaeognath birds. Mol Phylogenet Evol 2021; 167:107344. [PMID: 34748873 DOI: 10.1016/j.ympev.2021.107344] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/08/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]
Abstract
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
15
|
Abstract
The reconstruction of evolutionary relationships among species is fundamental for our understanding of biodiversity. Today, evolutionary relationships are closely related with the depiction of the tree of life, and research on the topic is underpinned by methods in molecular phylogenetics that have grown in popularity since the 1960s. These methods depend on our understanding of how nucleotide or amino acid sequences evolve through time and in different lineages. Armed with this knowledge, researchers can make inferences about the relationships and amount of genomic divergence among species.
Collapse
Affiliation(s)
- David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, Øster Farimagsgade 5A, 1352 Copenhagen, Denmark.
| |
Collapse
|
16
|
Mongiardino Koch N. Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci. Mol Biol Evol 2021; 38:4025-4038. [PMID: 33983409 DOI: 10.1101/2021.02.13.431075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
17
|
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
18
|
Torres A, Goloboff PA, Catalano SA. Parsimony analysis of phylogenomic datasets (I): scripts and guidelines for using TNT (Tree Analysis using New Technology). Cladistics 2021; 38:103-125. [DOI: 10.1111/cla.12477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2021] [Indexed: 12/15/2022] Open
Affiliation(s)
- Ambrosio Torres
- Unidad Ejecutora Lillo Consejo Nacional de Investigaciones Científicas y Técnicas ‐ Fundación Miguel Lillo Miguel Lillo 251 S. M. de Tucumán Tucumán 4000 Argentina
| | - Pablo A. Goloboff
- Unidad Ejecutora Lillo Consejo Nacional de Investigaciones Científicas y Técnicas ‐ Fundación Miguel Lillo Miguel Lillo 251 S. M. de Tucumán Tucumán 4000 Argentina
- American Museum of Natural History 200 Central Park West New York NY 10024 USA
| | - Santiago A. Catalano
- Unidad Ejecutora Lillo Consejo Nacional de Investigaciones Científicas y Técnicas ‐ Fundación Miguel Lillo Miguel Lillo 251 S. M. de Tucumán Tucumán 4000 Argentina
- Facultad de Ciencias Naturales e Instituto Miguel Lillo Universidad Nacional de Tucumán Miguel Lillo 205 S. M. de Tucumán Tucumán 4000 Argentina
| |
Collapse
|
19
|
Ji Q, Zhu H, Huang X, Zhou K, Liu Z, Sun Y, Wang Z, Ke W. Uncovering phylogenetic relationships and genetic diversity of water dropwort using phenotypic traits and SNP markers. PLoS One 2021; 16:e0249825. [PMID: 34228738 PMCID: PMC8259969 DOI: 10.1371/journal.pone.0249825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 03/25/2021] [Indexed: 12/02/2022] Open
Abstract
The water dropworts Oenanthe linearis Wall. ex DC. and O. javanica (Blume) DC. are aquatic perennial herbs that have been used in China as vegetables and traditional medicines. However, their phylogenetic relationships and genetic diversity are poorly understood. Here, we presented the phenotypic traits and genome-wide DNA marker-based analysis of 158 water dropwort accessions representing both species. The analysis revealed that Oenanthe linearis was readily segregated into linear-leaf and deep-cleft leaf water dropworts according to their leaf shapes at flowering. Oenanthe javanica was classified by clustering analysis into two clusters based mainly on the morphological characteristics of their ultimate segments (leaflets). A set of 11 493 high-quality single-nucleotide polymorphisms was identified and used to construct a phylogenetic tree. There was strong discrimination between O. linearis and O. javanica, which was consistent with their phenotype diversification. The population structure and phylogenetic tree analyses suggested that the O. linearis accessions formed two major groups, corresponding to the linear-leaf and deep-cleft leaf types. The most obvious phenotypic differences between them were fully expressed at the reproductive growth stage. A single-nucleotide polymorphism-based analysis revealed that the O. javanica accessions could be categorized into groups I andII. However, this finding did not entirely align with the clusters revealed by morphological classification. Landraces were clustered into one group along with the remaining wild accessions. Hence, water dropwort domestication was short in duration. The level of genetic diversity for O. linearis (π = 0.1902) was slightly lower than that which was estimated for O. javanica (π = 0.2174). There was a low level of genetic differentiation between O. linearis and O. javanica (Fst = 0.0471). The mean genetic diversity among accessions ranged from 0.1818 for the linear-leaf types to 0.2318 for the groupII accessions. The phenotypic traits and the single-nucleotide polymorphism markers identified here lay empirical foundation for future genomic studies on water dropwort.
Collapse
Affiliation(s)
- Qun Ji
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Honglian Zhu
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Xinfang Huang
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Kai Zhou
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Zhengwei Liu
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Yalin Sun
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Zhixin Wang
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| | - Weidong Ke
- Institute of Vegetables, Wuhan Academy of Agricultural Sciences, Wuhan, Hubei, China
| |
Collapse
|
20
|
Li X, Hou Z, Xu C, Shi X, Yang L, Lewis LA, Zhong B. Large Phylogenomic Data sets Reveal Deep Relationships and Trait Evolution in Chlorophyte Green Algae. Genome Biol Evol 2021; 13:6265471. [PMID: 33950183 PMCID: PMC8271138 DOI: 10.1093/gbe/evab101] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2021] [Indexed: 12/01/2022] Open
Abstract
The chlorophyte green algae (Chlorophyta) are species-rich ancient groups ubiquitous in various habitats with high cytological diversity, ranging from microscopic to macroscopic organisms. However, the deep phylogeny within core Chlorophyta remains unresolved, in part due to the relatively sparse taxon and gene sampling in previous studies. Here we contribute new transcriptomic data and reconstruct phylogenetic relationships of core Chlorophyta based on four large data sets up to 2,698 genes of 70 species, representing 80% of extant orders. The impacts of outgroup choice, missing data, bootstrap-support cutoffs, and model misspecification in phylogenetic inference of core Chlorophyta are examined. The species tree topologies of core Chlorophyta from different analyses are highly congruent, with strong supports at many relationships (e.g., the Bryopsidales and the Scotinosphaerales-Dasycladales clade). The monophyly of Chlorophyceae and of Trebouxiophyceae as well as the uncertain placement of Chlorodendrophyceae and Pedinophyceae corroborate results from previous studies. The reconstruction of ancestral scenarios illustrates the evolution of the freshwater-sea and microscopic–macroscopic transition in the Ulvophyceae, and the transformation of unicellular→colonial→multicellular in the chlorophyte green algae. In addition, we provided new evidence that serine is encoded by both canonical codons and noncanonical TAG code in Scotinosphaerales, and stop-to-sense codon reassignment in the Ulvophyceae has originated independently at least three times. Our robust phylogenetic framework of core Chlorophyta unveils the evolutionary history of phycoplast, cyto-morphology, and noncanonical genetic codes in chlorophyte green algae.
Collapse
Affiliation(s)
- Xi Li
- College of Life Sciences, Nanjing Normal University, China
| | - Zheng Hou
- College of Life Sciences, Nanjing Normal University, China
| | - Chenjie Xu
- College of Life Sciences, Nanjing Normal University, China
| | - Xuan Shi
- College of Life Sciences, Nanjing Normal University, China
| | - Lingxiao Yang
- College of Life Sciences, Nanjing Normal University, China
| | - Louise A Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, USA
| | - Bojian Zhong
- College of Life Sciences, Nanjing Normal University, China
| |
Collapse
|
21
|
Jiang X, Edwards SV, Liu L. The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets. Syst Biol 2021; 69:795-812. [PMID: 32011711 PMCID: PMC7302055 DOI: 10.1093/sysbio/syaa008] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 12/24/2019] [Accepted: 01/02/2020] [Indexed: 11/30/2022] Open
Abstract
A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically congruent gene trees suggest that a poor fit of substitution models, rejected by 44% of loci, and concatenation models, rejected by 38% of loci, is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across six major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models. Although conducted on reduced data sets due to computational constraints, Bayesian model validation and comparison both strongly favor the MSC over concatenation across all data sets; the concatenation assumption of congruent gene trees rarely holds for phylogenomic data sets with more than 10 loci. Thus, for large phylogenomic data sets, model comparisons are expected to consistently and more strongly favor the coalescent model over the concatenation model. We also found that loci rejecting the MSC have little effect on species tree estimation. Our study reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference. [Bayes factor; Bayesian model validation; coalescent prior; congruent gene trees; independent prior; Metazoa; posterior predictive simulation.]
Collapse
Affiliation(s)
- Xiaodong Jiang
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30602, USA.,Institute of Bioinformatics, University of Georgia, 120 Green Street, Athens, GA 30602, USA
| |
Collapse
|
22
|
Data, time and money: evaluating the best compromise for inferring molecular phylogenies of non-model animal taxa. Mol Phylogenet Evol 2020; 142:106660. [DOI: 10.1016/j.ympev.2019.106660] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 10/17/2019] [Accepted: 10/17/2019] [Indexed: 12/15/2022]
|
23
|
Cloutier A, Sackton TB, Grayson P, Clamp M, Baker AJ, Edwards SV. Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone. Syst Biol 2019; 68:937-955. [PMID: 31135914 PMCID: PMC6857515 DOI: 10.1093/sysbio/syz019] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 03/06/2019] [Accepted: 04/09/2019] [Indexed: 01/17/2023] Open
Abstract
Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.
Collapse
Affiliation(s)
- Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Timothy B Sackton
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Michele Clamp
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Allan J Baker
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario M5S 3B2, Canada
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario M5S 2C6, Canada
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
- Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
24
|
Gatesy J, Sloan DB, Warren JM, Baker RH, Simmons MP, Springer MS. Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts. Mol Phylogenet Evol 2019; 139:106539. [DOI: 10.1016/j.ympev.2019.106539] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/10/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022]
|
25
|
Jones KE, Fér T, Schmickl RE, Dikow RB, Funk VA, Herrando‐Moraira S, Johnston PR, Kilian N, Siniscalchi CM, Susanna A, Slovák M, Thapa R, Watson LE, Mandel JR. An empirical assessment of a single family-wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae. APPLICATIONS IN PLANT SCIENCES 2019; 7:e11295. [PMID: 31667023 PMCID: PMC6814182 DOI: 10.1002/aps3.11295] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/05/2019] [Indexed: 05/23/2023]
Abstract
PREMISE Hybrid capture with high-throughput sequencing (Hyb-Seq) is a powerful tool for evolutionary studies. The applicability of an Asteraceae family-specific Hyb-Seq probe set and the outcomes of different phylogenetic analyses are investigated here. METHODS Hyb-Seq data from 112 Asteraceae samples were organized into groups at different taxonomic levels (tribe, genus, and species). For each group, data sets of non-paralogous loci were built and proportions of parsimony informative characters estimated. The impacts of analyzing alternative data sets, removing long branches, and type of analysis on tree resolution and inferred topologies were investigated in tribe Cichorieae. RESULTS Alignments of the Asteraceae family-wide Hyb-Seq locus set were parsimony informative at all taxonomic levels. Levels of resolution and topologies inferred at shallower nodes differed depending on the locus data set and the type of analysis, and were affected by the presence of long branches. DISCUSSION The approach used to build a Hyb-Seq locus data set influenced resolution and topologies inferred in phylogenetic analyses. Removal of long branches improved the reliability of topological inferences in maximum likelihood analyses. The Astereaceae Hyb-Seq probe set is applicable at multiple taxonomic depths, which demonstrates that probe sets do not necessarily need to be lineage-specific.
Collapse
Affiliation(s)
- Katy E. Jones
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Tomáš Fér
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
| | - Roswitha E. Schmickl
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Institute of BotanyThe Czech Academy of SciencesZámek 1CZ 25243PrůhoniceCzech Republic
| | - Rebecca B. Dikow
- Data Science LabOffice of the Chief Information OfficerSmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | - Vicki A. Funk
- Department of BotanyNational Museum of Natural HistorySmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | | | - Paul R. Johnston
- Freie Universität BerlinEvolutionary BiologyBerlinGermany
- Berlin Center for Genomics in Biodiversity ResearchBerlinGermany
- Leibniz‐Institute of Freshwater Ecology and Inland Fisheries (IGB)BerlinGermany
| | - Norbert Kilian
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Carolina M. Siniscalchi
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Alfonso Susanna
- Botanic Institute of Barcelona (IBB‐CSIC‐ICUB)Pg. del Migdia s.n.ES 08038BarcelonaSpain
| | - Marek Slovák
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Plant Science and Biodiversity CentreSlovak Academy of SciencesSK‐84523BratislavaSlovakia
| | - Ramhari Thapa
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Linda E. Watson
- Department of Plant Biology, Ecology, and EvolutionOklahoma State UniversityStillwaterOklahoma74078USA
| | - Jennifer R. Mandel
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| |
Collapse
|
26
|
Pie MR, Bornschein MR, Ribeiro LF, Faircloth BC, McCormack JE. Phylogenomic species delimitation in microendemic frogs of the Brazilian Atlantic Forest. Mol Phylogenet Evol 2019; 141:106627. [PMID: 31539606 DOI: 10.1016/j.ympev.2019.106627] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/17/2019] [Accepted: 09/17/2019] [Indexed: 10/26/2022]
Abstract
The advent of next-generation sequencing allows researchers to use large-scale datasets for species delimitation analyses, yet one can envision an inflection point where the added accuracy of including more loci does not offset the increased computational burden. One alternative to including all loci could be to prioritize the analysis of loci for which there is an expectation of high informativeness. Here, we explore the issue of species delimitation and locus selection with montane species from two anuran genera that have been isolated in sky islands across the southern Brazilian Atlantic Forest: Melanophryniscus (Bufonidae) and Brachycephalus (Brachycephalidae). To delimit species, we obtained genetic data using target enrichment of ultraconserved elements from 32 populations (13 for Melanophryniscus and 19 for Brachycephalus), and we were able to create datasets that included over 800 loci with no missing data. We ranked loci according to their number of parsimony-informative sites, and we performed species delimitation analyses using BPP with the most informative 10, 20, 40, 80, 160, 320, and 640 loci. We identified three types of phylogenetic node: nodes with either consistently high or low support regardless of the number of loci or their informativeness and nodes that were initially poorly supported where support became stronger as we included more data. When viewed across all sensitivity analyses, our results suggest that the current species richness in both genera is likely underestimated. In addition, our results show the effects of different sampling strategies on species delimitation using phylogenomic datasets.
Collapse
Affiliation(s)
- Marcio R Pie
- Departamento de Zoologia, Universidade Federal do Paraná, CEP 81531-980 Curitiba, Paraná, Brazil; Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil.
| | - Marcos R Bornschein
- Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil; Instituto de Biociências, Universidade Estadual Paulista, Praça Infante Dom Henrique s/no, Parque Bitaru, CEP 11330-900 São Vicente, São Paulo, Brazil
| | - Luiz F Ribeiro
- Mater Natura - Instituto de Estudos Ambientais, CEP 80250-020 Curitiba, Paraná, Brazil; Escola de Ciências da Vida, Pontifícia Universidade Católica do Paraná, CEP 80215-901 Curitiba, Paraná, Brazil
| | - Brant C Faircloth
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - John E McCormack
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Road, Los Angeles, CA 90041, USA
| |
Collapse
|
27
|
Roycroft EJ, Moussalli A, Rowe KC. Phylogenomics Uncovers Confidence and Conflict in the Rapid Radiation of Australo-Papuan Rodents. Syst Biol 2019; 69:431-444. [DOI: 10.1093/sysbio/syz044] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 06/12/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
The estimation of robust and accurate measures of branch support has proven challenging in the era of phylogenomics. In data sets of potentially millions of sites, bootstrap support for bifurcating relationships around very short internal branches can be inappropriately inflated. Such overestimation of branch support may be particularly problematic in rapid radiations, where phylogenetic signal is low and incomplete lineage sorting severe. Here, we explore this issue by comparing various branch support estimates under both concatenated and coalescent frameworks, in the recent radiation Australo-Papuan murine rodents (Muridae: Hydromyini). Using nucleotide sequence data from 1245 independent loci and several phylogenomic inference methods, we unequivocally resolve the majority of genus-level relationships within Hydromyini. However, at four nodes we recover inconsistency in branch support estimates both within and among concatenated and coalescent approaches. In most cases, concatenated likelihood approaches using standard fast bootstrap algorithms did not detect any uncertainty at these four nodes, regardless of partitioning strategy. However, we found this could be overcome with two-stage resampling, that is, across genes and sites within genes (using -bsam GENESITE in IQ-TREE). In addition, low confidence at recalcitrant nodes was recovered using UFBoot2, a recent revision to the bootstrap protocol in IQ-TREE, but this depended on partitioning strategy. Summary coalescent approaches also failed to detect uncertainty under some circumstances. For each of four recalcitrant nodes, an equivalent (or close to equivalent) number of genes were in strong support ($>$ 75% bootstrap) of both the primary and at least one alternative topological hypothesis, suggesting notable phylogenetic conflict among loci not detected using some standard branch support metrics. Recent debate has focused on the appropriateness of concatenated versus multigenealogical approaches to resolving species relationships, but less so on accurately estimating uncertainty in large data sets. Our results demonstrate the importance of employing multiple approaches when assessing confidence and highlight the need for greater attention to the development of robust measures of uncertainty in the era of phylogenomics.
Collapse
Affiliation(s)
- Emily J Roycroft
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Adnan Moussalli
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Kevin C Rowe
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| |
Collapse
|
28
|
Simmons MP, Sloan DB, Springer MS, Gatesy J. Gene-wise resampling outperforms site-wise resampling in phylogenetic coalescence analyses. Mol Phylogenet Evol 2019; 131:80-92. [DOI: 10.1016/j.ympev.2018.10.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 10/01/2018] [Indexed: 01/15/2023]
|
29
|
Aharon S, Ballesteros JA, Crawford AR, Friske K, Gainett G, Langford B, Santibáñez-López CE, Ya'aran S, Gavish-Regev E, Sharma PP. The anatomy of an unstable node: a Levantine relict precipitates phylogenomic dissolution of higher-level relationships of the armoured harvestmen (Arachnida: Opiliones: Laniatores). INVERTEBR SYST 2019. [DOI: 10.1071/is19002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
After tumultuous revisions to the family-level systematics of Laniatores (the armored harvestmen), the basally branching family Phalangodidae presently bears a disjunct and irregular distribution, attributed to the fragmentation of Pangea. One of the curious lineages assigned to Phalangodidae is the monotypic Israeli genus Haasus, the only Laniatores species that occurs in Israel, and whose presence in the Levant has been inferred to result from biogeographic connectivity with Eurasia. Recent surveys of Israeli caves have also yielded a new troglobitic morphospecies of Haasus. Here, we describe this new species as Haasus naasane sp. nov. So as to test the biogeographic affinity of Haasus, we sequenced DNA from both species and RNA from Haasus naasane sp. nov., to assess their phylogenetic placement. Our results showed that the new species is clearly closely related to Haasus judaeus, but Haasus itself is unambiguously nested within the largely Afrotropical family Pyramidopidae. In addition, the Japanese ‘phalangodid’ Proscotolemon sauteri was recovered as nested within the Southeast Asian family Petrobunidae. Phylogenomic placement of Haasus naasane sp. nov. in a 1550-locus matrix indicates that Pyramidopidae has an unstable position in the tree of Laniatores, with alternative partitioning of the matrix recovering high nodal support for mutually exclusive tree topologies. Exploration of phylogenetic signal showed the cause of this instability to be a considerable conflict between partitions, suggesting that the basal phylogeny of Laniatores may not yet be stable to addition of taxa. We transfer Haasus to Pyramidopidae (new familial assignment). Additionally, we transfer Proscotolemon to the family Petrobunidae (new familial assignment). Future studies on basal Laniatores phylogeny should emphasise the investigation of small-bodied and obscure groups that superficially resemble Phalangodidae.
Collapse
|
30
|
Liu L, Anderson C, Pearl D, Edwards SV. Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model. Methods Mol Biol 2019; 1910:211-239. [PMID: 31278666 DOI: 10.1007/978-1-4939-9074-0_7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called "multispecies network coalescent" models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as "parsimony" or "democratic vote" approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single "supergene," were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called "coalescent" methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
| | | | - Dennis Pearl
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology & Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
31
|
Kuang T, Tornabene L, Li J, Jiang J, Chakrabarty P, Sparks JS, Naylor GJP, Li C. Phylogenomic analysis on the exceptionally diverse fish clade Gobioidei (Actinopterygii: Gobiiformes) and data-filtering based on molecular clocklikeness. Mol Phylogenet Evol 2018; 128:192-202. [PMID: 30036699 DOI: 10.1016/j.ympev.2018.07.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 07/11/2018] [Accepted: 07/17/2018] [Indexed: 11/30/2022]
Abstract
The use of genome-scale data to infer phylogenetic relationships has gained in popularity in recent years due to the progress made in target-gene capture and sequencing techniques. Data filtering, the approach of excluding data inconsistent with the model from analyses, presumably could alleviate problems caused by systematic errors in phylogenetic inference. Different data filtering criteria, such as those based on evolutionary rate and molecular clocklikeness as well as others have been proposed for selecting useful phylogenetic markers, yet few studies have tested these criteria using phylogenomic data. We developed a novel set of single-copy nuclear coding markers to capture thousands of target genes in gobioid fishes, a species-rich lineages of vertebrates, and tested the effects of data-filtering methods based on substitution rate and molecular clocklikeness while attempting to control for the compounding effects of missing data and variation in locus length. We found that molecular clocklikeness was a better predictor than overall substitution rate for phylogenetic usefulness of molecular markers in our study. In addition, when the 100 best ranked loci for our predictors were concatenated and analyzed using maximum likelihood, or combined in a coalescent-based species-tree analysis, the resulting trees showed a well-resolved topology of Gobioidei that mostly agrees with previous studies. However, trees generated from the 100 least clocklike frequently recovered conflicting, and in some cases clearly erroneous topologies with strong support, thus indicating strong systematic biases in those datasets. Collectively these results suggest that data filtering has the potential improve the performance of phylogenetic inference when using both a concatenation approach as well as methods that rely on input from individual gene trees (i.e. coalescent species-tree approaches), which may be preferred in scenarios where incomplete lineage sorting is likely to be an issue.
Collapse
Affiliation(s)
- Ting Kuang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Luke Tornabene
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA
| | - Jingyan Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Jiamei Jiang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Prosanta Chakrabarty
- Louisiana State University, Museum of Natural Science, Department of Biological Sciences, Baton Rouge, LA 70803, USA
| | - John S Sparks
- American Museum of Natural History, Central Park West at 79th Street, NY, NY 10024, USA
| | | | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China.
| |
Collapse
|
32
|
Chen MY, Liang D, Zhang P. Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences. Genome Biol Evol 2018; 9:1998-2012. [PMID: 28830116 PMCID: PMC5737624 DOI: 10.1093/gbe/evx147] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2017] [Indexed: 12/12/2022] Open
Abstract
The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life.
Collapse
Affiliation(s)
- Meng-Yun Chen
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
33
|
Kallal RJ, Fernández R, Giribet G, Hormiga G. A phylotranscriptomic backbone of the orb-weaving spider family Araneidae (Arachnida, Araneae) supported by multiple methodological approaches. Mol Phylogenet Evol 2018; 126:129-140. [PMID: 29635025 DOI: 10.1016/j.ympev.2018.04.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/05/2018] [Accepted: 04/06/2018] [Indexed: 01/01/2023]
Abstract
The orb-weaving spider family Araneidae is extremely diverse (>3100 spp.) and its members can be charismatic terrestrial arthropods, many of them recognizable by their iconic orbicular snare web, such as the common garden spiders. Despite considerable effort to better understand their backbone relationships based on multiple sources of data (morphological, behavioral and molecular), pervasive low support remains in recent studies. In addition, no overarching phylogeny of araneids is available to date, hampering further comparative work. In this study, we analyze the transcriptomes of 33 taxa, including 19 araneids - 12 of them new to this study - representing most of the core family lineages, to examine the relationships within the family using genomic-scale datasets resulting from various methodological treatments, namely ortholog selection and gene occupancy as a measure of matrix completion. Six matrices were constructed to assess these effects by varying orthology inference method and gene occupancy threshold. Orthology methods used are the benchmarking tool BUSCO and the tree-based method UPhO; three gene occupancy thresholds (45%, 65%, 85%) were used to assess the effect of missing data. Gene tree and species tree-based methods (including multi-species coalescent and concatenation approaches, as well as maximum likelihood and Bayesian inference) were used totalling 17 analytical treatments. The monophyly of Araneidae and the placement of core araneid lineages were supported, together with some previously unsound backbone divergences; these include high support for Zygiellinae as the earliest diverging subfamily (followed by Nephilinae), the placement of Gasteracanthinae as sister group to Cyclosa and close relatives, and close relationships between the Araneus + Neoscona clade and Cyrtophorinae + Argiopinae clade. Incongruences were relegated to short branches in the clade comprising Cyclosa and its close relatives. We found congruence between most of the completed analyses, with minimal topological effects from occupancy/missing data and orthology assessment. The resulting number of genes by certain combinations of orthology and occupancy thresholds being analyzed had the greatest effect on the resulting trees, with anomalous outcomes recovered from analysis of lower numbers of genes.
Collapse
Affiliation(s)
- Robert J Kallal
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA.
| | - Rosa Fernández
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA; Bioinformatics and Genomics Unit, Center for Genomic Regulation, Carrer del Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA
| |
Collapse
|
34
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
35
|
Edwards SV, Cloutier A, Baker AJ. Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics. Syst Biol 2017; 66:1028-1044. [PMID: 28637293 PMCID: PMC5790140 DOI: 10.1093/sysbio/syx058] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 06/03/2017] [Accepted: 06/06/2017] [Indexed: 01/12/2023] Open
Abstract
Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600-∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| | - Allan J. Baker
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| |
Collapse
|
36
|
Springer MS, Gatesy J. Pinniped Diphyly and Bat Triphyly: More Homology Errors Drive Conflicts in the Mammalian Tree. J Hered 2017; 109:297-307. [DOI: 10.1093/jhered/esx089] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 10/07/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY
| |
Collapse
|
37
|
Genomic evidence reveals a radiation of placental mammals uninterrupted by the KPg boundary. Proc Natl Acad Sci U S A 2017; 114:E7282-E7290. [PMID: 28808022 DOI: 10.1073/pnas.1616744114] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The timing of the diversification of placental mammals relative to the Cretaceous-Paleogene (KPg) boundary mass extinction remains highly controversial. In particular, there have been seemingly irreconcilable differences in the dating of the early placental radiation not only between fossil-based and molecular datasets but also among molecular datasets. To help resolve this discrepancy, we performed genome-scale analyses using 4,388 loci from 90 taxa, including representatives of all extant placental orders and transcriptome data from flying lemurs (Dermoptera) and pangolins (Pholidota). Depending on the gene partitioning scheme, molecular clock model, and genic deviation from molecular clock assumptions, extensive sensitivity analyses recovered widely varying diversification scenarios for placental mammals from a given gene set, ranging from a deep Cretaceous origin and diversification to a scenario spanning the KPg boundary, suggesting that the use of suboptimal molecular clock markers and methodologies is a major cause of controversies regarding placental diversification timing. We demonstrate that reconciliation between molecular and paleontological estimates of placental divergence times can be achieved using the appropriate clock model and gene partitioning scheme while accounting for the degree to which individual genes violate molecular clock assumptions. A birth-death-shift analysis suggests that placental mammals underwent a continuous radiation across the KPg boundary without apparent interruption by the mass extinction, paralleling a genus-level radiation of multituberculates and ecomorphological diversification of both multituberculates and therians. These findings suggest that the KPg catastrophe evidently played a limited role in placental diversification, which, instead, was likely a delayed response to the slightly earlier radiation of angiosperms.
Collapse
|
38
|
Feijoo M, Parada A. Macrosystematics of eutherian mammals combining HTS data to expand taxon coverage. Mol Phylogenet Evol 2017; 113:76-83. [PMID: 28487261 DOI: 10.1016/j.ympev.2017.05.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Revised: 05/04/2017] [Accepted: 05/04/2017] [Indexed: 02/04/2023]
Abstract
In the last few years high-throughput sequencing technologies have permitted significant advances in mammalian phylogenetic studies from a genomic perspective. However, these studies have been restricted to a sparse number of species with available reference genomes. Thus, several issues inside the eutherian mammals phylogeny remain unresolved. This may be due in part to limited taxon sampling, as taxonomic density is known to affect phylogenetic resolution. In this context, we present a protocol to increase taxon coverage using high-throughput sequencing data (RNA or DNA) generated for other biological studies and available in public databases. Following this procedure we addressed pending or controversial issues concerning the phylogenetic position of Dermoptera, Pholidota and Chiroptera, considering multiple and independent loci. Also for Chiroptera and Arctoidea we evaluated the relationships of the lineages that compose it. Although the maximum number of genes used is moderate (95), in some cases taxon coverage doubles that of previous related studies. Globally, all coalescent-based (STAR, MP-EST and ASTRAL) and concatenated (IQ-TREE and BEAST2) methods used for species tree reconstruction were consistent to each other and most of interrogated nodes received high statistical support.
Collapse
Affiliation(s)
- M Feijoo
- Departamento de Ecología y Evolución, Facultad de Ciencias, Universidad de la República, Montevideo CP 11400, Uruguay.
| | - A Parada
- Instituto de Ciencias Ambientales y Evolutivas, Universidad Austral de Chile, Valdivia CP 5090000, Chile.
| |
Collapse
|