1
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
2
|
Nicol DA, Saldivia P, Summerfield TC, Heads M, Lord JM, Khaing EP, Larcombe MJ. Phylogenomics and morphology of Celmisiinae (Asteraceae: Astereae): Taxonomic and evolutionary implications. Mol Phylogenet Evol 2024; 195:108064. [PMID: 38508479 DOI: 10.1016/j.ympev.2024.108064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/12/2024] [Accepted: 03/17/2024] [Indexed: 03/22/2024]
Abstract
The tribe Astereae (Asteraceae) includes 36 subtribes and 252 genera, and is distributed worldwide in temperate and tropical regions. One of the subtribes, Celmisiinae Saldivia, has been recently circumscribed to include six genera and ca. 160 species, and is restricted to eastern Australia, New Zealand, and New Guinea. The species show an impressive range of growth habit, from small herbs and ericoid subshrubs to medium-sized trees. They live in a wide range of habitats and are often dominant in subalpine and alpine vegetation. Despite the well-supported circumscription of Celmisiinae, uncertainties have remained about their internal relationships and classification at genus and species levels. This study exploited recent advances in high-throughput sequencing to build a robust multi-gene phylogeny for the subtribe Celmisiinae. The target enrichment Angiosperms353 bait set and the hybpiper-nf and paragone-nf pipelines were used to retrieve, infer, and assemble orthologous loci from 75 taxa representing all the main putative clades within the subtribe. Because of the diploidised ploidy level in Celmisiinae, as well as missing data in the assemblies, uncertainty remains surrounding the inference of orthology detection. However, based on a variety of gene-family sets, coalescent and concatenation-based phylogenetic reconstructions recovered similar topologies. Paralogy and missing data in the gene-families caused some problems, but the estimated phylogenies were well-supported and well-resolved. The phylogenomic evidence supported Celmisiinae and three main clades: the Pleurophyllum clade (Pleurophyllum, Macrolearia and Damnamenia), mostly in the New Zealand Subantarctic Islands, Celmisia of mainland New Zealand and Australia, and Shawia (including 'Olearia pro parte' and Pachystegia) of New Zealand, Australia and New Guinea. The results presented here add to the accumulating support for the Angiosperms353 bait set as an efficient method for documenting plant diversity.
Collapse
Affiliation(s)
- Duncan A Nicol
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand.
| | - Patricio Saldivia
- Biota Ltda. Av. Miguel Claro 1224, Providencia, Santiago, Chile; Museo Regional de Aysén, Km 3 Camino a Coyhaique Alto, Coyhaique, Chile
| | - Tina C Summerfield
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Michael Heads
- Buffalo Museum of Science, Buffalo, NY 14211-1293, USA
| | - Janice M Lord
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Ei P Khaing
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Matthew J Larcombe
- Department of Botany, University of Otago, PO Box 56, Dunedin, New Zealand
| |
Collapse
|
3
|
Moore‐Pollard ER, Jones DS, Mandel JR. Compositae-ParaLoss-1272: A complementary sunflower-specific probe set reduces paralogs in phylogenomic analyses of complex systems. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11568. [PMID: 38369976 PMCID: PMC10873820 DOI: 10.1002/aps3.11568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/30/2023] [Accepted: 11/12/2023] [Indexed: 02/20/2024]
Abstract
Premise A family-specific probe set for sunflowers, Compositae-1061, enables family-wide phylogenomic studies and investigations at lower taxonomic levels, but may lack resolution at genus to species levels, especially in groups complicated by polyploidy and hybridization. Methods We developed a Hyb-Seq probe set, Compositae-ParaLoss-1272, that targets orthologous loci in Asteraceae. We tested its efficiency across the family by simulating target enrichment sequencing in silico. Additionally, we tested its effectiveness at lower taxonomic levels in the historically complex genus Packera. We performed Hyb-Seq with Compositae-ParaLoss-1272 for 19 Packera taxa that were previously studied using Compositae-1061. The resulting sequences from each probe set, plus a combination of both, were used to generate phylogenies, compare topologies, and assess node support. Results We report that Compositae-ParaLoss-1272 captured loci across all tested Asteraceae members, had less gene tree discordance, and retained longer loci than Compositae-1061. Most notably, Compositae-ParaLoss-1272 recovered substantially fewer paralogous sequences than Compositae-1061, with only ~5% of the recovered loci reporting as paralogous, compared to ~59% with Compositae-1061. Discussion Given the complexity of plant evolutionary histories, assigning orthology for phylogenomic analyses will continue to be challenging. However, we anticipate Compositae-ParaLoss-1272 will provide improved resolution and utility for studies of complex groups and lower taxonomic levels in the sunflower family.
Collapse
Affiliation(s)
- Erika R. Moore‐Pollard
- Department of Biological SciencesUniversity of Memphis3700 Walker Ave.MemphisTennessee38152USA
| | - Daniel S. Jones
- Department of Biological SciencesAuburn University101 Rouse Life SciencesAuburnAlabama36849USA
| | - Jennifer R. Mandel
- Department of Biological SciencesUniversity of Memphis3700 Walker Ave.MemphisTennessee38152USA
| |
Collapse
|
4
|
Yang F, Ge J, Guo Y, Olmstead R, Sun W. Deciphering complex reticulate evolution of Asian Buddleja (Scrophulariaceae): insights into the taxonomy and speciation of polyploid taxa in the Sino-Himalayan region. ANNALS OF BOTANY 2023; 132:15-28. [PMID: 36722368 PMCID: PMC10550280 DOI: 10.1093/aob/mcad022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND AIMS Species of the genus Buddleja in Asia are mainly distributed in the Sino-Himalayan region and form a challenging taxonomic group, with extensive hybridization and polyploidization. A phylogenetic approach to unravelling the history of reticulation in this lineage will deepen our understanding of the speciation in biodiversity hotspots. METHODS For this study, we obtained 80 accessions representing all the species in the Asian Buddleja clade, and the ploidy level of each taxon was determined by flow cytometry analyses. Whole plastid genomes, nuclear ribosomal DNA, single nucleotide polymorphisms and a large number of low-copy nuclear genes assembled from genome skimming data were used to investigate the reticulate evolutionary history of Asian Buddleja. Complex cytonuclear conflicts were detected through a comparison of plastid and species trees. Gene tree incongruence was also analysed to detect any reticulate events in the history of this lineage. KEY RESULTS Six hybridization events were detected, which are able to explain the cytonuclear conflict in Asian Buddleja. Furthermore, PhyloNet analysis combining species ploidy data indicated several allopolyploid speciation events. A strongly supported species tree inferred from a large number of low-copy nuclear genes not only corrected some earlier misinterpretations, but also indicated that there are many Asian Buddleja species that have been lumped mistakenly. Divergent time estimation shows two periods of rapid diversification (8-10 and 0-3 Mya) in the Asian Buddleja clade, which might coincide with the final uplift of the Hengduan Mountains and Quaternary climate fluctuations, respectively. CONCLUSIONS This study presents a well-supported phylogenetic backbone for the Asian Buddleja species, elucidates their complex and reticulate evolutionary history and suggests that tectonic activity, climate fluctuations, polyploidization and hybridization together promoted the diversification of this lineage.
Collapse
Affiliation(s)
- Fengmao Yang
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences (CAS), Kunming 650201, Yunnan, China
| | - Jia Ge
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences (CAS), Kunming 650201, Yunnan, China
| | - Yongjie Guo
- Germplasm Bank of Wild Species of China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Richard Olmstead
- Department of Biology and Burke Museum, University of Washington, Seattle, WA 98195, USA
| | - Weibang Sun
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences (CAS), Kunming 650201, Yunnan, China
| |
Collapse
|
5
|
Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023; 40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies.
Collapse
Affiliation(s)
- James P Bernot
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, ℅ National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Joanna M Wolfe
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Meland
- Department of Biology, University of Bergen, Bergen, Norway
| | - Jørgen Olesen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Keith A Crandall
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
6
|
Yang Y, Forsythe ES, Ding YM, Zhang DY, Bai WN. Genomic Analysis of Plastid-Nuclear Interactions and Differential Evolution Rates in Coevolved Genes across Juglandaceae Species. Genome Biol Evol 2023; 15:evad145. [PMID: 37515592 PMCID: PMC10410296 DOI: 10.1093/gbe/evad145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/07/2023] [Accepted: 07/25/2023] [Indexed: 07/31/2023] Open
Abstract
The interaction between the nuclear and chloroplast genomes in plants is crucial for preserving essential cellular functions in the face of varying rates of mutation, levels of selection, and modes of transmission. Despite this, identifying nuclear genes that coevolve with chloroplast genomes at a genome-wide level has remained a challenge. In this study, we conducted an evolutionary rate covariation analysis to identify candidate nuclear genes coevolving with chloroplast genomes in Juglandaceae. Our analysis was based on 4,894 orthologous nuclear genes and 76 genes across seven chloroplast partitions in nine Juglandaceae species. Our results indicated that 1,369 (27.97%) of the nuclear genes demonstrated signatures of coevolution, with the Ycf1/2 partition yielding the largest number of hits (765) and the ClpP1 partition yielding the fewest (13). These hits were found to be significantly enriched in biological processes related to leaf development, photoperiodism, and response to abiotic stress. Among the seven partitions, AccD, ClpP1, MatK, and RNA polymerase partitions and their respective hits exhibited a narrow range, characterized by dN/dS values below 1. In contrast, the Ribosomal, Photosynthesis, Ycf1/2 partitions and their corresponding hits, displayed a broader range of dN/dS values, with certain values exceeding 1. Our findings highlight the differences in the number of candidate nuclear genes coevolving with the seven chloroplast partitions in Juglandaceae species and the correlation between the evolution rates of these genes and their corresponding chloroplast partitions.
Collapse
Affiliation(s)
- Yang Yang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Evan S Forsythe
- Department of Biology, Oregon State University-Cascades, Bend, Oregon, USA
- Integrative Biology Department, Oregon State University, Corvallis, Oregon, USA
| | - Ya-Mei Ding
- State Key Laboratory of Earth Surface Processes and Resource Ecology, and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
- South China Botanical Garden, The Chinese Academy of Sciences, Guangdong, China
| | - Da-Yong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Wei-Ning Bai
- State Key Laboratory of Earth Surface Processes and Resource Ecology, and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| |
Collapse
|
7
|
Fleming JF, Valero‐Gracia A, Struck TH. Identifying and addressing methodological incongruence in phylogenomics: A review. Evol Appl 2023; 16:1087-1104. [PMID: 37360032 PMCID: PMC10286231 DOI: 10.1111/eva.13565] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/07/2023] [Accepted: 05/17/2023] [Indexed: 06/28/2023] Open
Abstract
The availability of phylogenetic data has greatly expanded in recent years. As a result, a new era in phylogenetic analysis is dawning-one in which the methods we use to analyse and assess our data are the bottleneck to producing valuable phylogenetic hypotheses, rather than the need to acquire more data. This makes the ability to accurately appraise and evaluate new methods of phylogenetic analysis and phylogenetic artefact identification more important than ever. Incongruence in phylogenetic reconstructions based on different datasets may be due to two major sources: biological and methodological. Biological sources comprise processes like horizontal gene transfer, hybridization and incomplete lineage sorting, while methodological ones contain falsely assigned data or violations of the assumptions of the underlying model. While the former provides interesting insights into the evolutionary history of the investigated groups, the latter should be avoided or minimized as best as possible. However, errors introduced by methodology must first be excluded or minimized to be able to conclude that biological sources are the cause. Fortunately, a variety of useful tools exist to help detect such misassignments and model violations and to apply ameliorating measurements. Still, the number of methods and their theoretical underpinning can be overwhelming and opaque. Here, we present a practical and comprehensive review of recent developments in techniques to detect artefacts arising from model violations and poorly assigned data. The advantages and disadvantages of the different methods to detect such misleading signals in phylogenetic reconstructions are also discussed. As there is no one-size-fits-all solution, this review can serve as a guide in choosing the most appropriate detection methods depending on both the actual dataset and the computational power available to the researcher. Ultimately, this informed selection will have a positive impact on the broader field, allowing us to better understand the evolutionary history of the group of interest.
Collapse
|
8
|
Titus-McQuillan JE, Nanni AV, McIntyre LM, Rogers RL. Estimating transcriptome complexities across eukaryotes. BMC Genomics 2023; 24:254. [PMID: 37170194 PMCID: PMC10173493 DOI: 10.1186/s12864-023-09326-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/20/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Genomic complexity is a growing field of evolution, with case studies for comparative evolutionary analyses in model and emerging non-model systems. Understanding complexity and the functional components of the genome is an untapped wealth of knowledge ripe for exploration. With the "remarkable lack of correspondence" between genome size and complexity, there needs to be a way to quantify complexity across organisms. In this study, we use a set of complexity metrics that allow for evaluating changes in complexity using TranD. RESULTS We ascertain if complexity is increasing or decreasing across transcriptomes and at what structural level, as complexity varies. In this study, we define three metrics - TpG, EpT, and EpG- to quantify the transcriptome's complexity that encapsulates the dynamics of alternative splicing. Here we compare complexity metrics across 1) whole genome annotations, 2) a filtered subset of orthologs, and 3) novel genes to elucidate the impacts of orthologs and novel genes in transcript model analysis. Effective Exon Number (EEN) issued to compare the distribution of exon sizes within transcripts against random expectations of uniform exon placement. EEN accounts for differences in exon size, which is important because novel gene differences in complexity for orthologs and whole-transcriptome analyses are biased towards low-complexity genes with few exons and few alternative transcripts. CONCLUSIONS With our metric analyses, we are able to quantify changes in complexity across diverse lineages with greater precision and accuracy than previous cross-species comparisons under ortholog conditioning. These analyses represent a step toward whole-transcriptome analysis in the emerging field of non-model evolutionary genomics, with key insights for evolutionary inference of complexity changes on deep timescales across the tree of life. We suggest a means to quantify biases generated in ortholog calling and correct complexity analysis for lineage-specific effects. With these metrics, we directly assay the quantitative properties of newly formed lineage-specific genes as they lower complexity.
Collapse
Affiliation(s)
- James E Titus-McQuillan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| | - Adalena V Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Rebekah L Rogers
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| |
Collapse
|
9
|
Joyce EM, Appelhans MS, Buerki S, Cheek M, de Vos JM, Pirani JR, Zuntini AR, Bachelier JB, Bayly MJ, Callmander MW, Devecchi MF, Pell SK, Groppo M, Lowry PP, Mitchell J, Siniscalchi CM, Munzinger J, Orel HK, Pannell CM, Nauheimer L, Sauquet H, Weeks A, Muellner-Riehl AN, Leitch IJ, Maurin O, Forest F, Nargar K, Thiele KR, Baker WJ, Crayn DM. Phylogenomic analyses of Sapindales support new family relationships, rapid Mid-Cretaceous Hothouse diversification, and heterogeneous histories of gene duplication. FRONTIERS IN PLANT SCIENCE 2023; 14:1063174. [PMID: 36959945 PMCID: PMC10028101 DOI: 10.3389/fpls.2023.1063174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Sapindales is an angiosperm order of high economic and ecological value comprising nine families, c. 479 genera, and c. 6570 species. However, family and subfamily relationships in Sapindales remain unclear, making reconstruction of the order's spatio-temporal and morphological evolution difficult. In this study, we used Angiosperms353 target capture data to generate the most densely sampled phylogenetic trees of Sapindales to date, with 448 samples and c. 85% of genera represented. The percentage of paralogous loci and allele divergence was characterized across the phylogeny, which was time-calibrated using 29 rigorously assessed fossil calibrations. All families were supported as monophyletic. Two core family clades subdivide the order, the first comprising Kirkiaceae, Burseraceae, and Anacardiaceae, the second comprising Simaroubaceae, Meliaceae, and Rutaceae. Kirkiaceae is sister to Burseraceae and Anacardiaceae, and, contrary to current understanding, Simaroubaceae is sister to Meliaceae and Rutaceae. Sapindaceae is placed with Nitrariaceae and Biebersteiniaceae as sister to the core Sapindales families, but the relationships between these families remain unclear, likely due to their rapid and ancient diversification. Sapindales families emerged in rapid succession, coincident with the climatic change of the Mid-Cretaceous Hothouse event. Subfamily and tribal relationships within the major families need revision, particularly in Sapindaceae, Rutaceae and Meliaceae. Much of the difficulty in reconstructing relationships at this level may be caused by the prevalence of paralogous loci, particularly in Meliaceae and Rutaceae, that are likely indicative of ancient gene duplication events such as hybridization and polyploidization playing a role in the evolutionary history of these families. This study provides key insights into factors that may affect phylogenetic reconstructions in Sapindales across multiple scales, and provides a state-of-the-art phylogenetic framework for further research.
Collapse
Affiliation(s)
- Elizabeth M. Joyce
- Systematics, Biodiversity and Evolution of Plants, Ludwig-Maximilians-Universität München, Munich, Germany
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
- Australian Tropical Herbarium, James Cook University, Cairns, QLD, Australia
| | - Marc S. Appelhans
- Department of Systematics, Biodiversity and Evolution of Plants, University of Göttingen, Goettingen, Germany
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States
| | - Sven Buerki
- Department of Biological Sciences, Boise State University, Boise, ID, United States
| | - Martin Cheek
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Jurriaan M. de Vos
- Department of Environmental Sciences, University Basel, Basel, Switzerland
| | - José R. Pirani
- Departamento de Botaênica, Universidade de Saão Paulo, Herbário SPF, Saão Paulo, Brazil
| | | | | | - Michael J. Bayly
- School of BioSciences, The University of Melbourne, Parkville, VIC, Australia
| | | | - Marcelo F. Devecchi
- Departamento de Botaênica, Universidade de Saão Paulo, Herbário SPF, Saão Paulo, Brazil
| | - Susan K. Pell
- United States Botanic Garden, Washington, DC, United States
| | - Milton Groppo
- Departamento de Botaênica, Universidade de Saão Paulo, Herbário SPF, Saão Paulo, Brazil
| | - Porter P. Lowry
- Missouri Botanical Garden, St. Louis, MO, United States
- Institut de Systématique, Évolution, et Biodiversité (ISYEB), Muséum National d’Histoire Naturelle, Centre National de la Recherche Scientifique, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, Paris, France
| | - John Mitchell
- New York Botanical Garden, New York, NY, United States
| | - Carolina M. Siniscalchi
- Department of Biological Sciences, Harned Hall, Mississippi State University, Mississippi State, MS, United States
| | - Jérôme Munzinger
- AMAP, Université Montpellier, Institut de Recherche pour le Développement (IRD), Centre de coopération internationale en recherche agronomique pour le développement (CIRAD), Centre National de la Recherche Scientifique (CNRS), Institut national de la recherche agronomique (INRAE), Montpellier, France
| | - Harvey K. Orel
- School of BioSciences, The University of Melbourne, Parkville, VIC, Australia
| | - Caroline M. Pannell
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- Department of Biology, Oxford University, Oxford, United Kingdom
- Marine Laboratory, Queen’s University Belfast, Portaferry, United Kingdom
| | - Lars Nauheimer
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
- Australian Tropical Herbarium, James Cook University, Cairns, QLD, Australia
| | - Hervé Sauquet
- National Herbarium of New South Wales (NSW), Royal Botanic Gardens and Domain Trust, Sydney, NSW, Australia
| | - Andrea Weeks
- Department of Biology, George Mason University, Fairfax, VA, United States
| | - Alexandra N. Muellner-Riehl
- Department of Molecular Evolution and Plant Systematics & Herbarium, Faculty of Life Sciences, University of Leipzig, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Katharina Nargar
- Australian Tropical Herbarium, James Cook University, Cairns, QLD, Australia
- National Research Collections Australia, Commonwealth Industrial and Scientific Research Organization (CSIRO), Canberra, ACT, Australia
| | - Kevin R. Thiele
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | | | - Darren M. Crayn
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
- Australian Tropical Herbarium, James Cook University, Cairns, QLD, Australia
| |
Collapse
|
10
|
McCarthy CGP, Mulhair PO, Siu-Ting K, Creevey CJ, O’Connell MJ. Improving Orthologous Signal and Model Fit in Datasets Addressing the Root of the Animal Phylogeny. Mol Biol Evol 2023; 40:6989790. [PMID: 36649189 PMCID: PMC9848061 DOI: 10.1093/molbev/msac276] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/19/2022] [Accepted: 12/23/2022] [Indexed: 01/18/2023] Open
Abstract
There is conflicting evidence as to whether Porifera (sponges) or Ctenophora (comb jellies) comprise the root of the animal phylogeny. Support for either a Porifera-sister or Ctenophore-sister tree has been extensively examined in the context of model selection, taxon sampling, and outgroup selection. The influence of dataset construction is comparatively understudied. We re-examine five animal phylogeny datasets that have supported either root hypothesis using an approach designed to enrich orthologous signal in phylogenomic datasets. We find that many component orthogroups in animal datasets fail to recover major lineages as monophyletic with the exception of Ctenophora, regardless of the supported root. Enriching these datasets to retain orthogroups recovering ≥3 major lineages reduces dataset size by up to 50% while retaining underlying phylogenetic information and taxon sampling. Site-heterogeneous phylogenomic analysis of these enriched datasets recovers both Porifera-sister and Ctenophora-sister positions, even with additional constraints on outgroup sampling. Two datasets which previously supported Ctenophora-sister support Porifera-sister upon enrichment. All enriched datasets display improved model fitness under posterior predictive analysis. While not conclusively rooting animals at either Porifera or Ctenophora, we do see an increase in signal for Porifera-sister and a decrease in signal for Ctenophore-sister when data are filtered for orthologous signal. Our results indicate that dataset size and construction as well as model fit influence animal root inference.
Collapse
Affiliation(s)
| | | | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | | |
Collapse
|
11
|
Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O'Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol 2022; 32:5180-5188.e3. [PMID: 36356574 DOI: 10.1016/j.cub.2022.10.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/09/2022] [Accepted: 10/18/2022] [Indexed: 11/10/2022]
Abstract
Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum1,2,3,4,5,6 or within Deuterostomia, sister to Ambulacraria.7,8,9,10,11 Although their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. Recent studies have questioned whether Deuterostomia should be considered monophyletic at all.10,12,13 Hidden paralogy and poor phylogenetic signal present a major challenge for reconstructing species phylogenies.14,15,16,17,18 Here, we assess whether these issues have contributed to the conflict over the placement of Xenacoelomorpha. We reanalyzed published datasets, enriching for orthogroups whose gene trees support well-resolved clans elsewhere in the animal tree.16 We find that most genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy and low phylogenetic signal affect the ability to reconstruct branching patterns at deep nodes in the animal tree. We demonstrate that removing orthogroups that cannot recapitulate incontestable relationships alters the final topology that is inferred, while simultaneously improving the fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our set of filtered orthogroups. At a time when we are progressing toward sequencing all life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modeled adequately.19.
Collapse
Affiliation(s)
- Peter O Mulhair
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Charley G P McCarthy
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK.
| |
Collapse
|
12
|
Lu WX, Hu XY, Wang ZZ, Rao GY. Hyb-Seq provides new insights into the phylogeny and evolution of the Chrysanthemum zawadskii species complex in China. Cladistics 2022; 38:663-683. [PMID: 35766338 DOI: 10.1111/cla.12514] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 02/06/2023] Open
Abstract
A species complex is an assemblage of closely related species with blurred boundaries, and from which species could arise from different speciation processes and/or a speciation continuum. Such a complex can provide an opportunity to investigate evolutionary mechanisms acting on speciation. The Chrysanthemum zawadskii species complex in China, a monophyletic group of Chrysanthemum, consists of seven species with considerable morphological variation, diverse habitats and different distribution patterns. Here, we used Hyb-Seq data to construct a well-resolved phylogeny of the C. zawadskii complex. Then, we performed comparative analyses of variation patterns in morphology, ecology and distribution to investigate the roles of geography and ecology in this complex's diversification. Lastly, we implemented divergence time estimation, species distribution modelling and ancestral area reconstruction to trace the evolutionary history of this complex. We concluded that the C. zawadskii complex originated in the Qinling-Daba mountains during the early Pliocene and then spread west and northward along the mountain ranges to northern China. During this process, geographical and ecological factors imposing different influences resulted in the current diversification and distribution patterns of this species complex, which is composed of both well-diverged species and diverging lineages on the path of speciation.
Collapse
Affiliation(s)
- Wen-Xun Lu
- School of Life Sciences, Peking University, Beijing, China
| | - Xue-Ying Hu
- School of Life Sciences, Peking University, Beijing, China
| | - Zi-Zhao Wang
- School of Life Sciences, Peking University, Beijing, China
| | - Guang-Yuan Rao
- School of Life Sciences, Peking University, Beijing, China
| |
Collapse
|
13
|
Steenwyk JL, Goltz DC, Buida TJ, Li Y, Shen XX, Rokas A. OrthoSNAP: A tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees. PLoS Biol 2022; 20:e3001827. [PMID: 36228036 PMCID: PMC9595520 DOI: 10.1371/journal.pbio.3001827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 10/25/2022] [Accepted: 09/13/2022] [Indexed: 11/19/2022] Open
Abstract
Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species-a phenomenon observed among several important families of genes such as transporters and transcription factors-are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.
Collapse
Affiliation(s)
- Jacob L. Steenwyk
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (JLS); (AR)
| | - Dayna C. Goltz
- Independent Researcher, Nashville, Tennessee, United States of America
| | - Thomas J. Buida
- Independent Researcher, Nashville, Tennessee, United States of America
| | - Yuanning Li
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Xing-Xing Shen
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Antonis Rokas
- Vanderbilt University, Department of Biological Sciences, Nashville, Tennessee, United States of America
- Vanderbilt Evolutionary Studies Initiative, Vanderbilt University, Nashville, Tennessee, United States of America
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- * E-mail: (JLS); (AR)
| |
Collapse
|
14
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
15
|
Zhang C, Mirarab S. ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees. Bioinformatics 2022; 38:4949-4950. [PMID: 36094339 DOI: 10.1093/bioinformatics/btac620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/03/2022] [Accepted: 09/09/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Species tree inference from multi-copy gene trees has long been a challenge in phylogenomics. The recent method ASTRAL-Pro has made large strides by enabling multi-copy gene family trees as input and has been quickly adopted. Yet, its scalability, especially memory usage, needs to improve to accommodate the ever-growing dataset size. RESULTS We present ASTRAL-Pro 2, an ultrafast and memory efficient version of ASTRAL-Pro that adopts a placement-based optimization algorithm for significantly better scalability without sacrificing accuracy. AVAILABILITY The source code and binary files are publicly available at https://github.com/chaoszhang/ASTER; data are available at https://github.com/chaoszhang/A-Pro2_data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
16
|
Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022; 14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.
Collapse
Affiliation(s)
- Jesus Lozano-Fernandez
- Department of Genetics, Microbiology and Statistics, Biodiversity Research Institute (IRBio), University of Barcelona, Avd. Diagonal 643, 08028 Barcelona, Spain
- Institute of Evolutionary Biology (CSIC – Universitat Pompeu Fabra), Passeig marítim de la Barcelona 37-49, 08003 Barcelona, Spain
| |
Collapse
|
17
|
Ufimov R, Gorospe JM, Fér T, Kandziora M, Salomon L, van Loo M, Schmickl R. Utilizing paralogs for phylogenetic reconstruction has the potential to increase species tree support and reduce gene tree discordance in target enrichment data. Mol Ecol Resour 2022; 22:3018-3034. [PMID: 35796729 DOI: 10.1111/1755-0998.13684] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 05/28/2022] [Accepted: 06/22/2022] [Indexed: 11/30/2022]
Abstract
The analysis of target enrichment data in phylogenetics lacks optimization toward using paralogs for phylogenetic reconstruction. We developed a novel approach of detecting paralogs and utilizing them for phylogenetic tree inference, by retrieving both ortho- and paralogous copies and creating orthologous alignments, from which the gene trees are built. We implemented this approach in ParalogWizard and demonstrate its performance in plant groups that underwent a whole genome duplication relatively recently: the subtribe Malinae (family Rosaceae), using Angiosperms353 as well as Malinae481 probes, the genus Oritrophium (family Asteraceae), using Compositae1061 probes, and the genus Amomum (family Zingiberaceae), using Zingiberaceae1180 probes. Discriminating between orthologs and paralogs reduced gene tree discordance and increased the species tree support in the case of the Malinae, but not for Oritrophium and Amomum. This may relate to the difference in the proportion of paralogous loci between the datasets, which was highest for the Malinae. Overall, retrieving paralogs for phylogenetic reconstruction following ParalogWizard has the potential to increase the species tree support and reduce gene tree discordance in target enrichment data, particularly if the proportion of paralogous loci is high.
Collapse
Affiliation(s)
- Roman Ufimov
- Department of Forest Growth, Silviculture and Genetics, Austrian Research Centre for Forests, Seckendorff-Gudent-Weg 8, 1130, Vienna, Austria.,Komarov Botanical Institute, Russian Academy of Sciences, ul. Prof. Popova 2, 197376, St. Petersburg, Russian Federation
| | - Juan Manuel Gorospe
- Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43, Průhonice, Czech Republic.,Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Tomáš Fér
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Martha Kandziora
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Luciana Salomon
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| | - Marcela van Loo
- Department of Forest Growth, Silviculture and Genetics, Austrian Research Centre for Forests, Seckendorff-Gudent-Weg 8, 1130, Vienna, Austria
| | - Roswitha Schmickl
- Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43, Průhonice, Czech Republic.,Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 01, Prague, Czech Republic
| |
Collapse
|
18
|
Koedooder C, Landou E, Zhang F, Wang S, Basu S, Berman-Frank I, Shaked Y, Rubin-Blum M. Metagenomes of Red Sea Subpopulations Challenge the Use of Marker Genes and Morphology to Assess Trichodesmium Diversity. Front Microbiol 2022; 13:879970. [PMID: 35707175 PMCID: PMC9189399 DOI: 10.3389/fmicb.2022.879970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 04/22/2022] [Indexed: 11/13/2022] Open
Abstract
Trichodesmium are filamentous cyanobacteria of key interest due to their ability to fix carbon and nitrogen within an oligotrophic marine environment. Their blooms consist of a dynamic assemblage of subpopulations and colony morphologies that are hypothesized to occupy unique niches. Here, we assessed the poorly studied diversity of Trichodesmium in the Red Sea, based on metagenome-assembled genomes (MAGs) and hetR gene-based phylotyping. We assembled four non-redundant MAGs from morphologically distinct Trichodesmium colonies (tufts, dense and thin puffs). Trichodesmium thiebautii (puffs) and Trichodesmium erythraeum (tufts) were the dominant species within these morphotypes. While subspecies diversity is present for both T. thiebautii and T. erythraeum, a single T. thiebautii genotype comprised both thin and dense puff morphotypes, and we hypothesize that this phenotypic variation is likely attributed to gene regulation. Additionally, we found the rare non-diazotrophic clade IV and V genotypes, related to Trichodesmium nobis and Trichodesmium miru, respectively that likely occurred as single filaments. The hetR gene phylogeny further indicated that the genotype in clade IV could represent the species Trichodesmium contortum. Importantly, we show the presence of hetR paralogs in Trichodesmium, where two copies of the hetR gene were present within T. thiebautii genomes. This may lead to the overestimation of Trichodesmium diversity as one of the copies misidentified T. thiebautii as Trichodesmium aureum. Taken together, our results highlight the importance of re-assessing Trichodesmium taxonomy while showing the ability of genomics to capture the complex diversity and distribution of Trichodesmium populations.
Collapse
Affiliation(s)
- Coco Koedooder
- The Fredy and Nadine Herrmann Institute of Earth Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Interuniversity Institute for Marine Sciences in Eilat, Eilat, Israel
- Israel Oceanographic and Limnological Research, Haifa, Israel
- *Correspondence: Coco Koedooder,
| | - Etai Landou
- Mina and Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel
| | - Futing Zhang
- The Fredy and Nadine Herrmann Institute of Earth Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Interuniversity Institute for Marine Sciences in Eilat, Eilat, Israel
| | - Siyuan Wang
- The Fredy and Nadine Herrmann Institute of Earth Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Interuniversity Institute for Marine Sciences in Eilat, Eilat, Israel
| | - Subhajit Basu
- The Fredy and Nadine Herrmann Institute of Earth Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Interuniversity Institute for Marine Sciences in Eilat, Eilat, Israel
- Microsensor Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Ilana Berman-Frank
- Department of Marine Biology, Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Yeala Shaked
- The Fredy and Nadine Herrmann Institute of Earth Sciences, Hebrew University of Jerusalem, Jerusalem, Israel
- The Interuniversity Institute for Marine Sciences in Eilat, Eilat, Israel
| | | |
Collapse
|
19
|
Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022; 71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open
Abstract
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Collapse
Affiliation(s)
- James Willson
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mrinmoy Saha Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Baqiao Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Paul Zaharias
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
20
|
Masonick P, Meyer A, Hulsey CD. Phylogenomic analyses show repeated evolution of hypertrophied lips among Lake Malawi cichlid fishes. Genome Biol Evol 2022; 14:6568296. [PMID: 35417557 PMCID: PMC9017819 DOI: 10.1093/gbe/evac051] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2022] [Indexed: 11/27/2022] Open
Abstract
Cichlid fishes have repeatedly evolved an astounding diversity of trophic morphologies. For example, hypertrophied lips have evolved multiple times in both African and Neotropical cichlids and could have even evolved convergently within single species assemblages such as African Lake Malawi cichlids. However, the extremely high diversification rate in Lake Malawi cichlids and extensive potential for hybridization has cast doubt on whether even genome-level phylogenetic reconstructions could delineate if these types of adaptations have evolved once or multiple times. To examine the evolution of this iconic trait using protein-coding and noncoding single nucleotide polymorphisms (SNPs), we analyzed the genomes of 86 Lake Malawi cichlid species, including 33 de novo resequenced genomes. Surprisingly, genome-wide protein-coding SNPs exhibited enough phylogenetic informativeness to reconstruct interspecific and intraspecific relationships of hypertrophied lip cichlids, although noncoding SNPs provided better support. However, thinning of noncoding SNPs indicated most discrepancies come from the relatively smaller number of protein-coding sites and not from fundamental differences in their phylogenetic informativeness. Both coding and noncoding reconstructions showed that several “sand-dwelling” hypertrophied lip species, sampled intraspecifically, form a clade interspersed with a few other nonhypertrophied lip lineages. We also recovered Abactochromis labrosus within the rock-dwelling “mbuna” lineage, starkly contrasting with the affinities of other hypertrophied lip taxa found in the largely sand-dwelling “nonmbuna” component of this radiation. Comparative analyses coupled with tests for introgression indicate there is no widespread introgression between the hypertrophied lip lineages and taken together suggest this trophic phenotype has likely evolved at least twice independently within-lake Malawi.
Collapse
Affiliation(s)
- Paul Masonick
- Department of Biology, University of Konstanz, Universitätsstraße 10, 78464 Konstanz, Germany
| | - Axel Meyer
- Department of Biology, University of Konstanz, Universitätsstraße 10, 78464 Konstanz, Germany
| | - C Darrin Hulsey
- Department of Biology, University of Konstanz, Universitätsstraße 10, 78464 Konstanz, Germany.,Current Address: School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
21
|
Gable SM, Byars MI, Literman R, Tollis M. A Genomic Perspective on the Evolutionary Diversification of Turtles. Syst Biol 2022; 71:1331-1347. [DOI: 10.1093/sysbio/syac019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 02/28/2022] [Accepted: 03/01/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
To examine phylogenetic heterogeneity in turtle evolution, we collected thousands of high-confidence single-copy orthologs from 19 genome assemblies representative of extant turtle diversity and estimated a phylogeny with multispecies coalescent and concatenated partitioned methods. We also collected next-generation sequences from 26 turtle species and assembled millions of biallelic markers to reconstruct phylogenies based on annotated regions from the western painted turtle (Chrysemys picta bellii) genome (coding regions, introns, untranslated regions, intergenic, and others). We then measured gene tree-species tree discordance, as well as gene and site heterogeneity at each node in the inferred trees, and tested for temporal patterns in phylogenomic conflict across turtle evolution. We found strong and consistent support for all bifurcations in the inferred turtle species phylogenies. However, a number of genes, sites, and genomic features supported alternate relationships between turtle taxa. Our results suggest that gene tree-species tree discordance in these datasets is likely driven by population-level processes such as incomplete lineage sorting. We found very little effect of substitutional saturation on species tree topologies, and no clear phylogenetic patterns in codon usage bias and compositional heterogeneity. There was no correlation between gene and site concordance, node age, and DNA substitution rate across most annotated genomic regions. Our study demonstrates that heterogeneity is to be expected even in well resolved clades such as turtles, and that future phylogenomic studies should aim to sample as much of the genome as possible in order to obtain accurate phylogenies for assessing conservation priorities in turtles.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| | - Robert Literman
- Department of Biological Sciences, University of Rhode Island, 120 Flagg Road, Kingstown, RI, 0288, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| |
Collapse
|
22
|
Smith ML, Hahn MW. The Frequency and Topology of Pseudoorthologs. Syst Biol 2021; 71:649-659. [PMID: 34951639 DOI: 10.1093/sysbio/syab097] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 12/15/2021] [Accepted: 12/17/2021] [Indexed: 11/12/2022] Open
Abstract
Phylogenetics has long relied on the use of orthologs, or genes related through speciation events, to infer species relationships. However, identifying orthologs is difficult because gene duplication can obscure relationships among genes. Researchers have been particularly concerned with the insidious effects of pseudoorthologs-duplicated genes that are mistaken for orthologs because they are present in a single copy in each sampled species. Because gene tree topologies of pseudoorthologs may differ from the species tree topology, they have often been invoked as the cause of counterintuitive results in phylogenetics. Despite these perceived problems, no previous work has calculated the probabilities of pseudoortholog topologies, or has been able to circumscribe the regions of parameter space in which pseudoorthologs are most likely to occur. Here, we introduce a model for calculating the probabilities and branch lengths of orthologs and pseudoorthologs, including concordant and discordant pseudoortholog topologies, on a rooted three-taxon species tree. We show that the probability of orthologs is high relative to the probability of pseudoorthologs across reasonable regions of parameter space. Furthermore, the probabilities of the two discordant topologies are equal and never exceed that of the concordant topology, generally being much lower. We describe the species tree topologies most prone to generating pseudoorthologs, finding that they are likely to present problems to phylogenetic inference irrespective of the presence of pseudoorthologs. Overall, our results suggest that pseudoorthologs are unlikely to mislead inferences of species relationships under the biological scenarios considered here.
Collapse
Affiliation(s)
- Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
23
|
Morales-Briones DF, Gehrke B, Huang CH, Liston A, Ma H, Marx HE, Tank DC, Yang Y. Analysis of Paralogs in Target Enrichment Data Pinpoints Multiple Ancient Polyploidy Events in Alchemilla s.l. (Rosaceae). Syst Biol 2021; 71:190-207. [PMID: 33978764 PMCID: PMC8677558 DOI: 10.1093/sysbio/syab032] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 04/28/2021] [Accepted: 05/03/2021] [Indexed: 12/16/2022] Open
Abstract
Target enrichment is becoming increasingly popular for phylogenomic studies. Although baits for enrichment are typically designed to target single-copy genes, paralogs are often recovered with increased sequencing depth, sometimes from a significant proportion of loci, especially in groups experiencing whole-genome duplication (WGD) events. Common approaches for processing paralogs in target enrichment data sets include random selection, manual pruning, and mainly, the removal of entire genes that show any evidence of paralogy. These approaches are prone to errors in orthology inference or removing large numbers of genes. By removing entire genes, valuable information that could be used to detect and place WGD events is discarded. Here, we used an automated approach for orthology inference in a target enrichment data set of 68 species of Alchemilla s.l. (Rosaceae), a widely distributed clade of plants primarily from temperate climate regions. Previous molecular phylogenetic studies and chromosome numbers both suggested ancient WGDs in the group. However, both the phylogenetic location and putative parental lineages of these WGD events remain unknown. By taking paralogs into consideration and inferring orthologs from target enrichment data, we identified four nodes in the backbone of Alchemilla s.l. with an elevated proportion of gene duplication. Furthermore, using a gene-tree reconciliation approach, we established the autopolyploid origin of the entire Alchemilla s.l. and the nested allopolyploid origin of four major clades within the group. Here, we showed the utility of automated tree-based orthology inference methods, previously designed for genomic or transcriptomic data sets, to study complex scenarios of polyploidy and reticulate evolution from target enrichment data sets.[Alchemilla; allopolyploidy; autopolyploidy; gene tree discordance; orthology inference; paralogs; Rosaceae; target enrichment; whole genome duplication.].
Collapse
Affiliation(s)
- Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID 83844, USA
| | - Berit Gehrke
- University Gardens, University Museum, University of Bergen, Mildeveien 240, 5259 Hjellestad, Norway
| | - Chien-Hsun Huang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center of Genetics and Development, Ministry of Education Key Laboratory of Biodiversity and Ecological Engineering, Institute of Plant Biology, Center of Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331, USA
| | - Hong Ma
- Department of Biology, the Huck Institute of the Life Sciences, the Pennsylvania State University, 510D Mueller Laboratory, University Park, PA 16802 USA
| | - Hannah E Marx
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109-1048, USA
- Museum of Southwestern Biology and Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - David C Tank
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID 83844, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN 55108, USA
| |
Collapse
|
24
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
25
|
Chafin TK, Regmi B, Douglas MR, Edds DR, Wangchuk K, Dorji S, Norbu P, Norbu S, Changlu C, Khanal GP, Tshering S, Douglas ME. Parallel introgression, not recurrent emergence, explains apparent elevational ecotypes of polyploid Himalayan snowtrout. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210727. [PMID: 34729207 PMCID: PMC8548808 DOI: 10.1098/rsos.210727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/01/2021] [Indexed: 06/13/2023]
Abstract
The recurrence of similar evolutionary patterns within different habitats often reflects parallel selective pressures acting upon either standing or independently occurring genetic variation to produce a convergence of phenotypes. This interpretation (i.e. parallel divergences within adjacent streams) has been hypothesized for drainage-specific morphological 'ecotypes' observed in polyploid snowtrout (Cyprinidae: Schizothorax). However, parallel patterns of differential introgression during secondary contact are a viable alternative hypothesis. Here, we used ddRADseq (N = 35 319 de novo and N = 10 884 transcriptome-aligned SNPs), as derived from Nepali/Bhutanese samples (N = 48 each), to test these competing hypotheses. We first employed genome-wide allelic depths to derive appropriate ploidy models, then a Bayesian approach to yield genotypes statistically consistent under the inferred expectations. Elevational 'ecotypes' were consistent in geometric morphometric space, but with phylogenetic relationships at the drainage level, sustaining a hypothesis of independent emergence. However, partitioned analyses of phylogeny and admixture identified subsets of loci under selection that retained genealogical concordance with morphology, suggesting instead that apparent patterns of morphological/phylogenetic discordance are driven by widespread genomic homogenization. Here, admixture occurring in secondary contact effectively 'masks' previous isolation. Our results underscore two salient factors: (i) morphological adaptations are retained despite hybridization and (ii) the degree of admixture varies across tributaries, presumably concomitant with underlying environmental or anthropogenic factors.
Collapse
Affiliation(s)
- Tyler K. Chafin
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder 80309, USA
| | - Binod Regmi
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
- National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Marlis R. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - David R. Edds
- Department of Biological Sciences, Emporia State University, Emporia, KS 66801, USA
| | - Karma Wangchuk
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Sonam Dorji
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Pema Norbu
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Sangay Norbu
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Changlu Changlu
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Gopal Prasad Khanal
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Singye Tshering
- National Research and Development Centre for Riverine and Lake Fisheries, Ministry of Agriculture and Forests, Royal Government of Bhutan, Haa, Bhutan
| | - Michael E. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| |
Collapse
|
26
|
Guinand B, Oral M, Tougard C. Brown trout phylogenetics: A persistent mirage towards (too) many species. JOURNAL OF FISH BIOLOGY 2021; 99:298-307. [PMID: 33483952 DOI: 10.1111/jfb.14686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/28/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]
Affiliation(s)
- Bruno Guinand
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Münevver Oral
- Faculty of Fisheries and Aquatic Science, Recep Tayyip Erdogan University, Rize, Turkey
| | | |
Collapse
|
27
|
Yan Z, Smith ML, Du P, Hahn MW, Nakhleh L. Species Tree Inference Methods Intended to Deal with Incomplete Lineage Sorting Are Robust to the Presence of Paralogs. Syst Biol 2021; 71:367-381. [PMID: 34245291 PMCID: PMC8978208 DOI: 10.1093/sysbio/syab056] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Many recent phylogenetic methods have focused on accurately inferring species
trees when there is gene tree discordance due to incomplete lineage sorting
(ILS). For almost all of these methods, and for phylogenetic methods in general,
the data for each locus are assumed to consist of orthologous, single-copy
sequences. Loci that are present in more than a single copy in any of the
studied genomes are excluded from the data. These steps greatly reduce the
number of loci available for analysis. The question we seek to answer in this
study is: what happens if one runs such species tree inference methods on data
where paralogy is present, in addition to or without ILS being present? Through
simulation studies and analyses of two large biological data sets, we show that
running such methods on data with paralogs can still provide accurate results.
We use multiple different methods, some of which are based directly on the
multispecies coalescent model, and some of which have been proven to be
statistically consistent under it. We also treat the paralogous loci in multiple
ways: from explicitly denoting them as paralogs, to randomly selecting one copy
per species. In all cases, the inferred species trees are as accurate as
equivalent analyses using single-copy orthologs. Our results have significant
implications for the use of ILS-aware phylogenomic analyses, demonstrating that
they do not have to be restricted to single-copy loci. This will greatly
increase the amount of data that can be used for phylogenetic inference.[Gene
duplication and loss; incomplete lineage sorting; multispecies coalescent;
orthology; paralogy.]
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Megan L Smith
- Department of Biology and Department of Computer Science, Indiana University, 1001 East Third Street, Bloomington, IN 47405, USA
| | - Peng Du
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, 1001 East Third Street, Bloomington, IN 47405, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA.,Department of BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
28
|
Ufimov R, Zeisek V, Píšová S, Baker WJ, Fér T, van Loo M, Dobeš C, Schmickl R. Relative performance of customized and universal probe sets in target enrichment: A case study in subtribe Malinae. APPLICATIONS IN PLANT SCIENCES 2021; 9:e11442. [PMID: 34336405 PMCID: PMC8312748 DOI: 10.1002/aps3.11442] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/09/2021] [Indexed: 05/10/2023]
Abstract
PREMISE Custom probe design for target enrichment in phylogenetics is tedious and often hinders broader phylogenetic synthesis. The universal angiosperm probe set Angiosperms353 may be the solution. Here, we test the relative performance of Angiosperms353 on the Rosaceae subtribe Malinae in comparison with custom probes that we specifically designed for this clade. We then address the impact of bioinformatically altering the performance of Angiosperms353 by replacing the original probe sequences with orthologs extracted from the Malus domestica genome. METHODS To evaluate the relative performance of these probe sets, we compared the enrichment efficiency, locus recovery, alignment length, proportion of parsimony-informative sites, proportion of potential paralogs, the topology and support of the resulting species trees, and the gene tree discordance. RESULTS Locus recovery was highest for our custom Malinae probe set, and replacing the original Angiosperms353 sequences with a Malus representative improved the locus recovery relative to Angiosperms353. The proportion of parsimony-informative sites was similar between all probe sets, while the gene tree discordance was lower in the case of the custom probes. DISCUSSION A custom probe set benefits from data completeness and can be tailored toward the specificities of the project of choice; however, Angiosperms353 was equally as phylogenetically informative as the custom probes. We therefore recommend using both a custom probe set and Angiosperms353 to facilitate large-scale systematic studies, where financially possible.
Collapse
Affiliation(s)
- Roman Ufimov
- Department of Forest Growth, Silviculture and GeneticsAustrian Research Centre for ForestsSeckendorff‐Gudent‐Weg 8Vienna1130Austria
- Komarov Botanical InstituteRussian Academy of Sciencesul. Prof. Popova 2St. Petersburg197376Russian Federation
| | - Vojtěch Zeisek
- Institute of BotanyThe Czech Academy of SciencesZámek 1Průhonice252 43Czech Republic
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2Prague128 01Czech Republic
| | - Soňa Píšová
- Department of Forest Growth, Silviculture and GeneticsAustrian Research Centre for ForestsSeckendorff‐Gudent‐Weg 8Vienna1130Austria
- Institute of BotanyThe Czech Academy of SciencesZámek 1Průhonice252 43Czech Republic
| | | | - Tomáš Fér
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2Prague128 01Czech Republic
| | - Marcela van Loo
- Department of Forest Growth, Silviculture and GeneticsAustrian Research Centre for ForestsSeckendorff‐Gudent‐Weg 8Vienna1130Austria
| | - Christoph Dobeš
- Department of Forest Growth, Silviculture and GeneticsAustrian Research Centre for ForestsSeckendorff‐Gudent‐Weg 8Vienna1130Austria
| | - Roswitha Schmickl
- Institute of BotanyThe Czech Academy of SciencesZámek 1Průhonice252 43Czech Republic
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2Prague128 01Czech Republic
| |
Collapse
|
29
|
Ottenlips MV, Mansfield DH, Buerki S, Feist MAE, Downie SR, Dodsworth S, Forest F, Plunkett GM, Smith JF. Resolving species boundaries in a recent radiation with the Angiosperms353 probe set: the Lomatium packardiae/L. anomalum clade of the L. triternatum (Apiaceae) complex. AMERICAN JOURNAL OF BOTANY 2021; 108:1217-1233. [PMID: 34105148 PMCID: PMC8362113 DOI: 10.1002/ajb2.1676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/26/2021] [Indexed: 05/29/2023]
Abstract
PREMISE Speciation not associated with morphological shifts is challenging to detect unless molecular data are employed. Using Sanger-sequencing approaches, the Lomatium packardiae/L. anomalum subcomplex within the larger Lomatium triternatum complex could not be resolved. Therefore, we attempt to resolve these boundaries here. METHODS The Angiosperms353 probe set was employed to resolve the ambiguity within Lomatium triternatum species complex using 48 accessions assigned to L. packardiae, L. anomalum, or L. triternatum. In addition to exon data, 54 nuclear introns were extracted and were complete for all samples. Three approaches were used to estimate evolutionary relationships and define species boundaries: STACEY, a Bayesian coalescent-based species tree analysis that takes incomplete lineage sorting into account; ASTRAL-III, another coalescent-based species tree analysis; and a concatenated approach using MrBayes. Climatic factors, morphological characters, and soil variables were measured and analyzed to provide additional support for recovered groups. RESULTS The STACEY analysis recovered three major clades and seven subclades, all of which are geographically structured, and some correspond to previously named taxa. No other analysis had full agreement between recovered clades and other parameters. Climatic niche and leaflet width and length provide some predictive ability for the major clades. CONCLUSIONS The results suggest that these groups are in the process of incipient speciation and incomplete lineage sorting has been a major barrier to resolving boundaries within this lineage previously. These results are hypothesized through sequencing of multiple loci and analyzing data using coalescent-based processes.
Collapse
Affiliation(s)
| | | | - Sven Buerki
- Department of Biological SciencesBoise State UniversityBoiseID83725USA
| | | | - Stephen R. Downie
- Department of Plant BiologyUniversity of Illinois at Urbana‐ChampaignUrbanaIL61801USA
| | - Steven Dodsworth
- Royal Botanic Gardens, KewRichmondSurreyTW9 3AEUK
- School of Life SciencesUniversity of BedfordshireLutonLU1 3JUUK
| | - Félix Forest
- Royal Botanic Gardens, KewRichmondSurreyTW9 3AEUK
| | - Gregory M. Plunkett
- Cullman Program for Molecular SystematicsNew York Botanical Garden2900 Southern BoulevardBronxNY10458USA
| | - James F. Smith
- Department of Biological SciencesBoise State UniversityBoiseID83725USA
| |
Collapse
|
30
|
Singh S, Singh A. A prescient evolutionary model for genesis, duplication and differentiation of MIR160 homologs in Brassicaceae. Mol Genet Genomics 2021; 296:985-1003. [PMID: 34052911 DOI: 10.1007/s00438-021-01797-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 05/21/2021] [Indexed: 12/18/2022]
Abstract
MicroRNA160 is a class of nitrogen-starvation responsive genes which governs establishment of root system architecture by down-regulating AUXIN RESPONSE FACTOR genes (ARF10, ARF16 and ARF17) in plants. The high copy number of MIR160 variants discovered by us from land plants, especially polyploid crop Brassicas, posed questions regarding genesis, duplication, evolution and function. Absence of studies on impact of whole genome and segmental duplication on retention and evolution of MIR160 homologs in descendent plant lineages prompted us to undertake the current study. Herein, we describe ancestry and fate of MIR160 homologs in Brassicaceae in context of polyploidy driven genome re-organization, copy number and differentiation. Paralogy amongst Brassicaceae MIR160a, MIR160b and MIR160c was inferred using phylogenetic analysis of 468 MIR160 homologs from land plants. The evolutionarily distinct MIR160a was found to represent ancestral form and progenitor of MIR160b and MIR160c. Chronology of evolutionary events resulting in origin and diversification of genomic loci containing MIR160 homologs was delineated using derivatives of comparative synteny. A prescient model for causality of segmental duplications in establishment of paralogy in Brassicaceae MIR160, with whole genome duplication accentuating the copy number increase, is being posited in which post-segmental duplication events viz. differential gene fractionation, gene duplications and inversions are shown to drive divergence of chromosome segments. While mutations caused the diversification of MIR160a, MIR160b and MIR160c, duplicated segments containing these diversified genes suffered gene rearrangements via gene loss, duplications and inversions. Yet the topology of phylogenetic and phenetic trees were found congruent suggesting similar evolutionary trajectory. Over 80% of Brassicaceae genomes and subgenomes showed a preferential retention of single copy each of MIR160a, MIR160b and MIR160c suggesting functional relevance. Thus, our study provides a blue-print for reconstructing ancestry and phylogeny of MIRNA gene families at genomics level and analyzing the impact of polyploidy on organismal complexity. Such studies are critical for understanding the molecular basis of agronomic traits and deploying appropriate candidates for crop improvement.
Collapse
Affiliation(s)
- Swati Singh
- Department of Biotechnology, TERI School of Advanced Studies, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India.,Department of Life Sciences, School of Basic Sciences and Research, Sharda University, Plot no. 32-34, Knowledge Park III, Greater Noida, Uttar Pradesh, 201310, India
| | - Anandita Singh
- Department of Biotechnology, TERI School of Advanced Studies, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India.
| |
Collapse
|
31
|
Matsumoto H, Mimori T, Fukunaga T. Novel metric for hyperbolic phylogenetic tree embeddings. Biol Methods Protoc 2021; 6:bpab006. [PMID: 33928190 PMCID: PMC8058397 DOI: 10.1093/biomethods/bpab006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 01/09/2023] Open
Abstract
Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, Nagasaki, Japan.,Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Saitama, Japan
| | - Takahiro Mimori
- Medical Image Analysis Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Tsukasa Fukunaga
- Department of Computer Science, Graduate School of Information Science and Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|