1
|
Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles. Animals (Basel) 2023; 13:ani13030471. [PMID: 36766360 PMCID: PMC9913427 DOI: 10.3390/ani13030471] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/13/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open
Abstract
Non-avian reptiles comprise a large proportion of amniote vertebrate diversity, with squamate reptiles-lizards and snakes-recently overtaking birds as the most species-rich tetrapod radiation. Despite displaying an extraordinary diversity of phenotypic and genomic traits, genomic resources in non-avian reptiles have accumulated more slowly than they have in mammals and birds, the remaining amniotes. Here we review the remarkable natural history of non-avian reptiles, with a focus on the physical traits, genomic characteristics, and sequence compositional patterns that comprise key axes of variation across amniotes. We argue that the high evolutionary diversity of non-avian reptiles can fuel a new generation of whole-genome phylogenomic analyses. A survey of phylogenetic investigations in non-avian reptiles shows that sequence capture-based approaches are the most commonly used, with studies of markers known as ultraconserved elements (UCEs) especially well represented. However, many other types of markers exist and are increasingly being mined from genome assemblies in silico, including some with greater information potential than UCEs for certain investigations. We discuss the importance of high-quality genomic resources and methods for bioinformatically extracting a range of marker sets from genome assemblies. Finally, we encourage herpetologists working in genomics, genetics, evolutionary biology, and other fields to work collectively towards building genomic resources for non-avian reptiles, especially squamates, that rival those already in place for mammals and birds. Overall, the development of this cross-amniote phylogenomic tree of life will contribute to illuminate interesting dimensions of biodiversity across non-avian reptiles and broader amniotes.
Collapse
|
2
|
Schull JK, Turakhia Y, Hemker JA, Dally WJ, Bejerano G. OUP accepted manuscript. Genome Biol Evol 2022; 14:6529394. [PMID: 35171243 PMCID: PMC8920512 DOI: 10.1093/gbe/evac013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 11/14/2022] Open
Abstract
We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human–chimp–gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters. Champagne also offers further evidence for Myomorpha as sister to Sciuridae and Hystricomorpha in the rodent tree. Champagne harbors distinct theoretical advantages as an automated method that produces nearly homoplasy-free character matrices on the whole-genome scale.
Collapse
Affiliation(s)
- James K Schull
- Department of Computer Science, Stanford University, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, USA
| | - James A Hemker
- Department of Computer Science, Stanford University, USA
| | - William J Dally
- Department of Computer Science, Stanford University, USA
- NVIDIA, Santa Clara, California, USA
- Department of Electrical Engineering, Stanford University, USA
| | - Gill Bejerano
- Department of Computer Science, Stanford University, USA
- Department of Developmental Biology, Stanford University, USA
- Department of Biomedical Data Science, Stanford University, USA
- Department of Pediatrics, Stanford University, USA
- Corresponding author: E-mail:
| |
Collapse
|
3
|
G Ribeiro P, Torres Jiménez MF, Andermann T, Antonelli A, Bacon CD, Matos-Maraví P. A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics. Mol Ecol 2021; 30:6021-6035. [PMID: 34674330 PMCID: PMC9298010 DOI: 10.1111/mec.16240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 09/24/2021] [Accepted: 10/16/2021] [Indexed: 11/28/2022]
Abstract
The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short‐read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low‐coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein‐coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
Collapse
Affiliation(s)
- Pedro G Ribeiro
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic.,Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic
| | - María Fernanda Torres Jiménez
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Tobias Andermann
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.,Department of Biology, University of Fribourg, Fribourg, Switzerland.,Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.,Royal Botanical Gardens Kew, Richmond, UK.,Department of Plant Sciences, University of Oxford, Oxford, UK
| | - Christine D Bacon
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Pável Matos-Maraví
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| |
Collapse
|
4
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
5
|
Adams RH, Castoe TA, DeGiorgio M. PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments. Bioinformatics 2021; 37:1923-1925. [PMID: 33051672 DOI: 10.1093/bioinformatics/btaa884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 09/16/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Here, we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data. AVAILABILITYAND IMPLEMENTATION Available at Github (https://github.com/radamsRHA/PhyloWGA). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Richard H Adams
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
6
|
Cloutier A, Sackton TB, Grayson P, Clamp M, Baker AJ, Edwards SV. Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone. Syst Biol 2020; 68:937-955. [PMID: 31135914 PMCID: PMC6857515 DOI: 10.1093/sysbio/syz019] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 03/06/2019] [Accepted: 04/09/2019] [Indexed: 01/17/2023] Open
Abstract
Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.
Collapse
Affiliation(s)
- Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA.,Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Timothy B Sackton
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Phil Grayson
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA.,Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Michele Clamp
- Informatics Group, Harvard University, 28 Oxford Street, Cambridge, MA 02138, USA
| | - Allan J Baker
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario M5S 3B2, Canada.,Department of Natural History, Royal Ontario Museum, 100 Queen's Park, Toronto, Ontario M5S 2C6, Canada
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA.,Department of Ornithology, Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
7
|
Sanderson MJ, Nicolae M, McMahon MM. Homology-Aware Phylogenomics at Gigabase Scales. Syst Biol 2018; 66:590-603. [PMID: 28123115 PMCID: PMC5790135 DOI: 10.1093/sysbio/syw104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 11/25/2016] [Indexed: 11/13/2022] Open
Abstract
Obstacles to inferring species trees from whole genome data sets range from algorithmic and data management challenges to the wholesale discordance in evolutionary history found in different parts of a genome. Recent work that builds trees directly from genomes by parsing them into sets of small $k$-mer strings holds promise to streamline and simplify these efforts, but existing approaches do not account well for gene tree discordance. We describe a "seed and extend" protocol that finds nearly exact matching sets of orthologous $k$-mers and extends them to construct data sets that can properly account for genomic heterogeneity. Exploiting an efficient suffix array data structure, sets of whole genomes can be parsed and converted into phylogenetic data matrices rapidly, with contiguous blocks of $k$-mers from the same chromosome, gene, or scaffold concatenated as needed. Phylogenetic trees constructed from highly curated rice genome data and a diverse set of six other eukaryotic whole genome, transcriptome, and organellar genome data sets recovered trees nearly identical to published phylogenomic analyses, in a small fraction of the time, and requiring many fewer parameter choices. Our method's ability to retain local homology information was demonstrated by using it to characterize gene tree discordance across the rice genome, and by its robustness to the high rate of interchromosomal gene transfer found in several rice species.
Collapse
Affiliation(s)
- M J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Marius Nicolae
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - M M McMahon
- School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
8
|
Edwards SV, Cloutier A, Baker AJ. Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics. Syst Biol 2017; 66:1028-1044. [PMID: 28637293 PMCID: PMC5790140 DOI: 10.1093/sysbio/syx058] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 06/03/2017] [Accepted: 06/06/2017] [Indexed: 01/12/2023] Open
Abstract
Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600-∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| | - Allan J. Baker
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| |
Collapse
|
9
|
Jennings WB. On the independent gene trees assumption in phylogenomic studies. Mol Ecol 2017; 26:4862-4871. [PMID: 28752599 DOI: 10.1111/mec.14274] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 07/13/2017] [Accepted: 07/24/2017] [Indexed: 11/28/2022]
Abstract
Multilocus coalescent methods for inferring species trees or historical demographic parameters typically require the assumption that gene trees for sampled SNPs or DNA sequence loci are conditionally independent given their species tree. In practice, researchers have used different criteria to delimit "independent loci." One criterion identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment (IA criterion). O'Neill et al. (2013, Molecular Ecology, 22, 111-129) used this approach in their phylogeographic study of North American tiger salamander species complex. In two other studies, researchers developed a pair of related methods that employ an independent genealogies criterion (IG criterion), which considers the effects of population-level recombination on correlations between the gene trees of intrachromosomal loci. Here, I explain these three methods, illustrate their use with example data, and evaluate their efficacies. I show that the IA approach is more conservative, is simpler to use and requires fewer assumptions than the IG approaches. However, IG approaches can identify much larger numbers of independent loci than the IA method, which, in turn, allows researchers to obtain more precise and accurate estimates of species trees and historical demographic parameters. A disadvantage of the IG methods is that they require an estimate of the population recombination rate. Despite their drawbacks, IA and IG approaches provide molecular ecologists with promising a priori methods for selecting SNPs or DNA sequence loci that likely meet the independence assumption in coalescent-based phylogenomic studies.
Collapse
Affiliation(s)
- W Bryan Jennings
- Departamento de Vertebrados, Museu Nacional, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|