1
|
Travers SL, Hutter CR, Austin CC, Donnellan SC, Buehler MD, Ellison CE, Ruane S. VenomCap: An exon-capture probe set for the targeted sequencing of snake venom genes. Mol Ecol Resour 2024; 24:e14020. [PMID: 39297212 PMCID: PMC11495845 DOI: 10.1111/1755-0998.14020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/14/2024] [Accepted: 09/02/2024] [Indexed: 10/03/2024]
Abstract
Snake venoms are complex mixtures of toxic proteins that hold significant medical, pharmacological and evolutionary interest. To better understand the genetic diversity underlying snake venoms, we developed VenomCap, a novel exon-capture probe set targeting toxin-coding genes from a wide range of elapid snakes, with a particular focus on the ecologically diverse and medically important subfamily Hydrophiinae. We tested the capture success of VenomCap across 24 species, representing all major elapid lineages. We included snake phylogenomic probes in the VenomCap capture set, allowing us to compare capture performance between venom and phylogenomic loci and to infer elapid phylogenetic relationships. We demonstrated VenomCap's ability to recover exons from ~1500 target markers, representing a total of 24 known venom gene families, which includes the dominant gene families found in elapid venoms. We find that VenomCap's capture results are robust across all elapids sampled, and especially among hydrophiines, with respect to measures of target capture success (target loci matched, sensitivity, specificity and missing data). As a cost-effective and efficient alternative to full genome sequencing, VenomCap can dramatically accelerate the sequencing and analysis of venom gene families. Overall, our tool offers a model for genomic studies on snake venom gene diversity and evolution that can be expanded for comprehensive comparisons across the other families of venomous snakes.
Collapse
Affiliation(s)
- Scott L. Travers
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Carl R. Hutter
- Museum of Natural Sciences and Department of Biological Sciences. Louisiana State University. Baton Rouge, LA 70803, USA
| | - Christopher C. Austin
- Museum of Natural Sciences and Department of Biological Sciences. Louisiana State University. Baton Rouge, LA 70803, USA
| | - Stephen C. Donnellan
- South Australian Museum, North Terrace, Adelaide 5000, Australia
- Australian Museum Research Institute, Australian Museum, 1 William St, Sydney 2010, Australia
| | - Matthew D. Buehler
- Department of Biological Sciences and Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | | | - Sara Ruane
- Life Sciences Section, Negaunee Integrative Research Center, Field Museum, Chicago, IL 60605, USA
| |
Collapse
|
2
|
Herrig DK, Ridenbaugh RD, Vertacnik KL, Everson KM, Sim SB, Geib SM, Weisrock DW, Linnen CR. Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies. Syst Biol 2024; 73:839-860. [PMID: 38970484 DOI: 10.1093/sysbio/syae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/08/2024] Open
Abstract
Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.
Collapse
Affiliation(s)
- Danielle K Herrig
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Ryan D Ridenbaugh
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kim L Vertacnik
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kathryn M Everson
- Department of Natural Resources and Environmental Science, University of Nevada, 1664 N. Virginia St., Reno, NV 89557, USA
- Department of Integrative Biology, Oregon State University, 4575 SW Research Way, Corvallis, OR 97333, USA
| | - Sheina B Sim
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - Scott M Geib
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Catherine R Linnen
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| |
Collapse
|
3
|
Roberts JR, Bernstein JM, Austin CC, Hains T, Mata J, Kieras M, Pirro S, Ruane S. Whole snake genomes from eighteen families of snakes (Serpentes: Caenophidia) and their applications to systematics. J Hered 2024; 115:487-497. [PMID: 38722259 DOI: 10.1093/jhered/esae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 05/08/2024] [Indexed: 08/21/2024] Open
Abstract
We present genome assemblies for 18 snake species representing 18 families (Serpentes: Caenophidia): Acrochordus granulatus, Aparallactus werneri, Boaedon fuliginosus, Calamaria suluensis, Cerberus rynchops, Grayia smithii, Imantodes cenchoa, Mimophis mahfalensis, Oxyrhabdium leporinum, Pareas carinatus, Psammodynastes pulverulentus, Pseudoxenodon macrops, Pseudoxyrhopus heterurus, Sibynophis collaris, Stegonotus admiraltiensis, Toxicocalamus goodenoughensis, Trimeresurus albolabris, and Tropidonophis doriae. From these new genome assemblies, we extracted thousands of loci commonly used in systematic and phylogenomic studies on snakes, including target-capture datasets composed of ultraconserved elements (UCEs) and anchored hybrid enriched loci (AHEs), as well as traditional Sanger loci. Phylogenies inferred from the two target-capture loci datasets were identical with each other and strongly congruent with previously published snake phylogenies. To show the additional utility of these non-model genomes for investigative evolutionary research, we mined the genome assemblies of two New Guinea island endemics in our dataset (S. admiraltiensis and T. doriae) for the ATP1a3 gene, a thoroughly researched indicator of resistance to toad toxin ingestion by squamates. We find that both these snakes possess the genotype for toad toxin resistance despite their endemism to New Guinea, a region absent of any toads until the human-mediated introduction of Cane Toads in the 1930s. These species possess identical substitutions that suggest the same bufotoxin resistance as their Australian congenerics (Stegonotus australis and Tropidonophis mairii) which forage on invasive Cane Toads. Herein, we show the utility of short-read high-coverage genomes, as well as improving the deficit of available squamate genomes with associated voucher specimens.
Collapse
Affiliation(s)
- Jackson R Roberts
- Division of Zoology, Sternberg Museum of Natural History, Fort Hays State University, Hays, KS 67601, United States
- Division of Herpetology, Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, United States
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States
| | - Justin M Bernstein
- Center for Genomics, University of Kansas, Lawrence, KS 66045, United States
- Department of Biology, University of Texas at Arlington, Arlington, TX 76010, United States
| | - Christopher C Austin
- Division of Herpetology, Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, United States
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States
| | - Taylor Hains
- Committee on Evolutionary Biology, University of Chicago, Chicago, IL 60637, United States
- Life Sciences Section, Negaunee Integrative Research Center, The Field Museum of Natural History, Chicago, IL 60637, United States
| | - Joshua Mata
- Amphibian and Reptile Collection, The Field Museum of Natural History, Chicago, IL 60605, United States
| | - Michael Kieras
- Iridian Genomes, Inc., Bethesda, MD 20817, United States
| | - Stacy Pirro
- Iridian Genomes, Inc., Bethesda, MD 20817, United States
| | - Sara Ruane
- Life Sciences Section, Negaunee Integrative Research Center, The Field Museum of Natural History, Chicago, IL 60637, United States
- Amphibian and Reptile Collection, The Field Museum of Natural History, Chicago, IL 60605, United States
| |
Collapse
|
4
|
Weinell JL, Burbrink FT, Das S, Brown RM. Novel phylogenomic inference and 'Out of Asia' biogeography of cobras, coral snakes and their allies. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240064. [PMID: 39113776 PMCID: PMC11303032 DOI: 10.1098/rsos.240064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 05/27/2024] [Accepted: 05/31/2024] [Indexed: 08/10/2024]
Abstract
Estimation of evolutionary relationships among lineages that rapidly diversified can be challenging, and, in such instances, inaccurate or unresolved phylogenetic estimates can lead to erroneous conclusions regarding historical geographical ranges of lineages. One example underscoring this issue has been the historical challenge posed by untangling the biogeographic origin of elapoid snakes, which includes numerous dangerously venomous species as well as species not known to be dangerous to humans. The worldwide distribution of this lineage makes it an ideal group for testing hypotheses related to historical faunal exchanges among the many continents and other landmasses occupied by contemporary elapoid species. We developed a novel suite of genomic resources, included worldwide sampling, and inferred a robust estimate of evolutionary relationships, which we leveraged to quantitatively estimate geographical range evolution through the deep-time history of this remarkable radiation. Our phylogenetic and biogeographical estimates of historical ranges definitively reject a lingering former 'Out of Africa' hypothesis and support an 'Out of Asia' scenario involving multiple faunal exchanges between Asia, Africa, Australasia, the Americas and Europe.
Collapse
Affiliation(s)
- Jeffrey L. Weinell
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd, Lawrence, KS66045, USA
- Department of Herpetology, American Museum of Natural History, 200 Central Park West, New York, NY10024, USA
| | - Frank T. Burbrink
- Department of Herpetology, American Museum of Natural History, 200 Central Park West, New York, NY10024, USA
| | - Sunandan Das
- Ecological Genetics Research Unit, Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki00014, Finland
| | - Rafe M. Brown
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, 1345 Jayhawk Blvd, Lawrence, KS66045, USA
| |
Collapse
|
5
|
Bernstein JM, Voris HK, Stuart BL, Karns DR, McGuire JA, Iskandar DT, Riyanto A, Calderón-Acevedo CA, Brown RM, Gehara M, Soto-Centeno JA, Ruane S. Integrative methods reveal multiple drivers of diversification in rice paddy snakes. Sci Rep 2024; 14:4727. [PMID: 38472264 DOI: 10.1038/s41598-024-54744-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/15/2024] [Indexed: 03/14/2024] Open
Abstract
Divergence dating analyses in systematics provide a framework to develop and test biogeographic hypotheses regarding speciation. However, as molecular datasets grow from multilocus to genomic, sample sizes decrease due to computational burdens, and the testing of fine-scale biogeographic hypotheses becomes difficult. In this study, we use coalescent demographic models to investigate the diversification of poorly known rice paddy snakes from Southeast Asia (Homalopsidae: Hypsiscopus), which have conflicting dates of origin based on previous studies. We use coalescent modeling to test the hypothesis that Hypsiscopus diversified 2.5 mya during the Khorat Plateau uplift in Thailand. Additionally, we use ecological niche analyses to identify potential differences in the niche space of the two most widely distributed species in the past and present. Our results suggest Hypsiscopus diversified ~ 2.4 mya, supporting that the Khorat Plateau may have initiated the diversification of rice paddy snakes. We also find significant niche differentiation and shifts between species of Hypsiscopus, indicating that environmental differences may have sustained differentiation of this genus after the Khorat Plateau uplift. Our study expands on the diversification history of snakes in Southeast Asia, and highlights how results from smaller multilocus datasets can be useful in developing and testing biogeographic hypotheses alongside genomic datasets.
Collapse
Affiliation(s)
- Justin M Bernstein
- Center for Genomics, University of Kansas, Dyche Hall, 1345 Jayhawk Blvd, Lawrence, KS, 66045, USA.
| | - Harold K Voris
- Life Sciences Section, Negaunee Integrative Research Center, Field Museum, 1400 S. Lake Shore Drive, Chicago, IL, 60605, USA
| | - Bryan L Stuart
- Section of Research and Collections, North Carolina Museum of Natural Sciences, Raleigh, NC, 27601, USA
| | - Daryl R Karns
- Biology Department, Hanover College, Hanover, IN, 47243, USA
| | - Jimmy A McGuire
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA, 94720, USA
| | - Djoko T Iskandar
- School of Life Sciences and Technology, Institut Teknologi Bandung, Bandung, Indonesia
| | - Awal Riyanto
- Museum Zoologicum Bogoriense, Research Center for Biology, National Research and Innovation Agency of Indonesia (BRIN), Cibinong, 16911, Indonesia
| | - Camilo A Calderón-Acevedo
- State University of New York: College of Environmental Science and Forestry, Syracuse, NY, 13210, USA
| | - Rafe M Brown
- Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, Lawrence, KS, 66045, USA
| | - Marcelo Gehara
- Department of Earth and Environmental Science, Rutgers University-Newark, Newark, NJ, 07102, USA
| | - J Angel Soto-Centeno
- Department of Earth and Environmental Science, Rutgers University-Newark, Newark, NJ, 07102, USA
- Department of Mammalogy, American Museum of Natural History, New York, NY, 10024, USA
| | - Sara Ruane
- Life Sciences Section, Negaunee Integrative Research Center, Field Museum, 1400 S. Lake Shore Drive, Chicago, IL, 60605, USA
| |
Collapse
|
6
|
Santibáñez-López CE, Ojanguren-Affilastro AA, Graham MR, Sharma PP. Congruence between ultraconserved element-based matrices and phylotranscriptomic datasets in the scorpion Tree of Life. Cladistics 2023; 39:533-547. [PMID: 37401727 DOI: 10.1111/cla.12551] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/06/2023] [Indexed: 07/05/2023] Open
Abstract
Scorpions are ancient and historically renowned for their potent venom. Traditionally, the systematics of this group of arthropods was supported by morphological characters, until recent phylogenomic analyses (using RNAseq data) revealed most of the higher-level taxa to be non-monophyletic. While these phylogenomic hypotheses are stable for almost all lineages, some nodes have been hard to resolve due to minimal taxonomic sampling (e.g. family Chactidae). In the same line, it has been shown that some nodes in the Arachnid Tree of Life show disagreement between hypotheses generated using transcritptomes and other genomic sources such as the ultraconserved elements (UCEs). Here, we compared the phylogenetic signal of transcriptomes vs. UCEs by retrieving UCEs from new and previously published scorpion transcriptomes and genomes, and reconstructed phylogenies using both datasets independently. We reexamined the monophyly and phylogenetic placement of Chactidae, sampling an additional chactid species using both datasets. Our results showed that both sets of genome-scale datasets recovered highly similar topologies, with Chactidae rendered paraphyletic owing to the placement of Nullibrotheas allenii. As a first step toward redressing the systematics of Chactidae, we establish the family Anuroctonidae (new family) to accommodate the genus Anuroctonus.
Collapse
Affiliation(s)
| | | | - Matthew R Graham
- Department of Biology, Eastern Connecticut State University, Willimantic, CT, 06226, USA
| | - Prashant P Sharma
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
7
|
Li J, Han G, Tian X, Liang D, Zhang P. UPrimer: A Clade-Specific Primer Design Program Based on Nested-PCR Strategy and Its Applications in Amplicon Capture Phylogenomics. Mol Biol Evol 2023; 40:msad230. [PMID: 37832226 PMCID: PMC10630340 DOI: 10.1093/molbev/msad230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 10/15/2023] Open
Abstract
Amplicon capture is a promising target sequence capture approach for phylogenomic analyses, and the design of clade-specific nuclear protein-coding locus (NPCL) amplification primers is crucial for its successful application. In this study, we developed a primer design program called UPrimer that can quickly design clade-specific NPCL amplification primers based on genome data, without requiring manual intervention. Unlike other available primer design programs, UPrimer uses a nested-PCR strategy that greatly improves the amplification success rate of the designed primers. We examined all available metazoan genome data deposited in NCBI and developed NPCL primer sets for 21 metazoan groups with UPrimer, covering a wide range of taxa, including arthropods, mollusks, cnidarians, echinoderms, and vertebrates. On average, each clade-specific NPCL primer set comprises ∼1,000 NPCLs. PCR amplification tests were performed in 6 metazoan groups, and the developed primers showed a PCR success rate exceeding 95%. Furthermore, we demonstrated a phylogenetic case study in Lepidoptera, showing how NPCL primers can be used for phylogenomic analyses with amplicon capture. Our results indicated that using 100 NPCL probes recovered robust high-level phylogenetic relationships among butterflies, highlighting the utility of the newly designed NPCL primer sets for phylogenetic studies. We anticipate that the automated tool UPrimer and the developed NPCL primer sets for 21 metazoan groups will enable researchers to obtain phylogenomic data more efficiently and cost-effectively and accelerate the resolution of various parts of the Tree of Life.
Collapse
Affiliation(s)
- JiaXuan Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - GuangCheng Han
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Xiao Tian
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510275, China
| |
Collapse
|
8
|
Wogan GOU, Yuan ML, Mahler DL, Wang IJ. Hybridization and Transgressive Evolution Generate Diversity in an Adaptive Radiation of Anolis Lizards. Syst Biol 2023; 72:874-884. [PMID: 37186031 PMCID: PMC10687355 DOI: 10.1093/sysbio/syad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 04/01/2023] [Accepted: 04/24/2023] [Indexed: 05/17/2023] Open
Abstract
Interspecific hybridization may act as a major force contributing to the evolution of biodiversity. Although generally thought to reduce or constrain divergence between 2 species, hybridization can, paradoxically, promote divergence by increasing genetic variation or providing novel combinations of alleles that selection can act upon to move lineages toward new adaptive peaks. Hybridization may, then, play a key role in adaptive radiation by allowing lineages to diversify into new ecological space. Here, we test for signatures of historical hybridization in the Anolis lizards of Puerto Rico and evaluate 2 hypotheses for the role of hybridization in facilitating adaptive radiation-the hybrid swarm origins hypothesis and the syngameon hypothesis. Using whole genome sequences from all 10 species of Puerto Rican anoles, we calculated D and f-statistics (from ABBA-BABA tests) to test for introgression across the radiation and employed multispecies network coalescent methods to reconstruct phylogenetic networks that allow for hybridization. We then analyzed morphological data for these species to test for patterns consistent with transgressive evolution, a phenomenon in which the trait of a hybrid lineage is found outside of the range of its 2 parents. Our analyses uncovered strong evidence for introgression at multiple stages of the radiation, including support for an ancient hybrid origin of a clade comprising half of the extant Puerto Rican anole species. Moreover, we detected significant signals of transgressive evolution for 2 ecologically important traits, head length and toepad width, the latter of which has been described as a key innovation in Anolis. [Adaptive radiation; introgression; multispecies network coalescent; phenotypic evolution; phylogenetic network; reticulation; syngameon; transgressive segregation.].
Collapse
Affiliation(s)
- Guinevere O U Wogan
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
- Department of Integrative Biology, Oklahoma State University, Stillwater, OK 74078, USA
| | - Michael L Yuan
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
| | - D Luke Mahler
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Ian J Wang
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA
- Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
9
|
Karin BR, Arellano S, Wang L, Walzer K, Pomerantz A, Vasquez JM, Chatla K, Sudmant PH, Bach BH, Smith LL, McGuire JA. Highly-multiplexed and efficient long-amplicon PacBio and Nanopore sequencing of hundreds of full mitochondrial genomes. BMC Genomics 2023; 24:229. [PMID: 37131128 PMCID: PMC10155392 DOI: 10.1186/s12864-023-09277-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/24/2023] [Indexed: 05/04/2023] Open
Abstract
BACKGROUND Mitochondrial genome sequences have become critical to the study of biodiversity. Genome skimming and other short-read based methods are the most common approaches, but they are not well-suited to scale up to multiplexing hundreds of samples. Here, we report on a new approach to sequence hundreds to thousands of complete mitochondrial genomes in parallel using long-amplicon sequencing. We amplified the mitochondrial genome of 677 specimens in two partially overlapping amplicons and implemented an asymmetric PCR-based indexing approach to multiplex 1,159 long amplicons together on a single PacBio SMRT Sequel II cell. We also tested this method on Oxford Nanopore Technologies (ONT) MinION R9.4 to assess if this method could be applied to other long-read technologies. We implemented several optimizations that make this method significantly more efficient than alternative mitochondrial genome sequencing methods. RESULTS With the PacBio sequencing data we recovered at least one of the two fragments for 96% of samples (~ 80-90%) with mean coverage ~ 1,500x. The ONT data recovered less than 50% of input fragments likely due to low throughput and the design of the Barcoded Universal Primers which were optimized for PacBio sequencing. We compared a single mitochondrial gene alignment to half and full mitochondrial genomes and found, as expected, increased tree support with longer alignments, though whole mitochondrial genomes were not significantly better than half mitochondrial genomes. CONCLUSIONS This method can effectively capture thousands of long amplicons in a single run and be used to build more robust phylogenies quickly and effectively. We provide several recommendations for future users depending on the evolutionary scale of their system. A natural extension of this method is to collect multi-locus datasets consisting of mitochondrial genomes and several long nuclear loci at once.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA.
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA.
| | - Selene Arellano
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Laura Wang
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Kayla Walzer
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Aaron Pomerantz
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Juan Manuel Vasquez
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Kamalakar Chatla
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
| | - Peter H Sudmant
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Bryan H Bach
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Lydia L Smith
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| | - Jimmy A McGuire
- Department of Integrative Biology, Valley Life Sciences Building, University of California, Berkeley, CA, 94708, USA
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| |
Collapse
|
10
|
Ortiz-Sepulveda CM, Genete M, Blassiau C, Godé C, Albrecht C, Vekemans X, Van Bocxlaer B. Target enrichment of long open reading frames and ultraconserved elements to link microevolution and macroevolution in non-model organisms. Mol Ecol Resour 2023; 23:659-679. [PMID: 36349833 DOI: 10.1111/1755-0998.13735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 10/09/2022] [Accepted: 10/19/2022] [Indexed: 11/10/2022]
Abstract
Despite the increasing accessibility of high-throughput sequencing, obtaining high-quality genomic data on non-model organisms without proximate well-assembled and annotated genomes remains challenging. Here, we describe a workflow that takes advantage of distant genomic resources and ingroup transcriptomes to select and jointly enrich long open reading frames (ORFs) and ultraconserved elements (UCEs) from genomic samples for integrative studies of microevolutionary and macroevolutionary dynamics. This workflow is applied to samples of the African unionid bivalve tribe Coelaturini (Parreysiinae) at basin and continent-wide scales. Our results indicate that ORFs are efficiently captured without prior identification of intron-exon boundaries. The enrichment of UCEs was less successful, but nevertheless produced substantial data sets. Exploratory continent-wide phylogenetic analyses with ORF supercontigs (>515,000 parsimony informative sites) resulted in a fully resolved phylogeny, the backbone of which was also retrieved with UCEs (>11,000 informative sites). Variant calling on ORFs and UCEs of Coelaturini from the Malawi Basin produced ~2000 SNPs per population pair. Estimates of nucleotide diversity and population differentiation were similar for ORFs and UCEs. They were low compared to previous estimates in molluscs, but comparable to those in recently diversifying Malawi cichlids and other taxa at an early stage of speciation. Skimming off-target sequence data from the same enriched libraries of Coelaturini from the Malawi Basin, we reconstructed the maternally-inherited mitogenome, which displays the gene order inferred for the most recent common ancestor of Unionidae. Overall, our workflow and results provide exciting perspectives for integrative genomic studies of microevolutionary and macroevolutionary dynamics in non-model organisms.
Collapse
Affiliation(s)
| | - Mathieu Genete
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | | | - Cécile Godé
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | - Christian Albrecht
- Department of Animal Ecology and Systematics, Justus Liebig University, D-35392 Giessen, Germany.,Department of Biology, Mbarara University of Science and Technology, Mbarara, Uganda
| | - Xavier Vekemans
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | | |
Collapse
|
11
|
Mezzasalma M, Capriglione T, Kupriyanova L, Odierna G, Pallotta MM, Petraccioli A, Picariello O, Guarino FM. Characterization of Two Transposable Elements and an Ultra-Conserved Element Isolated in the Genome of Zootoca vivipara (Squamata, Lacertidae). Life (Basel) 2023; 13:life13030637. [PMID: 36983793 PMCID: PMC10058329 DOI: 10.3390/life13030637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/10/2023] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
Transposable elements (TEs) constitute a considerable fraction of eukaryote genomes representing a major source of genetic variability. We describe two DNA sequences isolated in the lizard Zootoca vivipara, here named Zv516 and Zv817. Both sequences are single-copy nuclear sequences, including a truncation of two transposable elements (TEs), SINE Squam1 in Zv516 and a Tc1/Mariner-like DNA transposon in Zv817. FISH analyses with Zv516 showed the occurrence of interspersed signals of the SINE Squam1 sequence on all chromosomes of Z. vivipara and quantitative dot blot indicated that this TE is present with about 4700 copies in the Z. vivipara genome. FISH and dot blot with Zv817 did not produce clear hybridization signals. Bioinformatic analysis showed the presence of active SINE Squam 1 copies in the genome of different lacertids, in different mRNAs, and intronic and coding regions of various genes. The Tc1/Mariner-like DNA transposon occurs in all reptiles, excluding Sphenodon and Archosauria. Zv817 includes a trait of 284 bp, representing an amniote ultra-conserved element (UCE). Using amniote UCE homologous sequences from available whole genome sequences of major amniote taxonomic groups, we performed a phylogenetic analysis which retrieved Prototheria as the sister group of Metatheria and Eutheria. Within diapsids, Testudines are the sister group to Aves + Crocodylia (Archosauria), and Sphenodon is the sister group to Squamata. Furthermore, large trait regions flanking the UCE are conserved at family level.
Collapse
Affiliation(s)
- Marcello Mezzasalma
- Department of Biology, Ecology and Earth Science, University of Calabria, Via P. Bucci 4/B, 87036 Rende, Italy
- Correspondence: (M.M.); (G.O.)
| | - Teresa Capriglione
- Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126 Naples, Italy
| | - Larissa Kupriyanova
- Zoological Institute, Russian Academy of Sciences, 190121 St. Petersburg, Russia
| | - Gaetano Odierna
- Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126 Naples, Italy
- Correspondence: (M.M.); (G.O.)
| | | | - Agnese Petraccioli
- Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126 Naples, Italy
| | - Orfeo Picariello
- Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126 Naples, Italy
| | - Fabio M. Guarino
- Department of Biology, University of Naples Federico II, Via Cinthia 26, 80126 Naples, Italy
| |
Collapse
|
12
|
Owen CL, Marshall DC, Wade EJ, Meister R, Goemans G, Kunte K, Moulds M, Hill K, Villet M, Pham TH, Kortyna M, Lemmon EM, Lemmon AR, Simon C. Detecting and removing sample contamination in phylogenomic data: an example and its implications for Cicadidae phylogeny (Insecta: Hemiptera). Syst Biol 2022; 71:1504-1523. [PMID: 35708660 DOI: 10.1093/sysbio/syac043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 05/23/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Contamination of a genetic sample with DNA from one or more non-target species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and Next-Generation Sequencing (NGS) studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on detection of bimodal distributions of patristic distances across gene trees. When the contamination occurs between samples within a dataset, comparisons between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a dataset generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the AHE markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in part probably due to short length. The cleaned dataset, consisting of 429 loci, from 90 genera representing 44 of 56 current Cicadidae tribes, supported three of the four sampled Cicadidae subfamilies in concatenated-matrix maximum likelihood (ML) and multispecies coalescent-based species tree analyses, with the fourth subfamily weakly supported in the ML trees. No well-supported patterns from previous family-level Sanger sequencing studies of Cicadidae phylogeny were contradicted. One taxon (Aragualna plenalinea) did not fall with its current subfamily in the genetic tree, and this genus and its tribe Aragualnini is reclassified to Tibicininae following morphological re-examination. Only subtle differences were observed in trees after removal of loci for which divergent base frequencies were detected. Greater success may be achieved by increased taxon sampling and developing a probe set targeting a more recent common ancestor and longer loci. Searches for contamination are an essential step in phylogenomic analyses of all kinds and our pipeline is an effective solution.
Collapse
Affiliation(s)
- Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, c/o National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - David C Marshall
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Elizabeth J Wade
- Dept. of Natural Science and Mathematics, Curry College, Milton, MA 02186, USA
| | - Russ Meister
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Geert Goemans
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Krushnamegh Kunte
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore 560 065, India
| | - Max Moulds
- Australian Museum Research Institute, 1 William Street, Sydney N.S.W, Australia. 2010
| | - Kathy Hill
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - M Villet
- Dept. of Biology, Rhodes University, Grahamstown 6140, South Africa
| | - Thai-Hong Pham
- Mientrung Institute for Scientific Research, Vietnam Academy of Science and Technology, Hue, Vietnam.,Vietnam National Museum of Nature and Graduate School of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Michelle Kortyna
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, USA
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL 32306, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University 400 Dirac Science Library, Tallahassee, FL 32306, USA
| | - Chris Simon
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
13
|
Hutter CR, Cobb KA, Portik DM, Travers SL, Wood PL, Brown RM. FrogCap: A modular sequence capture probe-set for phylogenomics and population genetics for all frogs, assessed across multiple phylogenetic scales. Mol Ecol Resour 2022; 22:1100-1119. [PMID: 34569723 DOI: 10.1111/1755-0998.13517] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/08/2021] [Accepted: 09/14/2021] [Indexed: 12/01/2022]
Abstract
Despite the prevalence of high-throughput sequencing in phylogenetics, many relationships remain difficult to resolve because of conflicting signal among genomic regions. Selection of different types of molecular markers from different genomic regions is required to overcome these challenges. For evolutionary studies in frogs, we introduce the publicly available FrogCap suite of genomic resources, which is a large collection of ~15,000 markers that unifies previous genetic sequencing efforts. FrogCap is designed to be modular, such that subsets of markers and SNPs can be selected based on the desired phylogenetic scale. FrogCap uses a variety of marker types that include exons and introns, ultraconserved elements, and previously sequenced Sanger markers, which span up to 10,000 bp in alignment lengths; in addition, we demonstrate potential for SNP-based analyses. We tested FrogCap using 121 samples distributed across five phylogenetic scales, comparing probes designed using a consensus- or exemplar genome-based approach. Using the consensus design is more resilient to issues with sensitivity, specificity, and missing data than picking an exemplar genome sequence. We also tested the impact of different bait kit sizes (20,020 vs. 40,040) on depth of coverage and found triple the depth for the 20,020 bait kit. We observed sequence capture success (i.e., missing data, sequenced markers/bases, marker length, and informative sites) across phylogenetic scales. The incorporation of different marker types is effective for deep phylogenetic relationships and shallow population genetics studies. Having demonstrated FrogCap's utility and modularity, we conclude that these new resources are efficacious for high-throughput sequencing projects across variable timescales.
Collapse
Affiliation(s)
- Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Kerry A Cobb
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Daniel M Portik
- California Academy of Sciences, San Francisco, California, USA
| | - Scott L Travers
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
- Department of Biological Sciences, Rutgers University-Newark, Newark, New Jersey, USA
| | - Perry L Wood
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, USA
| |
Collapse
|
14
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
15
|
Gable SM, Byars MI, Literman R, Tollis M. A Genomic Perspective on the Evolutionary Diversification of Turtles. Syst Biol 2022; 71:1331-1347. [DOI: 10.1093/sysbio/syac019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 02/28/2022] [Accepted: 03/01/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
To examine phylogenetic heterogeneity in turtle evolution, we collected thousands of high-confidence single-copy orthologs from 19 genome assemblies representative of extant turtle diversity and estimated a phylogeny with multispecies coalescent and concatenated partitioned methods. We also collected next-generation sequences from 26 turtle species and assembled millions of biallelic markers to reconstruct phylogenies based on annotated regions from the western painted turtle (Chrysemys picta bellii) genome (coding regions, introns, untranslated regions, intergenic, and others). We then measured gene tree-species tree discordance, as well as gene and site heterogeneity at each node in the inferred trees, and tested for temporal patterns in phylogenomic conflict across turtle evolution. We found strong and consistent support for all bifurcations in the inferred turtle species phylogenies. However, a number of genes, sites, and genomic features supported alternate relationships between turtle taxa. Our results suggest that gene tree-species tree discordance in these datasets is likely driven by population-level processes such as incomplete lineage sorting. We found very little effect of substitutional saturation on species tree topologies, and no clear phylogenetic patterns in codon usage bias and compositional heterogeneity. There was no correlation between gene and site concordance, node age, and DNA substitution rate across most annotated genomic regions. Our study demonstrates that heterogeneity is to be expected even in well resolved clades such as turtles, and that future phylogenomic studies should aim to sample as much of the genome as possible in order to obtain accurate phylogenies for assessing conservation priorities in turtles.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| | - Robert Literman
- Department of Biological Sciences, University of Rhode Island, 120 Flagg Road, Kingstown, RI, 0288, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, PO Box 5693, Flagstaff, AZ 8601, USA
| |
Collapse
|
16
|
OUP accepted manuscript. Syst Biol 2022; 71:973-985. [DOI: 10.1093/sysbio/syac014] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 02/15/2022] [Accepted: 02/22/2022] [Indexed: 11/12/2022] Open
|
17
|
Bangs MR, Steppan SJ. A rodent anchored hybrid enrichment probe set for a range of phylogenetic utility: From order to species. Mol Ecol Resour 2021; 22:1521-1528. [PMID: 34800355 DOI: 10.1111/1755-0998.13555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/22/2021] [Accepted: 11/10/2021] [Indexed: 11/29/2022]
Abstract
Rodents are the largest order of mammals and contain several model organisms important to scientific research in a variety of fields, yet no large set of genomic markers have been designed for this group to date, hindering evolutionary studies into relationships of the group as a whole. Here we present a genomic probe set designed and optimized for rodents with a protocol that is easy to replicate with little laboratory investment. This design utilizes an anchored hybrid enrichment approach specifically targeting rodents to generate longer loci with a higher substitution rate than existing vertebrate probes to provide utility at various taxonomic levels. Using a test set of rodents from all five suborders, we successfully obtained alignments for 416 of the 418 target loci with an average of 1379 bp per locus and a total alignment of more than half a million base pairs. This genomic data set performed well in all phylogenetic analyses, especially in recent phylogenetic splits, with ample parsimony-informative sites within genera and even within species, showing more than four times as many single nucleotide polymorphisms per locus than a recent vertebrate ultraconserved elements study. Additional support is provided in resolving deeper clades in Rodentia. By providing this probe design, we hope that more laboratories can easily generate data for answering questions in rodents from species delimitation to understanding relationships among families in rapid radiations.
Collapse
Affiliation(s)
- Max R Bangs
- Department of Biological Sciences, Florida State University, Tallahassee, Florida, USA
| | - Scott J Steppan
- Department of Biological Sciences, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
18
|
Duchêne DA, Mather N, Van Der Wal C, Ho SYW. Excluding loci with substitution saturation improves inferences from phylogenomic data. Syst Biol 2021; 71:676-689. [PMID: 34508605 PMCID: PMC9016599 DOI: 10.1093/sysbio/syab075] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/07/2021] [Indexed: 11/21/2022] Open
Abstract
The historical signal in nucleotide sequences becomes eroded over time by substitutions occurring repeatedly at the same sites. This phenomenon, known as substitution saturation, is recognized as one of the primary obstacles to deep-time phylogenetic inference using genome-scale data sets. We present a new test of substitution saturation and demonstrate its performance in simulated and empirical data. For some of the 36 empirical phylogenomic data sets that we examined, we detect substitution saturation in around 50% of loci. We found that saturation tends to be flagged as problematic in loci with highly discordant phylogenetic signals across sites. Within each data set, the loci with smaller numbers of informative sites are more likely to be flagged as containing problematic levels of saturation. The entropy saturation test proposed here is sensitive to high evolutionary rates relative to the evolutionary timeframe, while also being sensitive to several factors known to mislead phylogenetic inference, including short internal branches relative to external branches, short nucleotide sequences, and tree imbalance. Our study demonstrates that excluding loci with substitution saturation can be an effective means of mitigating the negative impact of multiple substitutions on phylogenetic inferences. [Phylogenetic model performance; phylogenomics; substitution model; substitution saturation; test statistics.]
Collapse
Affiliation(s)
- David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Niklas Mather
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Cara Van Der Wal
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
19
|
Alda F, Ludt WB, Elías DJ, McMahan CD, Chakrabarty P. Comparing Ultraconserved Elements and Exons for Phylogenomic Analyses of Middle American Cichlids: When Data Agree to Disagree. Genome Biol Evol 2021; 13:evab161. [PMID: 34272856 PMCID: PMC8369075 DOI: 10.1093/gbe/evab161] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2021] [Indexed: 12/20/2022] Open
Abstract
Choosing among types of genomic markers to be used in a phylogenomic study can have a major influence on the cost, design, and results of a study. Yet few attempts have been made to compare categories of next-generation sequence markers limiting our ability to compare the suitability of these different genomic fragment types. Here, we explore properties of different genomic markers to find if they vary in the accuracy of component phylogenetic trees and to clarify the causes of conflict obtained from different data sets or inference methods. As a test case, we explore the causes of discordance between phylogenetic hypotheses obtained using a novel data set of ultraconserved elements (UCEs) and a recently published exon data set of the cichlid tribe Heroini. Resolving relationships among heroine cichlids has historically been difficult, and the processes of colonization and diversification in Middle America and the Greater Antilles are not yet well understood. Despite differences in informativeness and levels of gene tree discordance between UCEs and exons, the resulting phylogenomic hypotheses generally agree on most relationships. The independent data sets disagreed in areas with low phylogenetic signal that were overwhelmed by incomplete lineage sorting and nonphylogenetic signals. For UCEs, high levels of incomplete lineage sorting were found to be the major cause of gene tree discordance, whereas, for exons, nonphylogenetic signal is most likely caused by a reduced number of highly informative loci. This paucity of informative loci in exons might be due to heterogeneous substitution rates that are problematic to model (i.e., computationally restrictive) resulting in systematic errors that UCEs (being less informative individually but more uniform) are less prone to. These results generally demonstrate the robustness of phylogenomic methods to accommodate genomic markers with different biological and phylogenetic properties. However, we identify common and unique pitfalls of different categories of genomic fragments when inferring enigmatic phylogenetic relationships.
Collapse
Affiliation(s)
- Fernando Alda
- Department of Biology, Geology and Environmental Science, University of Tennessee at Chattanooga, Tennessee, USA
| | - William B Ludt
- Department of Ichthyology, Natural History Museum of Los Angeles County, Los Angeles, California, USA
| | - Diego J Elías
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| | | | - Prosanta Chakrabarty
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| |
Collapse
|
20
|
Huang J, Bennett J, Flouri T, Leaché AD, Yang Z. Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model. Syst Biol 2021; 71:334-352. [PMID: 34143216 PMCID: PMC8977997 DOI: 10.1093/sysbio/syab047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 06/03/2021] [Accepted: 06/21/2021] [Indexed: 01/01/2023] Open
Abstract
Genome sequencing projects routinely generate haploid consensus sequences from diploid
genomes, which are effectively chimeric sequences with the phase at heterozygous sites
resolved at random. The impact of phasing errors on phylogenomic analyses under the
multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer
simulation to evaluate the performance of four phase-resolution strategies (the true phase
resolution, the diploid analytical integration algorithm which averages over all phase
resolutions, computational phase resolution using the program PHASE, and random
resolution) on estimation of the species tree and evolutionary parameters in analysis of
multilocus genomic data under the MSC model. We found that species tree estimation is
robust to phasing errors when species divergences were much older than average coalescent
times but may be affected by phasing errors when the species tree is shallow. Estimation
of parameters under the MSC model with and without introgression is affected by phasing
errors. In particular, random phase resolution causes serious overestimation of population
sizes for modern species and biased estimation of cross-species introgression probability.
In general, the impact of phasing errors is greater when the mutation rate is higher, the
data include more samples per species, and the species tree is shallower with recent
divergences. Use of phased sequences inferred by the PHASE program produced small biases
in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and
another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution
strategies have similar impacts on practical data analyses. We suggest that genome
sequencing projects should produce unphased diploid genotype sequences if fully phased
data are too challenging to generate, and avoid haploid consensus sequences, which have
heterozygous sites phased at random. In case the analytical integration algorithm is
computationally unfeasible, computational phasing prior to population genomic analyses is
an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species
tree.]
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, P.R. China
| | - Jeremy Bennett
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269-3043, USA
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
21
|
Van Dam MH, Henderson JB, Esposito L, Trautwein M. Genomic Characterization and Curation of UCEs Improves Species Tree Reconstruction. Syst Biol 2020; 70:307-321. [PMID: 32750133 PMCID: PMC7875437 DOI: 10.1093/sysbio/syaa063] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/26/2020] [Accepted: 07/29/2020] [Indexed: 12/12/2022] Open
Abstract
Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]
Collapse
Affiliation(s)
- Matthew H Van Dam
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - James B Henderson
- Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Lauren Esposito
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Michelle Trautwein
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| |
Collapse
|
22
|
Owen CL, Stern DB, Hilton SK, Crandall KA. Hemiptera phylogenomic resources: Tree‐based orthology prediction and conserved exon identification. Mol Ecol Resour 2020; 20:1346-1360. [DOI: 10.1111/1755-0998.13180] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 04/02/2020] [Accepted: 04/27/2020] [Indexed: 12/21/2022]
Affiliation(s)
- Christopher L. Owen
- Computational Biology Institute George Washington University Washington DC USA
- Systematic Entomology Laboratory USDA‐ARS Beltsville MD USA
| | - David B. Stern
- Computational Biology Institute George Washington University Washington DC USA
- Department of Integrative Biology University of Wisconsin ‐ Madison Madison WI USA
| | - Sarah K. Hilton
- Computational Biology Institute George Washington University Washington DC USA
- Department of Genome Sciences University of Washington Washington DC USA
| | - Keith A. Crandall
- Computational Biology Institute George Washington University Washington DC USA
| |
Collapse
|
23
|
Huang J, Flouri T, Yang Z. A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model. Mol Biol Evol 2020; 37:3211-3224. [DOI: 10.1093/molbev/msaa166] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
AbstractWe use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Mathematics, Beijing Jiaotong University, Beijing, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
24
|
Jiao X, Yang Z. Defining Species When There is Gene Flow. Syst Biol 2020; 70:108-119. [PMID: 32617579 DOI: 10.1093/sysbio/syaa052] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 12/20/2022] Open
Abstract
Whatever one's definition of species, it is generally expected that individuals of the same species should be genetically more similar to each other than they are to individuals of another species. Here, we show that in the presence of cross-species gene flow, this expectation may be incorrect. We use the multispecies coalescent model with continuous-time migration or episodic introgression to study the impact of gene flow on genetic differences within and between species and highlight a surprising but plausible scenario in which different population sizes and asymmetrical migration rates cause a genetic sequence to be on average more closely related to a sequence from another species than to a sequence from the same species. Our results highlight the extraordinary impact that even a small amount of gene flow may have on the genetic history of the species. We suggest that contrasting long-term migration rate and short-term hybridization rate, both of which can be estimated using genetic data, may be a powerful approach to detecting the presence of reproductive barriers and to define species boundaries.[Gene flow; introgression; migration; multispecies coalescent; species concept; species delimitation.].
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
25
|
Wcisel DJ, Howard JT, Yoder JA, Dornburg A. Transcriptome Ortholog Alignment Sequence Tools (TOAST) for phylogenomic dataset assembly. BMC Evol Biol 2020; 20:41. [PMID: 32228442 PMCID: PMC7106827 DOI: 10.1186/s12862-020-01603-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 03/11/2020] [Indexed: 01/05/2023] Open
Abstract
Background Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource. Results We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question. Conclusions TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference. Software, a detailed manual, and example data files are available through github carolinafishes.github.io
Collapse
Affiliation(s)
- Dustin J Wcisel
- Department of Molecular Biomedical Sciences, NC State University, Raleigh, NC, USA
| | - J Thomas Howard
- Department of Molecular Biomedical Sciences, NC State University, Raleigh, NC, USA
| | - Jeffrey A Yoder
- Department of Molecular Biomedical Sciences, NC State University, Raleigh, NC, USA.,Comparative Medicine Institute, NC State University, Raleigh, NC, USA.,Center for Human Health and the Environment, NC State University, Raleigh, NC, USA
| | - Alex Dornburg
- Department of Molecular Biomedical Sciences, NC State University, Raleigh, NC, USA.
| |
Collapse
|