1
|
Dai J, Rubel T, Han Y, Molloy EK. Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem. Algorithms Mol Biol 2024; 19:2. [PMID: 38191515 PMCID: PMC10775561 DOI: 10.1186/s13015-023-00249-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 12/10/2023] [Indexed: 01/10/2024] Open
Abstract
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem for the Dollo criterion score in [Formula: see text] time, where n is the number of leaves, k is the number of characters, and [Formula: see text] is the set of clades used as constraints. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. This motivated us to implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility for analyzing retroelement insertion presence / absence patterns for bats, birds, toothed whales as well as simulated data. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.
Collapse
Affiliation(s)
- Junyan Dai
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Tobias Rubel
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Yunheng Han
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA.
- University of Maryland Institute for Advanced Computer Studies, College Park, MD, USA.
| |
Collapse
|
2
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
3
|
Kosushkin S, Korchagin V, Vergun A, Ryskov A. Interspecific Comparison of Orthologous Short Interspersed Elements Loci Using Whole-Genome Data. Genes (Basel) 2023; 14:2089. [PMID: 38003031 PMCID: PMC10670947 DOI: 10.3390/genes14112089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/08/2023] [Accepted: 11/15/2023] [Indexed: 11/26/2023] Open
Abstract
The polymorphism of SINE-containing loci reflects the evolutionary processes that occurred both during the period before the divergence of the taxa and after it. Orthologous loci containing SINE in two or more genomes indicate the relatedness of the taxa, while different copies may have a specific set of mutations and degree of difference. Polymorphic insertion can be interpreted with a high degree of confidence as a shared derived character in the phylogenetic reconstruction of the history of the taxon. The computational comparison of the entire set of SINE-containing loci between genomes is a challenging task, and we propose to consider it in detail using the genomes of representatives of squamate reptiles (lizards) as an example. Our approach allows us to extract copies of SINE from the genomes, find pairwise orthologous loci by using flanking genomic sequences, and analyze the resulting sets of loci for the presence or absence of SINE, the degree of similarity of the flanks, and the similarity of the SINE themselves. The workflow we propose allows us to efficiently extract and analyze orthologous SINE loci for the downstream analysis, as shown in our comparison of species- and genus-level taxa in lacertid lizards.
Collapse
Affiliation(s)
- Sergei Kosushkin
- Laboratory of Genome Organization, Institute of Gene Biology of the Russian Academy of Sciences, Vavilova Str., 34/5, Moscow 119334, Russia; (V.K.)
| | - Vitaly Korchagin
- Laboratory of Genome Organization, Institute of Gene Biology of the Russian Academy of Sciences, Vavilova Str., 34/5, Moscow 119334, Russia; (V.K.)
| | - Andrey Vergun
- Laboratory of Genome Organization, Institute of Gene Biology of the Russian Academy of Sciences, Vavilova Str., 34/5, Moscow 119334, Russia; (V.K.)
- Department of Biochemistry, Molecular Biology and Genetics, Moscow Pedagogical State University, 1/1 M. Pirogovskaya Str., Moscow 119991, Russia
| | - Alexey Ryskov
- Laboratory of Genome Organization, Institute of Gene Biology of the Russian Academy of Sciences, Vavilova Str., 34/5, Moscow 119334, Russia; (V.K.)
| |
Collapse
|
4
|
Han S, Dias GB, Basting PJ, Nelson MG, Patel S, Marzo M, Bergman CM. Ongoing transposition in cell culture reveals the phylogeny of diverse Drosophila S2 sublines. Genetics 2022; 221:iyac077. [PMID: 35536183 PMCID: PMC9252272 DOI: 10.1093/genetics/iyac077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/28/2022] [Indexed: 11/13/2022] Open
Abstract
Cultured cells are widely used in molecular biology despite poor understanding of how cell line genomes change in vitro over time. Previous work has shown that Drosophila cultured cells have a higher transposable element content than whole flies, but whether this increase in transposable element content resulted from an initial burst of transposition during cell line establishment or ongoing transposition in cell culture remains unclear. Here, we sequenced the genomes of 25 sublines of Drosophila S2 cells and show that transposable element insertions provide abundant markers for the phylogenetic reconstruction of diverse sublines in a model animal cell culture system. DNA copy number evolution across S2 sublines revealed dramatically different patterns of genome organization that support the overall evolutionary history reconstructed using transposable element insertions. Analysis of transposable element insertion site occupancy and ancestral states support a model of ongoing transposition dominated by episodic activity of a small number of retrotransposon families. Our work demonstrates that substantial genome evolution occurs during long-term Drosophila cell culture, which may impact the reproducibility of experiments that do not control for subline identity.
Collapse
Affiliation(s)
- Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Guilherme B Dias
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Michael G Nelson
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Sanjai Patel
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Mar Marzo
- Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, UK
| | - Casey M Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
5
|
Loh JW, Ha H, Lin T, Sun N, Burns KH, Xing J. Integrated Mobile Element Scanning (ME-Scan) method for identifying multiple types of polymorphic mobile element insertions. Mob DNA 2020; 11:12. [PMID: 32110248 PMCID: PMC7035633 DOI: 10.1186/s13100-020-00207-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 02/14/2020] [Indexed: 01/29/2023] Open
Abstract
Background Mobile elements are ubiquitous components of mammalian genomes and constitute more than half of the human genome. Polymorphic mobile element insertions (pMEIs) are a major source of human genomic variation and are gaining research interest because of their involvement in gene expression regulation, genome integrity, and disease. Results Building on our previous Mobile Element Scanning (ME-Scan) protocols, we developed an integrated ME-Scan protocol to identify three major active families of human mobile elements, AluYb, L1HS, and SVA. This approach selectively amplifies insertion sites of currently active retrotransposons for Illumina sequencing. By pooling the libraries together, we can identify pMEIs from all three mobile element families in one sequencing run. To demonstrate the utility of the new ME-Scan protocol, we sequenced 12 human parent-offspring trios. Our results showed high sensitivity (> 90%) and accuracy (> 95%) of the protocol for identifying pMEIs in the human genome. In addition, we also tested the feasibility of identifying somatic insertions using the protocol. Conclusions The integrated ME-Scan protocol is a cost-effective way to identify novel pMEIs in the human genome. In addition, by developing the protocol to detect three mobile element families, we demonstrate the flexibility of the ME-Scan protocol. We present instructions for the library design, a sequencing protocol, and a computational pipeline for downstream analyses as a complete framework that will allow researchers to easily adapt the ME-Scan protocol to their own projects in other genomes.
Collapse
Affiliation(s)
- Jui Wan Loh
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA
| | - Hongseok Ha
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| | - Timothy Lin
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA
| | - Nawei Sun
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| | - Kathleen H Burns
- 3Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, 21205 MD USA
| | - Jinchuan Xing
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| |
Collapse
|
6
|
Platt RN, Faircloth BC, Sullivan KAM, Kieran TJ, Glenn TC, Vandewege MW, Lee TE, Baker RJ, Stevens RD, Ray DA. Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats. Syst Biol 2018; 67:236-249. [PMID: 28945862 PMCID: PMC5837689 DOI: 10.1093/sysbio/syx070] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 07/31/2017] [Accepted: 08/15/2017] [Indexed: 01/05/2023] Open
Abstract
The rapid diversification of Myotis bats into more than 100 species is one of the most extensive mammalian radiations available for study. Efforts to understand relationships within Myotis have primarily utilized mitochondrial markers and trees inferred from nuclear markers lacked resolution. Our current understanding of relationships within Myotis is therefore biased towards a set of phylogenetic markers that may not reflect the history of the nuclear genome. To resolve this, we sequenced the full mitochondrial genomes of 37 representative Myotis, primarily from the New World, in conjunction with targeted sequencing of 3648 ultraconserved elements (UCEs). We inferred the phylogeny and explored the effects of concatenation and summary phylogenetic methods, as well as combinations of markers based on informativeness or levels of missing data, on our results. Of the 294 phylogenies generated from the nuclear UCE data, all are significantly different from phylogenies inferred using mitochondrial genomes. Even within the nuclear data, quartet frequencies indicate that around half of all UCE loci conflict with the estimated species tree. Several factors can drive such conflict, including incomplete lineage sorting, introgressive hybridization, or even phylogenetic error. Despite the degree of discordance between nuclear UCE loci and the mitochondrial genome and among UCE loci themselves, the most common nuclear topology is recovered in one quarter of all analyses with strong nodal support. Based on these results, we re-examine the evolutionary history of Myotis to better understand the phenomena driving their unique nuclear, mitochondrial, and biogeographic histories.
Collapse
Affiliation(s)
- Roy N Platt
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| | - Brant C Faircloth
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, 202 Life Science Building, Baton Rouge, LA, USA
| | - Kevin A M Sullivan
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| | - Troy J Kieran
- Department of Environmental Health Science, University of Georgia, 206 Environmental Health Sciences Building, Athens, GA, USA
| | - Travis C Glenn
- Department of Environmental Health Science, University of Georgia, 206 Environmental Health Sciences Building, Athens, GA, USA
| | - Michael W Vandewege
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| | - Thomas E Lee
- Department of Biology, Abilene Christian University, 1600 Campus Ct. Abilene, TX, USA
| | - Robert J Baker
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| | - Richard D Stevens
- Natural Resource Management, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, USA
| |
Collapse
|
7
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
8
|
Lammers F, Gallus S, Janke A, Nilsson MA. Phylogenetic Conflict in Bears Identified by Automated Discovery of Transposable Element Insertions in Low-Coverage Genomes. Genome Biol Evol 2017; 9:2862-2878. [PMID: 28985298 PMCID: PMC5737362 DOI: 10.1093/gbe/evx170] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2017] [Indexed: 12/15/2022] Open
Abstract
Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation.
Collapse
Affiliation(s)
- Fritjof Lammers
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main, Germany
- Institute for Ecology, Evolution & Diversity, Biologicum, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Susanne Gallus
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main, Germany
| | - Axel Janke
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main, Germany
- Institute for Ecology, Evolution & Diversity, Biologicum, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - Maria A. Nilsson
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main, Germany
| |
Collapse
|
9
|
Sullivan KAM, Platt RN, Bradley RD, Ray DA. Whole mitochondrial genomes provide increased resolution and indicate paraphyly in deer mice. BMC ZOOL 2017. [DOI: 10.1186/s40850-017-0020-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|
10
|
Zhang S, Kelleher ES. Targeted identification of TE insertions in a Drosophila genome through hemi-specific PCR. Mob DNA 2017; 8:10. [PMID: 28775768 PMCID: PMC5534036 DOI: 10.1186/s13100-017-0092-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 07/10/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are major components of eukaryotic genomes and drivers of genome evolution, producing intraspecific polymorphism and interspecific differences through mobilization and non-homologous recombination. TE insertion sites are often highly variable within species, creating a need for targeted genome re-sequencing (TGS) methods to identify TE insertion sites. METHODS We present a hemi-specific PCR approach for TGS of P-elements in Drosophila genomes on the Illumina platform. We also present a computational framework for identifying new insertions from TGS reads. Finally, we describe a new method for estimating the frequency of TE insertions from WGS data, which is based precise insertion sites provided by TGS annotations. RESULTS By comparing our results to TE annotations based on whole genome re-sequencing (WGS) data for the same Drosophilamelanogaster strain, we demonstrate that TGS is powerful for identifying true insertions, even in repeat-rich heterochromatic regions. We also demonstrate that TGS offers enhanced annotation of precise insertion sites, which facilitates estimation of TE insertion frequency. CONCLUSIONS TGS by hemi-specific PCR is a powerful approach for identifying TE insertions of particular TE families in species with a high-quality reference genome, at greatly reduced cost as compared to WGS. It may therefore be ideal for population genomic studies of particular TE families. Additionally, TGS and WGS can be used as complementary approaches, with TGS annotations identifying more annotated insertions with greater precision for a target TE family, and WGS data allowing for estimates of TE insertion frequencies, and a broader picture of the location of non-target TEs across the genome.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| | - Erin S. Kelleher
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| |
Collapse
|
11
|
Feusier J, Witherspoon DJ, Scott Watkins W, Goubert C, Sasani TA, Jorde LB. Discovery of rare, diagnostic AluYb8/9 elements in diverse human populations. Mob DNA 2017; 8:9. [PMID: 28770012 PMCID: PMC5531096 DOI: 10.1186/s13100-017-0093-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 07/17/2017] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Polymorphic human Alu elements are excellent tools for assessing population structure, and new retrotransposition events can contribute to disease. Next-generation sequencing has greatly increased the potential to discover Alu elements in human populations, and various sequencing and bioinformatics methods have been designed to tackle the problem of detecting these highly repetitive elements. However, current techniques for Alu discovery may miss rare, polymorphic Alu elements. Combining multiple discovery approaches may provide a better profile of the polymorphic Alu mobilome. AluYb8/9 elements have been a focus of our recent studies as they are young subfamilies (~2.3 million years old) that contribute ~30% of recent polymorphic Alu retrotransposition events. Here, we update our ME-Scan methods for detecting Alu elements and apply these methods to discover new insertions in a large set of individuals with diverse ancestral backgrounds. RESULTS We identified 5,288 putative Alu insertion events, including several hundred novel AluYb8/9 elements from 213 individuals from 18 diverse human populations. Hundreds of these loci were specific to continental populations, and 23 non-reference population-specific loci were validated by PCR. We provide high-quality sequence information for 68 rare AluYb8/9 elements, of which 11 have hallmarks of an active source element. Our subfamily distribution of rare AluYb8/9 elements is consistent with previous datasets, and may be representative of rare loci. We also find that while ME-Scan and low-coverage, whole-genome sequencing (WGS) detect different Alu elements in 41 1000 Genomes individuals, the two methods yield similar population structure results. CONCLUSION Current in-silico methods for Alu discovery may miss rare, polymorphic Alu elements. Therefore, using multiple techniques can provide a more accurate profile of Alu elements in individuals and populations. We improved our false-negative rate as an indicator of sample quality for future ME-Scan experiments. In conclusion, we demonstrate that ME-Scan is a good supplement for next-generation sequencing methods and is well-suited for population-level analyses.
Collapse
Affiliation(s)
- Julie Feusier
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| | - David J. Witherspoon
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| | - W. Scott Watkins
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| | - Clément Goubert
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| | - Thomas A. Sasani
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| | - Lynn B. Jorde
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT USA
| |
Collapse
|
12
|
Gasc C, Peyret P. Revealing large metagenomic regions through long DNA fragment hybridization capture. MICROBIOME 2017; 5:33. [PMID: 28292322 PMCID: PMC5351058 DOI: 10.1186/s40168-017-0251-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 03/05/2017] [Indexed: 05/07/2023]
Abstract
BACKGROUND High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes from single organisms or metagenomic samples. However, due to the limited capacity of short-read sequence data to assemble complex or low coverage regions, genomes are typically fragmented, leading to draft genomes with numerous underexplored large genomic regions. Revealing these missing sequences is a major goal to resolve concerns in numerous biological studies. METHODS To overcome these limitations, we developed an innovative target enrichment method for the reconstruction of large unknown genomic regions. Based on a hybridization capture strategy, this approach enables the enrichment of large genomic regions allowing the reconstruction of tens of kilobase pairs flanking a short, targeted DNA sequence. RESULTS Applied to a metagenomic soil sample targeting the linA gene, the biomarker of hexachlorocyclohexane (HCH) degradation, our method permitted the enrichment of the gene and its flanking regions leading to the reconstruction of several contigs and complete plasmids exceeding tens of kilobase pairs surrounding linA. Thus, through gene association and genome reconstruction, we identified microbial species involved in HCH degradation which constitute targets to improve biostimulation treatments. CONCLUSIONS This new hybridization capture strategy makes surveying and deconvoluting complex genomic regions possible through large genomic regions enrichment and allows the efficient exploration of metagenomic diversity. Indeed, this approach enables to assign identity and function to microorganisms in natural environments, one of the ultimate goals of microbial ecology.
Collapse
Affiliation(s)
- Cyrielle Gasc
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| | - Pierre Peyret
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| |
Collapse
|
13
|
Abstract
Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified "accordion" model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
Collapse
|
14
|
Suh A. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. ZOOL SCR 2016. [DOI: 10.1111/zsc.12213] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Alexander Suh
- Department of Evolutionary Biology Evolutionary Biology Centre (EBC) Uppsala University SE ‐ 752 36 Uppsala Sweden
| |
Collapse
|
15
|
Ha H, Loh JW, Xing J. Identification of polymorphic SVA retrotransposons using a mobile element scanning method for SVA (ME-Scan-SVA). Mob DNA 2016; 7:15. [PMID: 27478512 PMCID: PMC4967303 DOI: 10.1186/s13100-016-0072-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Accepted: 07/21/2016] [Indexed: 12/28/2022] Open
Abstract
Background Mobile element insertions are a major source of human genomic variation. SVA (SINE-R/VNTR/Alu) is the youngest retrotransposon family in the human genome and a number of diseases are known to be caused by SVA insertions. However, inter-individual genomic variations generated by SVA insertions and their impacts have not been studied extensively due to the difficulty in identifying polymorphic SVA insertions. Results To systematically identify SVA insertions at the population level and assess their genomic impact, we developed a mobile element scanning (ME-Scan) protocol we called ME-Scan-SVA. Using a nested SVA-specific PCR enrichment method, ME-Scan-SVA selectively amplify the 5′ end of SVA elements and their flanking genomic regions. To demonstrate the utility of the protocol, we constructed and sequenced a ME-Scan-SVA library of 21 individuals and analyzed the data using a new analysis pipeline designed for the protocol. Overall, the method achieved high SVA-specificity and over >90 % of the sequenced reads are from SVA insertions. The method also had high sensitivity (>90 %) for fixed SVA insertions that contain the SVA-specific primer-binding sites in the reference genome. Using candidate locus selection criteria that are expected to have a 90 % sensitivity, we identified 151 and 29 novel polymorphic SVA candidates under relaxed and stringent cutoffs, respectively (average 12 and 2 per individual). For six polymorphic SVAs that we were able to validate by PCR, the average individual genotype accuracy is 92 %, demonstrating a high accuracy of the computational genotype calling pipeline. Conclusions The new approach allows identifying novel SVA insertions using high-throughput sequencing. It is cost-effective and can be applied in large-scale population study. It also can be applied for detecting potential active SVA elements, and somatic SVA retrotransposition events in different tissues or developmental stages. Electronic supplementary material The online version of this article (doi:10.1186/s13100-016-0072-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hongseok Ha
- Department of Genetics, The State University of New Jersey, Piscataway, 08854 NJ USA ; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, 08854 NJ USA
| | - Jui Wan Loh
- Department of Genetics, The State University of New Jersey, Piscataway, 08854 NJ USA
| | - Jinchuan Xing
- Department of Genetics, The State University of New Jersey, Piscataway, 08854 NJ USA ; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, 08854 NJ USA
| |
Collapse
|
16
|
Platt RN, Mangum SF, Ray DA. Pinpointing the vesper bat transposon revolution using the Miniopterus natalensis genome. Mob DNA 2016; 7:12. [PMID: 27489570 PMCID: PMC4971623 DOI: 10.1186/s13100-016-0071-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 07/13/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Around 40 million years ago DNA transposons began accumulating in an ancestor of bats in the family Vespertilionidae. Since that time, Class II transposons have been continuously reinvading and accumulating in vespertilionid genomes at a rate that is unprecedented in mammals. Miniopterus (Miniopteridae), a genus of long-fingered bats that was recently elevated from Vespertilionidae, is the sister taxon to the vespertilionids and is often used as an outgroup when studying transposable elements in vesper bats. Previous wet-lab techniques failed to identify Helitrons, TcMariners, or hAT transposons in Miniopterus. Limitations of those methods and ambiguous results regarding the distribution of piggyBac transposons left some questions as to the distribution of Class II elements in this group. The recent release of the Miniopterus natalensis genome allows for transposable element discovery with a higher degree of precision. RESULTS Here we analyze the transposable element content of M. natalensis to pinpoint with greater accuracy the taxonomic distribution of Class II transposable elements in bats. These efforts demonstrate that, compared to the vespertilionids, Class II TEs are highly mutated and comprise only a small portion of the M. natalensis genome. Despite the limited Class II content, M. natalensis possesses a limited number of lineage-specific, low copy number piggyBacs and shares several TcMariner families with vespertilionid bats. Multiple efforts to identify Helitrons, one of the major TE components of vesper bat genomes, using de novo repeat identification and structural based searches failed. CONCLUSIONS These observations combined with previous results inform our understanding of the events leading to the unique Class II element acquisition that characterizes vespertilionids. While it appears that a small number of TcMariner and piggyBac elements were deposited in the ancestral Miniopterus + vespertilionid genome, these elements are not present in M. natalensis genome at high copy number. Instead, this work indicates that the vesper bats alone experienced the expansion of TEs ranging from Helitrons to piggyBacs to hATs.
Collapse
Affiliation(s)
- Roy N Platt
- Department of Biological Sciences, Texas Tech University, Box 43131, Lubbock, TX 79409-3131 USA
| | - Sarah F Mangum
- Department of Biological Sciences, Texas Tech University, Box 43131, Lubbock, TX 79409-3131 USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Box 43131, Lubbock, TX 79409-3131 USA
| |
Collapse
|
17
|
Gasc C, Peyretaillade E, Peyret P. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms. Nucleic Acids Res 2016; 44:4504-18. [PMID: 27105841 PMCID: PMC4889952 DOI: 10.1093/nar/gkw309] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 04/07/2016] [Accepted: 04/12/2016] [Indexed: 12/25/2022] Open
Abstract
The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology.
Collapse
Affiliation(s)
- Cyrielle Gasc
- EA 4678 CIDAM, Université d'Auvergne, Clermont-Ferrand, 63001, France
| | | | - Pierre Peyret
- EA 4678 CIDAM, Université d'Auvergne, Clermont-Ferrand, 63001, France
| |
Collapse
|
18
|
Kuritzin A, Kischka T, Schmitz J, Churakov G. Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data. PLoS Comput Biol 2016; 12:e1004812. [PMID: 26967525 PMCID: PMC4788455 DOI: 10.1371/journal.pcbi.1004812] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 02/13/2016] [Indexed: 01/25/2023] Open
Abstract
Ancient retroposon insertions can be used as virtually homoplasy-free markers to reconstruct the phylogenetic history of species. Inherited, orthologous insertions in related species offer reliable signals of a common origin of the given species. One prerequisite for such a phylogenetically informative insertion is that the inserted element was fixed in the ancestral population before speciation; if not, polymorphically inserted elements may lead to random distributions of presence/absence states during speciation and possibly to apparently conflicting reconstructions of their ancestry. Fortunately, such misleading fixed cases are relatively rare but nevertheless, need to be considered. Here, we present novel, comprehensive statistical models applicable for (1) analyzing any pattern of rare genomic changes, (2) testing and differentiating conflicting phylogenetic reconstructions based on rare genomic changes caused by incomplete lineage sorting or/and ancestral hybridization, and (3) differentiating between search strategies involving genome information from one or several lineages. When the new statistics are applied, in non-conflicting cases a minimum of three elements present in both of two species and absent in a third group are considered significant support (p<0.05) for the branching of the third from the other two, if all three of the given species are screened equally for genome or experimental data. Five elements are necessary for significant support (p<0.05) if a diagnostic locus derived from only one of three species is screened, and no conflicting markers are detected. Most potentially conflicting patterns can be evaluated for their significance and ancestral hybridization can be distinguished from incomplete lineage sorting by considering symmetric or asymmetric distribution of rare genomic changes among possible tree configurations. Additionally, we provide an R-application to make the new KKSC insertion significance test available for the scientific community at http://retrogenomics.uni-muenster.de:3838/KKSC_significance_test/. The presence/absence patterns of transposed elements, so called jumping genes, provide invaluable information about evolution. Unfortunately, there is still no clear all-encompassing analysis of the statistical significance of insertion patterns, and the single existing model of insertion data is no longer sufficient for the emerging genomic era. Here, we have provided a comprehensive statistical framework for testing the significance of support for phylogenetic hypotheses derived from genome-level presence/absence data such as retroposon insertions and for evaluating such data for different evolutionary scenarios, including polytomy, incomplete lineage sorting, and ancestral hybridization. This statistical framework is especially important for high-throughput applications of current and upcoming genome projects due to its treatment of unlimited numbers of testable markers, and is embedded in a user-friendly R-application available to the scientific community online. Finally, a reliable, adaptable calculation for the significance of support for phylogenetic trees derived from retroposon presence/absence data is now available.
Collapse
Affiliation(s)
- Andrej Kuritzin
- Department of System Analysis, Saint Petersburg State Institute of Technology, St. Petersburg, Russia
| | - Tabea Kischka
- Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Jürgen Schmitz
- Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany
- * E-mail: (JS); (GC)
| | - Gennady Churakov
- Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany
- Institute of Evolution and Biodiversity, University of Münster, Münster, Germany
- * E-mail: (JS); (GC)
| |
Collapse
|
19
|
Kuramoto T, Nishihara H, Watanabe M, Okada N. Determining the Position of Storks on the Phylogenetic Tree of Waterbirds by Retroposon Insertion Analysis. Genome Biol Evol 2015; 7:3180-9. [PMID: 26527652 PMCID: PMC4700946 DOI: 10.1093/gbe/evv213] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Despite many studies on avian phylogenetics in recent decades that used morphology, mitochondrial genomes, and/or nuclear genes, the phylogenetic positions of several birds (e.g., storks) remain unsettled. In addition to the aforementioned approaches, analysis of retroposon insertions, which are nearly homoplasy-free phylogenetic markers, has also been used in avian phylogenetics. However, the first step in the analysis of retroposon insertions, that is, isolation of retroposons from genomic libraries, is a costly and time-consuming procedure. Therefore, we developed a high-throughput and cost-effective protocol to collect retroposon insertion information based on next-generation sequencing technology, which we call here the STRONG (Screening of Transposons Obtained by Next Generation Sequencing) method, and applied it to 3 waterbird species, for which we identified 35,470 loci containing chicken repeat 1 retroposons (CR1). Our analysis of the presence/absence of 30 CR1 insertions demonstrated the intra- and interordinal phylogenetic relationships in the waterbird assemblage, namely 1) Loons diverged first among the waterbirds, 2) penguins (Sphenisciformes) and petrels (Procellariiformes) diverged next, and 3) among the remaining families of waterbirds traditionally classified in Ciconiiformes/Pelecaniformes, storks (Ciconiidae) diverged first. Furthermore, our genome-scale, in silico retroposon analysis based on published genome data uncovered a complex divergence history among pelican, heron, and ibis lineages, presumably involving ancient interspecies hybridization between the heron and ibis lineages. Thus, our retroposon-based waterbird phylogeny and the established phylogenetic position of storks will help to understand the evolutionary processes of aquatic adaptation and related morphological convergent evolution.
Collapse
Affiliation(s)
- Tae Kuramoto
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
| | - Hidenori Nishihara
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
| | - Maiko Watanabe
- Division of Microbiology, National Institute of Health Sciences, Setagaya, Tokyo, Japan
| | - Norihiro Okada
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan Foundation for Advancement of International Science, Tsukuba, Ibaraki, Japan Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|