1
|
Forthman M, Gordon ERL, Kimball RT. Low hybridization temperatures improve target capture success of invertebrate loci: a case study of leaf-footed bugs (Hemiptera: Coreoidea). ROYAL SOCIETY OPEN SCIENCE 2023; 10:230307. [PMID: 37388308 PMCID: PMC10300676 DOI: 10.1098/rsos.230307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/12/2023] [Indexed: 07/01/2023]
Abstract
Target capture is widely used in phylogenomic, ecological and functional genomic studies. Bait sets that allow capture from a diversity of species can be advantageous, but high-sequence divergence from baits can limit yields. Currently, only four experimental comparisons of a critical target capture parameter, hybridization temperature, have been published. These have been in vertebrates, where bait divergences are typically low, and none include invertebrates where bait-target divergences may be higher. Most invertebrate capture studies use a fixed, high hybridization temperature to maximize the proportion of on-target data, but many report low locus recovery. Using leaf-footed bugs (Hemiptera: Coreoidea), we investigate the effect of hybridization temperature on capture success of ultraconserved elements targeted by (i) baits developed from divergent hemipteran genomes and (ii) baits developed from less divergent coreoid transcriptomes. Lower temperatures generally resulted in more contigs and improved recovery of targets despite a lower proportion of on-target reads, lower read depth and more putative paralogues. Hybridization temperatures had less of an effect when using transcriptome-derived baits, which is probably due to lower bait-target divergences and greater bait tiling density. Thus, accommodating low hybridization temperatures during target capture can provide a cost-effective, widely applicable solution to improve invertebrate locus recovery.
Collapse
Affiliation(s)
- Michael Forthman
- California State Collection of Arthropods, Plant Pest Diagnostics Branch, California Department of Food and Agriculture, 3294 Meadowview Road, Sacramento, CA 95832, USA
- Entomology and Nematology Department, University of Florida, 1881 Natural Area Drive, Gainesville, FL 32611, USA
| | - Eric R. L. Gordon
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75N. Eagleville Road, Unit 3043, Storrs, CT 06269, USA
| | - Rebecca T. Kimball
- Department of Biology, University of Florida, 876 Newell Drive, Gainesville, FL 32611, USA
| |
Collapse
|
2
|
O'Loughlin SM, Forster AJ, Fuchs S, Dottorini T, Nolan T, Crisanti A, Burt A. Ultra-conserved sequences in the genomes of highly diverse Anopheles mosquitoes, with implications for malaria vector control. G3-GENES GENOMES GENETICS 2021; 11:6175102. [PMID: 33730159 PMCID: PMC8495744 DOI: 10.1093/g3journal/jkab086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 03/08/2021] [Indexed: 12/30/2022]
Abstract
DNA sequences that are exactly conserved over long evolutionary time scales have been observed in a variety of taxa. Such sequences are likely under strong functional constraint and they have been useful in the field of comparative genomics for identifying genome regions with regulatory function. A potential new application for these ultra-conserved elements (UCEs) has emerged in the development of gene drives to control mosquito populations. Many gene drives work by recognizing and inserting at a specific target sequence in the genome, often imposing a reproductive load as a consequence. They can therefore select for target sequence variants that provide resistance to the drive. Focusing on highly conserved, highly constrained sequences lowers the probability that variant, gene drive-resistant alleles can be tolerated. Here, we search for conserved sequences of 18 bp and over in an alignment of 21 Anopheles genomes, spanning an evolutionary timescale of 100 million years, and characterize the resulting sequences according to their location and function. Over 8000 UCEs were found across the alignment, with a maximum length of 164 bp. Length-corrected gene ontology analysis revealed that genes containing Anopheles UCEs were over-represented in categories with structural or nucleotide-binding functions. Known insect transcription factor binding sites were found in 48% of intergenic Anopheles UCEs. When we looked at the genome sequences of 1142 wild-caught mosquitoes, we found that 15% of the Anopheles UCEs contained no polymorphisms. Our list of Anopheles UCEs should provide a valuable starting point for the selection and testing of new targets for gene-drive modification in the mosquitoes that transmit malaria.
Collapse
Affiliation(s)
- Samantha M O'Loughlin
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK
| | - Annie J Forster
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK
| | - Silke Fuchs
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK
| | - Tania Dottorini
- School of Veterinary Medicine and Science, Sutton Bonington Campus, University of Nottingham, Leicestershire, LE12 5RD, UK
| | - Tony Nolan
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK.,Liverpool School of Tropical Medicine, Liverpool, L3 5QA, UK
| | - Andrea Crisanti
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK
| | - Austin Burt
- Department of Life Sciences, Imperial College London, Silwood Park, Ascot, SL5 7PY, UK
| |
Collapse
|
3
|
Van der Mude A. Structure encoding in DNA. J Theor Biol 2020; 492:110205. [PMID: 32070719 DOI: 10.1016/j.jtbi.2020.110205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 12/29/2019] [Accepted: 02/14/2020] [Indexed: 12/21/2022]
Abstract
It is proposed that transposons and related long non-coding RNA define the fine structure of body parts. Although morphogens have long been known to direct the formation of many gross structures in early embryonic development, they do not have the necessary precision to define a structure down to the individual cellular level. Using the distinction between procedural and declarative knowledge in information processing as an analogy, it is hypothesized that DNA encodes fine structure in a manner that is different from the genetic code for proteins. The hypothesis states that repeated or near-repeated sequences that are in transposons and non-coding RNA define body part structures. As the cells in a body part go through the epigenetic process of differentiation, the action of methylation serves to inactivate all but the relevant structure definitions and some associated cell type genes. The transposons left active will then physically modify the DNA sequence in the heterochromatin to establish the local context in the three-dimensional body part structure. This brings the encoded definition of the cell type to the histone. The histone code for that cell type starts the regulatory cascade that turns on the genes associated with that particular type of cell, transforming it from a multipotent cell to a fully differentiated cell. This mechanism creates structures in the musculoskeletal system, the organs of the body, the major parts of the brain, and other systems.
Collapse
|
4
|
Manee MM, Jackson J, Bergman CM. Conserved Noncoding Elements Influence the Transposable Element Landscape in Drosophila. Genome Biol Evol 2018; 10:1533-1545. [PMID: 29850787 PMCID: PMC6007792 DOI: 10.1093/gbe/evy104] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2018] [Indexed: 12/15/2022] Open
Abstract
Highly conserved noncoding elements (CNEs) constitute a significant proportion of the genomes of multicellular eukaryotes. The function of most CNEs remains elusive, but growing evidence indicates they are under some form of purifying selection. Noncoding regions in many species also harbor large numbers of transposable element (TE) insertions, which are typically lineage specific and depleted in exons because of their deleterious effects on gene function or expression. However, it is currently unknown whether the landscape of TE insertions in noncoding regions is random or influenced by purifying selection on CNEs. Here, we combine comparative and population genomic data in Drosophila melanogaster to show that the abundance of TE insertions in intronic and intergenic CNEs is reduced relative to random expectation, supporting the idea that selective constraints on CNEs eliminate a proportion of TE insertions in noncoding regions. However, we find no evidence for differences in the allele frequency spectra for polymorphic TE insertions in CNEs versus those in unconstrained spacer regions, suggesting that the distribution of fitness effects acting on observable TE insertions is similar across different functional compartments in noncoding DNA. Our results provide evidence that selective constraints on CNEs contribute to shaping the landscape of TE insertion in eukaryotic genomes, and provide further evidence that CNEs are indeed functionally constrained and not simply mutational cold spots.
Collapse
Affiliation(s)
- Manee M Manee
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.,National Center for Biotechnology, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia.,Center of Excellence for Genomics (CEG), King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - John Jackson
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.,Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Casey M Bergman
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.,Department of Genetics, University of Georgia, Athens, GA.,Institute of Bioinformatics, University of Georgia, Athens, GA
| |
Collapse
|
5
|
Rubanov LI, Seliverstov AV, Zverkov OA, Lyubetsky VA. A method for identification of highly conserved elements and evolutionary analysis of superphylum Alveolata. BMC Bioinformatics 2016; 17:385. [PMID: 27645252 PMCID: PMC5028923 DOI: 10.1186/s12859-016-1257-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 09/13/2016] [Indexed: 01/24/2023] Open
Abstract
Background Perfectly or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. However, little is known about such elements in protists. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them. This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm. Results A novel algorithm is developed to identify highly conserved DNA elements. It is based on the identification of dense subgraphs in a specially built multipartite graph (whose parts correspond to genomes). Specifically, the algorithm does not rely on genome alignments, nor pre-identified perfectly conserved elements; instead, it performs a fast search for pairs of words (in different genomes) of maximum length with the difference below the specified edit distance. Such pair defines an edge whose weight equals the maximum (or total) length of words assigned to its ends. The graph composed of these edges is then compacted by merging some of its edges and vertices. The dense subgraphs are identified by a cellular automaton-like algorithm; each subgraph defines a cluster composed of similar inextensible words from different genomes. Almost all clusters are considered as predicted highly conserved elements. The algorithm is applied to the nuclear genomes of the superphylum Alveolata, and the corresponding phylogenetic tree is built and discussed. Conclusion We proposed an algorithm for the identification of highly conserved elements. The multitude of identified elements was used to infer the phylogeny of Alveolata. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1257-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lev I Rubanov
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetnyi per. 19, Building 1, Moscow, 127051, Russia.
| | - Alexandr V Seliverstov
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetnyi per. 19, Building 1, Moscow, 127051, Russia
| | - Oleg A Zverkov
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetnyi per. 19, Building 1, Moscow, 127051, Russia
| | - Vassily A Lyubetsky
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetnyi per. 19, Building 1, Moscow, 127051, Russia
| |
Collapse
|
6
|
Khoroshko VA, Levitsky VG, Zykova TY, Antonenko OV, Belyaeva ES, Zhimulev IF. Chromatin Heterogeneity and Distribution of Regulatory Elements in the Late-Replicating Intercalary Heterochromatin Domains of Drosophila melanogaster Chromosomes. PLoS One 2016; 11:e0157147. [PMID: 27300486 PMCID: PMC4907538 DOI: 10.1371/journal.pone.0157147] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 05/25/2016] [Indexed: 12/28/2022] Open
Abstract
Late-replicating domains (intercalary heterochromatin) in the Drosophila genome display a number of features suggesting their organization is quite unique. Typically, they are quite large and encompass clusters of functionally unrelated tissue-specific genes. They correspond to the topologically associating domains and conserved microsynteny blocks. Our study aims at exploring further details of molecular organization of intercalary heterochromatin and has uncovered surprising heterogeneity of chromatin composition in these regions. Using the 4HMM model developed in our group earlier, intercalary heterochromatin regions were found to host chromatin fragments with a particular epigenetic profile. Aquamarine chromatin fragments (spanning 0.67% of late-replicating regions) are characterized as a class of sequences that appear heterogeneous in terms of their decompactization. These fragments are enriched with enhancer sequences and binding sites for insulator proteins. They likely mark the chromatin state that is related to the binding of cis-regulatory proteins. Malachite chromatin fragments (11% of late-replicating regions) appear to function as universal transitional regions between two contrasting chromatin states. Namely, they invariably delimit intercalary heterochromatin regions from the adjacent active chromatin of interbands. Malachite fragments also flank aquamarine fragments embedded in the repressed chromatin of late-replicating regions. Significant enrichment of insulator proteins CP190, SU(HW), and MOD2.2 was observed in malachite chromatin. Neither aquamarine nor malachite chromatin types appear to correlate with the positions of highly conserved non-coding elements (HCNE) that are typically replete in intercalary heterochromatin. Malachite chromatin found on the flanks of intercalary heterochromatin regions tends to replicate earlier than the malachite chromatin embedded in intercalary heterochromatin. In other words, there exists a gradient of replication progressing from the flanks of intercalary heterochromatin regions center-wise. The peculiar organization and features of replication in large late-replicating regions are discussed as possible factors shaping the evolutionary stability of intercalary heterochromatin.
Collapse
Affiliation(s)
| | - Viktor G. Levitsky
- Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | - Tatyana Yu. Zykova
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk, Russia
| | | | - Elena S. Belyaeva
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk, Russia
| | - Igor F. Zhimulev
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
7
|
Warnefors M, Hartmann B, Thomsen S, Alonso CR. Combinatorial Gene Regulatory Functions Underlie Ultraconserved Elements in Drosophila. Mol Biol Evol 2016; 33:2294-306. [PMID: 27247329 PMCID: PMC4989106 DOI: 10.1093/molbev/msw101] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Ultraconserved elements (UCEs) are discrete genomic elements conserved across large evolutionary distances. Although UCEs have been linked to multiple facets of mammalian gene regulation their extreme evolutionary conservation remains largely unexplained. Here, we apply a computational approach to investigate this question in Drosophila, exploring the molecular functions of more than 1,500 UCEs shared across the genomes of 12 Drosophila species. Our data indicate that Drosophila UCEs are hubs for gene regulatory functions and suggest that UCE sequence invariance originates from their combinatorial roles in gene control. We also note that the gene regulatory roles of intronic and intergenic UCEs (iUCEs) are distinct from those found in exonic UCEs (eUCEs). In iUCEs, transcription factor (TF) and epigenetic factor binding data strongly support iUCE roles in transcriptional and epigenetic regulation. In contrast, analyses of eUCEs indicate that they are two orders of magnitude more likely than the expected to simultaneously include protein-coding sequence, TF-binding sites, splice sites, and RNA editing sites but have reduced roles in transcriptional or epigenetic regulation. Furthermore, we use a Drosophila cell culture system and transgenic Drosophila embryos to validate the notion of UCE combinatorial regulatory roles using an eUCE within the Hox gene Ultrabithorax and show that its protein-coding region also contains alternative splicing regulatory information. Taken together our experiments indicate that UCEs emerge as a result of combinatorial gene regulatory roles and highlight common features in mammalian and insect UCEs implying that similar processes might underlie ultraconservation in diverse animal taxa.
Collapse
Affiliation(s)
- Maria Warnefors
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Britta Hartmann
- Institute of Human Genetics, Freiburg, Germany BIOSS Centre for Biological Signaling Studies, University Medical Center Freiburg, Freiburg, Germany
| | - Stefan Thomsen
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Claudio R Alonso
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
8
|
Blaimer BB, Brady SG, Schultz TR, Lloyd MW, Fisher BL, Ward PS. Phylogenomic methods outperform traditional multi-locus approaches in resolving deep evolutionary history: a case study of formicine ants. BMC Evol Biol 2015; 15:271. [PMID: 26637372 PMCID: PMC4670518 DOI: 10.1186/s12862-015-0552-5] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 11/26/2015] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Ultraconserved elements (UCEs) have been successfully used in phylogenomics for a variety of taxa, but their power in phylogenetic inference has yet to be extensively compared with that of traditional Sanger sequencing data sets. Moreover, UCE data on invertebrates, including insects, are sparse. We compared the phylogenetic informativeness of 959 UCE loci with a multi-locus data set of ten nuclear markers obtained via Sanger sequencing, testing the ability of these two types of data to resolve and date the evolutionary history of the second most species-rich subfamily of ants in the world, the Formicinae. RESULTS Phylogenetic analyses show that UCEs are superior in resolving ancient and shallow relationships in formicine ants, demonstrated by increased node support and a more resolved phylogeny. Phylogenetic informativeness metrics indicate a twofold improvement relative to the 10-gene data matrix generated from the identical set of taxa. We were able to significantly improve formicine classification based on our comprehensive UCE phylogeny. Our divergence age estimations, using both UCE and Sanger data, indicate that crown-group Formicinae are older (104-117 Ma) than previously suggested. Biogeographic analyses infer that the diversification of the subfamily has occurred on all continents with no particular hub of cladogenesis. CONCLUSIONS We found UCEs to be far superior to the multi-locus data set in estimating formicine relationships. The early history of the clade remains uncertain due to ancient rapid divergence events that are unresolvable even with our genomic-scale data, although this might be largely an effect of several problematic taxa subtended by long branches. Our comparison of divergence ages from both Sanger and UCE data demonstrates the effectiveness of UCEs for dating analyses. This comparative study highlights both the promise and limitations of UCEs for insect phylogenomics, and will prove useful to the growing number of evolutionary biologists considering the transition from Sanger to next-generation sequencing approaches.
Collapse
Affiliation(s)
- Bonnie B Blaimer
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
| | - Seán G Brady
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
| | - Ted R Schultz
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
| | - Michael W Lloyd
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20560, USA.
| | - Brian L Fisher
- Department of Entomology, California Academy of Sciences, San Francisco, CA, 94118, USA.
| | - Philip S Ward
- Department of Entomology and Nematology, University of California-Davis, Davis, CA, 95616, USA.
| |
Collapse
|
9
|
Makunin IV, Kolesnikova TD, Andreyenkova NG. Underreplicated regions in Drosophila melanogaster are enriched with fast-evolving genes and highly conserved noncoding sequences. Genome Biol Evol 2014; 6:2050-60. [PMID: 25062918 PMCID: PMC4159006 DOI: 10.1093/gbe/evu156] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Many late replicating regions are underreplicated in polytene chromosomes of Drosophila melanogaster. These regions contain silenced chromatin and overlap long syntenic blocks of conserved gene order in drosophilids. In this report we show that in D. melanogaster the underreplicated regions are enriched with fast-evolving genes lacking homologs in distant species such as mosquito or human, indicating that the phylogenetic conservation of genes correlates with replication timing and chromatin status. Drosophila genes without human homologs located in the underreplicated regions have higher nonsynonymous substitution rate and tend to encode shorter proteins when compared with those in the adjacent regions. At the same time, the underreplicated regions are enriched with ultraconserved elements and highly conserved noncoding sequences, especially in introns of very long genes indicating the presence of an extensive regulatory network that may be responsible for the conservation of gene order in these regions. The regions have a modest preference for long noncoding RNAs but are depleted for small nucleolar RNAs, microRNAs, and transfer RNAs. Our results demonstrate that the underreplicated regions have a specific genic composition and distinct pattern of evolution.
Collapse
Affiliation(s)
- Igor V Makunin
- Research Computing Centre, The University of Queensland, St Lucia, Queensland, AustraliaInstitute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatyana D Kolesnikova
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, RussiaNovosibirsk State University, Russia
| | - Natalya G Andreyenkova
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|