1
|
Srivastav SP, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. Genome Res 2024; 34:711-724. [PMID: 38749655 PMCID: PMC11216404 DOI: 10.1101/gr.278062.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 05/07/2024] [Indexed: 05/28/2024]
Abstract
The piRNA pathway is a highly conserved mechanism to repress transposable element (TE) activity in the animal germline via a specialized class of small RNAs called piwi-interacting RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). Although the molecular processes by which piCs function are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC origin and evolution, we use a population genomic approach to compare piC activity and sequence composition across eight geographically distant strains of D. melanogaster with high-quality long-read genome assemblies. We perform annotations of ovary piCs and genome-wide TE content in each strain. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs. Most TEs inferred to be recently active show an enrichment of insertions into old and large piCs, consistent with the previously proposed "trap" model of piC evolution. In contrast, a small subset of active LTR families is enriched for the formation of new piCs, suggesting that these TEs have higher proclivity to form piCs. Thus, our findings uncover processes leading to the origin of piCs. We propose that piC evolution begins with the emergence of piRNAs from individual insertions of a few select TE families prone to seed new piCs that subsequently expand by accretion of insertions from most other TE families during evolution to form larger "trap" clusters. Our study shows that TEs themselves are the major force driving the rapid evolution of piCs.
Collapse
Affiliation(s)
- Satyam P Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
2
|
Scarpa A, Pianezza R, Wierzbicki F, Kofler R. Genomes of historical specimens reveal multiple invasions of LTR retrotransposons in Drosophila melanogaster during the 19th century. Proc Natl Acad Sci U S A 2024; 121:e2313866121. [PMID: 38564639 PMCID: PMC11009621 DOI: 10.1073/pnas.2313866121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/05/2024] [Indexed: 04/04/2024] Open
Abstract
Transposable element invasions have a profound impact on the evolution of genomes and phenotypes. It is thus an important open question how often such TE invasions occur. To address this question, we utilize the genomes of historical specimens, sampled about 200 y ago. We found that the LTR retrotransposons Blood, Opus, and 412 spread in Drosophila melanogaster in the 19th century. These invasions constitute second waves, as degraded fragments were found for all three TEs. The composition of Opus and 412, but not of Blood, shows a pronounced geographic heterogeneity, likely due to founder effects during the invasions. Finally, we identified species from the Drosophila simulans complex as the likely origin of the TEs. We show that in total, seven TE families invaded D. melanogaster during the last 200y, thereby increasing the genome size by up to 1.2Mbp. We suggest that this high rate of TE invasions was likely triggered by human activity. Based on the analysis of strains and specimens sampled at different times, we provide a detailed timeline of TE invasions, making D. melanogaster the first organism where the invasion history of TEs during the last two centuries could be inferred.
Collapse
Affiliation(s)
- Almorò Scarpa
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien1210, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna1210, Austria
| | - Riccardo Pianezza
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien1210, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna1210, Austria
| | - Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien1210, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna1210, Austria
| | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien1210, Austria
| |
Collapse
|
3
|
Pianezza R, Scarpa A, Narayanan P, Signor S, Kofler R. Spoink, a LTR retrotransposon, invaded D. melanogaster populations in the 1990s. PLoS Genet 2024; 20:e1011201. [PMID: 38530818 PMCID: PMC10965091 DOI: 10.1371/journal.pgen.1011201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/27/2024] [Indexed: 03/28/2024] Open
Abstract
During the last few centuries D. melanogaster populations were invaded by several transposable elements, the most recent of which was thought to be the P-element between 1950 and 1980. Here we describe a novel TE, which we named Spoink, that has invaded D. melanogaster. It is a 5216nt LTR retrotransposon of the Ty3/gypsy superfamily. Relying on strains sampled at different times during the last century we show that Spoink invaded worldwide D. melanogaster populations after the P-element between 1983 and 1993. This invasion was likely triggered by a horizontal transfer from the D. willistoni group, much as the P-element. Spoink is probably silenced by the piRNA pathway in natural populations and about 1/3 of the examined strains have an insertion into a canonical piRNA cluster such as 42AB. Given the degree of genetic investigation of D. melanogaster it is perhaps surprising that Spoink was able to invade unnoticed.
Collapse
Affiliation(s)
- Riccardo Pianezza
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Almorò Scarpa
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Prakash Narayanan
- Biological Sciences, North Dakota State University, Fargo, North Dakota, United States of America
| | - Sarah Signor
- Biological Sciences, North Dakota State University, Fargo, North Dakota, United States of America
| | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| |
Collapse
|
4
|
Wierzbicki F, Kofler R. The composition of piRNA clusters in Drosophila melanogaster deviates from expectations under the trap model. BMC Biol 2023; 21:224. [PMID: 37858221 PMCID: PMC10588112 DOI: 10.1186/s12915-023-01727-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND It is widely assumed that the invasion of a transposable element (TE) in mammals and invertebrates is stopped when a copy of the TE jumps into a piRNA cluster (i.e., the trap model). However, recent works, which for example showed that deletion of three major piRNA clusters has no effect on TE activity, cast doubt on the trap model. RESULTS Here, we test the trap model from a population genetics perspective. Our simulations show that the composition of regions that act as transposon traps (i.e., potentially piRNA clusters) ought to deviate from regions that have no effect on TE activity. We investigated TEs in five Drosophila melanogaster strains using three complementary approaches to test whether the composition of piRNA clusters matches these expectations. We found that the abundance of TE families inside and outside of piRNA clusters is highly correlated, although this is not expected under the trap model. Furthermore, the distribution of the number of TE insertions in piRNA clusters is also much broader than expected. CONCLUSIONS We found that the observed composition of piRNA clusters is not in agreement with expectations under the simple trap model. Dispersed piRNA producing TE insertions and temporal as well as spatial heterogeneity of piRNA clusters may account for these deviations.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria.
| |
Collapse
|
5
|
Srivastav S, Feschotte C, Clark AG. Rapid evolution of piRNA clusters in the Drosophila melanogaster ovary. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.08.539910. [PMID: 37214865 PMCID: PMC10197564 DOI: 10.1101/2023.05.08.539910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Animal genomes are parasitized by a horde of transposable elements (TEs) whose mutagenic activity can have catastrophic consequences. The piRNA pathway is a conserved mechanism to repress TE activity in the germline via a specialized class of small RNAs associated with effector Piwi proteins called piwi-associated RNAs (piRNAs). piRNAs are produced from discrete genomic regions called piRNA clusters (piCs). While piCs are generally enriched for TE sequences and the molecular processes by which they are transcribed and regulated are relatively well understood in Drosophila melanogaster, much less is known about the origin and evolution of piCs in this or any other species. To investigate piC evolution, we use a population genomics approach to compare piC activity and sequence composition across 8 geographically distant strains of D. melanogaster with high quality long-read genome assemblies. We perform extensive annotations of ovary piCs and TE content in each strain and test predictions of two proposed models of piC evolution. The 'de novo' model posits that individual TE insertions can spontaneously attain the status of a small piC to generate piRNAs silencing the entire TE family. The 'trap' model envisions large and evolutionary stable genomic clusters where TEs tend to accumulate and serves as a long-term "memory" of ancient TE invasions and produce a great variety of piRNAs protecting against related TEs entering the genome. It remains unclear which model best describes the evolution of piCs. Our analysis uncovers extensive variation in piC activity across strains and signatures of rapid birth and death of piCs in natural populations. Most TE families inferred to be recently or currently active show an enrichment of strain-specific insertions into large piCs, consistent with the trap model. By contrast, only a small subset of active LTR retrotransposon families is enriched for the formation of strain-specific piCs, suggesting that these families have an inherent proclivity to form de novo piCs. Thus, our findings support aspects of both 'de novo' and 'trap' models of piC evolution. We propose that these two models represent two extreme stages along an evolutionary continuum, which begins with the emergence of piCs de novo from a few specific LTR retrotransposon insertions that subsequently expand by accretion of other TE insertions during evolution to form larger 'trap' clusters. Our study shows that piCs are evolutionarily labile and that TEs themselves are the major force driving the formation and evolution of piCs.
Collapse
Affiliation(s)
- Satyam Srivastav
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| |
Collapse
|
6
|
GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads. Nat Commun 2023; 14:204. [PMID: 36639368 PMCID: PMC9839709 DOI: 10.1038/s41467-022-35670-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 12/16/2022] [Indexed: 01/15/2023] Open
Abstract
High-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we report on GALA (Gap-free long-read Assembly tool), a computational framework for chromosome-based sequencing data separation and de novo assembly implemented through a multi-layer graph that identifies discordances within preliminary assemblies and partitions the data into chromosome-scale scaffolding groups. The subsequent independent assembly of each scaffolding group generates a gap-free assembly likely free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, and even motif analyses to generate gap-free chromosome-scale assemblies. As a proof of principle we de novo assemble the C. elegans genome using combined PacBio and Nanopore sequencing data and a rice cultivar genome using Nanopore sequencing data from publicly available datasets. We also demonstrate the proposed method's applicability with a gap-free assembly of the human genome using PacBio high-fidelity (HiFi) long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.
Collapse
|
7
|
Wang L, Tracy L, Su W, Yang F, Feng Y, Silverman N, Zhang ZZZ. Retrotransposon activation during Drosophila metamorphosis conditions adult antiviral responses. Nat Genet 2022; 54:1933-1945. [PMID: 36396707 PMCID: PMC9795486 DOI: 10.1038/s41588-022-01214-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 09/29/2022] [Indexed: 11/18/2022]
Abstract
Retrotransposons are one type of mobile genetic element that abundantly reside in the genomes of nearly all animals. Their uncontrolled activation is linked to sterility, cancer and other pathologies, thereby being largely considered detrimental. Here we report that, within a specific time window of development, retrotransposon activation can license the host's immune system for future antiviral responses. We found that the mdg4 (also known as Gypsy) retrotransposon selectively becomes active during metamorphosis at the Drosophila pupal stage. At this stage, mdg4 activation educates the host's innate immune system by inducing the systemic antiviral function of the nuclear factor-κB protein Relish in a dSTING-dependent manner. Consequently, adult flies with mdg4, Relish or dSTING silenced at the pupal stage are unable to clear exogenous viruses and succumb to viral infection. Altogether, our data reveal that hosts can establish a protective antiviral response that endows a long-term benefit in pathogen warfare due to the developmental activation of mobile genetic elements.
Collapse
Affiliation(s)
- Lu Wang
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA.
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China.
| | - Lauren Tracy
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA
| | - Weijia Su
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA
| | - Fu Yang
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA
| | - Yu Feng
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Neal Silverman
- Division of Infectious Diseases and Immunology, Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Z Z Zhao Zhang
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC, USA.
- Duke Regeneration Center, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
8
|
Bajus M, Macko-Podgórni A, Grzebelus D, Baránek M. A review of strategies used to identify transposition events in plant genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:1080993. [PMID: 36531345 PMCID: PMC9751208 DOI: 10.3389/fpls.2022.1080993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Transposable elements (TEs) were initially considered redundant and dubbed 'junk DNA'. However, more recently they were recognized as an essential element of genome plasticity. In nature, they frequently become active upon exposition of the host to stress conditions. Even though most transposition events are neutral or even deleterious, occasionally they may happen to be beneficial, resulting in genetic novelty providing better fitness to the host. Hence, TE mobilization may promote adaptability and, in the long run, act as a significant evolutionary force. There are many examples of TE insertions resulting in increased tolerance to stresses or in novel features of crops which are appealing to the consumer. Possibly, TE-driven de novo variability could be utilized for crop improvement. However, in order to systematically study the mechanisms of TE/host interactions, it is necessary to have suitable tools to globally monitor any ongoing TE mobilization. With the development of novel potent technologies, new high-throughput strategies for studying TE dynamics are emerging. Here, we present currently available methods applied to monitor the activity of TEs in plants. We divide them on the basis of their operational principles, the position of target molecules in the process of transposition and their ability to capture real cases of actively transposing elements. Their possible theoretical and practical drawbacks are also discussed. Finally, conceivable strategies and combinations of methods resulting in an improved performance are proposed.
Collapse
Affiliation(s)
- Marko Bajus
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| | - Alicja Macko-Podgórni
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Dariusz Grzebelus
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Miroslav Baránek
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| |
Collapse
|
9
|
ONT-Based Alternative Assemblies Impact on the Annotations of Unique versus Repetitive Features in the Genome of a Romanian Strain of Drosophila melanogaster. Int J Mol Sci 2022; 23:ijms232314892. [PMID: 36499217 PMCID: PMC9741293 DOI: 10.3390/ijms232314892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/21/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
To date, different strategies of whole-genome sequencing (WGS) have been developed in order to understand the genome structure and functions. However, the analysis of genomic sequences obtained from natural populations is challenging and the biological interpretation of sequencing data remains the main issue. The MinION device developed by Oxford Nanopore Technologies (ONT) is able to generate long reads with minimal costs and time requirements. These valuable assets qualify it as a suitable method for performing WGS, especially in small laboratories. The long reads resulted using this sequencing approach can cover large structural variants and repetitive sequences commonly present in the genomes of eukaryotes. Using MinION, we performed two WGS assessments of a Romanian local strain of Drosophila melanogaster, referred to as Horezu_LaPeri (Horezu). In total, 1,317,857 reads with a size of 8.9 gigabytes (Gb) were generated. Canu and Flye de novo assembly tools were employed to obtain four distinct assemblies with both unfiltered and filtered reads, achieving maximum reference genome coverages of 94.8% (Canu) and 91.4% (Flye). In order to test the quality of these assemblies, we performed a two-step evaluation. Firstly, we considered the BUSCO scores and inquired for a supplemental set of genes using BLAST. Subsequently, we appraised the total content of natural transposons (NTs) relative to the reference genome (ISO1 strain) and mapped the mdg1 retroelement as a resolution assayer. Our results reveal that filtered data provide only slightly enhanced results when considering genes identification, but the use of unfiltered data had a consistent positive impact on the global evaluation of the NTs content. Our comparative studies also revealed differences between Flye and Canu assemblies regarding the annotation of unique versus repetitive genomic features. In our hands, Flye proved to be moderately better for gene identification, while Canu clearly outperformed Flye for NTs analysis. Data concerning the NTs content were compared to those obtained with ONT for the D. melanogaster ISO1 strain, revealing that our strategy conducted to better results. Additionally, the parameters of our ONT reads and assemblies are similar to those reported for ONT experiments performed on various model organisms, revealing that our assembly data are appropriate for a proficient annotation of the Horezu genome.
Collapse
|
10
|
Han S, Dias GB, Basting PJ, Viswanatha R, Perrimon N, Bergman C. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res 2022; 50:e124. [PMID: 36156149 PMCID: PMC9757076 DOI: 10.1093/nar/gkac794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Collapse
Affiliation(s)
| | | | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Raghuvir Viswanatha
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA,Howard Hughes Medical Institute, Boston, MA, USA
| | - Casey M Bergman
- To whom correspondence should be addressed. Tel: +1 706 542 1764; Fax: +1 706 542 3910;
| |
Collapse
|
11
|
Lee Y, Ha U, Moon S. Ongoing endeavors to detect mobilization of transposable elements. BMB Rep 2022. [PMID: 35725016 PMCID: PMC9340088 DOI: 10.5483/bmbrep.2022.55.7.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of ‘Dissociation (Dc) locus’ by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host.
Collapse
Affiliation(s)
- Yujeong Lee
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Una Ha
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| | - Sungjin Moon
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341, Korea
| |
Collapse
|
12
|
Stanek TJ, Cao W, Mehra RM, Ellison CE. Sex-specific variation in R-loop formation in Drosophila melanogaster. PLoS Genet 2022; 18:e1010268. [PMID: 35687614 PMCID: PMC9223372 DOI: 10.1371/journal.pgen.1010268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 06/23/2022] [Accepted: 05/22/2022] [Indexed: 11/18/2022] Open
Abstract
R-loops are three-stranded nucleotide structures consisting of a DNA:RNA hybrid and a displaced ssDNA non-template strand. Previous work suggests that R-loop formation is primarily determined by the thermodynamics of DNA:RNA binding, which are governed by base composition (e.g., GC skew) and transcription-induced DNA superhelicity. However, R-loops have been described at genomic locations that lack these properties, suggesting that they may serve other context-specific roles. To better understand the genetic determinants of R-loop formation, we have characterized the Drosophila melanogaster R-loop landscape across strains and between sexes using DNA:RNA immunoprecipitation followed by high-throughput sequencing (DRIP-seq). We find that R-loops are associated with sequence motifs that are G-rich or exhibit G/C skew, as well as highly expressed genes, tRNAs, and small nuclear RNAs, consistent with a role for DNA sequence and torsion in R-loop specification. However, we also find motifs associated with R-loops that are A/T-rich and lack G/C skew as well as a subset of R-loops that are enriched in polycomb-repressed chromatin. Differential enrichment analysis reveals a small number of sex-biased R-loops: while non-differentially enriched and male-enriched R-loops form at similar genetic features and chromatin states and contain similar sequence motifs, female-enriched R-loops form at unique genetic features, chromatin states, and sequence motifs and are associated with genes that show ovary-biased expression. Male-enriched R-loops are most abundant on the dosage-compensated X chromosome, where R-loops appear stronger compared to autosomal R-loops. R-loop-containing genes on the X chromosome are dosage-compensated yet show lower MOF binding and reduced H4K16ac compared to R-loop-absent genes, suggesting that H4K16ac or MOF may attenuate R-loop formation. Collectively, these results suggest that R-loop formation in vivo is not fully explained by DNA sequence and topology and raise the possibility that a distinct subset of these hybrid structures plays an important role in the establishment and maintenance of epigenetic differences between sexes.
Collapse
Affiliation(s)
- Timothy J. Stanek
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Department of Pathology, Robert Wood Johnson Medical School, Piscataway, New Jersey, United States of America
| | - Weihuan Cao
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Rohan M Mehra
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Christopher E. Ellison
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
13
|
Rech GE, Radío S, Guirao-Rico S, Aguilera L, Horvath V, Green L, Lindstadt H, Jamilloux V, Quesneville H, González J. Population-scale long-read sequencing uncovers transposable elements associated with gene expression variation and adaptive signatures in Drosophila. Nat Commun 2022; 13:1948. [PMID: 35413957 PMCID: PMC9005704 DOI: 10.1038/s41467-022-29518-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 03/15/2022] [Indexed: 12/16/2022] Open
Abstract
High quality reference genomes are crucial to understanding genome function, structure and evolution. The availability of reference genomes has allowed us to start inferring the role of genetic variation in biology, disease, and biodiversity conservation. However, analyses across organisms demonstrate that a single reference genome is not enough to capture the global genetic diversity present in populations. In this work, we generate 32 high-quality reference genomes for the well-known model species D. melanogaster and focus on the identification and analysis of transposable element variation as they are the most common type of structural variant. We show that integrating the genetic variation across natural populations from five climatic regions increases the number of detected insertions by 58%. Moreover, 26% to 57% of the insertions identified using long-reads were missed by short-reads methods. We also identify hundreds of transposable elements associated with gene expression variation and new TE variants likely to contribute to adaptive evolution in this species. Our results highlight the importance of incorporating the genetic variation present in natural populations to genomic studies, which is essential if we are to understand how genomes function and evolve.
Collapse
Affiliation(s)
- Gabriel E Rech
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Santiago Radío
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Sara Guirao-Rico
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Laura Aguilera
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Vivien Horvath
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Llewellyn Green
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | - Hannah Lindstadt
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain
| | | | | | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003, Barcelona, Spain.
| |
Collapse
|
14
|
Lee YCG. Synergistic epistasis of the deleterious effects of transposable elements. Genetics 2022; 220:iyab211. [PMID: 34888644 PMCID: PMC9097265 DOI: 10.1093/genetics/iyab211] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/10/2021] [Indexed: 11/12/2022] Open
Abstract
The replicative nature and generally deleterious effects of transposable elements (TEs) raise an outstanding question about how TE copy number is stably contained in host populations. Classic theoretical analyses predict that, when the decline in fitness due to each additional TE insertion is greater than linear, or when there is synergistic epistasis, selection against TEs can result in a stable equilibrium of TE copy number. While several mechanisms are predicted to yield synergistic deleterious effects of TEs, we lack empirical investigations of the presence of such epistatic interactions. Purifying selection with synergistic epistasis generates repulsion linkage between deleterious alleles. We investigated this population genetic signal in the likely ancestral Drosophila melanogaster population and found evidence supporting the presence of synergistic epistasis among TE insertions, especially TEs expected to exert large fitness impacts. Even though synergistic epistasis of TEs has been predicted to arise through ectopic recombination and TE-mediated epigenetic silencing mechanisms, we only found mixed support for the associated predictions. We observed signals of synergistic epistasis for a large number of TE families, which is consistent with the expectation that such epistatic interaction mainly happens among copies of the same family. Curiously, significant repulsion linkage was also found among TE insertions from different families, suggesting the possibility that synergism of TEs' deleterious fitness effects could arise above the family level and through mechanisms similar to those of simple mutations. Our findings set the stage for investigating the prevalence and importance of epistatic interactions in the evolutionary dynamics of TEs.
Collapse
Affiliation(s)
- Yuh Chwen G Lee
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA 92697, USA
| |
Collapse
|
15
|
Said I, McGurk MP, Clark AG, Barbash DA. Patterns of piRNA Regulation in Drosophila Revealed through Transposable Element Clade Inference. Mol Biol Evol 2022; 39:msab336. [PMID: 34921315 PMCID: PMC8788220 DOI: 10.1093/molbev/msab336] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Transposable elements (TEs) are self-replicating "genetic parasites" ubiquitous to eukaryotic genomes. In addition to conflict between TEs and their host genomes, TEs of the same family are in competition with each other. They compete for the same genomic niches while experiencing the same regime of copy-number selection. This suggests that competition among TEs may favor the emergence of new variants that can outcompete their ancestral forms. To investigate the sequence evolution of TEs, we developed a method to infer clades: collections of TEs that share SNP variants and represent distinct TE family lineages. We applied this method to a panel of 85 Drosophila melanogaster genomes and found that the genetic variation of several TE families shows significant population structure that arises from the population-specific expansions of single clades. We used population genetic theory to classify these clades into younger versus older clades and found that younger clades are associated with a greater abundance of sense and antisense piRNAs per copy than older ones. Further, we find that the abundance of younger, but not older clades, is positively correlated with antisense piRNA production, suggesting a general pattern where hosts preferentially produce antisense piRNAs from recently active TE variants. Together these findings suggest a pattern whereby new TE variants arise by mutation and then increase in copy number, followed by the host producing antisense piRNAs that may be used to silence these emerging variants.
Collapse
Affiliation(s)
- Iskander Said
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Michael P McGurk
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| |
Collapse
|
16
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
17
|
Zamyatin A, Avdeyev P, Liang J, Sharma A, Chen C, Lukyanchikova V, Alexeev N, Tu Z, Alekseyev MA, Sharakhov IV. Chromosome-level genome assemblies of the malaria vectors Anopheles coluzzii and Anopheles arabiensis. Gigascience 2021; 10:giab017. [PMID: 33718948 PMCID: PMC7957348 DOI: 10.1093/gigascience/giab017] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 01/01/2021] [Accepted: 01/23/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Anopheles coluzzii and Anopheles arabiensis belong to the Anopheles gambiae complex and are among the major malaria vectors in sub-Saharan Africa. However, chromosome-level reference genome assemblies are still lacking for these medically important mosquito species. FINDINGS In this study, we produced de novo chromosome-level genome assemblies for A. coluzzii and A. arabiensis using the long-read Oxford Nanopore sequencing technology and the Hi-C scaffolding approach. We obtained 273.4 and 256.8 Mb of the total assemblies for A. coluzzii and A. arabiensis, respectively. Each assembly consists of 3 chromosome-scale scaffolds (X, 2, 3), complete mitochondrion, and unordered contigs identified as autosomal pericentromeric DNA, X pericentromeric DNA, and Y sequences. Comparison of these assemblies with the existing assemblies for these species demonstrated that we obtained improved reference-quality genomes. The new assemblies allowed us to identify genomic coordinates for the breakpoint regions of fixed and polymorphic chromosomal inversions in A. coluzzii and A. arabiensis. CONCLUSION The new chromosome-level assemblies will facilitate functional and population genomic studies in A. coluzzii and A. arabiensis. The presented assembly pipeline will accelerate progress toward creating high-quality genome references for other disease vectors.
Collapse
Affiliation(s)
- Anton Zamyatin
- Computer Technologies Laboratory, ITMO University, Kronverkskiy Prospekt 49-A, Saint Petersburg 197101, Russia
| | - Pavel Avdeyev
- Department of Mathematics, The George Washington University, 801 22nd Street NW, Washington, DC 20052, USA
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, 800 22nd Street NW, Washington, DC 20052, USA
| | - Jiangtao Liang
- Department of Entomology, Virginia Polytechnic Institute and State University, 170 Drillfield Drive, Blacksburg, VA 24061, USA
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
| | - Atashi Sharma
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Chujia Chen
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Varvara Lukyanchikova
- Department of Entomology, Virginia Polytechnic Institute and State University, 170 Drillfield Drive, Blacksburg, VA 24061, USA
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
- Institute of Cytology and Genetics the Siberian Division of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Kronverkskiy Prospekt 49-A, Saint Petersburg 197101, Russia
| | - Zhijian Tu
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Max A Alekseyev
- Department of Mathematics, The George Washington University, 801 22nd Street NW, Washington, DC 20052, USA
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, 800 22nd Street NW, Washington, DC 20052, USA
| | - Igor V Sharakhov
- Department of Entomology, Virginia Polytechnic Institute and State University, 170 Drillfield Drive, Blacksburg, VA 24061, USA
- Fralin Life Sciences Institute, Virginia Polytechnic Institute and State University, 360 West Campus Drive, Blacksburg, VA 24061, USA
| |
Collapse
|
18
|
Ellison CE, Kagda MS, Cao W. Telomeric TART elements target the piRNA machinery in Drosophila. PLoS Biol 2020; 18:e3000689. [PMID: 33347429 PMCID: PMC7785250 DOI: 10.1371/journal.pbio.3000689] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 01/05/2021] [Accepted: 12/10/2020] [Indexed: 11/23/2022] Open
Abstract
Coevolution between transposable elements (TEs) and their hosts can be antagonistic, where TEs evolve to avoid silencing and the host responds by reestablishing TE suppression, or mutualistic, where TEs are co-opted to benefit their host. The TART-A TE functions as an important component of Drosophila telomeres but has also reportedly inserted into the Drosophila melanogaster nuclear export factor gene nxf2. We find that, rather than inserting into nxf2, TART-A has actually captured a portion of nxf2 sequence. We show that TART-A produces abundant Piwi-interacting small RNAs (piRNAs), some of which are antisense to the nxf2 transcript, and that the TART-like region of nxf2 is evolving rapidly. Furthermore, in D. melanogaster, TART-A is present at higher copy numbers, and nxf2 shows reduced expression, compared to the closely related species Drosophila simulans. We propose that capturing nxf2 sequence allowed TART-A to target the nxf2 gene for piRNA-mediated repression and that these 2 elements are engaged in antagonistic coevolution despite the fact that TART-A is serving a critical role for its host genome. Co-evolution between transposable elements (TEs) and their hosts can be antagonistic, where TEs evolve to avoid silencing and the host responds by re-establishing TE suppression, or mutualistic, where TEs are co-opted to benefit their host. This study shows that a specialized Drosophila retrotransposon that functions as a telomere has captured a portion of a host piRNA gene which may allow it to evade silencing.
Collapse
Affiliation(s)
- Christopher E. Ellison
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
- * E-mail:
| | - Meenakshi S. Kagda
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| | - Weihuan Cao
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States of America
| |
Collapse
|
19
|
Torosin NS, Anand A, Golla TR, Cao W, Ellison CE. 3D genome evolution and reorganization in the Drosophila melanogaster species group. PLoS Genet 2020; 16:e1009229. [PMID: 33284803 PMCID: PMC7746282 DOI: 10.1371/journal.pgen.1009229] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 12/17/2020] [Accepted: 10/27/2020] [Indexed: 01/17/2023] Open
Abstract
Topologically associating domains, or TADs, are functional units that organize chromosomes into 3D structures of interacting chromatin. TADs play an important role in regulating gene expression by constraining enhancer-promoter contacts and there is evidence that deletion of TAD boundaries leads to aberrant expression of neighboring genes. While the mechanisms of TAD formation have been well-studied, current knowledge on the patterns of TAD evolution across species is limited. Due to the integral role TADs play in gene regulation, their structure and organization is expected to be conserved during evolution. However, more recent research suggests that TAD structures diverge relatively rapidly. We use Hi-C chromosome conformation capture to measure evolutionary conservation of whole TADs and TAD boundary elements between D. melanogaster and D. triauraria, two early-branching species from the melanogaster species group which diverged ∼15 million years ago. We find that the majority of TADs have been reorganized since the common ancestor of D. melanogaster and D. triauraria, via a combination of chromosomal rearrangements and gain/loss of TAD boundaries. TAD reorganization between these two species is associated with a localized effect on gene expression, near the site of disruption. By separating TADs into subtypes based on their chromatin state, we find that different subtypes are evolving under different evolutionary forces. TADs enriched for broadly expressed, transcriptionally active genes are evolving rapidly, potentially due to positive selection, whereas TADs enriched for developmentally-regulated genes remain conserved, presumably due to their importance in restricting gene-regulatory element interactions. These results provide novel insight into the evolutionary dynamics of TADs and help to reconcile contradictory reports related to the evolutionary conservation of TADs and whether changes in TAD structure affect gene expression.
Collapse
Affiliation(s)
- Nicole S. Torosin
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States
| | - Aparna Anand
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States
| | - Tirupathi Rao Golla
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States
| | - Weihuan Cao
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States
| | - Christopher E. Ellison
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, United States
| |
Collapse
|
20
|
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 781] [Impact Index Per Article: 195.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open
Abstract
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Collapse
Affiliation(s)
- Shanika L. Amarasinghe
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Parkville, 3052 Australia
- School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010 Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
- School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010 Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| |
Collapse
|