1
|
Katsanou A, Kostoulas C, Liberopoulos E, Tsatsoulis A, Georgiou I, Tigas S. Retrotransposons and Diabetes Mellitus. EPIGENOMES 2024; 8:35. [PMID: 39311137 PMCID: PMC11417941 DOI: 10.3390/epigenomes8030035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/01/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
Retrotransposons are invasive genetic elements, which replicate by copying and pasting themselves throughout the genome in a process called retrotransposition. The most abundant retrotransposons by number in the human genome are Alu and LINE-1 elements, which comprise approximately 40% of the human genome. The ability of retrotransposons to expand and colonize eukaryotic genomes has rendered them evolutionarily successful and is responsible for creating genetic alterations leading to significant impacts on their hosts. Previous research suggested that hypomethylation of Alu and LINE-1 elements is associated with global hypomethylation and genomic instability in several types of cancer and diseases, such as neurodegenerative diseases, obesity, osteoporosis, and diabetes mellitus (DM). With the advancement of sequencing technologies and computational tools, the study of the retrotransposon's association with physiology and diseases is becoming a hot topic among researchers. Quantifying Alu and LINE-1 methylation is thought to serve as a surrogate measurement of global DNA methylation level. Although Alu and LINE-1 hypomethylation appears to serve as a cellular senescence biomarker promoting genomic instability, there is sparse information available regarding their potential functional and biological significance in DM. This review article summarizes the current knowledge on the involvement of the main epigenetic alterations in the methylation status of Alu and LINE-1 retrotransposons and their potential role as epigenetic markers of global DNA methylation in the pathogenesis of DM.
Collapse
Affiliation(s)
- Andromachi Katsanou
- Department of Endocrinology, University of Ioannina, 45110 Ioannina, Greece; (A.K.); (A.T.)
- Department of Internal Medicine, Hatzikosta General Hospital, 45445 Ioannina, Greece
| | - Charilaos Kostoulas
- Laboratory of Medical Genetics, Faculty of Medicine, School of Health Sciences, University of Ioannina, 45110 Ioannina, Greece; (C.K.); (I.G.)
| | - Evangelos Liberopoulos
- First Department of Propaedeutic Internal Medicine, Medical School, National and Kapodistrian University of Athens, Laiko General Hospital, 11527 Athens, Greece;
| | - Agathocles Tsatsoulis
- Department of Endocrinology, University of Ioannina, 45110 Ioannina, Greece; (A.K.); (A.T.)
| | - Ioannis Georgiou
- Laboratory of Medical Genetics, Faculty of Medicine, School of Health Sciences, University of Ioannina, 45110 Ioannina, Greece; (C.K.); (I.G.)
| | - Stelios Tigas
- Department of Endocrinology, University of Ioannina, 45110 Ioannina, Greece; (A.K.); (A.T.)
| |
Collapse
|
2
|
Phillips AR. Variant calling in polyploids for population and quantitative genetics. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11607. [PMID: 39184203 PMCID: PMC11342233 DOI: 10.1002/aps3.11607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/03/2024] [Accepted: 04/10/2024] [Indexed: 08/27/2024]
Abstract
Advancements in genome assembly and sequencing technology have made whole genome sequence (WGS) data and reference genomes accessible to study polyploid species. Compared to popular reduced-representation sequencing approaches, the genome-wide coverage and greater marker density provided by WGS data can greatly improve our understanding of polyploid species and polyploid biology. However, biological features that make polyploid species interesting also pose challenges in read mapping, variant identification, and genotype estimation. Accounting for characteristics in variant calling like allelic dosage uncertainty, homology between subgenomes, and variance in chromosome inheritance mode can reduce errors. Here, I discuss the challenges of variant calling in polyploid WGS data and discuss where potential solutions can be integrated into a standard variant calling pipeline.
Collapse
Affiliation(s)
- Alyssa R. Phillips
- Department of Evolution and EcologyUniversity of California, DavisDavis95616CaliforniaUSA
| |
Collapse
|
3
|
Lee B, Park J, Voshall A, Maury E, Kang Y, Kim YJ, Lee JY, Shim HR, Kim HJ, Lee JW, Jung MH, Kim SC, Chu HBK, Kim DW, Kim M, Choi EJ, Hwang OK, Lee HW, Ha K, Choi JK, Kim Y, Choi Y, Park WY, Lee EA. Pan-cancer analysis reveals multifaceted roles of retrotransposon-fusion RNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.16.562422. [PMID: 37905014 PMCID: PMC10614793 DOI: 10.1101/2023.10.16.562422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Transposon-derived transcripts are abundant in RNA sequences, yet their landscape and function, especially for fusion transcripts derived from unannotated or somatically acquired transposons, remains underexplored. Here, we developed a new bioinformatic tool to detect transposon-fusion transcripts in RNA-sequencing data and performed a pan-cancer analysis of 10,257 cancer samples across 34 cancer types as well as 3,088 normal tissue samples. We identified 52,277 cancer-specific fusions with ~30 events per cancer and hotspot loci within transposons vulnerable to fusion formation. Exonization of intronic transposons was the most prevalent genic fusions, while somatic L1 insertions constituted a small fraction of cancer-specific fusions. Source L1s and HERVs, but not Alus showed decreased DNA methylation in cancer upon fusion formation. Overall cancer-specific L1 fusions were enriched in tumor suppressors while Alu fusions were enriched in oncogenes, including recurrent Alu fusions in EZH2 predictive of patient survival. We also demonstrated that transposon-derived peptides triggered CD8+ T-cell activation to the extent comparable to EBV viruses. Our findings reveal distinct epigenetic and tumorigenic mechanisms underlying transposon fusions across different families and highlight transposons as novel therapeutic targets and the source of potent neoantigens.
Collapse
Affiliation(s)
- Boram Lee
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
- Department of Pathology and Translational Genomics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Junseok Park
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Adam Voshall
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Eduardo Maury
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Bioinformatics and Integrative Genomics Program; Harvard/MIT MD-PhD Program, Harvard Medical School, Boston, MA, USA
| | - Yeeok Kang
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Yoen Jeong Kim
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
| | - Jin-Young Lee
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Hye-Ran Shim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Hyo-Ju Kim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Jung-Woo Lee
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Min-Hyeok Jung
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Si-Cho Kim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Hoang Bao Khanh Chu
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Da-Won Kim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Minjeong Kim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Eun-Ji Choi
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Ok Kyung Hwang
- New Drug Development Center, KBiohealth, Cheongju-Si, Chungbuk, Republic of Korea
| | - Ho Won Lee
- New Drug Development Center, KBiohealth, Cheongju-Si, Chungbuk, Republic of Korea
| | - Kyungsoo Ha
- New Drug Development Center, KBiohealth, Cheongju-Si, Chungbuk, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Yongjoon Kim
- Cancer Genome Research Center (CGRC), Yonsei University, Seoul, Republic of Korea
| | - Yoonjoo Choi
- Combinatorial Tumor Immunotherapy MRC, Chonnam National University Medical School, Hwasun, Republic of Korea
| | - Woong-Yang Park
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
4
|
Alkailani MI, Gibbings D. The Regulation and Immune Signature of Retrotransposons in Cancer. Cancers (Basel) 2023; 15:4340. [PMID: 37686616 PMCID: PMC10486412 DOI: 10.3390/cancers15174340] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/14/2023] [Accepted: 08/18/2023] [Indexed: 09/10/2023] Open
Abstract
Advances in sequencing technologies and the bioinformatic analysis of big data facilitate the study of jumping genes' activity in the human genome in cancer from a broad perspective. Retrotransposons, which move from one genomic site to another by a copy-and-paste mechanism, are regulated by various molecular pathways that may be disrupted during tumorigenesis. Active retrotransposons can stimulate type I IFN responses. Although accumulated evidence suggests that retrotransposons can induce inflammation, the research investigating the exact mechanism of triggering these responses is ongoing. Understanding these mechanisms could improve the therapeutic management of cancer through the use of retrotransposon-induced inflammation as a tool to instigate immune responses to tumors.
Collapse
Affiliation(s)
- Maisa I. Alkailani
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, Doha P.O. Box 34110, Qatar
| | - Derrick Gibbings
- Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada;
| |
Collapse
|
5
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023; 14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
- Department of Genetics, University of Georgia, Athens, GA USA
| |
Collapse
|
6
|
Saito-Adachi M, Hama N, Totoki Y, Nakamura H, Arai Y, Hosoda F, Rokutan H, Yachida S, Kato M, Fukagawa A, Shibata T. Oncogenic structural aberration landscape in gastric cancer genomes. Nat Commun 2023; 14:3688. [PMID: 37349325 PMCID: PMC10287692 DOI: 10.1038/s41467-023-39263-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 06/05/2023] [Indexed: 06/24/2023] Open
Abstract
Structural variants (SVs) are responsible for driver events in gastric cancer (GC); however, their patterns and processes remain poorly understood. Here, we examine 170 GC whole genomes to unravel the oncogenic structural aberration landscape in GC genomes and identify six rearrangement signatures (RSs). Non-random combinations of RSs elucidate distinctive GC subtypes comprising one or a few dominant RS that are associated with specific driver events (BRCA1/2 defects, mismatch repair deficiency, and TP53 mutation) and epidemiological backgrounds. Twenty-seven SV hotspots are identified as GC driver candidates. SV hotspots frequently constitute complexly clustered SVs involved in driver gene amplification, such as ERBB2, CCNE1, and FGFR2. Further deconstruction of the locally clustered SVs uncovers amplicon-generating profiles characterized by super-large SVs and intensive segmental amplifications, contributing to the extensive amplification of GC oncogenes. Comprehensive analyses using adjusted SV allele frequencies indicate the significant involvement of extra-chromosomal DNA in processes linked to specific RSs.
Collapse
Affiliation(s)
- Mihoko Saito-Adachi
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Natsuko Hama
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasushi Totoki
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Department of Cancer Genome Informatics, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Hiromi Nakamura
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Fumie Hosoda
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Hirofumi Rokutan
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Department of Pathology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shinichi Yachida
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Department of Cancer Genome Informatics, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Mamoru Kato
- Division of Bioinformatics, National Cancer Center Research Institute, Tokyo, Japan
| | - Akihiko Fukagawa
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan.
- Laboratory of Molecular Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
7
|
Wang Y, McNeil P, Abdulazeez R, Pascual M, Johnston SE, Keightley PD, Obbard DJ. Variation in mutation, recombination, and transposition rates in Drosophila melanogaster and Drosophila simulans. Genome Res 2023; 33:587-598. [PMID: 37037625 PMCID: PMC10234296 DOI: 10.1101/gr.277383.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/28/2023] [Indexed: 04/12/2023]
Abstract
The rates of mutation, recombination, and transposition are core parameters in models of evolution. They impact genetic diversity, responses to ongoing selection, and levels of genetic load. However, even for key evolutionary model species such as Drosophila melanogaster and Drosophila simulans, few estimates of these parameters are available, and we have little idea of how rates vary between individuals, sexes, or populations. Knowledge of this variation is fundamental for parameterizing models of genome evolution. Here, we provide direct estimates of mutation, recombination, and transposition rates and their variation in a West African and a European population of D. melanogaster and a European population of D. simulans Across 89 flies, we observe 58 single-nucleotide mutations, 286 crossovers, and 89 transposable element (TE) insertions. Compared to the European D. melanogaster, we find the West African population has a lower mutation rate (1.67 × 10-9 site-1 gen-1 vs. 4.86 × 10-9 site-1 gen-1) and a lower transposition rate (8.99 × 10-5 copy-1 gen-1 vs. 23.36 × 10-5 copy-1 gen-1), but a higher recombination rate (3.44 cM/Mb vs. 2.06 cM/Mb). The European D. simulans population has a similar mutation rate to European D. melanogaster, but a significantly higher recombination rate and a lower, but not significantly different, transposition rate. Overall, we find paternal-derived mutations are more frequent than maternal ones in both species. Our study quantifies the variation in rates of mutation, recombination, and transposition among different populations and sexes, and our direct estimates of these parameters in D. melanogaster and D. simulans will benefit future studies in population and evolutionary genetics.
Collapse
Affiliation(s)
- Yiguan Wang
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom;
| | - Paul McNeil
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | | | - Marta Pascual
- Departament de Genètica, Microbiologia i Estadística and IRBio, Universitat de Barcelona, 08028 Barcelona, Spain
| | - Susan E Johnston
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Peter D Keightley
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Darren J Obbard
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| |
Collapse
|
8
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia-superfamily retrotransposons in yeast. CONCLUSION McClintock (https://github.com/bergmanlab/mcclintock/) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA
- Department of Genetics, University of Georgia, Athens, GA
| |
Collapse
|
9
|
Bowles H, Kabiljo R, Al Khleifat A, Jones A, Quinn JP, Dobson RJB, Swanson CM, Al-Chalabi A, Iacoangeli A. An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data. FRONTIERS IN BIOINFORMATICS 2023; 2:1062328. [PMID: 36845320 PMCID: PMC9945273 DOI: 10.3389/fbinf.2022.1062328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/12/2022] [Indexed: 02/10/2023] Open
Abstract
There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.
Collapse
Affiliation(s)
- Harry Bowles
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Renata Kabiljo
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Ashley Jones
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Richard J. B. Dobson
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
- NIHR Biomedical Research Centre, University College London Hospitals NHS Foundation Trust, London, United Kingdom
| | - Chad M. Swanson
- Department of Infectious Diseases, School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Neurology, King’s College Hospital, London, United Kingdom
| | - Alfredo Iacoangeli
- Department of Basic and Clinical Neuroscience, King’s College London, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Department of Biostatistics and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London, London, United Kingdom
| |
Collapse
|
10
|
Contribution of Retrotransposons to the Pathogenesis of Type 1 Diabetes and Challenges in Analysis Methods. Int J Mol Sci 2023; 24:ijms24043104. [PMID: 36834511 PMCID: PMC9966460 DOI: 10.3390/ijms24043104] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/30/2023] [Accepted: 02/02/2023] [Indexed: 02/09/2023] Open
Abstract
Type 1 diabetes (T1D) is one of the most common chronic diseases of the endocrine system, associated with several life-threatening comorbidities. While the etiopathogenesis of T1D remains elusive, a combination of genetic susceptibility and environmental factors, such as microbial infections, are thought to be involved in the development of the disease. The prime model for studying the genetic component of T1D predisposition encompasses polymorphisms within the HLA (human leukocyte antigen) region responsible for the specificity of antigen presentation to lymphocytes. Apart from polymorphisms, genomic reorganization caused by repeat elements and endogenous viral elements (EVEs) might be involved in T1D predisposition. Such elements are human endogenous retroviruses (HERVs) and non-long terminal repeat (non-LTR) retrotransposons, including long and short interspersed nuclear elements (LINEs and SINEs). In line with their parasitic origin and selfish behaviour, retrotransposon-imposed gene regulation is a major source of genetic variation and instability in the human genome, and may represent the missing link between genetic susceptibility and environmental factors long thought to contribute to T1D onset. Autoreactive immune cell subtypes with differentially expressed retrotransposons can be identified with single-cell transcriptomics, and personalized assembled genomes can be constructed, which can then serve as a reference for predicting retrotransposon integration/restriction sites. Here we review what is known to date about retrotransposons, we discuss the involvement of viruses and retrotransposons in T1D predisposition, and finally we consider challenges in retrotransposons analysis methods.
Collapse
|
11
|
Smits N, Faulkner GJ. Nanopore Sequencing to Identify Transposable Element Insertions and Their Epigenetic Modifications. Methods Mol Biol 2023; 2607:151-171. [PMID: 36449163 DOI: 10.1007/978-1-0716-2883-6_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Over the past 20 years, high-throughput genomic assays have fundamentally changed how transposable elements (TEs) are studied. While short-read DNA sequencing has been at the heart of these efforts, novel technologies that generate longer reads are driving a shift in the field. Long-read sequencing now permits locus-specific approaches to locate individual TE insertions and understand their epigenetic and transcriptional regulation, while still profiling TE activity genome-wide. Here we provide detailed guidelines to implement Oxford Nanopore Technologies (ONT) sequencing to identify polymorphic TE insertions and profile TE epigenetic landscapes. Using human long interspersed element-1 (LINE-1, L1) as an example, we explain the procedures involved, including final visualization, and potential bottlenecks and pitfalls. ONT sequencing will be, in our view, a workhorse technology for the foreseeable future in the TE field.
Collapse
Affiliation(s)
- Nathan Smits
- Mater Research Institute, University of Queensland, Woolloongabba, QLD, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, QLD, Australia.
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
12
|
Gerdes P, Lim SM, Ewing AD, Larcombe MR, Chan D, Sanchez-Luque FJ, Walker L, Carleton AL, James C, Knaupp AS, Carreira PE, Nefzger CM, Lister R, Richardson SR, Polo JM, Faulkner GJ. Retrotransposon instability dominates the acquired mutation landscape of mouse induced pluripotent stem cells. Nat Commun 2022; 13:7470. [PMID: 36463236 PMCID: PMC9719517 DOI: 10.1038/s41467-022-35180-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 11/22/2022] [Indexed: 12/04/2022] Open
Abstract
Induced pluripotent stem cells (iPSCs) can in principle differentiate into any cell of the body, and have revolutionized biomedical research and regenerative medicine. Unlike their human counterparts, mouse iPSCs (miPSCs) are reported to silence transposable elements and prevent transposable element-mediated mutagenesis. Here we apply short-read or Oxford Nanopore Technologies long-read genome sequencing to 38 bulk miPSC lines reprogrammed from 10 parental cell types, and 18 single-cell miPSC clones. While single nucleotide variants and structural variants restricted to miPSCs are rare, we find 83 de novo transposable element insertions, including examples intronic to Brca1 and Dmd. LINE-1 retrotransposons are profoundly hypomethylated in miPSCs, beyond other transposable elements and the genome overall, and harbor alternative protein-coding gene promoters. We show that treatment with the LINE-1 inhibitor lamivudine does not hinder reprogramming and efficiently blocks endogenous retrotransposition, as detected by long-read genome sequencing. These experiments reveal the complete spectrum and potential significance of mutations acquired by miPSCs.
Collapse
Affiliation(s)
- Patricia Gerdes
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Sue Mei Lim
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Adam D. Ewing
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Michael R. Larcombe
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Dorothy Chan
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Francisco J. Sanchez-Luque
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia ,grid.418805.00000 0004 0500 8423GENYO. Pfizer-University of Granada-Andalusian Government Centre for Genomics and Oncological Research, PTS, Granada, 18016 Spain
| | - Lucinda Walker
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Alexander L. Carleton
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Cini James
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Anja S. Knaupp
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Patricia E. Carreira
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Christian M. Nefzger
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Ryan Lister
- grid.1012.20000 0004 1936 7910Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, The University of Western Australia, Perth, WA 6009 Australia ,grid.431595.f0000 0004 0469 0045Harry Perkins Institute of Medical Research, Perth, WA 6009 Australia
| | - Sandra R. Richardson
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Jose M. Polo
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia ,grid.1010.00000 0004 1936 7304Adelaide Centre for Epigenetics and The South Australian Immunogenomics Cancer Institute, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005 Australia
| | - Geoffrey J. Faulkner
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia ,grid.1003.20000 0000 9320 7537Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072 Australia
| |
Collapse
|
13
|
Savage AL, Iacoangeli A, Schumann GG, Rubio-Roldan A, Garcia-Perez JL, Al Khleifat A, Koks S, Bubb VJ, Al-Chalabi A, Quinn JP. Characterisation of retrotransposon insertion polymorphisms in whole genome sequencing data from individuals with amyotrophic lateral sclerosis. Gene 2022; 843:146799. [PMID: 35963498 DOI: 10.1016/j.gene.2022.146799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/15/2022] [Accepted: 08/05/2022] [Indexed: 11/15/2022]
Abstract
The genetics of an individual is a crucial factor in understanding the risk of developing the neurodegenerative disease amyotrophic lateral sclerosis (ALS). There is still a large proportion of the heritability of ALS, particularly in sporadic cases, to be understood. Among others, active transposable elements drive inter-individual variability, and in humans long interspersed element 1 (LINE1, L1), Alu and SINE-VNTR-Alu (SVA) retrotransposons are a source of polymorphic insertions in the population. We undertook a pilot study to characterise the landscape of non-reference retrotransposon insertion polymorphisms (non-ref RIPs) in 15 control and 15 ALS individuals' whole genomes from Project MinE, an international project to identify potential genetic causes of ALS. The combination of two bioinformatics tools (mobile element locator tool (MELT) and TEBreak) identified on average 1250 Alu, 232 L1 and 77 SVA non-ref RIPs per genome across the 30 analysed. Further PCR validation of individual polymorphic retrotransposon insertions showed a similar level of accuracy for MELT and TEBreak. Our preliminary study did not identify a specific RIP or a significant difference in the total number of non-ref RIPs in ALS compared to control genomes. The use of multiple bioinformatic tools improved the accuracy of non-ref RIP detection and our study highlights the potential importance of studying these elements further in ALS.
Collapse
Affiliation(s)
- Abigail L Savage
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Alfredo Iacoangeli
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 8AF, UK
| | - Gerald G Schumann
- Division of Medical Biotechnology, Paul-Ehrlich-Institut, Langen 63225, Germany
| | - Alejandro Rubio-Roldan
- Department of Genomic Medicine and Department of Oncology, GENYO, Centre for Genomics & Oncology, PTS Granada, 18007, Spain
| | - Jose L Garcia-Perez
- Department of Genomic Medicine and Department of Oncology, GENYO, Centre for Genomics & Oncology, PTS Granada, 18007, Spain; MRC-HGU Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Ahmad Al Khleifat
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK
| | - Sulev Koks
- Perron Institute for Neurological and Translational Science, Perth, Western Australia 6009, Australia; Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, Western Australia 6150, Australia
| | - Vivien J Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Ammar Al-Chalabi
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK; Department of Neurology, King's College Hospital, London SE5 9RS, UK
| | - John P Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK.
| |
Collapse
|
14
|
Samoluk SS, Vaio M, Ortíz AM, Chalup LMI, Robledo G, Bertioli DJ, Seijo G. Comparative repeatome analysis reveals new evidence on genome evolution in wild diploid Arachis (Fabaceae) species. PLANTA 2022; 256:50. [PMID: 35895167 DOI: 10.1007/s00425-022-03961-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/12/2022] [Indexed: 06/15/2023]
Abstract
Opposing changes in the abundance of satellite DNA and long terminal repeat (LTR) retroelements are the main contributors to the variation in genome size and heterochromatin amount in Arachis diploids. The South American genus Arachis (Fabaceae) comprises 83 species organized in nine taxonomic sections. Among them, section Arachis is characterized by species with a wide genome and karyotype diversity. Such diversity is determined mainly by the amount and composition of repetitive DNA. Here we performed computational analysis on low coverage genome sequencing to infer the dynamics of changes in major repeat families that led to the differentiation of genomes in diploid species (x = 10) of genus Arachis, focusing on section Arachis. Estimated repeat content ranged from 62.50 to 71.68% of the genomes. Species with different genome composition tended to have different landscapes of repeated sequences. Athila family retrotransposons were the most abundant and variable lineage among Arachis repeatomes, with peaks of transpositional activity inferred at different times in the evolution of the species. Satellite DNAs (satDNAs) were less abundant, but differentially represented among species. High rates of evolution of an AT-rich superfamily of satDNAs led to the differential accumulation of heterochromatin in Arachis genomes. The relationship between genome size variation and the repetitive content is complex. However, largest genomes presented a higher accumulation of LTR elements and lower contents of satDNAs. In contrast, species with lowest genome sizes tended to accumulate satDNAs in detriment of LTR elements. Phylogenetic analysis based on repetitive DNA supported the genome arrangement of section Arachis. Altogether, our results provide the most comprehensive picture on the repeatome dynamics that led to the genome differentiation of Arachis species.
Collapse
Affiliation(s)
- Sergio S Samoluk
- Instituto de Botánica del Nordeste (UNNE-CONICET), Facultad de Ciencias Agrarias, Corrientes, Argentina.
| | - Magdalena Vaio
- Laboratory of Plant Genome Evolution and Domestication, Department of Plant Biology, Faculty of Agronomy, University of the Republic, Montevideo, Uruguay
| | - Alejandra M Ortíz
- Instituto de Botánica del Nordeste (UNNE-CONICET), Facultad de Ciencias Agrarias, Corrientes, Argentina
| | - Laura M I Chalup
- Instituto de Botánica del Nordeste (UNNE-CONICET), Facultad de Ciencias Agrarias, Corrientes, Argentina
| | - Germán Robledo
- Instituto de Botánica del Nordeste (UNNE-CONICET), Facultad de Ciencias Agrarias, Corrientes, Argentina
- Facultad de Ciencias Exactas y Naturales y Agrimensura, Universidad Nacional del Nordeste, Corrientes, Argentina
| | - David J Bertioli
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA, USA
| | - Guillermo Seijo
- Instituto de Botánica del Nordeste (UNNE-CONICET), Facultad de Ciencias Agrarias, Corrientes, Argentina
- Facultad de Ciencias Exactas y Naturales y Agrimensura, Universidad Nacional del Nordeste, Corrientes, Argentina
| |
Collapse
|
15
|
Ansaloni F, Gualandi N, Esposito M, Gustincich S, Sanges R. TEspeX: consensus-specific quantification of transposable element expression preventing biases from exonized fragments. Bioinformatics 2022; 38:4430-4433. [PMID: 35876845 PMCID: PMC9477521 DOI: 10.1093/bioinformatics/btac526] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 07/08/2022] [Accepted: 07/12/2022] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Transposable elements (TEs) play key roles in crucial biological pathways. Therefore, several tools enabling the quantification of their expression were recently developed. However, many of the existing tools lack the capability to distinguish between the transcription of autonomously expressed TEs and TE fragments embedded in canonical coding/non-coding non-TE transcripts. Consequently, an apparent change in the expression of a given TE may simply reflect the variation in the expression of the transcripts containing TE-derived sequences. To overcome this issue, we have developed TEspeX, a pipeline for the quantification of TE expression at the consensus level. TEspeX uses Illumina RNA-seq short reads to quantify TE expression avoiding counting reads deriving from inactive TE fragments embedded in canonical transcripts. AVAILABILITY AND IMPLEMENTATION The tool is implemented in python3, distributed under the GNU General Public License (GPL) and available on Github at https://github.com/fansalon/TEspeX (Zenodo URL: https://doi.org/10.5281/zenodo.6800331). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Federico Ansaloni
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste 34136, Italy,Central RNA Laboratory, Istituto Italiano di Tecnologia, Genova 16163, Italy
| | - Nicolò Gualandi
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste 34136, Italy
| | - Mauro Esposito
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste 34136, Italy
| | | | - Remo Sanges
- To whom correspondence should be addressed. or
| |
Collapse
|
16
|
Halabian R, Makałowski W. A Map of 3' DNA Transduction Variants Mediated by Non-LTR Retroelements on 3202 Human Genomes. BIOLOGY 2022; 11:1032. [PMID: 36101413 PMCID: PMC9311842 DOI: 10.3390/biology11071032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 05/03/2023]
Abstract
As one of the major structural constituents, mobile elements comprise more than half of the human genome, among which Alu, L1, and SVA elements are still active and continue to generate new offspring. One of the major characteristics of L1 and SVA elements is their ability to co-mobilize adjacent downstream sequences to new loci in a process called 3' DNA transduction. Transductions influence the structure and content of the genome in different ways, such as increasing genome variation, exon shuffling, and gene duplication. Moreover, given their mutagenicity capability, 3' transductions are often involved in tumorigenesis or in the development of some diseases. In this study, we analyzed 3202 genomes sequenced at high coverage by the New York Genome Center to catalog and characterize putative 3' transduced segments mediated by L1s and SVAs. Here, we present a genome-wide map of inter/intrachromosomal 3' transduction variants, including their genomic and functional location, length, progenitor location, and allelic frequency across 26 populations. In total, we identified 7103 polymorphic L1s and 3040 polymorphic SVAs. Of these, 268 and 162 variants were annotated as high-confidence L1 and SVA 3' transductions, respectively, with lengths that ranged from 7 to 997 nucleotides. We found specific loci within chromosomes X, 6, 7, and 6_GL000253v2_alt as master L1s and SVAs that had yielded more transductions, among others. Together, our results demonstrate the dynamic nature of transduction events within the genome and among individuals and their contribution to the structural variations of the human genome.
Collapse
Affiliation(s)
| | - Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, 48149 Münster, Germany;
| |
Collapse
|
17
|
Stefanini I, Di Paola M, Liti G, Marranci A, Sebastiani F, Casalone E, Cavalieri D. Resistance to Arsenite and Arsenate in Saccharomyces cerevisiae Arises through the Subtelomeric Expansion of a Cluster of Yeast Genes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19138119. [PMID: 35805774 PMCID: PMC9266342 DOI: 10.3390/ijerph19138119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 06/25/2022] [Accepted: 06/28/2022] [Indexed: 01/25/2023]
Abstract
Arsenic is one of the most prevalent toxic elements in the environment, and its toxicity affects every organism. Arsenic resistance has mainly been observed in microorganisms, and, in bacteria, it has been associated with the presence of the Ars operon. In Saccharomyces cerevisiae, three genes confer arsenic resistance: ARR1, ARR2, and ARR3. Unlike bacteria, in which the presence of the Ars genes confers per se resistance to arsenic, most of the S. cerevisiae isolates present the three ARR genes, regardless of whether the strain is resistant or sensitive to arsenic. To assess the genetic features that make natural S. cerevisiae strains resistant to arsenic, we used a combination of comparative genomic hybridization, whole-genome sequencing, and transcriptomics profiling with microarray analyses. We observed that both the presence and the genomic location of multiple copies of the whole cluster of ARR genes were central to the escape from subtelomeric silencing and the acquisition of resistance to arsenic. As a result of the repositioning, the ARR genes were expressed even in the absence of arsenic. In addition to their relevance in improving our understanding of the mechanism of arsenic resistance in yeast, these results provide evidence for a new cluster of functionally related genes that are independently duplicated and translocated.
Collapse
Affiliation(s)
- Irene Stefanini
- Department of Life Sciences and Systems Biology, University of Turin, 10123 Turin, Italy;
| | - Monica Di Paola
- Department of Biology, University of Florence, Sesto Fiorentino, 50019 Florence, Italy; (M.D.P.); (E.C.)
| | - Gianni Liti
- National Centre for Scientific Research (CNRS), National Institute of Health and Medical Research (INSERM), Institute for Research on Cancer and Aging (IRCAN), Université Côte d’Azur, 06103 Nice, France;
| | - Andrea Marranci
- Core Research Laboratory, Oncogenomics Unit, Istituto di Fisiologia Clinica, Institute for Cancer Research and Pre-vention (ISPRO), 56124 Pisa, Italy;
| | - Federico Sebastiani
- Institute for Sustainable Plant Protection, National Research Council (IPSP-CNR), Sesto Fiorentino, 50019 Florence, Italy;
| | - Enrico Casalone
- Department of Biology, University of Florence, Sesto Fiorentino, 50019 Florence, Italy; (M.D.P.); (E.C.)
| | - Duccio Cavalieri
- Department of Biology, University of Florence, Sesto Fiorentino, 50019 Florence, Italy; (M.D.P.); (E.C.)
- Correspondence:
| |
Collapse
|
18
|
Riehl K, Riccio C, Miska EA, Hemberg M. TransposonUltimate: software for transposon classification, annotation and detection. Nucleic Acids Res 2022; 50:e64. [PMID: 35234904 PMCID: PMC9226531 DOI: 10.1093/nar/gkac136] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/09/2022] [Accepted: 02/14/2022] [Indexed: 12/17/2022] Open
Abstract
Most genomes harbor a large number of transposons, and they play an important role in evolution and gene regulation. They are also of interest to clinicians as they are involved in several diseases, including cancer and neurodegeneration. Although several methods for transposon identification are available, they are often highly specialised towards specific tasks or classes of transposons, and they lack common standards such as a unified taxonomy scheme and output file format. We present TransposonUltimate, a powerful bundle of three modules for transposon classification, annotation, and detection of transposition events. TransposonUltimate comes as a Conda package under the GPL-3.0 licence, is well documented and it is easy to install through https://github.com/DerKevinRiehl/TransposonUltimate. We benchmark the classification module on the large TransposonDB covering 891,051 sequences to demonstrate that it outperforms the currently best existing solutions. The annotation and detection modules combine sixteen existing softwares, and we illustrate its use by annotating Caenorhabditis elegans, Rhizophagus irregularis and Oryza sativa subs. japonica genomes. Finally, we use the detection module to discover 29 554 transposition events in the genomes of 20 wild type strains of C. elegans. Databases, assemblies, annotations and further findings can be downloaded from (https://doi.org/10.5281/zenodo.5518085).
Collapse
Affiliation(s)
- Kevin Riehl
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Cristian Riccio
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Eric A Miska
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02215, USA
| |
Collapse
|
19
|
Meta-Analysis Suggests That Intron Retention Can Affect Quantification of Transposable Elements from RNA-Seq Data. BIOLOGY 2022; 11:biology11060826. [PMID: 35741347 PMCID: PMC9220773 DOI: 10.3390/biology11060826] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/20/2022] [Accepted: 05/26/2022] [Indexed: 02/08/2023]
Abstract
Simple Summary Transposable elements (TEs) are repetitive sequences comprising more than one third of the human genome with the original ability to change their location within the genome. Owing to their repetitive nature, the quantification of TEs results often challenging. RNA-seq is a useful tool for genome-wide TEs quantification, nevertheless it also presents technical issues, including low reads mappability and erroneous quantification derived from the transcription of TEs fragments embedded in canonical transcripts. Fragments derived from TEs are found within the introns of most genes, which led to the hypothesis that intron retention (IR) can affect the unbiased quantification of TEs expression. Performing meta-analysis of public RNA-seq datasets, here we observe that IR can indeed impact the quantification of TEs by increasing the number of reads mapped on intronic TE copies. Our work highlights a correlation between IR and TEs expression measurement by RNA-seq that should be taken into account to achieve reliable TEs quantification, especially in samples characterized by extensive IR, because differential IR might be confused with differential TEs expression. Abstract Transposable elements (TEs), also known as “jumping genes”, are repetitive sequences with the capability of changing their location within the genome. They are key players in many different biological processes in health and disease. Therefore, a reliable quantification of their expression as transcriptional units is crucial to distinguish between their independent expression and the transcription of their sequences as part of canonical transcripts. TEs quantification faces difficulties of different types, the most important one being low reads mappability due to their repetitive nature preventing an unambiguous mapping of reads originating from their sequences. A large fraction of TEs fragments localizes within introns, which led to the hypothesis that intron retention (IR) can be an additional source of bias, potentially affecting accurate TEs quantification. IR occurs when introns, normally removed from the mature transcript by the splicing machinery, are maintained in mature transcripts. IR is a widespread mechanism affecting many different genes with cell type-specific patterns. We hypothesized that, in an RNA-seq experiment, reads derived from retained introns can introduce a bias in the detection of overlapping, independent TEs RNA expression. In this study we performed meta-analysis using public RNA-seq data from lymphoblastoid cell lines and show that IR can impact TEs quantification using established tools with default parameters. Reads mapped on intronic TEs were indeed associated to the expression of TEs and influence their correct quantification as independent transcriptional units. We confirmed these results using additional independent datasets, demonstrating that this bias does not appear in samples where IR is not present and that differential TEs expression does not impact on IR quantification. We concluded that IR causes the over-quantification of intronic TEs and differential IR might be confused with differential TEs expression. Our results should be taken into account for a correct quantification of TEs expression from RNA-seq data, especially in samples in which IR is abundant.
Collapse
|
20
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
21
|
Marchi E, Jones M, Klenerman P, Frater J, Magiorkinis G, Belshaw R. BreakAlign: a Perl program to align chimaeric (split) genomic NGS reads and allow visual confirmation of novel retroviral integrations. BMC Bioinformatics 2022; 23:134. [PMID: 35428171 PMCID: PMC9013057 DOI: 10.1186/s12859-022-04621-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 02/28/2022] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Retroviruses replicate by integrating a DNA copy into a host chromosome. Detecting novel retroviral integrations (ones not in the reference genome sequence of the host) from genomic NGS data is bioinformatically challenging and frequently produces many false positives. One common method of confirmation is visual inspection of an alignment of the chimaeric (split) reads that span a putative novel retroviral integration site. We perceived the need for a program that would facilitate this by producing a multiple alignment containing both the viral and host regions that flank an integration. RESULTS BreakAlign is a Perl program that uses blastn to produce such a multiple alignment. In addition to the NGS dataset and a reference viral sequence, the program requires either (a) the ~ 500nt host genome sequence that spans the putative integration or (b) coordinates of this putative integration in an installed copy of the reference human genome (multiple integrations can be processed automatically). BreakAlign is freely available from https://github.com/marchiem/breakalign and is accompanied by example files allowing a test run. CONCLUSION BreakAlign will confirm and facilitate characterisation of both (a) germline integrations of endogenous retroviruses and (b) somatic integrations of exogenous retroviruses such as HIV and HTLV. Although developed for use with genomic short-read NGS (second generation) data and retroviruses, it should also be useful for long-read (third generation) data and any mobile element with at least one conserved flanking region.
Collapse
Affiliation(s)
- Emanuele Marchi
- Nuffield Department of Medicine, University of Oxford, Oxford, UK.
| | - Mathew Jones
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Paul Klenerman
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - John Frater
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Gkikas Magiorkinis
- Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, Athens, Greece
| | - Robert Belshaw
- Department of Biology, College of Science and Technology, Wenzhou-Kean University, Wenzhou, Zhejiang Province, China.
| |
Collapse
|
22
|
Pfaff AL, Singleton LM, Kõks S. Mechanisms of disease-associated SINE-VNTR-Alus. Exp Biol Med (Maywood) 2022; 247:756-764. [PMID: 35387528 DOI: 10.1177/15353702221082612] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
SINE-VNTR-Alus (SVAs) are the youngest retrotransposon family in the human genome. Their ongoing mobilization has generated genetic variation within the human population. At least 24 insertions to date, detailed in this review, have been associated with disease. The predominant mechanisms through which this occurs are alterations to normal splicing patterns, exonic insertions causing loss-of-function mutations, and large genomic deletions. Dissecting the functional impact of these SVAs and the mechanism through which they cause disease provides insight into the consequences of their presence in the genome and how these elements could influence phenotypes. Many of these disease-associated SVAs have been difficult to characterize and would not have been identified through routine analyses. However, the number identified has increased in recent years as DNA and RNA sequencing data became more widely available. Therefore, as the search for complex structural variation in disease continues, it is likely to yield further disease-causing SVA insertions.
Collapse
Affiliation(s)
- Abigail L Pfaff
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia.,Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| | - Lewis M Singleton
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia
| | - Sulev Kõks
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia.,Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| |
Collapse
|
23
|
Niu Y, Teng X, Zhou H, Shi Y, Li Y, Tang Y, Zhang P, Luo H, Kang Q, Xu T, He S. Characterizing mobile element insertions in 5675 genomes. Nucleic Acids Res 2022; 50:2493-2508. [PMID: 35212372 PMCID: PMC8934628 DOI: 10.1093/nar/gkac128] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 02/07/2022] [Accepted: 02/11/2022] [Indexed: 12/30/2022] Open
Abstract
Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map of 36 699 non-reference MEIs constructed from 5675 genomes, comprising 2998 Chinese samples (∼26.2×, NyuWa) and 2677 samples from the 1000 Genomes Project (∼7.4×, 1KGP). We discovered that LINE-1 insertions were highly enriched in centromere regions, implying the role of chromosome context in retroelement insertion. After functional annotation, we estimated that MEIs are responsible for about 9.3% of all protein-truncating events per genome. Finally, we built a companion database named HMEID for public use. This resource represents the latest and largest genomewide study on MEIs and will have broad utility for exploration of human MEI findings.
Collapse
Affiliation(s)
- Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xueyi Teng
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Honghong Zhou
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiheng Tang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Quan Kang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
24
|
Wang C, Liang C. The insertion and dysregulation of transposable elements in osteosarcoma and their association with patient event-free survival. Sci Rep 2022; 12:377. [PMID: 35013466 PMCID: PMC8748539 DOI: 10.1038/s41598-021-04208-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 11/23/2021] [Indexed: 12/11/2022] Open
Abstract
The dysregulation of transposable elements (TEs) has been explored in a variety of cancers. However, TE activities in osteosarcoma (OS) have not been extensively studied yet. By integrative analysis of RNA-seq, whole-genome sequencing (WGS), and methylation data, we showed aberrant TE activities associated with dysregulations of TEs in OS tumors. Specifically, expression levels of LINE-1 and Alu of different evolutionary ages, as well as subfamilies of SVA and HERV-K, were significantly up-regulated in OS tumors, accompanied by enhanced DNA repair responses. We verified the characteristics of LINE-1 mediated TE insertions, including target site duplication (TSD) length (centered around 15 bp) and preferential insertions into intergenic and AT-rich regions as well as intronic regions of longer genes. By filtering polymorphic TE insertions reported in 1000 genome project (1KGP), besides 148 tumor-specific somatic TE insertions, we found most OS patient-specific TE insertions (3175 out of 3326) are germline insertions, which are associated with genes involved in neuronal processes or with transcription factors important for cancer development. In addition to 68 TE-affected cancer genes, we found recurrent germline TE insertions in 72 non-cancer genes with high frequencies among patients. We also found that +/− 500 bps flanking regions of transcription start sites (TSS) of LINE-1 (young) and Alu showed lower methylation levels in OS tumor samples than controls. Interestingly, by incorporating patient clinical data and focusing on TE activities in OS tumors, our data analysis suggested that higher TE insertions in OS tumors are associated with a longer event-free survival time.
Collapse
Affiliation(s)
- Chao Wang
- Department of Biology, Miami University, Oxford, Ohio, 45056, USA.
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio, 45056, USA.
| |
Collapse
|
25
|
Kirov I, Merkulov P, Dudnikov M, Polkhovskaya E, Komakhin RA, Konstantinov Z, Gvaramiya S, Ermolaev A, Kudryavtseva N, Gilyok M, Divashuk MG, Karlov GI, Soloviev A. Transposons Hidden in Arabidopsis thaliana Genome Assembly Gaps and Mobilization of Non-Autonomous LTR Retrotransposons Unravelled by Nanotei Pipeline. PLANTS (BASEL, SWITZERLAND) 2021; 10:2681. [PMID: 34961152 PMCID: PMC8704663 DOI: 10.3390/plants10122681] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 06/12/2023]
Abstract
Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Ekaterina Polkhovskaya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Roman A. Komakhin
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Zakhar Konstantinov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Sofya Gvaramiya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Aleksey Ermolaev
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Natalya Kudryavtseva
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Marina Gilyok
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Mikhail G. Divashuk
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Gennady I. Karlov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| |
Collapse
|
26
|
Petersen M, Winter S, Coimbra R, J de Jong M, Kapitonov VV, Nilsson MA. Population analysis of retrotransposons in giraffe genomes supports RTE decline and widespread LINE1 activity in Giraffidae. Mob DNA 2021; 12:27. [PMID: 34836553 PMCID: PMC8620236 DOI: 10.1186/s13100-021-00254-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 10/25/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The majority of structural variation in genomes is caused by insertions of transposable elements (TEs). In mammalian genomes, the main TE fraction is made up of autonomous and non-autonomous non-LTR retrotransposons commonly known as LINEs and SINEs (Long and Short Interspersed Nuclear Elements). Here we present one of the first population-level analysis of TE insertions in a non-model organism, the giraffe. Giraffes are ruminant artiodactyls, one of the few mammalian groups with genomes that are colonized by putatively active LINEs of two different clades of non-LTR retrotransposons, namely the LINE1 and RTE/BovB LINEs as well as their associated SINEs. We analyzed TE insertions of both types, and their associated SINEs in three giraffe genome assemblies, as well as across a population level sampling of 48 individuals covering all extant giraffe species. RESULTS The comparative genome screen identified 139,525 recent LINE1 and RTE insertions in the sampled giraffe population. The analysis revealed a drastically reduced RTE activity in giraffes, whereas LINE1 is still actively propagating in the genomes of extant (sub)-species. In concert with the extremely low activity of the giraffe RTE, we also found that RTE-dependent SINEs, namely Bov-tA and Bov-A2, have been virtually immobile in the last 2 million years. Despite the high current activity of the giraffe LINE1, we did not find evidence for the presence of currently active LINE1-dependent SINEs. TE insertion heterozygosity rates differ among the different (sub)-species, likely due to divergent population histories. CONCLUSIONS The horizontally transferred RTE/BovB and its derived SINEs appear to be close to inactivation and subsequent extinction in the genomes of extant giraffe species. This is the first time that the decline of a TE family has been meticulously analyzed from a population genetics perspective. Our study shows how detailed information about past and present TE activity can be obtained by analyzing large-scale population-level genomic data sets.
Collapse
Affiliation(s)
- Malte Petersen
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108, Freiburg, Germany
| | - Sven Winter
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany
| | - Raphael Coimbra
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany
- Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße 13, 60438, Frankfurt am Main, Germany
| | - Menno J de Jong
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany
| | - Vladimir V Kapitonov
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt am Main, Germany
| | - Maria A Nilsson
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage 25, 60325, Frankfurt am Main, Germany.
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt am Main, Germany.
| |
Collapse
|
27
|
Impact of Repetitive DNA Elements on Snake Genome Biology and Evolution. Cells 2021; 10:cells10071707. [PMID: 34359877 PMCID: PMC8303610 DOI: 10.3390/cells10071707] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 06/29/2021] [Accepted: 07/01/2021] [Indexed: 12/11/2022] Open
Abstract
The distinctive biology and unique evolutionary features of snakes make them fascinating model systems to elucidate how genomes evolve and how variation at the genomic level is interlinked with phenotypic-level evolution. Similar to other eukaryotic genomes, large proportions of snake genomes contain repetitive DNA, including transposable elements (TEs) and satellite repeats. The importance of repetitive DNA and its structural and functional role in the snake genome, remain unclear. This review highlights the major types of repeats and their proportions in snake genomes, reflecting the high diversity and composition of snake repeats. We present snakes as an emerging and important model system for the study of repetitive DNA under the impact of sex and microchromosome evolution. We assemble evidence to show that certain repetitive elements in snakes are transcriptionally active and demonstrate highly dynamic lineage-specific patterns as repeat sequences. We hypothesize that particular TEs can trigger different genomic mechanisms that might contribute to driving adaptive evolution in snakes. Finally, we review emerging approaches that may be used to study the expression of repetitive elements in complex genomes, such as snakes. The specific aspects presented here will stimulate further discussion on the role of genomic repeats in shaping snake evolution.
Collapse
|
28
|
Kortright KE, Doss-Gollin S, Chan BK, Turner PE. Evolution of Bacterial Cross-Resistance to Lytic Phages and Albicidin Antibiotic. Front Microbiol 2021; 12:658374. [PMID: 34220747 PMCID: PMC8245764 DOI: 10.3389/fmicb.2021.658374] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 11/21/2022] Open
Abstract
Due to concerns over the global increase of antibiotic-resistant bacteria, alternative antibacterial strategies, such as phage therapy, are increasingly being considered. However, evolution of bacterial resistance to new therapeutics is almost a certainty; indeed, it is possible that resistance to alternative treatments might result in an evolved trade-up such as enhanced antibiotic resistance. Here, we hypothesize that selection for Escherichia coli bacteria to resist phage T6, phage U115, or albicidin, a DNA gyrase inhibitor, should often result in a pleiotropic trade-up in the form of cross-resistance, because all three antibacterial agents interact with the Tsx porin. Selection imposed by any one of the antibacterials resulted in cross-resistance to all three of them, in each of the 29 spontaneous bacterial mutants examined in this study. Furthermore, cross-resistance did not cause measurable fitness (growth) deficiencies for any of the bacterial mutants, when competed against wild-type E. coli in both low-resource and high-resource environments. A combination of whole-genome and targeted sequencing confirmed that mutants differed from wild-type E. coli via change(s) in the tsx gene. Our results indicate that evolution of cross-resistance occurs frequently in E. coli subjected to independent selection by phage T6, phage U115 or albicidin. This study cautions that deployment of new antibacterial therapies such as phage therapy, should be preceded by a thorough investigation of evolutionary consequences of the treatment, to avoid the potential for evolved trade-ups.
Collapse
Affiliation(s)
| | - Simon Doss-Gollin
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Benjamin K. Chan
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Paul E. Turner
- Program in Microbiology, Yale School of Medicine, New Haven, CT, United States
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| |
Collapse
|
29
|
Jain D, Chu C, Alver BH, Lee S, Lee EA, Park PJ. HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data. Bioinformatics 2021; 37:1045-1051. [PMID: 33136153 DOI: 10.1093/bioinformatics/btaa923] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/14/2020] [Accepted: 10/17/2020] [Indexed: 11/13/2022] Open
Abstract
Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. AVAILABILITY AND IMPLEMENTATION HiTea is available at https://github.com/parklab/HiTea and as a Docker image. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhawal Jain
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Burak Han Alver
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Soohyun Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA 02115, USA.,Broad Institute of MIT and Harvard University, Cambridge, MA 02142, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
30
|
Fong SL, Capra JA. Modeling the evolutionary architectures of transcribed human enhancer sequences reveals distinct origins, functions, and associations with human-trait variation. Mol Biol Evol 2021; 38:3681-3696. [PMID: 33973014 PMCID: PMC8382917 DOI: 10.1093/molbev/msab138] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Despite the importance of gene regulatory enhancers in human biology and evolution, we lack a comprehensive model of enhancer evolution and function. This substantially limits our understanding of the genetic basis of species divergence and our ability to interpret the effects of noncoding variants on human traits. To explore enhancer sequence evolution and its relationship to regulatory function, we traced the evolutionary origins of transcribed human enhancer sequences with activity across diverse tissues and cellular contexts from the FANTOM5 consortium. The transcribed enhancers are enriched for sequences of a single evolutionary age (“simple” evolutionary architectures) compared with enhancers that are composites of sequences of multiple evolutionary ages (“complex” evolutionary architectures), likely indicating constraint against genomic rearrangements. Complex enhancers are older, more pleiotropic, and more active across species than simple enhancers. Genetic variants within complex enhancers are also less likely to associate with human traits and biochemical activity. Transposable-element-derived sequences (TEDS) have made diverse contributions to enhancers of both architectures; the majority of TEDS are found in enhancers with simple architectures, while a minority have remodeled older sequences to create complex architectures. Finally, we compare the evolutionary architectures of transcribed enhancers with histone-mark-defined enhancers. Our results reveal that most human transcribed enhancers are ancient sequences of a single age, and thus the evolution of most human enhancers was not driven by increases in evolutionary complexity over time. Our analyses further suggest that considering enhancer evolutionary histories provides context that can aid interpretation of the effects of variants on enhancer function. Based on these results, we propose a framework for analyzing enhancer evolutionary architecture.
Collapse
Affiliation(s)
- Sarah L Fong
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA.,Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.,Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, USA
| |
Collapse
|
31
|
A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome. Exp Mol Med 2021; 53:615-630. [PMID: 33833373 PMCID: PMC8102501 DOI: 10.1038/s12276-021-00586-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 01/26/2021] [Accepted: 01/27/2021] [Indexed: 12/13/2022] Open
Abstract
Advances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes. A novel strategy for genome analysis offers insights into the distribution and impact on genome variation of transposable elements, DNA sequences that can replicate and relocate themselves at different chromosomal regions. These sequences, also known as ‘jumping genes’, comprise up to 50% of the genome, but it has proven challenging to map them with existing techniques. Seyoung Mun of Dankook University, Cheonan, South Korea, and coworkers have developed a sequencing and computational analysis strategy that allowed them to accurately map transposable elements across the genome of a Korean individual. These data revealed hundreds of insertion and deletion events relative to an existing reference map of the genome, showing significant alterations in the chromosomal structure. The authors speculate that such widespread transposition events could potentially contribute to individual differences in gene expression and risk of disease.
Collapse
|
32
|
Negm S, Greenberg A, Larracuente A, Sproul J. RepeatProfiler: A pipeline for visualization and comparative analysis of repetitive DNA profiles. Mol Ecol Resour 2021; 21:969-981. [PMID: 33277787 PMCID: PMC7954937 DOI: 10.1111/1755-0998.13305] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 11/11/2020] [Accepted: 11/30/2020] [Indexed: 12/20/2022]
Abstract
Study of repetitive DNA elements in model organisms highlights the role of repetitive elements (REs) in many processes that drive genome evolution and phenotypic change. Because REs are much more dynamic than single-copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower-evolving genomic regions. Many tools for studying REs are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non-model groups, for which genomic resources are limited. Here, we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive element DNA profiles from low-coverage, short-read sequence data. RepeatProfiler automates the generation and visualization of RE coverage depth profiles (RE profiles) and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analysed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of RE profiles as a high-resolution data source for studies in species delimitation, comparative genomics, and repeat biology.
Collapse
Affiliation(s)
- S. Negm
- University of Rochester, Department of Biology, 337 Hutchison Hall, Rochester, NY, 14627
| | - A. Greenberg
- University of Rochester, Department of Biology, 337 Hutchison Hall, Rochester, NY, 14627
| | - A.M. Larracuente
- University of Rochester, Department of Biology, 337 Hutchison Hall, Rochester, NY, 14627
| | - J.S. Sproul
- University of Rochester, Department of Biology, 337 Hutchison Hall, Rochester, NY, 14627
| |
Collapse
|
33
|
Smukowski Heil C, Patterson K, Hickey ASM, Alcantara E, Dunham MJ. Transposable Element Mobilization in Interspecific Yeast Hybrids. Genome Biol Evol 2021; 13:6141023. [PMID: 33595639 PMCID: PMC7952228 DOI: 10.1093/gbe/evab033] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/11/2021] [Indexed: 12/13/2022] Open
Abstract
Barbara McClintock first hypothesized that interspecific hybridization could provide a “genomic shock” that leads to the mobilization of transposable elements (TEs). This hypothesis is based on the idea that regulation of TE movement is potentially disrupted in hybrids. However, the handful of studies testing this hypothesis have yielded mixed results. Here, we set out to identify if hybridization can increase transposition rate and facilitate colonization of TEs in Saccharomyces cerevisiae × Saccharomyces uvarum interspecific yeast hybrids. Saccharomyces cerevisiae have a small number of active long terminal repeat retrotransposons (Ty elements), whereas their distant relative S. uvarum have lost the Ty elements active in S. cerevisiae. Although the regulation system of Ty elements is known in S. cerevisiae, it is unclear how Ty elements are regulated in other Saccharomyces species, and what mechanisms contributed to the loss of most classes of Ty elements in S. uvarum. Therefore, we first assessed whether TEs could insert in the S. uvarum sub-genome of a S. cerevisiae × S. uvarum hybrid. We induced transposition to occur in these hybrids and developed a sequencing technique to show that Ty elements insert readily and nonrandomly in the S. uvarum genome. We then used an in vivo reporter construct to directly measure transposition rate in hybrids, demonstrating that hybridization itself does not alter rate of mobilization. However, we surprisingly show that species-specific mitochondrial inheritance can change transposition rate by an order of magnitude. Overall, our results provide evidence that hybridization can potentially facilitate the introduction of TEs across species boundaries and alter transposition via mitochondrial transmission, but that this does not lead to unrestrained proliferation of TEs suggested by the genomic shock theory.
Collapse
Affiliation(s)
- Caiti Smukowski Heil
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Kira Patterson
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | - Erica Alcantara
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
34
|
Abstract
Drosophila melanogaster, a small dipteran of African origin, represents one of the best-studied model organisms. Early work in this system has uniquely shed light on the basic principles of genetics and resulted in a versatile collection of genetic tools that allow to uncover mechanistic links between genotype and phenotype. Moreover, given its worldwide distribution in diverse habitats and its moderate genome-size, Drosophila has proven very powerful for population genetics inference and was one of the first eukaryotes whose genome was fully sequenced. In this book chapter, we provide a brief historical overview of research in Drosophila and then focus on recent advances during the genomic era. After describing different types and sources of genomic data, we discuss mechanisms of neutral evolution including the demographic history of Drosophila and the effects of recombination and biased gene conversion. Then, we review recent advances in detecting genome-wide signals of selection, such as soft and hard selective sweeps. We further provide a brief introduction to background selection, selection of noncoding DNA and codon usage and focus on the role of structural variants, such as transposable elements and chromosomal inversions, during the adaptive process. Finally, we discuss how genomic data helps to dissect neutral and adaptive evolutionary mechanisms that shape genetic and phenotypic variation in natural populations along environmental gradients. In summary, this book chapter serves as a starting point to Drosophila population genomics and provides an introduction to the system and an overview to data sources, important population genetic concepts and recent advances in the field.
Collapse
|
35
|
Lup SD, Wilson-Sánchez D, Andreu-Sánchez S, Micol JL. Easymap: A User-Friendly Software Package for Rapid Mapping-by-Sequencing of Point Mutations and Large Insertions. FRONTIERS IN PLANT SCIENCE 2021; 12:655286. [PMID: 34040621 PMCID: PMC8143052 DOI: 10.3389/fpls.2021.655286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/18/2021] [Indexed: 05/15/2023]
Abstract
Mapping-by-sequencing strategies combine next-generation sequencing (NGS) with classical linkage analysis, allowing rapid identification of the causal mutations of the phenotypes exhibited by mutants isolated in a genetic screen. Computer programs that analyze NGS data obtained from a mapping population of individuals derived from a mutant of interest to identify a causal mutation are available; however, the installation and usage of such programs requires bioinformatic skills, modifying or combining pieces of existing software, or purchasing licenses. To ease this process, we developed Easymap, an open-source program that simplifies the data analysis workflows from raw NGS reads to candidate mutations. Easymap can perform bulked segregant mapping of point mutations induced by ethyl methanesulfonate (EMS) with DNA-seq or RNA-seq datasets, as well as tagged-sequence mapping for large insertions, such as transposons or T-DNAs. The mapping analyses implemented in Easymap have been validated with experimental and simulated datasets from different plant and animal model species. Easymap was designed to be accessible to all users regardless of their bioinformatics skills by implementing a user-friendly graphical interface, a simple universal installation script, and detailed mapping reports, including informative images and complementary data for assessment of the mapping results. Easymap is available at http://genetics.edu.umh.es/resources/easymap; its Quickstart Installation Guide details the recommended procedure for installation.
Collapse
|
36
|
Ewing AD, Smits N, Sanchez-Luque FJ, Faivre J, Brennan PM, Richardson SR, Cheetham SW, Faulkner GJ. Nanopore Sequencing Enables Comprehensive Transposable Element Epigenomic Profiling. Mol Cell 2020; 80:915-928.e5. [PMID: 33186547 DOI: 10.1016/j.molcel.2020.10.024] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 10/14/2020] [Accepted: 10/15/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) drive genome evolution and are a notable source of pathogenesis, including cancer. While CpG methylation regulates TE activity, the locus-specific methylation landscape of mobile human TEs has to date proven largely inaccessible. Here, we apply new computational tools and long-read nanopore sequencing to directly infer CpG methylation of novel and extant TE insertions in hippocampus, heart, and liver, as well as paired tumor and non-tumor liver. As opposed to an indiscriminate stochastic process, we find pronounced demethylation of young long interspersed element 1 (LINE-1) retrotransposons in cancer, often distinct to the adjacent genome and other TEs. SINE-VNTR-Alu (SVA) retrotransposons, including their internal tandem repeat-associated CpG island, are near-universally methylated. We encounter allele-specific TE methylation and demethylation of aberrantly expressed young LINE-1s in normal tissues. Finally, we recover the complete sequences of tumor-specific LINE-1 insertions and their retrotransposition hallmarks, demonstrating how long-read sequencing can simultaneously survey the epigenome and detect somatic TE mobilization.
Collapse
Affiliation(s)
- Adam D Ewing
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia.
| | - Nathan Smits
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia
| | - Francisco J Sanchez-Luque
- GENYO, Pfizer-University of Granada-Andalusian Government Centre for Genomics and Oncological Research, PTS Granada 18016, Spain; MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Jamila Faivre
- INSERM, U1193, Paul-Brousse University Hospital, Hepatobiliary Centre, Villejuif 94800, France
| | - Paul M Brennan
- Translational Neurosurgery, Centre for Clinical Brain Sciences, Edinburgh EH16 4SB, UK
| | - Sandra R Richardson
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia
| | - Seth W Cheetham
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia.
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia; Queensland Brain Institute, University of Queensland, St. Lucia, QLD 4067, Australia.
| |
Collapse
|
37
|
Pinto L, Torres C, Gil C, Santos HM, Capelo JL, Borges V, Gomes JP, Silva C, Vieira L, Poeta P, Igrejas G. Multiomics Substrates of Resistance to Emerging Pathogens? Transcriptome and Proteome Profile of a Vancomycin-Resistant Enterococcus faecalis Clinical Strain. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 24:81-95. [PMID: 32073998 DOI: 10.1089/omi.2019.0164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Antibiotic resistance and hospital acquired infections are on the rise worldwide. Vancomycin-resistant enterococci have been reported in clinical settings in recent decades. In this multiomics study, we provide comprehensive proteomic and transcriptomic analyses of a vancomycin-resistant Enterococcus faecalis clinical isolate from a patient with a urinary tract infection. The previous genotypic profile of the strain C2620 indicated the presence of antibiotic resistance genes characteristic of the vanB cluster. To further investigate the transcriptome of this pathogenic strain, we used whole genome sequencing and RNA-sequencing to detect and quantify the genes expressed. In parallel, we used two-dimensional gel electrophoresis followed by MALDI-TOF/MS (Matrix-assisted laser desorption/ionization-Time-of-flight/Mass spectrometry) to identify the proteins in the proteome. We studied the membrane and cytoplasm subproteomes separately. From a total of 207 analysis spots, we identified 118 proteins. The protein list was compared to the results obtained from the full transcriptome assay. Several genes and proteins related to stress and cellular response were identified, as well as some linked to antibiotic and drug responses, which is consistent with the known state of multiresistance. Even though the correlation between transcriptome and proteome data is not yet fully understood, the use of multiomics approaches has proven to be increasingly relevant to achieve deeper insights into the survival ability of pathogenic bacteria found in health care facilities.
Collapse
Affiliation(s)
- Luís Pinto
- Department of Genetics and Biotechnology, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.,Functional Genomics and Proteomics Unit, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.,Veterinary Science Department, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal
| | - Carmen Torres
- Área de Bioquímica y Biología Molecular, Universidad de La Rioja, Logroño, Spain
| | - Concha Gil
- Departamento de Microbiologia II, Facultad de Farmacia, Universidad Complutense de Madrid, Madrid, Spain
| | - Hugo M Santos
- LAQV-REQUIMTE, Faculty of Science and Technology, Nova University of Lisbon, Lisbon, Portugal
| | - José Luís Capelo
- LAQV-REQUIMTE, Faculty of Science and Technology, Nova University of Lisbon, Lisbon, Portugal
| | - Vítor Borges
- Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health, Lisbon, Portugal
| | - João Paulo Gomes
- Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health, Lisbon, Portugal
| | - Catarina Silva
- Innovation and Technology Unit, Department of Human Genetics, National Institute of Health, Lisbon, Portugal
| | - Luís Vieira
- Innovation and Technology Unit, Department of Human Genetics, National Institute of Health, Lisbon, Portugal
| | - Patrícia Poeta
- Veterinary Science Department, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.,LAQV-REQUIMTE, Faculty of Science and Technology, Nova University of Lisbon, Lisbon, Portugal
| | - Gilberto Igrejas
- Department of Genetics and Biotechnology, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.,Functional Genomics and Proteomics Unit, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.,LAQV-REQUIMTE, Faculty of Science and Technology, Nova University of Lisbon, Lisbon, Portugal
| |
Collapse
|
38
|
Alquezar‐Planas DE, Löber U, Cui P, Quedenau C, Chen W, Greenwood AD. DNA sonication inverse PCR for genome scale analysis of uncharacterized flanking sequences. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13497] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- David E. Alquezar‐Planas
- Department of Wildlife Diseases Leibniz Institute for Zoo and Wildlife Research Berlin Germany
- Australian Museum Research InstituteAustralian Museum Sydney NSW Australia
| | - Ulrike Löber
- Department of Wildlife Diseases Leibniz Institute for Zoo and Wildlife Research Berlin Germany
- The Berlin Center for Genomics in Biodiversity Research Berlin Germany
- Experimental and Clinical Research Center A Cooperation of Charité – Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine Berlin Germany
| | - Pin Cui
- Department of Wildlife Diseases Leibniz Institute for Zoo and Wildlife Research Berlin Germany
| | - Claudia Quedenau
- Genomics Max Delbrück Center for Molecular Medicine Berlin Germany
| | - Wei Chen
- Berlin Institute for Medical Systems BiologyMax‐Delbrück Center for Molecular Medicine Berlin Germany
| | - Alex D. Greenwood
- Department of Wildlife Diseases Leibniz Institute for Zoo and Wildlife Research Berlin Germany
- Department of Veterinary Medicine Freie Universität Berlin Berlin Germany
| |
Collapse
|
39
|
Thielen PM, Pendleton AL, Player RA, Bowden KV, Lawton TJ, Wisecaver JH. Reference Genome for the Highly Transformable Setaria viridis ME034V. G3 (BETHESDA, MD.) 2020; 10:3467-3478. [PMID: 32694197 PMCID: PMC7534418 DOI: 10.1534/g3.120.401345] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 07/16/2020] [Indexed: 12/22/2022]
Abstract
Setaria viridis (green foxtail) is an important model system for improving cereal crops due to its diploid genome, ease of cultivation, and use of C4 photosynthesis. The S. viridis accession ME034V is exceptionally transformable, but the lack of a sequenced genome for this accession has limited its utility. We present a 397 Mb highly contiguous de novo assembly of ME034V using ultra-long nanopore sequencing technology (read N50 = 41kb). We estimate that this genome is largely complete based on our updated k-mer based genome size estimate of 401 Mb for S. viridis Genome annotation identified 37,908 protein-coding genes and >300k repetitive elements comprising 46% of the genome. We compared the ME034V assembly with two other previously sequenced Setaria genomes as well as to a diversity panel of 235 S. viridis accessions. We found the genome assemblies to be largely syntenic, but numerous unique polymorphic structural variants were discovered. Several ME034V deletions may be associated with recent retrotransposition of copia and gypsy LTR repeat families, as evidenced by their low genotype frequencies in the sampled population. Lastly, we performed a phylogenomic analysis to identify gene families that have expanded in Setaria, including those involved in specialized metabolism and plant defense response. The high continuity of the ME034V genome assembly validates the utility of ultra-long DNA sequencing to improve genetic resources for emerging model organisms. Structural variation present in Setaria illustrates the importance of obtaining the proper genome reference for genetic experiments. Thus, we anticipate that the ME034V genome will be of significant utility for the Setaria research community.
Collapse
Affiliation(s)
- Peter M Thielen
- Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 20723
| | - Amanda L Pendleton
- Department of Biochemistry, Purdue University, West Lafayette, Indiana 47907
- Purdue Center for Plant Biology, Purdue University, West Lafayette, Indiana 47907
| | - Robert A Player
- Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 20723
| | - Kenneth V Bowden
- Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 20723
| | - Thomas J Lawton
- Johns Hopkins University Applied Physics Laboratory, Laurel, Maryland 20723
| | - Jennifer H Wisecaver
- Department of Biochemistry, Purdue University, West Lafayette, Indiana 47907
- Purdue Center for Plant Biology, Purdue University, West Lafayette, Indiana 47907
| |
Collapse
|
40
|
Orozco-Arias S, Tobon-Orozco N, Piña JS, Jiménez-Varón CF, Tabares-Soto R, Guyot R. TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets. BIOLOGY 2020; 9:E281. [PMID: 32917036 PMCID: PMC7563458 DOI: 10.3390/biology9090281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 09/01/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era. TIP_finder version 1.0 is available at https://github.com/simonorozcoarias/TIP_finder.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170002, Colombia
| | - Nicolas Tobon-Orozco
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | - Johan S. Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
- Institut de Recherche pour le Développement (IRD), CIRAD, Université de Montpellier, 34394 Montpellier, France
| |
Collapse
|
41
|
Liu Z, Fan M, Yue EK, Li Y, Tao RF, Xu HM, Duan MH, Xu JH. Natural variation and evolutionary dynamics of transposable elements in Brassica oleracea based on next-generation sequencing data. HORTICULTURE RESEARCH 2020; 7:145. [PMID: 32922817 PMCID: PMC7459127 DOI: 10.1038/s41438-020-00367-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 05/22/2020] [Accepted: 06/19/2020] [Indexed: 06/02/2023]
Abstract
Brassica oleracea comprises various economically important vegetables and presents extremely diverse morphological variations. They provide a rich source of nutrition for human health and have been used as a model system for studying polyploidization. Transposable elements (TEs) account for nearly 40% of the B. oleracea genome and contribute greatly to genetic diversity and genome evolution. Although the proliferation of TEs has led to a large expansion of the B. oleracea genome, little is known about the population dynamics and evolutionary activity of TEs. A comprehensive mobilome profile of 45,737 TE loci was obtained from resequencing data from 121 diverse accessions across nine B. oleracea morphotypes. Approximately 70% (32,195) of the loci showed insertion polymorphisms between or within morphotypes. In particular, up to 1221 loci were differentially fixed among morphotypes. Further analysis revealed that the distribution of the population frequency of TE loci was highly variable across different TE superfamilies and families, implying a diverse expansion history during host genome evolution. These findings provide better insight into the evolutionary dynamics and genetic diversity of B. oleracea genomes and will potentially serve as a valuable resource for molecular markers and association studies between TE-based genomic variations and morphotype-specific phenotypic differentiation.
Collapse
Affiliation(s)
- Zhen Liu
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Miao Fan
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Er-Kui Yue
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Yu Li
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Ruo-Fu Tao
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Hai-Ming Xu
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| | - Ming-Hua Duan
- Zhejiang Zhengjingyuan Pharmacy Chain Co., Ltd. & Hangzhou Zhengcaiyuan Pharmaceutical Co., Ltd., 310021 Hangzhou, People’s Republic of China
| | - Jian-Hong Xu
- Institute of Crop Science, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, 310058 Hangzhou, People’s Republic of China
| |
Collapse
|
42
|
Stritt C, Wyler M, Gimmi EL, Pippel M, Roulin AC. Diversity, dynamics and effects of long terminal repeat retrotransposons in the model grass Brachypodium distachyon. THE NEW PHYTOLOGIST 2020; 227:1736-1748. [PMID: 31677277 PMCID: PMC7497039 DOI: 10.1111/nph.16308] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/10/2019] [Indexed: 05/13/2023]
Abstract
Transposable elements (TEs) are the main reason for the high plasticity of plant genomes, where they occur as communities of diverse evolutionary lineages. Because research has typically focused on single abundant families or summarized TEs at a coarse taxonomic level, our knowledge about how these lineages differ in their effects on genome evolution is still rudimentary. Here we investigate the community composition and dynamics of 32 long terminal repeat retrotransposon (LTR-RT) families in the 272-Mb genome of the Mediterranean grass Brachypodium distachyon. We find that much of the recent transpositional activity in the B. distachyon genome is due to centromeric Gypsy families and Copia elements belonging to the Angela lineage. With a half-life as low as 66 kyr, the latter are the most dynamic part of the genome and an important source of within-species polymorphisms. Second, GC-rich Gypsy elements of the Retand lineage are the most abundant TEs in the genome. Their presence explains > 20% of the genome-wide variation in GC content and is associated with higher methylation levels. Our study shows how individual TE lineages change the genetic and epigenetic constitution of the host beyond simple changes in genome size.
Collapse
Affiliation(s)
- Christoph Stritt
- Institute for Plant and Microbial BiologyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Michele Wyler
- Institute for Plant and Microbial BiologyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Elena L. Gimmi
- Institute for Plant and Microbial BiologyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and GeneticsPfotenhauerstrasse 108Dresden01307Germany
| | - Anne C. Roulin
- Institute for Plant and Microbial BiologyUniversity of ZurichZollikerstrasse 107Zurich8008Switzerland
| |
Collapse
|
43
|
Zhang Y, Sun X, Wang Q, Xu J, Dong F, Yang S, Yang J, Zhang Z, Qian Y, Chen J, Zhang J, Liu Y, Tao R, Jiang Y, Yang J, Yang S. Multicopy Chromosomal Integration Using CRISPR-Associated Transposases. ACS Synth Biol 2020; 9:1998-2008. [PMID: 32551502 DOI: 10.1021/acssynbio.0c00073] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Controlling the copy number of gene expression cassettes is an important strategy to engineer bacterial cells into high-efficiency biocatalysts. Current strategies mostly use plasmid vectors, but multicopy plasmids are often genetically unstable, and their copy numbers cannot be precisely controlled. The integration of expression cassettes into a bacterial chromosome has advantages, but iterative integration is laborious, and it is challenging to obtain a library with varied gene doses for phenotype characterization. Here, we demonstrated that multicopy chromosomal integration using CRISPR-associated transposases (MUCICAT) can be achieved by designing a crRNA to target multicopy loci or a crRNA array to target multiple loci in the Escherichia coli genome. Within 5 days without selection pressure, E. coli strains carrying cargos with successively increasing copy numbers (up to 10) were obtained. Recombinant MUCICAT E. coli containing genomic multicopy glucose dehydrogenase expression cassettes showed 2.6-fold increased expression of this important industrial enzyme compared to E. coli harboring the conventional protein-expressing plasmid pET24a. Successful extension of MUCICAT to Tatumella citrea further demonstrated that MUCICAT may be generally applied to many bacterial species.
Collapse
Affiliation(s)
- Yiwen Zhang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoman Sun
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210046, China
| | - Qingzhuo Wang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaqi Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Feng Dong
- Huzhou Center of Industrial Biotechnology, Shanghai Institutes for Biological Sciences, Huzhou 313000, China
| | - Siqi Yang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiawei Yang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210046, China
| | - Zixu Zhang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210046, China
| | - Yuan Qian
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Jun Chen
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Jiao Zhang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yingmiao Liu
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Rongsheng Tao
- Huzhou Center of Industrial Biotechnology, Shanghai Institutes for Biological Sciences, Huzhou 313000, China
| | - Yu Jiang
- Shanghai Taoyusheng Biotechnology Co., Ltd, Shanghai 201203, China
| | - Junjie Yang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Sheng Yang
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- Huzhou Center of Industrial Biotechnology, Shanghai Institutes for Biological Sciences, Huzhou 313000, China
| |
Collapse
|
44
|
Ferchaud AL, Leitwein M, Laporte M, Boivin-Delisle D, Bougas B, Hernandez C, Normandeau É, Thibault I, Bernatchez L. Adaptive and maladaptive genetic diversity in small populations: Insights from the Brook Charr (Salvelinus fontinalis) case study. Mol Ecol 2020; 29:3429-3445. [PMID: 33463857 DOI: 10.1111/mec.15566] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 07/13/2020] [Accepted: 07/16/2020] [Indexed: 12/12/2022]
Abstract
Investigating the relative importance of neutral versus selective processes governing the accumulation of genetic variants is a key goal in both evolutionary and conservation biology. This is particularly true in the context of small populations, where genetic drift can counteract the effect of selection. Using Brook Charr (Salvelinus fontinalis) from Québec, Canada, as a case study, we investigated the importance of demographic versus selective processes governing the accumulation of both adaptive and maladaptive mutations in closed versus open and connected populations to assess gene flow effect. This was achieved by using 14,779 high-quality filtered SNPs genotyped among 1,416 fish representing 50 populations from three life history types: lacustrine (closed populations), riverine and anadromous (connected populations). Using the PROVEAN algorithm, we observed a considerable accumulation of putative deleterious mutations across populations. The absence of correlation between the occurrence of putatively beneficial or deleterious mutations and local recombination rate supports the hypothesis that genetic drift might be the main driver of the accumulation of such variants. However, despite a lower genetic diversity observed in lacustrine than in riverine or anadromous populations, lacustrine populations do not exhibit more deleterious mutations than the two other history types, suggesting that the negative effect of genetic drift in lacustrine populations may be mitigated by that of relaxed purifying selection. Moreover, we also identified genomic regions associated with anadromy, as well as an overrepresentation of transposable elements associated with variation in environmental variables, thus supporting the importance of transposable elements in adaptation.
Collapse
Affiliation(s)
- Anne-Laure Ferchaud
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Maeva Leitwein
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Martin Laporte
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Damien Boivin-Delisle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Bérénice Bougas
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Cécilia Hernandez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Éric Normandeau
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Isabel Thibault
- Direction de l'expertise Sur la Faune Aquatique, Ministère des Forêts, de la Faune et des Parcs du Québec, Québec, QC, Canada
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| |
Collapse
|
45
|
Co-option of the lineage-specific LAVA retrotransposon in the gibbon genome. Proc Natl Acad Sci U S A 2020; 117:19328-19338. [PMID: 32690705 DOI: 10.1073/pnas.2006038117] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VNTR-Alu LIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Collapse
|
46
|
Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet 2020; 21:721-736. [PMID: 32576954 DOI: 10.1038/s41576-020-0251-y] [Citation(s) in RCA: 168] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2020] [Indexed: 12/21/2022]
Abstract
Transposable elements (TEs) are insertional mutagens that contribute greatly to the plasticity of eukaryotic genomes, influencing the evolution and adaptation of species as well as physiology or disease in individuals. Measuring TE expression helps to understand not only when and where TE mobilization can occur but also how this process alters gene expression, chromatin accessibility or cellular signalling pathways. Although genome-wide gene expression assays such as RNA sequencing include transposon-derived transcripts, most computational analytical tools discard or misinterpret TE-derived reads. Emerging approaches are improving the identification of expressed TE loci and helping to discriminate TE transcripts that permit TE mobilization from chimeric gene-TE transcripts or pervasive transcription. Here we review the main challenges associated with the detection of TE expression, including mappability, insertional and internal sequence polymorphisms, and the diversity of the TE transcriptional landscape, as well as the different experimental and computational strategies to solve them.
Collapse
|
47
|
SVXplorer: Three-tier approach to identification of structural variants via sequential recombination of discordant cluster signatures. PLoS Comput Biol 2020; 16:e1007737. [PMID: 32182236 PMCID: PMC7100977 DOI: 10.1371/journal.pcbi.1007737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 03/27/2020] [Accepted: 02/18/2020] [Indexed: 11/19/2022] Open
Abstract
The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer.
Collapse
|
48
|
Storer JM, Walker JA, Jordan VE, Batzer MA. Sensitivity of the polyDetect computational pipeline for phylogenetic analyses. Anal Biochem 2020; 593:113516. [PMID: 31794702 DOI: 10.1016/j.ab.2019.113516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 11/19/2019] [Accepted: 11/25/2019] [Indexed: 01/16/2023]
Abstract
Alu elements are powerful phylogenetic markers. The combination of a recently-developed computational pipeline, polyDetect, with high copy number Alu insertions has previously been utilized to help resolve the Papio baboon phylogeny with high statistical support. Here, the polyDetect method was applied to the highly contentious Cebidae phylogeny within New World monkeys (NWM). The polyDetect method relies on conserved homology/identity of short read sequence data among the species being compared to accurately map predicted shared Alu insertions to each unique flanking sequence. The results of this comprehensive assessment indicate that there were insufficient sequence homology/identity stretches in non-repeated DNA sequences among the four Cebidae genera analyzed in this study to make this strategy phylogenetically viable. The ~20 million years of evolutionary divergence of the Cebidae genera has resulted in random sequence decay within the short read data, obscuring potentially orthologous elements in the species tested. These analyses suggest that the polyDetect pipeline is best suited to resolving phylogenies of more recently diverged lineages when high-quality assembled genomes are not available for the taxa of interest.
Collapse
Affiliation(s)
- Jessica M Storer
- Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA, 70803, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA, 70803, USA
| | - Vallmer E Jordan
- Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA, 70803, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA, 70803, USA.
| |
Collapse
|
49
|
Uzunović J, Josephs EB, Stinchcombe JR, Wright SI. Transposable Elements Are Important Contributors to Standing Variation in Gene Expression in Capsella Grandiflora. Mol Biol Evol 2020; 36:1734-1745. [PMID: 31028401 DOI: 10.1093/molbev/msz098] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Transposable elements (TEs) make up a significant portion of eukaryotic genomes and are important drivers of genome evolution. However, the extent to which TEs affect gene expression variation on a genome-wide scale in comparison with other types of variants is still unclear. We characterized TE insertion polymorphisms and their association with gene expression in 124 whole-genome sequences from a single population of Capsella grandiflora, and contrasted this with the effects of single nucleotide polymorphisms (SNPs). Population frequency of insertions was negatively correlated with distance to genes, as well as density of conserved noncoding elements, suggesting that the negative effects of TEs on gene regulation are important in limiting their abundance. Rare TE variants strongly influence gene expression variation, predominantly through downregulation. In contrast, rare SNPs contribute equally to up- and down-regulation, but have a weaker individual effect than TEs. An expression quantitative trait loci (eQTL) analysis shows that a greater proportion of common TEs are eQTLs as opposed to common SNPs, and a third of the genes with TE eQTLs do not have SNP eQTLs. In contrast with rare TE insertions, common insertions are more likely to increase expression, consistent with recent models of cis-regulatory evolution favoring enhancer alleles. Taken together, these results imply that TEs are a significant contributor to gene expression variation and are individually more likely than rare SNPs to cause extreme changes in gene expression.
Collapse
Affiliation(s)
- Jasmina Uzunović
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI
| | - John R Stinchcombe
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.,Koffler Scientific Reserve, University of Toronto, Toronto, Ontario, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
50
|
Lou C, Goodier JL, Qiang R. A potential new mechanism for pregnancy loss: considering the role of LINE-1 retrotransposons in early spontaneous miscarriage. Reprod Biol Endocrinol 2020; 18:6. [PMID: 31964400 PMCID: PMC6971995 DOI: 10.1186/s12958-020-0564-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 01/07/2020] [Indexed: 12/14/2022] Open
Abstract
LINE1 retrotransposons are mobile DNA elements that copy and paste themselves into new sites in the genome. To ensure their evolutionary success, heritable new LINE-1 insertions accumulate in cells that can transmit genetic information to the next generation (i.e., germ cells and embryonic stem cells). It is our hypothesis that LINE1 retrotransposons, insertional mutagens that affect expression of genes, may be causal agents of early miscarriage in humans. The cell has evolved various defenses restricting retrotransposition-caused mutation, but these are occasionally relaxed in certain somatic cell types, including those of the early embryo. We predict that reduced suppression of L1s in germ cells or early-stage embryos may lead to excessive genome mutation by retrotransposon insertion, or to the induction of an inflammatory response or apoptosis due to increased expression of L1-derived nucleic acids and proteins, and so disrupt gene function important for embryogenesis. If correct, a novel threat to normal human development is revealed, and reverse transcriptase therapy could be one future strategy for controlling this cause of embryonic damage in patients with recurrent miscarriages.
Collapse
Affiliation(s)
- Chao Lou
- Department of Genetics, Northwest Women’s and Children’s Hospital, 1616 Yanxiang Road, Xi’an, Shaanxi Province People’s Republic of China
| | - John L. Goodier
- 0000 0001 2171 9311grid.21107.35McKusick-Nathans Deartment of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Rong Qiang
- Department of Genetics, Northwest Women’s and Children’s Hospital, 1616 Yanxiang Road, Xi’an, Shaanxi Province People’s Republic of China
| |
Collapse
|