1
|
Choi JD, Del Pinto LA, Sutter NB. SINE retrotransposons import polyadenylation signals to 3'UTRs in dog (Canis familiaris). Mob DNA 2025; 16:1. [PMID: 39755632 DOI: 10.1186/s13100-024-00338-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 12/17/2024] [Indexed: 01/06/2025] Open
Abstract
BACKGROUND Messenger RNA 3' untranslated regions (3'UTRs) control many aspects of gene expression and determine where the transcript will terminate. The polyadenylation signal (PAS) AAUAAA (AATAAA in DNA) is a key regulator of transcript termination and this hexamer, or a similar sequence, is very frequently found within 30 bp of 3'UTR ends. Short interspersed element (SINE) retrotransposons are found throughout genomes in high copy numbers. When inserted into genes they can disrupt expression, alter splicing, or cause nuclear retention of mRNAs. The genomes of the domestic dog and other carnivores carry hundreds of thousands of Can-SINEs, a tRNA-related SINE with transcription termination potential. Because of this we asked whether Can-SINEs may terminate transcript in some dog genes. RESULTS Each of the dog's nine Can-SINE consensus sequences carry an average of three AATAAA PASs on their sense strands but zero on their antisense strands. Consistent with the idea that Can-SINEs can terminate transcripts, we find that sense-oriented Can-SINEs are approximately ten times more frequent at 3' ends of 3'UTRs compared to further upstream within 3'UTRs. Furthermore, the count of AATAAA PASs on head-to-tail SINE sequences differs significantly between sense and antisense-oriented retrotransposons in transcripts. Can-SINEs near 3'UTR ends are likely to carry an AATAAA motif on the mRNA sense strand while those further upstream are not. We identified loci where Can-SINE insertion has truncated or altered a 3'UTR of the dog genome (dog 3'UTR) compared to the human ortholog. Dog 3'UTRs have peaks of AATAAA PAS frequency at 28, 32, and 36 bp from the end. The periodicity is partly explained by TAAA(n) repeats within Can-SINE AT-rich tails. We annotated all repeat-masked Can-SINE copies in the Boxer reference genome and found that the young SINEC_Cf type has a mode of 15 bp length for target site duplications (TSDs). All dog Can-SINE types favor integration at TSDs beginning with A(4). CONCLUSION Dog Can-SINE retrotransposition has imported AATAAA PASs into gene transcripts and led to alteration of 3'UTRs. AATAAA sequences are selectively removed from Can-SINEs in introns and upstream 3'UTR regions but are retained at the far downstream end of 3'UTRs, which we infer reflects their role as termination sequences for these transcripts.
Collapse
Affiliation(s)
- Jessica D Choi
- Department of Biology, La Sierra University, Riverside, CA, USA.
- The Jackson Laboratory, Bar Harbor, ME, USA.
- Graduate School of Biomedical Sciences, Tufts University, Boston, MA, USA.
| | | | - Nathan B Sutter
- Department of Biology, La Sierra University, Riverside, CA, USA
| |
Collapse
|
2
|
Stévant I, Gonen N, Poulat F. Transposable elements acquire time- and sex-specific transcriptional and epigenetic signatures along mouse fetal gonad development. Front Cell Dev Biol 2024; 11:1327410. [PMID: 38283992 PMCID: PMC10811072 DOI: 10.3389/fcell.2023.1327410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/20/2023] [Indexed: 01/30/2024] Open
Abstract
Gonadal sex determination in mice is a complex and dynamic process, which is crucial for the development of functional reproductive organs. The expression of genes involved in this process is regulated by a variety of genetic and epigenetic mechanisms. Recently, there has been increasing evidence that transposable elements (TEs), which are a class of mobile genetic elements, play a significant role in regulating gene expression during embryogenesis and organ development. In this study, we aimed to investigate the involvement of TEs in the regulation of gene expression during mouse embryonic gonadal development. Through bioinformatics analysis, we aimed to identify and characterize specific TEs that operate as regulatory elements for sex-specific genes, as well as their potential mechanisms of regulation. We identified TE loci expressed in a time- and sex-specific manner along fetal gonad development that correlate positively and negatively with nearby gene expression, suggesting that their expression is integrated to the gonadal regulatory network. Moreover, chromatin accessibility and histone post-transcriptional modification analyses in differentiating supporting cells revealed that TEs are acquiring a sex-specific signature for promoter-, enhancer-, and silencer-like elements, with some of them being proximal to critical sex-determining genes. Altogether, our study introduces TEs as the new potential players in the gene regulatory network that controls gonadal development in mammals.
Collapse
Affiliation(s)
- Isabelle Stévant
- The Mina and Everard Goodman Faculty of Life Sciences and the Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan, Israel
- Institute of Human Genetics, CNRS UMR9002 University of Montpellier, Montpellier, France
| | - Nitzan Gonen
- The Mina and Everard Goodman Faculty of Life Sciences and the Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat Gan, Israel
| | - Francis Poulat
- Institute of Human Genetics, CNRS UMR9002 University of Montpellier, Montpellier, France
| |
Collapse
|
3
|
Deng S. The origin of genetic and metabolic systems: Evolutionary structuralinsights. Heliyon 2023; 9:e14466. [PMID: 36967965 PMCID: PMC10036676 DOI: 10.1016/j.heliyon.2023.e14466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 02/27/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023] Open
Abstract
DNA is derived from reverse transcription and its origin is related to reverse transcriptase, DNA polymerase and integrase. The gene structure originated from the evolution of the first RNA polymerase. Thus, an explanation of the origin of the genetic system must also explain the evolution of these enzymes. This paper proposes a polymer structure model, termed the stable complex evolution model, which explains the evolution of enzymes and functional molecules. Enzymes evolved their functions by forming locally tightly packed complexes with specific substrates. A metabolic reaction can therefore be considered to be the result of adaptive evolution in this way when a certain essential molecule is lacking in a cell. The evolution of the primitive genetic and metabolic systems was thus coordinated and synchronized. According to the stable complex model, almost all functional molecules establish binding affinity and specific recognition through complementary interactions, and functional molecules therefore have the nature of being auto-reactive. This is thermodynamically favorable and leads to functional duplication and self-organization. Therefore, it can be speculated that biological systems have a certain tendency to maintain functional stability or are influenced by an inherent selective power. The evolution of dormant bacteria may support this hypothesis, and inherent selectivity can be unified with natural selection at the molecular level.
Collapse
Affiliation(s)
- Shaojie Deng
- Chongqing (Fengjie) Municipal Bureau of Planning and Natural Resources, China
| |
Collapse
|
4
|
Savage AL, Iacoangeli A, Schumann GG, Rubio-Roldan A, Garcia-Perez JL, Al Khleifat A, Koks S, Bubb VJ, Al-Chalabi A, Quinn JP. Characterisation of retrotransposon insertion polymorphisms in whole genome sequencing data from individuals with amyotrophic lateral sclerosis. Gene 2022; 843:146799. [PMID: 35963498 DOI: 10.1016/j.gene.2022.146799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/15/2022] [Accepted: 08/05/2022] [Indexed: 11/15/2022]
Abstract
The genetics of an individual is a crucial factor in understanding the risk of developing the neurodegenerative disease amyotrophic lateral sclerosis (ALS). There is still a large proportion of the heritability of ALS, particularly in sporadic cases, to be understood. Among others, active transposable elements drive inter-individual variability, and in humans long interspersed element 1 (LINE1, L1), Alu and SINE-VNTR-Alu (SVA) retrotransposons are a source of polymorphic insertions in the population. We undertook a pilot study to characterise the landscape of non-reference retrotransposon insertion polymorphisms (non-ref RIPs) in 15 control and 15 ALS individuals' whole genomes from Project MinE, an international project to identify potential genetic causes of ALS. The combination of two bioinformatics tools (mobile element locator tool (MELT) and TEBreak) identified on average 1250 Alu, 232 L1 and 77 SVA non-ref RIPs per genome across the 30 analysed. Further PCR validation of individual polymorphic retrotransposon insertions showed a similar level of accuracy for MELT and TEBreak. Our preliminary study did not identify a specific RIP or a significant difference in the total number of non-ref RIPs in ALS compared to control genomes. The use of multiple bioinformatic tools improved the accuracy of non-ref RIP detection and our study highlights the potential importance of studying these elements further in ALS.
Collapse
Affiliation(s)
- Abigail L Savage
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Alfredo Iacoangeli
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK; Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 8AF, UK
| | - Gerald G Schumann
- Division of Medical Biotechnology, Paul-Ehrlich-Institut, Langen 63225, Germany
| | - Alejandro Rubio-Roldan
- Department of Genomic Medicine and Department of Oncology, GENYO, Centre for Genomics & Oncology, PTS Granada, 18007, Spain
| | - Jose L Garcia-Perez
- Department of Genomic Medicine and Department of Oncology, GENYO, Centre for Genomics & Oncology, PTS Granada, 18007, Spain; MRC-HGU Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Ahmad Al Khleifat
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK
| | - Sulev Koks
- Perron Institute for Neurological and Translational Science, Perth, Western Australia 6009, Australia; Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, Western Australia 6150, Australia
| | - Vivien J Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Ammar Al-Chalabi
- Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London SE5 9RT, UK; Department of Neurology, King's College Hospital, London SE5 9RS, UK
| | - John P Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK.
| |
Collapse
|
5
|
Rodríguez-Quiroz R, Valdebenito-Maturana B. SoloTE for improved analysis of transposable elements in single-cell RNA-Seq data using locus-specific expression. Commun Biol 2022; 5:1063. [PMID: 36202992 PMCID: PMC9537157 DOI: 10.1038/s42003-022-04020-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/21/2022] [Indexed: 11/08/2022] Open
Abstract
Transposable Elements (TEs) contribute to the repetitive fraction in almost every eukaryotic genome known to date, and their transcriptional activation can influence the expression of neighboring genes in healthy and disease states. Single cell RNA-Seq (scRNA-Seq) is a technical advance that allows the study of gene expression on a cell-by-cell basis. Although a current computational approach is available for the single cell analysis of TE expression, it omits their genomic location. Here we show SoloTE, a pipeline that outperforms the previous approach in terms of computational resources and by allowing the inclusion of locus-specific TE activity in scRNA-Seq expression matrixes. We then apply SoloTE to several datasets to reveal the repertoire of TEs that become transcriptionally active in different cell groups, and based on their genomic location, we predict their potential impact on gene expression. As our tool takes as input the resulting files from standard scRNA-Seq processing pipelines, we expect it to be widely adopted in single cell studies to help researchers discover patterns of cellular diversity associated with TE expression.
Collapse
Affiliation(s)
- Rocío Rodríguez-Quiroz
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | |
Collapse
|
6
|
van der Kuyl AC. Analysis of Simian Endogenous Retrovirus (SERV) Full-Length Proviruses in Old World Monkey Genomes. Genes (Basel) 2022; 13:119. [PMID: 35052460 PMCID: PMC8775094 DOI: 10.3390/genes13010119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/20/2021] [Accepted: 01/06/2022] [Indexed: 02/05/2023] Open
Abstract
Simian endogenous retrovirus, SERV, is a successful germ line invader restricted to Old World monkey (OWM) species. (1) Background: The availability of high-quality primate genomes warrants a study of the characteristics, evolution, and distribution of SERV proviruses. (2) Methods: Cercopithecinae OWM genomes from public databases were queried for the presence of full-length SERV proviruses. A dataset of 81 Cer-SERV genomes was generated and analyzed. (3) Results: Full-length Cer-SERV proviruses were mainly found in terrestrial OWM, and less so in arboreal, forest- dwelling monkeys. Phylogenetic analysis confirmed the existence of two genotypes, Cer-SERV-1 and Cer-SERV-2, with Cer-SERV-1 showing evidence of recent germ-line expansions. Long Terminal Repeat (LTR) variation indicated that most proviruses were of a similar age and were estimated to be between <0.3 and 10 million years old. Integrations shared between species were relatively rare. Sequence analysis further showed extensive CpG methylation-associated mutations, variable Primer Binding Site (PBS) use with Cer-SERV-1 using PBSlys3 and Cer-SERV-2 using PBSlys1,2, and the recent gain of LTR motifs for transcription factors active during embryogenesis in Cer-SERV-1. (4) Conclusions: sequence analysis of 81 SERV proviruses from Cercopithecinae OWM genomes provides evidence for the adaptation of this retrovirus to germ line reproduction.
Collapse
Affiliation(s)
- Antoinette C van der Kuyl
- Laboratory of Experimental Virology, Department of Medical Microbiology and Infection Prevention, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| |
Collapse
|
7
|
Hagemeijer YP, Guryev V, Horvatovich P. Accurate Prediction of Protein Sequences for Proteogenomics Data Integration. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2420:233-260. [PMID: 34905178 DOI: 10.1007/978-1-0716-1936-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This book chapter discusses proteogenomics data integration and provides an overview into the different omics layer involved in defining the proteome of a living organism. Various aspects of genome variability affecting either the sequence or abundance level of proteins are discussed in this book chapter, such as the effect of single-nucleotide variants or larger genomic structural variants on the proteome. Next, various sequencing technologies are introduced and discussed from a proteogenomics data integration perspective such as those providing short- and long-read sequencing and listing their respective advantages and shortcomings for accurate protein variant prediction using genomic/transcriptomics sequencing data. Finally, the various bioinformatics tools used to process and analyze DNA/RNA sequencing data are discussed with the ultimate goal of obtaining accurately predicted sample-specific protein sequences that can be used as a drop-in replacement in existing approaches for peptide and protein identification using popular database search engines such as MSFragger, SearchGUI/PeptideShaker.
Collapse
Affiliation(s)
- Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.,European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands.
| |
Collapse
|
8
|
Cao X, Zhang Y, Payer LM, Lords H, Steranka JP, Burns KH, Xing J. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol 2020; 21:185. [PMID: 32718348 PMCID: PMC7385971 DOI: 10.1186/s13059-020-02101-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 07/14/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Mobile elements are a major source of structural variants in the human genome, and some mobile elements can regulate gene expression and transcript splicing. However, the impact of polymorphic mobile element insertions (pMEIs) on gene expression and splicing in diverse human tissues has not been thoroughly studied. The multi-tissue gene expression and whole genome sequencing data generated by the Genotype-Tissue Expression (GTEx) project provide a great opportunity to systematically evaluate the role of pMEIs in regulating gene expression in human tissues. RESULTS Using the GTEx whole genome sequencing data, we identify 20,545 high-quality pMEIs from 639 individuals. Coupling pMEI genotypes with gene expression profiles, we identify pMEI-associated expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) in 48 tissues. Using joint analyses of pMEIs and other genomic variants, pMEIs are predicted to be the potential causal variant for 3522 eQTLs and 3717 sQTLs. The pMEI-associated eQTLs and sQTLs show a high level of tissue specificity, and these pMEIs are enriched in the proximity of affected genes and in regulatory elements. Using reporter assays, we confirm that several pMEIs associated with eQTLs and sQTLs can alter gene expression levels and isoform proportions, respectively. CONCLUSION Overall, our study shows that pMEIs are associated with thousands of gene expression and splicing variations, indicating that pMEIs could have a significant role in regulating tissue-specific gene expression and transcript splicing. Detailed mechanisms for the role of pMEIs in gene regulation in different tissues will be an important direction for future studies.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Yeting Zhang
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Hannah Lords
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jared P Steranka
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
9
|
Clayton EA, Rishishwar L, Huang TC, Gulati S, Ban D, McDonald JF, Jordan IK. An atlas of transposable element-derived alternative splicing in cancer. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190342. [PMID: 32075558 PMCID: PMC7061986 DOI: 10.1098/rstb.2019.0342] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
Transposable element (TE)-derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially used in normal versus cancer tissues. We analysed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumour tissue pairs across 13 cancer types, resulting in the discovery of 4820 TE-generated alternative splice events distributed among 723 cancer-associated genes. Short interspersed nuclear elements (Alu) and long interspersed nuclear elements (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes, including MYH11, WHSC1 and CANT1, were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
- Evan A. Clayton
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Lavanya Rishishwar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- PanAmerican Bioinformatics Institute, Cali, Colombia
- Applied Bioinformatics Laboratory, Atlanta, GA, USA
| | - Tzu-Chuan Huang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Saurabh Gulati
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Dongjo Ban
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - John F. McDonald
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - I. King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- PanAmerican Bioinformatics Institute, Cali, Colombia
- Applied Bioinformatics Laboratory, Atlanta, GA, USA
| |
Collapse
|
10
|
Spirito G, Mangoni D, Sanges R, Gustincich S. Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data. BMC Bioinformatics 2019; 20:495. [PMID: 31757210 PMCID: PMC6873650 DOI: 10.1186/s12859-019-3113-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 09/20/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are DNA sequences able to mobilize themselves and to increase their copy-number in the host genome. In the past, they have been considered mainly selfish DNA without evident functions. Nevertheless, currently they are believed to have been extensively involved in the evolution of primate genomes, especially from a regulatory perspective. Due to their recent activity they are also one of the primary sources of structural variants (SVs) in the human genome. By taking advantage of sequencing technologies and bioinformatics tools, recent surveys uncovered specific TE structural variants (TEVs) that gave rise to polymorphisms in human populations. When combined with RNA-seq data this information provides the opportunity to study the potential impact of TEs on gene expression in human. RESULTS In this work, we assessed the effects of the presence of specific TEs in cis on the expression of flanking genes by producing associations between polymorphic TEs and flanking gene expression levels in human lymphoblastoid cell lines. By using public data from the 1000 Genome Project and the Geuvadis consortium, we exploited an expression quantitative trait loci (eQTL) approach integrated with additional bioinformatics data mining analyses. We uncovered human loci enriched for common, less common and rare TEVs and identified 323 significant TEV-cis-eQTL associations. SINE-R/VNTR/Alus (SVAs) resulted the TE class with the strongest effects on gene expression. We also unveiled differential functional enrichments on genes associated to TEVs, genes associated to TEV-cis-eQTLs and genes associated to the genomic regions mostly enriched in TEV-cis-eQTLs highlighting, at multiple levels, the impact of TEVs on the host genome. Finally, we also identified polymorphic TEs putatively embedded in transcriptional units, proposing a novel mechanism in which TEVs may mediate individual-specific traits. CONCLUSION We contributed to unveiling the effect of polymorphic TEs on transcription in lymphoblastoid cell lines.
Collapse
Affiliation(s)
- Giovanni Spirito
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Damiano Mangoni
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
| | - Remo Sanges
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy.
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy.
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Naples, Italy.
| | - Stefano Gustincich
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy.
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy.
| |
Collapse
|
11
|
Rishishwar L, Wang L, Wang J, Yi SV, Lachance J, Jordan IK. Evidence for positive selection on recent human transposable element insertions. Gene 2018; 675:69-79. [DOI: 10.1016/j.gene.2018.06.077] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 06/24/2018] [Indexed: 11/29/2022]
|
12
|
Wang L, Jordan IK. Transposable element activity, genome regulation and human health. Curr Opin Genet Dev 2018; 49:25-33. [PMID: 29505964 DOI: 10.1016/j.gde.2018.02.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 01/30/2018] [Accepted: 02/13/2018] [Indexed: 12/21/2022]
Abstract
A convergence of novel genome analysis technologies is enabling population genomic studies of human transposable elements (TEs). Population surveys of human genome sequences have uncovered thousands of individual TE insertions that segregate as common genetic variants, i.e. TE polymorphisms. These recent TE insertions provide an important source of naturally occurring human genetic variation. Investigators are beginning to leverage population genomic data sets to execute genome-scale association studies for assessing the phenotypic impact of human TE polymorphisms. For example, the expression quantitative trait loci (eQTL) analytical paradigm has recently been used to uncover hundreds of associations between human TE insertion variants and gene expression levels. These include population-specific gene regulatory effects as well as coordinated changes to gene regulatory networks. In addition, analyses of linkage disequilibrium patterns with previously characterized genome-wide association study (GWAS) trait variants have uncovered TE insertion polymorphisms that are likely causal variants for a variety of common complex diseases. Gene regulatory mechanisms that underlie specific disease phenotypes have been proposed for a number of these trait associated TE polymorphisms. These new population genomic approaches hold great promise for understanding how ongoing TE activity contributes to functionally relevant genetic variation within and between human populations.
Collapse
Affiliation(s)
- Lu Wang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; PanAmerican Bioinformatics Institute, Cali, Colombia
| | - I King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; PanAmerican Bioinformatics Institute, Cali, Colombia.
| |
Collapse
|
13
|
Wang L, Norris ET, Jordan IK. Human Retrotransposon Insertion Polymorphisms Are Associated with Health and Disease via Gene Regulatory Phenotypes. Front Microbiol 2017; 8:1418. [PMID: 28824558 PMCID: PMC5539088 DOI: 10.3389/fmicb.2017.01418] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 07/13/2017] [Indexed: 11/18/2022] Open
Abstract
The human genome hosts several active families of transposable elements (TEs), including the Alu, LINE-1, and SVA retrotransposons that are mobilized via reverse transcription of RNA intermediates. We evaluated how insertion polymorphisms generated by human retrotransposon activity may be related to common health and disease phenotypes that have been previously interrogated through genome-wide association studies (GWAS). To address this question, we performed a genome-wide screen for retrotransposon polymorphism disease associations that are linked to TE induced gene regulatory changes. Our screen first identified polymorphic retrotransposon insertions found in linkage disequilibrium (LD) with single nucleotide polymorphisms that were previously associated with common complex diseases by GWAS. We further narrowed this set of candidate disease associated retrotransposon polymorphisms by identifying insertions that are located within tissue-specific enhancer elements. We then performed expression quantitative trait loci analysis on the remaining set of candidates in order to identify polymorphic retrotransposon insertions that are associated with gene expression changes in B-cells of the human immune system. This progressive and stringent screen yielded a list of six retrotransposon insertions as the strongest candidates for TE polymorphisms that lead to disease via enhancer-mediated changes in gene regulation. For example, we found an SVA insertion within a cell-type specific enhancer located in the second intron of the B4GALT1 gene. B4GALT1 encodes a glycosyltransferase that functions in the glycosylation of the Immunoglobulin G (IgG) antibody in such a way as to convert its activity from pro- to anti-inflammatory. The disruption of the B4GALT1 enhancer by the SVA insertion is associated with down-regulation of the gene in B-cells, which would serve to keep the IgG molecule in a pro-inflammatory state. Consistent with this idea, the B4GALT1 enhancer SVA insertion is linked to a genomic region implicated by GWAS in both inflammatory conditions and autoimmune diseases, such as systemic lupus erythematosus and Crohn’s disease. We explore this example and the other cases uncovered by our genome-wide screen in an effort to illuminate how retrotransposon insertion polymorphisms can impact human health and disease by causing changes in gene expression.
Collapse
Affiliation(s)
- Lu Wang
- School of Biological Sciences, Georgia Institute of Technology, AtlantaGA, United States.,PanAmerican Bioinformatics InstituteCali, Colombia.,Applied Bioinformatics Laboratory, AtlantaGA, United States
| | - Emily T Norris
- School of Biological Sciences, Georgia Institute of Technology, AtlantaGA, United States.,PanAmerican Bioinformatics InstituteCali, Colombia.,Applied Bioinformatics Laboratory, AtlantaGA, United States
| | - I K Jordan
- School of Biological Sciences, Georgia Institute of Technology, AtlantaGA, United States.,PanAmerican Bioinformatics InstituteCali, Colombia.,Applied Bioinformatics Laboratory, AtlantaGA, United States
| |
Collapse
|
14
|
Wang L, Rishishwar L, Mariño-Ramírez L, Jordan IK. Human population-specific gene expression and transcriptional network modification with polymorphic transposable elements. Nucleic Acids Res 2017; 45:2318-2328. [PMID: 27998931 PMCID: PMC5389732 DOI: 10.1093/nar/gkw1286] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 12/05/2016] [Accepted: 12/12/2016] [Indexed: 02/07/2023] Open
Abstract
Transposable element (TE) derived sequences are known to contribute to the regulation of the human genome. The majority of known TE-derived regulatory sequences correspond to relatively ancient insertions, which are fixed across human populations. The extent to which human genetic variation caused by recent TE activity leads to regulatory polymorphisms among populations has yet to be thoroughly explored. In this study, we searched for associations between polymorphic TE (polyTE) loci and human gene expression levels using an expression quantitative trait loci (eQTL) approach. We compared locus-specific polyTE insertion genotypes to B cell gene expression levels among 445 individuals from 5 human populations. Numerous human polyTE loci correspond to both cis and trans eQTL, and their regulatory effects are directly related to cell type-specific function in the immune system. PolyTE loci are associated with differences in expression between European and African population groups, and a single polyTE loci is indirectly associated with the expression of numerous genes via the regulation of the B cell-specific transcription factor PAX5. The polyTE-gene expression associations we found indicate that human TE genetic variation can have important phenotypic consequences. Our results reveal that TE-eQTL are involved in population-specific gene regulation as well as transcriptional network modification.
Collapse
Affiliation(s)
- Lu Wang
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Lavanya Rishishwar
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Applied Bioinformatics Laboratory, Atlanta, GA 30332, USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, 760043, Colombia
- BIOS Centro de Bioinformática y Biología Computacional, Manizales, Caldas, 170002, Colombia
| | - Leonardo Mariño-Ramírez
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, 760043, Colombia
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - I. King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Applied Bioinformatics Laboratory, Atlanta, GA 30332, USA
- PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, 760043, Colombia
- BIOS Centro de Bioinformática y Biología Computacional, Manizales, Caldas, 170002, Colombia
| |
Collapse
|
15
|
Rishishwar L, Wang L, Clayton EA, Mariño-Ramírez L, McDonald JF, Jordan IK. Population and clinical genetics of human transposable elements in the (post) genomic era. Mob Genet Elements 2017; 7:1-20. [PMID: 28228978 PMCID: PMC5305044 DOI: 10.1080/2159256x.2017.1280116] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/03/2017] [Accepted: 01/04/2017] [Indexed: 10/26/2022] Open
Abstract
Recent technological developments-in genomics, bioinformatics and high-throughput experimental techniques-are providing opportunities to study ongoing human transposable element (TE) activity at an unprecedented level of detail. It is now possible to characterize genome-wide collections of TE insertion sites for multiple human individuals, within and between populations, and for a variety of tissue types. Comparison of TE insertion site profiles between individuals captures the germline activity of TEs and reveals insertion site variants that segregate as polymorphisms among human populations, whereas comparison among tissue types ascertains somatic TE activity that generates cellular heterogeneity. In this review, we provide an overview of these new technologies and explore their implications for population and clinical genetic studies of human TEs. We cover both recent published results on human TE insertion activity as well as the prospects for future TE studies related to human evolution and health.
Collapse
Affiliation(s)
- Lavanya Rishishwar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; PanAmerican Bioinformatics Institute, Cali, Colombia; Applied Bioinformatics Laboratory, Atlanta, GA, USA
| | - Lu Wang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; PanAmerican Bioinformatics Institute, Cali, Colombia
| | - Evan A Clayton
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; Ovarian Cancer Institute, Atlanta, GA, USA
| | - Leonardo Mariño-Ramírez
- PanAmerican Bioinformatics Institute, Cali, Colombia; National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - John F McDonald
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; Ovarian Cancer Institute, Atlanta, GA, USA
| | - I King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA; PanAmerican Bioinformatics Institute, Cali, Colombia; Applied Bioinformatics Laboratory, Atlanta, GA, USA
| |
Collapse
|
16
|
Broecker F, Horton R, Heinrich J, Franz A, Schweiger MR, Lehrach H, Moelling K. The intron-enriched HERV-K(HML-10) family suppresses apoptosis, an indicator of malignant transformation. Mob DNA 2016; 7:25. [PMID: 27980690 PMCID: PMC5142424 DOI: 10.1186/s13100-016-0081-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 11/19/2016] [Indexed: 02/06/2023] Open
Abstract
Background Human endogenous retroviruses (HERVs) constitute 8% of the human genome and contribute substantially to the transcriptome. HERVs have been shown to generate RNAs that modulate host gene expression. However, experimental evidence for an impact of these regulatory transcripts on the cellular phenotype has been lacking. Results We characterized the previously little described HERV-K(HML-10) endogenous retrovirus family on a genome-wide scale. HML-10 invaded the ancestral genome of Old World monkeys about 35 Million years ago and is enriched within introns of human genes when compared to other HERV families. We show that long terminal repeats (LTRs) of HML-10 exhibit variable promoter activity in human cancer cell lines. One identified HML-10 LTR-primed RNA was in opposite orientation to the pro-apoptotic Death-associated protein 3 (DAP3). In HeLa cells, experimental inactivation of HML-10 LTR-primed transcripts induced DAP3 expression levels, which led to apoptosis. Conclusions Its enrichment within introns suggests that HML-10 may have been evolutionary co-opted for gene regulation more than other HERV families. We demonstrated such a regulatory activity for an HML-10 RNA that suppressed DAP3-mediated apoptosis in HeLa cells. Since HML-10 RNA appears to be upregulated in various tumor cell lines and primary tumor samples, it may contribute to evasion of apoptosis in malignant cells. However, the overall weak expression of HML-10 transcripts described here raises the question whether our result described for HeLa represent a rare event in cancer. A possible function in other cells or tissues requires further investigation. Electronic supplementary material The online version of this article (doi:10.1186/s13100-016-0081-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Felix Broecker
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany ; Institute of Medical Microbiology, University of Zurich, Gloriastr. 32, 8006 Zurich, Switzerland ; Current affiliation: Max Planck Institute of Colloids and Interfaces, Am Mühlenberg 1, 14424 Potsdam, Germany
| | - Roger Horton
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany
| | - Jochen Heinrich
- Institute of Medical Microbiology, University of Zurich, Gloriastr. 32, 8006 Zurich, Switzerland
| | - Alexandra Franz
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany ; Current affiliation: University of Zurich, Institute of Molecular Life Sciences, Winterthurerstr. 190, 8057 Zurich, Switzerland
| | - Michal-Ruth Schweiger
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany ; Current affiliation: Functional Epigenomics, CCG, Cologne University Hospital, University of Cologne, Weyertal 115b, 50931 Cologne, Germany
| | - Hans Lehrach
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany ; Dahlem Centre for Genome Research and Medical Systems Biology, Fabeckstr. 60-62, 14195 Berlin, Germany
| | - Karin Moelling
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany ; Institute of Medical Microbiology, University of Zurich, Gloriastr. 32, 8006 Zurich, Switzerland
| |
Collapse
|
17
|
Abstract
Over 40% of mammalian genomes comprise the products of reverse transcription. Among such retrotransposed sequences are those characterized by the presence of long terminal repeats (LTRs), including the endogenous retroviruses (ERVs), which are inherited genetic elements closely resembling the proviruses formed following exogenous retrovirus infection. Sequences derived from ERVs make up at least 8 to 10% of the human and mouse genomes and range from ancient sequences that predate mammalian divergence to elements that are currently still active. In this chapter we describe the discovery, classification and origins of ERVs in mammals and consider cellular mechanisms that have evolved to control their expression. We also discuss the negative effects of ERVs as agents of genetic disease and cancer and review examples of ERV protein domestication to serve host functions, as in placental development. Finally, we address growing evidence that the gene regulatory potential of ERV LTRs has been exploited multiple times during evolution to regulate genes and gene networks. Thus, although recently endogenized retroviral elements are often pathogenic, those that survive the forces of negative selection become neutral components of the host genome or can be harnessed to serve beneficial roles.
Collapse
|
18
|
Thung DT, de Ligt J, Vissers LEM, Steehouwer M, Kroon M, de Vries P, Slagboom EP, Ye K, Veltman JA, Hehir-Kwa JY. Mobster: accurate detection of mobile element insertions in next generation sequencing data. Genome Biol 2015; 15:488. [PMID: 25348035 PMCID: PMC4228151 DOI: 10.1186/s13059-014-0488-x] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Indexed: 01/15/2023] Open
Abstract
Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.
Collapse
Affiliation(s)
- Djie Tjwan Thung
- Department of Human Genetics, RadboudUMC, P.O. Box 9101, 6500, HB, Nijmegen, the Netherlands
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Grau JH, Poustka AJ, Meixner M, Plötner J. LTR retroelements are intrinsic components of transcriptional networks in frogs. BMC Genomics 2014; 15:626. [PMID: 25056159 PMCID: PMC4131045 DOI: 10.1186/1471-2164-15-626] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 07/15/2014] [Indexed: 12/16/2022] Open
Abstract
Background LTR retroelements (LTR REs) constitute a major group of transposable elements widely distributed in eukaryotic genomes. Through their own mechanism of retrotranscription LTR REs enrich the genomic landscape by providing genetic variability, thus contributing to genome structure and organization. Nonetheless, transcriptomic activity of LTR REs still remains an obscure domain within cell, developmental, and organism biology. Results Here we present a first comparative analysis of LTR REs for anuran amphibians based on a full depth coverage transcriptome of the European pool frog, Pelophylax lessonae, the genome of the African clawed frog, Silurana tropicalis (release v7.1), and additional transcriptomes of S. tropicalis and Cyclorana alboguttata. We identified over 1000 copies of LTR REs from all four families (Bel/Pao, Ty1/Copia, Ty3/Gypsy, Retroviridae) in the genome of S. tropicalis and discovered transcripts of several of these elements in all RNA-seq datasets analyzed. Elements of the Ty3/Gypsy family were most active, especially Amn-san elements, which accounted for approximately 0.27% of the genome in Silurana. Some elements exhibited tissue specific expression patterns, for example Hydra1.1 and MuERV-like elements in Pelophylax. In S. tropicalis considerable transcription of LTR REs was observed during embryogenesis as soon as the embryonic genome became activated, i.e. at midblastula transition. In the course of embryonic development the spectrum of transcribed LTR REs changed; during gastrulation and neurulation MuERV-like and SnRV like retroviruses were abundantly transcribed while during organogenesis transcripts of the XEN1 retroviruses became much more active. Conclusions The differential expression of LTR REs during embryogenesis in concert with their tissue-specificity and the protein domains they encode are evidence for the functional roles these elements play as integrative parts of complex regulatory networks. Our results support the meanwhile widely accepted concept that retroelements are not simple “junk DNA” or “harmful genomic parasites” but essential components of the transcriptomic machinery in vertebrates. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-626) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- José Horacio Grau
- Dahlem Center for Genome Research and Medical Systems Biology, Fabeckstraße 60-62, 14195 Berlin, Germany.
| | | | | | | |
Collapse
|