151
|
Bourgeois Y, Ruggiero RP, Hariyani I, Boissinot S. Disentangling the determinants of transposable elements dynamics in vertebrate genomes using empirical evidences and simulations. PLoS Genet 2020; 16:e1009082. [PMID: 33017388 PMCID: PMC7561263 DOI: 10.1371/journal.pgen.1009082] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 10/15/2020] [Accepted: 08/25/2020] [Indexed: 12/14/2022] Open
Abstract
The interactions between transposable elements (TEs) and their hosts constitute one of the most profound co-evolutionary processes found in nature. The population dynamics of TEs depends on factors specific to each TE families, such as the rate of transposition and insertional preference, the demographic history of the host and the genomic landscape. How these factors interact has yet to be investigated holistically. Here we are addressing this question in the green anole (Anolis carolinensis) whose genome contains an extraordinary diversity of TEs (including non-LTR retrotransposons, SINEs, LTR-retrotransposons and DNA transposons). We observed a positive correlation between recombination rate and frequency of TEs and densities for LINEs, SINEs and DNA transposons. For these elements, there was a clear impact of demography on TE frequency and abundance, with a loss of polymorphic elements and skewed frequency spectra in recently expanded populations. On the other hand, some LTR-retrotransposons displayed patterns consistent with a very recent phase of intense amplification. To determine how demography, genomic features and intrinsic properties of TEs interact we ran simulations using SLiM3. We determined that i) short TE insertions are not strongly counter-selected, but long ones are, ii) neutral demographic processes, linked selection and preferential insertion may explain positive correlations between average TE frequency and recombination, iii) TE insertions are unlikely to have been massively recruited in recent adaptation. We demonstrate that deterministic and stochastic processes have different effects on categories of TEs and that a combination of empirical analyses and simulations can disentangle these mechanisms.
Collapse
Affiliation(s)
- Yann Bourgeois
- School of Biological Sciences, University of Portsmouth, Portsmouth, United Kingdom
- New York University Abu Dhabi, Saadiyat Island Campus, Abu Dhabi, United Arab Emirates
- * E-mail: (YB); (SB)
| | - Robert P. Ruggiero
- New York University Abu Dhabi, Saadiyat Island Campus, Abu Dhabi, United Arab Emirates
- Department of Biology, Southeast Missouri State University, Cape Girardeau, MO, United States of America
| | - Imtiyaz Hariyani
- New York University Abu Dhabi, Saadiyat Island Campus, Abu Dhabi, United Arab Emirates
| | - Stéphane Boissinot
- New York University Abu Dhabi, Saadiyat Island Campus, Abu Dhabi, United Arab Emirates
- * E-mail: (YB); (SB)
| |
Collapse
|
152
|
Bogaerts-Márquez M, Barrón MG, Fiston-Lavier AS, Vendrell-Mir P, Castanera R, Casacuberta JM, González J. T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data. Bioinformatics 2020; 36:1191-1197. [PMID: 31580402 PMCID: PMC7703783 DOI: 10.1093/bioinformatics/btz727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 09/16/2019] [Accepted: 09/25/2019] [Indexed: 12/22/2022] Open
Abstract
Motivation Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. Results In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. Availability and implementation To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- María Bogaerts-Márquez
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| | - Maite G Barrón
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| | - Anna-Sophie Fiston-Lavier
- Institut des Sciences de l'Evolution de Montpellier (UMR 5554, CNRS-UM-IRD-EPHE), 11 Université de Motpellier, Place Eugène Bataillon, Montpellier, France
| | - Pol Vendrell-Mir
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Raúl Castanera
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Josep M Casacuberta
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| |
Collapse
|
153
|
Ferraro NM, Strober BJ, Einson J, Abell NS, Aguet F, Barbeira AN, Brandt M, Bucan M, Castel SE, Davis JR, Greenwald E, Hess GT, Hilliard AT, Kember RL, Kotis B, Park Y, Peloso G, Ramdas S, Scott AJ, Smail C, Tsang EK, Zekavat SM, Ziosi M, Aradhana, Ardlie KG, Assimes TL, Bassik MC, Brown CD, Correa A, Hall I, Im HK, Li X, Natarajan P, Lappalainen T, Mohammadi P, Montgomery SB, Battle A. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 2020; 369:eaaz5900. [PMID: 32913073 PMCID: PMC7646251 DOI: 10.1126/science.aaz5900] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 07/31/2020] [Indexed: 12/18/2022]
Abstract
Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
Collapse
Affiliation(s)
- Nicole M Ferraro
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA
| | - Benjamin J Strober
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jonah Einson
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Nathan S Abell
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Margot Brandt
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Maja Bucan
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Stephane E Castel
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Joe R Davis
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Emily Greenwald
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Gaelen T Hess
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Austin T Hilliard
- Palo Alto Veterans Institute for Research, Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto Health Care System, Palo Alto, CA, USA
| | - Rachel L Kember
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Bence Kotis
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - YoSon Park
- Department of Systems Pharmacology and Translational Medicine, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Gina Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Shweta Ramdas
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandra J Scott
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Craig Smail
- Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA
| | - Emily K Tsang
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Seyedeh M Zekavat
- Medical & Population Genomics, Yale School of Medicine and Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Aradhana
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Themistocles L Assimes
- Palo Alto Veterans Institute for Research, Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | - Adolfo Correa
- University of Mississippi Medical Center, Jackson, MS, USA
| | - Ira Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Xin Li
- Department of Pathology, Stanford University, Stanford, CA, USA
- Shanghai Institutes for Biological Sciences, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Pradeep Natarajan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Pejman Mohammadi
- New York Genome Center, New York, NY, USA.
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
- Scripps Translational Science Institute, La Jolla, CA, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
154
|
Pfaff AL, Bubb VJ, Quinn JP, Koks S. An Increased Burden of Highly Active Retrotransposition Competent L1s Is Associated with Parkinson's Disease Risk and Progression in the PPMI Cohort. Int J Mol Sci 2020; 21:E6562. [PMID: 32911699 PMCID: PMC7554759 DOI: 10.3390/ijms21186562] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 09/03/2020] [Accepted: 09/03/2020] [Indexed: 02/06/2023] Open
Abstract
Long interspersed element-1 (LINE-1/L1s) contributes 17% of the human genome with more than 1 million elements present; however, fewer than 100 of these have evidence for being retrotransposition competent (RC). In addition to those RC-L1s present in the reference genome, there are a small number of known non-reference L1 insertions that are also retrotransposition competent. L1 activity, whether through the potentially detrimental effects of their mRNA or protein expression or somatic retrotransposition events, has been linked to several neurological conditions. The polymorphic nature of both reference and non-reference RC-L1s in terms of their presence or absence will result in individuals harboring a different combination of these elements and it is currently unknown if this type of germline variation contributes to the risk of neurological disease. Here, we utilized whole-genome sequencing data from 178 healthy controls and 372 Parkinson's disease (PD) subjects from the Parkinson's Progression Markers Initiative (PPMI) to investigate the role of RC-L1s in PD. In the PPMI cohort, we identified 22 reference and 50 non-reference polymorphic RC-L1 loci. Focusing on 16 highly active RC-L1 loci, an increased burden of these elements (≥9) was associated with PD (OR 1.25, 95% CI 1.03-1.51, p = 0.02). In addition, we identified significant associations of progression markers of PD and the burden of highly active RC-L1s. This study has identified a novel type of genetic element associated with PD risk and disease progression.
Collapse
Affiliation(s)
- Abigail L. Pfaff
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia;
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| | - Vivien J. Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK; (V.J.B.); (J.P.Q.)
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK; (V.J.B.); (J.P.Q.)
| | - Sulev Koks
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia;
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| |
Collapse
|
155
|
Shademan M, Zare K, Zahedi M, Mosannen Mozaffari H, Bagheri Hosseini H, Ghaffarzadegan K, Goshayeshi L, Dehghani H. Promoter methylation, transcription, and retrotransposition of LINE-1 in colorectal adenomas and adenocarcinomas. Cancer Cell Int 2020; 20:426. [PMID: 32905102 PMCID: PMC7466817 DOI: 10.1186/s12935-020-01511-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 08/21/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The methylation of the CpG islands of the LINE-1 promoter is a tight control mechanism on the function of mobile elements. However, simultaneous quantification of promoter methylation and transcription of LINE-1 has not been performed in progressive stages of colorectal cancer. In addition, the insertion of mobile elements in the genome of advanced adenoma stage, a precancerous stage before colorectal carcinoma has not been emphasized. In this study, we quantify promoter methylation and transcripts of LINE-1 in three stages of colorectal non-advanced adenoma, advanced adenoma, and adenocarcinoma. In addition, we analyze the insertion of LINE-1, Alu, and SVA elements in the genome of patient tumors with colorectal advanced adenomas. METHODS LINE-1 hypomethylation status was evaluated by absolute quantitative analysis of methylated alleles (AQAMA) assay. To quantify the level of transcripts for LINE-1, quantitative RT-PCR was performed. To find mobile element insertions, the advanced adenoma tissue samples were subjected to whole genome sequencing and MELT analysis. RESULTS We found that the LINE-1 promoter methylation in advanced adenoma and adenocarcinoma was significantly lower than that in non-advanced adenomas. Accordingly, the copy number of LINE-1 transcripts in advanced adenoma was significantly higher than that in non-advanced adenomas, and in adenocarcinomas was significantly higher than that in the advanced adenomas. Whole-genome sequencing analysis of colorectal advanced adenomas revealed that at this stage polymorphic insertions of LINE-1, Alu, and SVA comprise approximately 16%, 51%, and 74% of total insertions, respectively. CONCLUSIONS Our correlative analysis showing a decreased methylation of LINE-1 promoter accompanied by the higher level of LINE-1 transcription, and polymorphic genomic insertions in advanced adenoma, suggests that the early and advanced polyp stages may host very important pathogenic processes concluding to cancer.
Collapse
Affiliation(s)
- Milad Shademan
- Graduate Program in Physiology, Department of Basic Sciences, Faculty of Veterinary Medicine, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Khadijeh Zare
- Stem Cell Biology and Regenerative Medicine Research Group, Research Institute of Biotechnology, Ferdowsi University of Mashhad, Azadi Square, Mashhad, 91779-48974 Iran
| | - Morteza Zahedi
- Graduate Program in Physiology, Department of Basic Sciences, Faculty of Veterinary Medicine, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Hooman Mosannen Mozaffari
- Department of Gastroenterology and Hepatology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Gastroenterology and Hepatology Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hadi Bagheri Hosseini
- Department of Gastroenterology and Hepatology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Gastroenterology and Hepatology Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Kamran Ghaffarzadegan
- Pathology Department, Education and Research Department, Razavi Hospital, Mashhad, Iran
| | - Ladan Goshayeshi
- Department of Gastroenterology and Hepatology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Surgical Oncology Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hesam Dehghani
- Stem Cell Biology and Regenerative Medicine Research Group, Research Institute of Biotechnology, Ferdowsi University of Mashhad, Azadi Square, Mashhad, 91779-48974 Iran
- Division of Biotechnology, Faculty of Veterinary Medicine, Ferdowsi University of Mashhad, Mashhad, Iran
- Department of Basic Sciences, Faculty of Veterinary Medicine, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
156
|
Cao X, Zhang Y, Payer LM, Lords H, Steranka JP, Burns KH, Xing J. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol 2020; 21:185. [PMID: 32718348 PMCID: PMC7385971 DOI: 10.1186/s13059-020-02101-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 07/14/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Mobile elements are a major source of structural variants in the human genome, and some mobile elements can regulate gene expression and transcript splicing. However, the impact of polymorphic mobile element insertions (pMEIs) on gene expression and splicing in diverse human tissues has not been thoroughly studied. The multi-tissue gene expression and whole genome sequencing data generated by the Genotype-Tissue Expression (GTEx) project provide a great opportunity to systematically evaluate the role of pMEIs in regulating gene expression in human tissues. RESULTS Using the GTEx whole genome sequencing data, we identify 20,545 high-quality pMEIs from 639 individuals. Coupling pMEI genotypes with gene expression profiles, we identify pMEI-associated expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) in 48 tissues. Using joint analyses of pMEIs and other genomic variants, pMEIs are predicted to be the potential causal variant for 3522 eQTLs and 3717 sQTLs. The pMEI-associated eQTLs and sQTLs show a high level of tissue specificity, and these pMEIs are enriched in the proximity of affected genes and in regulatory elements. Using reporter assays, we confirm that several pMEIs associated with eQTLs and sQTLs can alter gene expression levels and isoform proportions, respectively. CONCLUSION Overall, our study shows that pMEIs are associated with thousands of gene expression and splicing variations, indicating that pMEIs could have a significant role in regulating tissue-specific gene expression and transcript splicing. Detailed mechanisms for the role of pMEIs in gene regulation in different tissues will be an important direction for future studies.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Yeting Zhang
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Hannah Lords
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jared P Steranka
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
157
|
Co-option of the lineage-specific LAVA retrotransposon in the gibbon genome. Proc Natl Acad Sci U S A 2020; 117:19328-19338. [PMID: 32690705 DOI: 10.1073/pnas.2006038117] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VNTR-Alu LIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Collapse
|
158
|
Chen X, Li D. ERVcaller: identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data. Bioinformatics 2020; 35:3913-3922. [PMID: 30895294 DOI: 10.1093/bioinformatics/btz205] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 02/28/2019] [Accepted: 03/19/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Approximately 8% of the human genome is derived from endogenous retroviruses (ERVs). In recent years, an increasing number of human diseases have been found to be associated with ERVs. However, it remains challenging to accurately detect the full spectrum of polymorphic (unfixed) ERVs using whole-genome sequencing (WGS) data. RESULTS We designed a new tool, ERVcaller, to detect and genotype transposable element (TE) insertions, including ERVs, in the human genome. We evaluated ERVcaller using both simulated and real benchmark WGS datasets. Compared to existing tools, ERVcaller consistently obtained both the highest sensitivity and precision for detecting simulated ERV and other TE insertions derived from real polymorphic TE sequences. For the WGS data from the 1000 Genomes Project, ERVcaller detected the largest number of TE insertions per sample based on consensus TE loci. By analyzing the experimentally verified TE insertions, ERVcaller had 94.0% TE detection sensitivity and 96.6% genotyping accuracy. Polymerase chain reaction and Sanger sequencing in a small sample set verified 86.7% of examined insertion statuses and 100% of examined genotypes. In conclusion, ERVcaller is capable of detecting and genotyping TE insertions using WGS data with both high sensitivity and precision. This tool can be applied broadly to other species. AVAILABILITY AND IMPLEMENTATION http://www.uvm.edu/genomics/software/ERVcaller.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT, USA.,Department of Computer Science, University of Vermont, Burlington, VT, USA
| |
Collapse
|
159
|
Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet 2020; 21:721-736. [PMID: 32576954 DOI: 10.1038/s41576-020-0251-y] [Citation(s) in RCA: 164] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2020] [Indexed: 12/21/2022]
Abstract
Transposable elements (TEs) are insertional mutagens that contribute greatly to the plasticity of eukaryotic genomes, influencing the evolution and adaptation of species as well as physiology or disease in individuals. Measuring TE expression helps to understand not only when and where TE mobilization can occur but also how this process alters gene expression, chromatin accessibility or cellular signalling pathways. Although genome-wide gene expression assays such as RNA sequencing include transposon-derived transcripts, most computational analytical tools discard or misinterpret TE-derived reads. Emerging approaches are improving the identification of expressed TE loci and helping to discriminate TE transcripts that permit TE mobilization from chimeric gene-TE transcripts or pervasive transcription. Here we review the main challenges associated with the detection of TE expression, including mappability, insertional and internal sequence polymorphisms, and the diversity of the TE transcriptional landscape, as well as the different experimental and computational strategies to solve them.
Collapse
|
160
|
Han L, Zhao X, Benton ML, Perumal T, Collins RL, Hoffman GE, Johnson JS, Sloofman L, Wang HZ, Stone MR, Brennand KJ, Brand H, Sieberts SK, Marenco S, Peters MA, Lipska BK, Roussos P, Capra JA, Talkowski M, Ruderfer DM. Functional annotation of rare structural variation in the human brain. Nat Commun 2020; 11:2990. [PMID: 32533064 PMCID: PMC7293301 DOI: 10.1038/s41467-020-16736-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 05/14/2020] [Indexed: 11/09/2022] Open
Abstract
Structural variants (SVs) contribute to many disorders, yet, functionally annotating them remains a major challenge. Here, we integrate SVs with RNA-sequencing from human post-mortem brains to quantify their dosage and regulatory effects. We show that genic and regulatory SVs exist at significantly lower frequencies than intergenic SVs. Functional impact of copy number variants (CNVs) stems from both the proportion of genic and regulatory content altered and loss-of-function intolerance of the gene. We train a linear model to predict expression effects of rare CNVs and use it to annotate regulatory disruption of CNVs from 14,891 independent genome-sequenced individuals. Pathogenic deletions implicated in neurodevelopmental disorders show significantly more extreme regulatory disruption scores and if rank ordered would be prioritized higher than using frequency or length alone. This work shows the deleteriousness of regulatory SVs, particularly those altering CTCF sites and provides a simple approach for functionally annotating the regulatory consequences of CNVs.
Collapse
Affiliation(s)
- Lide Han
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Mary Lauren Benton
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Ryan L Collins
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Sciences, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jessica S Johnson
- Pamela Sklar Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura Sloofman
- Pamela Sklar Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Harold Z Wang
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew R Stone
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Kristen J Brennand
- Pamela Sklar Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Stefano Marenco
- Human Brain Collection Core, Intramural Research Program, NIMH, National Institutes of Health, Bethesda, MD, USA
| | | | - Barbara K Lipska
- Human Brain Collection Core, Intramural Research Program, NIMH, National Institutes of Health, Bethesda, MD, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Sciences, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Psychiatry, JJ Peters VA Medical Center, Bronx, NY, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Michael Talkowski
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T, Cambridge, MA, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
161
|
Jiang T, Liu B, Li J, Wang Y. rMETL: sensitive mobile element insertion detection with long read realignment. Bioinformatics 2020; 35:3484-3486. [PMID: 30759188 DOI: 10.1093/bioinformatics/btz106] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 01/24/2019] [Accepted: 02/12/2019] [Indexed: 01/22/2023] Open
Abstract
SUMMARY Mobile element insertion (MEI) is a major category of structure variations (SVs). The rapid development of long read sequencing technologies provides the opportunity to detect MEIs sensitively. However, the signals of MEI implied by noisy long reads are highly complex due to the repetitiveness of mobile elements as well as the high sequencing error rates. Herein, we propose the Realignment-based Mobile Element insertion detection Tool for Long read (rMETL). Benchmarking results of simulated and real datasets demonstrate that rMETL enables to handle the complex signals to discover MEIs sensitively. It is suited to produce high-quality MEI callsets in many genomics studies. AVAILABILITY AND IMPLEMENTATION rMETL is available from https://github.com/hitbc/rMETL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junyi Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
162
|
Jakubosky D, D'Antonio M, Bonder MJ, Smail C, Donovan MKR, Young Greenwald WW, Matsui H, D'Antonio-Chronowska A, Stegle O, Smith EN, Montgomery SB, DeBoever C, Frazer KA. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat Commun 2020; 11:2927. [PMID: 32522982 PMCID: PMC7286898 DOI: 10.1038/s41467-020-16482-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 05/05/2020] [Indexed: 12/14/2022] Open
Abstract
Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.
Collapse
Affiliation(s)
- David Jakubosky
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, 92093-0419, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093-0419, USA
| | - Matteo D'Antonio
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Craig Smail
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Pathology, Stanford University, Stanford, California, 94305, USA
| | - Margaret K R Donovan
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093-0419, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | | | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Erin N Smith
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University, Stanford, California, 94305, USA
- Department of Genetics, Stanford University, Stanford, California, 94305, USA
| | - Christopher DeBoever
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Kelly A Frazer
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
163
|
Grivainis M, Tang Z, Fenyö D. TranspoScope: interactive visualization of retrotransposon insertions. Bioinformatics 2020; 36:3877-3878. [PMID: 32298413 DOI: 10.1093/bioinformatics/btaa244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 01/09/2020] [Accepted: 04/09/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Retrotransposition is an important force in shaping the human genome and is involved in prenatal development, disease and aging. Current genome browsers are not optimized for visualizing the experimental evidence for retrotransposon insertions. RESULTS We have developed a specialized browser to visualize the evidence for retrotransposon insertions for both targeted and whole-genome sequencing data. AVAILABILITY AND IMPLEMENTATION TranspoScope's source code, as well as installation instructions, are available at https://github.com/FenyoLab/transposcope.
Collapse
Affiliation(s)
- Mark Grivainis
- Institute for Systems Genetics.,Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, 550 1st Ave, New York, NY 10016, USA
| | - Zuojian Tang
- Institute for Systems Genetics.,Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, 550 1st Ave, New York, NY 10016, USA
| | - David Fenyö
- Institute for Systems Genetics.,Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, 550 1st Ave, New York, NY 10016, USA
| |
Collapse
|
164
|
Goubert C, Thomas J, Payer LM, Kidd JM, Feusier J, Watkins WS, Burns KH, Jorde LB, Feschotte C. TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data. Nucleic Acids Res 2020; 48:e36. [PMID: 32067044 PMCID: PMC7102983 DOI: 10.1093/nar/gkaa074] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 01/08/2020] [Accepted: 02/11/2020] [Indexed: 12/12/2022] Open
Abstract
Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
Collapse
Affiliation(s)
- Clément Goubert
- Department of Molecular Biology and Genetics, 215 Tower Rd, Cornell University, Ithaca, NY 14853, USA
| | - Jainy Thomas
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Julie Feusier
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - W Scott Watkins
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, 215 Tower Rd, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
165
|
Jeon S, Bhak Y, Choi Y, Jeon Y, Kim S, Jang J, Jang J, Blazyte A, Kim C, Kim Y, Shim J, Kim N, Kim YJ, Park SG, Kim J, Cho YS, Park Y, Kim HM, Kim BC, Park NH, Shin ES, Kim BC, Bolser D, Manica A, Edwards JS, Church G, Lee S, Bhak J. Korean Genome Project: 1094 Korean personal genomes with clinical information. SCIENCE ADVANCES 2020; 6:eaaz7835. [PMID: 32766443 PMCID: PMC7385432 DOI: 10.1126/sciadv.aaz7835] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 03/19/2020] [Indexed: 05/30/2023]
Abstract
We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several types of genomic variations. A genome-wide association study illustrated the power of whole-genome sequences for analyzing clinical traits, identifying nine more significant candidate alleles than previously reported from the same linkage disequilibrium blocks. Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the 1KGP panel. As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets. Overall, this study shows that Korea1K can be a useful genotypic and phenotypic resource for clinical and ethnogenetic studies.
Collapse
Affiliation(s)
- Sungwon Jeon
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Youngjune Bhak
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
- Clinomics Inc., Ulsan 44919, Republic of Korea
| | - Yeonsong Choi
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Yeonsu Jeon
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Seunghoon Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Jaeyoung Jang
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Jinho Jang
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Asta Blazyte
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Changjae Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Clinomics Inc., Ulsan 44919, Republic of Korea
| | - Yeonkyung Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Jungae Shim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Nayeong Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Yeo Jin Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Seung Gu Park
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Jungeun Kim
- Personal Genomics Institute (PGI), Genome Research Foundation (GRF), Osong 28160, Republic of Korea
| | | | - Yeshin Park
- Clinomics Inc., Ulsan 44919, Republic of Korea
| | - Hak-Min Kim
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
- Clinomics Inc., Ulsan 44919, Republic of Korea
| | | | - Neung-Hwa Park
- Department of Internal Medicine, University of Ulsan College of Medicine, Ulsan University Hospital, Ulsan 44033, Republic of Korea
- Biomedical Research Center, University of Ulsan College of Medicine, Ulsan University Hospital, Ulsan 44033, Republic of Korea
| | - Eun-Seok Shin
- Division of Cardiology, Department of Internal Medicine, Ulsan Medical Center, Ulsan 44686, Republic of Korea
| | | | - Dan Bolser
- Clinomics Inc., Ulsan 44919, Republic of Korea
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Jeremy S. Edwards
- Department of Chemistry and Chemical Biology, University of New Mexico and University of New Mexico Comprehensive Cancer Center, Albuquerque, NM 87106, USA
| | - George Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Semin Lee
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
| | - Jong Bhak
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
- Department of Biomedical Engineering, School of Life Sciences, UNIST, Ulsan 44919, Republic of Korea
- Clinomics Inc., Ulsan 44919, Republic of Korea
- Personal Genomics Institute (PGI), Genome Research Foundation (GRF), Osong 28160, Republic of Korea
| |
Collapse
|
166
|
Halvorsen M, Huh R, Oskolkov N, Wen J, Netotea S, Giusti-Rodriguez P, Karlsson R, Bryois J, Nystedt B, Ameur A, Kähler AK, Ancalade N, Farrell M, Crowley JJ, Li Y, Magnusson PKE, Gyllensten U, Hultman CM, Sullivan PF, Szatkiewicz JP. Increased burden of ultra-rare structural variants localizing to boundaries of topologically associated domains in schizophrenia. Nat Commun 2020; 11:1842. [PMID: 32296054 PMCID: PMC7160146 DOI: 10.1038/s41467-020-15707-w] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 03/24/2020] [Indexed: 01/13/2023] Open
Abstract
Despite considerable progress in schizophrenia genetics, most findings have been for large rare structural variants and common variants in well-imputed regions with few genes implicated from exome sequencing. Whole genome sequencing (WGS) can potentially provide a more complete enumeration of etiological genetic variation apart from the exome and regions of high linkage disequilibrium. We analyze high-coverage WGS data from 1162 Swedish schizophrenia cases and 936 ancestry-matched population controls. Our main objective is to evaluate the contribution to schizophrenia etiology from a variety of genetic variants accessible to WGS but not by previous technologies. Our results suggest that ultra-rare structural variants that affect the boundaries of topologically associated domains (TADs) increase risk for schizophrenia. Alterations in TAD boundaries may lead to dysregulation of gene expression. Future mechanistic studies will be needed to determine the precise functional effects of these variants on biology. Common variants identified by large-scale genomewide association studies cannot account fully account for the heritability of schizophrenia (SCZ). Here, the authors report high-coverage whole-genome sequencing of 1162 SCZ cases and 936 controls and explore the contribution of different types of variants to SCZ.
Collapse
Affiliation(s)
- Matthew Halvorsen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Ruth Huh
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Nikolay Oskolkov
- Department of Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Lund University, 22362, Lund, Sweden
| | - Jia Wen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Sergiu Netotea
- Department of Biology and Biological Engineering, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Chalmers University of Technology, 41258, Göteborg, Sweden
| | | | - Robert Karlsson
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Julien Bryois
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Björn Nystedt
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, 75237, Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185, Uppsala, Sweden
| | - Anna K Kähler
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - NaEshia Ancalade
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Martilias Farrell
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.,Department of Psychiatry, University of North Carolina, Chapel Hill, NC, 27599, USA.,Department of Clinical Neuroscience, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Patrik K E Magnusson
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Ulf Gyllensten
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185, Uppsala, Sweden
| | - Christina M Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA. .,Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177, Stockholm, Sweden. .,Department of Psychiatry, University of North Carolina, Chapel Hill, NC, 27599, USA.
| | - Jin P Szatkiewicz
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA. .,Department of Psychiatry, University of North Carolina, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
167
|
Goubert C, Zevallos NA, Feschotte C. Contribution of unfixed transposable element insertions to human regulatory variation. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190331. [PMID: 32075552 PMCID: PMC7061991 DOI: 10.1098/rstb.2019.0331] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/09/2019] [Indexed: 12/11/2022] Open
Abstract
Thousands of unfixed transposable element (TE) insertions segregate in the human population, but little is known about their impact on genome function. Recently, a few studies associated unfixed TE insertions to mRNA levels of adjacent genes, but the biological significance of these associations, their replicability across cell types and the mechanisms by which they may regulate genes remain largely unknown. Here, we performed a TE-expression QTL analysis of 444 lymphoblastoid cell lines (LCL) and 289 induced pluripotent stem cells using a newly developed set of genotypes for 2743 polymorphic TE insertions. We identified 211 and 176 TE-eQTL acting in cis in each respective cell type. Approximately 18% were shared across cell types with strongly correlated effects. Furthermore, analysis of chromatin accessibility QTL in a subset of the LCL suggests that unfixed TEs often modulate the activity of enhancers and other distal regulatory DNA elements, which tend to lose accessibility when a TE inserts within them. We also document a case of an unfixed TE likely influencing gene expression at the post-transcriptional level. Our study points to broad and diverse cis-regulatory effects of unfixed TEs in the human population and underscores their plausible contribution to phenotypic variation. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
| | | | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Road, Ithaca, NY 14853, USA
| |
Collapse
|
168
|
Zhou W, Emery SB, Flasch DA, Wang Y, Kwan KY, Kidd JM, Moran JV, Mills RE. Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology. Nucleic Acids Res 2020; 48:1146-1163. [PMID: 31853540 PMCID: PMC7026601 DOI: 10.1093/nar/gkz1173] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 11/14/2019] [Accepted: 12/05/2019] [Indexed: 11/13/2022] Open
Abstract
Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.
Collapse
Affiliation(s)
- Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Diane A Flasch
- Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Yifan Wang
- Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Kenneth Y Kwan
- Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA.,Molecular and Behavioral Neuroscience Institute, University of Michigan Medical School, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
| | - Jeffrey M Kidd
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.,Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - John V Moran
- Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA.,Department of Internal Medicine, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA.,Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| |
Collapse
|
169
|
A putative silencer variant in a spontaneous canine model of retinitis pigmentosa. PLoS Genet 2020; 16:e1008659. [PMID: 32150541 PMCID: PMC7082071 DOI: 10.1371/journal.pgen.1008659] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 03/19/2020] [Accepted: 02/06/2020] [Indexed: 01/19/2023] Open
Abstract
Retinitis pigmentosa (RP) is the leading cause of blindness with nearly two million people affected worldwide. Many genes have been implicated in RP, yet in 30–80% of the RP patients the genetic cause remains unknown. A similar phenotype, progressive retinal atrophy (PRA), affects many dog breeds including the Miniature Schnauzer. We performed clinical, genetic and functional experiments to identify the genetic cause of PRA in the breed. The age of onset and pattern of disease progression suggested that at least two forms of PRA, types 1 and 2 respectively, affect the breed, which was confirmed by genome-wide association study that implicated two distinct genomic loci in chromosomes 15 and X, respectively. Whole-genome sequencing revealed a fully segregating recessive regulatory variant in type 1 PRA. The associated variant has a very recent origin based on haplotype analysis and lies within a regulatory site with the predicted binding site of HAND1::TCF3 transcription factor complex. Luciferase assays suggested that mutated regulatory sequence increases expression. Case-control retinal expression comparison of six best HAND1::TCF3 target genes were analyzed with quantitative reverse-transcriptase PCR assay and indicated overexpression of EDN2 and COL9A2 in the affected retina. Defects in both EDN2 and COL9A2 have been previously associated with retinal degeneration. In summary, our study describes two genetically different forms of PRA and identifies a fully penetrant variant in type 1 form with a possible regulatory effect. This would be among the first reports of a regulatory variant in retinal degeneration in any species, and establishes a new spontaneous dog model to improve our understanding of retinal biology and gene regulation while the affected breed will benefit from a reliable genetic testing. Retinitis pigmentosa (RP) is a blinding eye disease that affects nearly two million people worldwide. Several genes and variants have been associated with the disease, but still 30–80% of the patients lack genetic diagnosis. There is currently no standard treatment for RP, and much is expected from gene therapy. A similar disease, called progressive retinal atrophy (PRA), affects many dog breeds. We performed clinical, genetic and functional analyses to find the genetic cause for PRA in Miniature Schnauzers. We discovered two forms of PRA in the breed, named type 1 and 2, and show that they are genetically distinct as they map to different chromosomes, 15 and X, respectively. Further genetic, bioinformatic and functional analyses discovered a fully penetrant recessive variant in a putative silencer region for type 1 PRA. Silencer regions are important for gene regulation and we found that two of its predicted target genes, EDN2 and COL9A2, were overexpressed in the retina of the affected dog. Defects in both EDN2 and COL9A2 have been associated with retinal degeneration. This study provides new insights to retinal biology while the genetic test guides better breeding choices.
Collapse
|
170
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
171
|
Dillard KJ, Ochs M, Niskanen JE, Arumilli M, Donner J, Kyöstilä K, Hytönen MK, Anttila M, Lohi H. Recessive missense LAMP3 variant associated with defect in lamellar body biogenesis and fatal neonatal interstitial lung disease in dogs. PLoS Genet 2020; 16:e1008651. [PMID: 32150563 PMCID: PMC7082050 DOI: 10.1371/journal.pgen.1008651] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 03/19/2020] [Accepted: 02/04/2020] [Indexed: 01/06/2023] Open
Abstract
Neonatal interstitial lung diseases due to abnormal surfactant biogenesis are rare in humans and have never been reported as a spontaneous disorder in animals. We describe here a novel lung disorder in Airedale Terrier (AT) dogs with clinical symptoms and pathology similar to the most severe neonatal forms of human surfactant deficiency. Lethal hypoxic respiratory distress and failure occurred within the first days or weeks of life in the affected puppies. Transmission electron microscopy of the affected lungs revealed maturation arrest in the formation of lamellar bodies (LBs) in the alveolar epithelial type II (AECII) cells. The secretory organelles were small and contained fewer lamellae, often in combination with small vesicles surrounded by an occasionally disrupted common limiting membrane. A combined approach of genome-wide association study and whole exome sequencing identified a recessive variant, c.1159G>A, p.(E387K), in LAMP3, a limiting membrane protein of the cytoplasmic surfactant organelles in AECII cells. The substitution resides in the LAMP domain adjacent to a conserved disulfide bond. In summary, this study describes a novel interstitial lung disease in dogs, identifies a new candidate gene for human surfactant dysfunction and brings important insights into the essential role of LAMP3 in the process of the LB formation.
Collapse
Affiliation(s)
- Kati J. Dillard
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
- Veterinary Bacteriology and Pathology Research Unit, Finnish Food Authority, Helsinki, Finland
| | - Matthias Ochs
- Institute of Functional and Applied Anatomy, Hannover Medical School, Hannover, Germany
- Institute of Functional Anatomy, Charité - Universitaetsmedizin Berlin, Berlin, Germany
- German Center for Lung Research (DZL), Berlin, Germany
| | - Julia E. Niskanen
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Meharji Arumilli
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Jonas Donner
- Genoscoper Laboratories Ltd (Wisdom Health), Helsinki, Finland
| | - Kaisa Kyöstilä
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Marjo K. Hytönen
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Marjukka Anttila
- Veterinary Bacteriology and Pathology Research Unit, Finnish Food Authority, Helsinki, Finland
| | - Hannes Lohi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| |
Collapse
|
172
|
Loh JW, Ha H, Lin T, Sun N, Burns KH, Xing J. Integrated Mobile Element Scanning (ME-Scan) method for identifying multiple types of polymorphic mobile element insertions. Mob DNA 2020; 11:12. [PMID: 32110248 PMCID: PMC7035633 DOI: 10.1186/s13100-020-00207-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 02/14/2020] [Indexed: 01/29/2023] Open
Abstract
Background Mobile elements are ubiquitous components of mammalian genomes and constitute more than half of the human genome. Polymorphic mobile element insertions (pMEIs) are a major source of human genomic variation and are gaining research interest because of their involvement in gene expression regulation, genome integrity, and disease. Results Building on our previous Mobile Element Scanning (ME-Scan) protocols, we developed an integrated ME-Scan protocol to identify three major active families of human mobile elements, AluYb, L1HS, and SVA. This approach selectively amplifies insertion sites of currently active retrotransposons for Illumina sequencing. By pooling the libraries together, we can identify pMEIs from all three mobile element families in one sequencing run. To demonstrate the utility of the new ME-Scan protocol, we sequenced 12 human parent-offspring trios. Our results showed high sensitivity (> 90%) and accuracy (> 95%) of the protocol for identifying pMEIs in the human genome. In addition, we also tested the feasibility of identifying somatic insertions using the protocol. Conclusions The integrated ME-Scan protocol is a cost-effective way to identify novel pMEIs in the human genome. In addition, by developing the protocol to detect three mobile element families, we demonstrate the flexibility of the ME-Scan protocol. We present instructions for the library design, a sequencing protocol, and a computational pipeline for downstream analyses as a complete framework that will allow researchers to easily adapt the ME-Scan protocol to their own projects in other genomes.
Collapse
Affiliation(s)
- Jui Wan Loh
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA
| | - Hongseok Ha
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| | - Timothy Lin
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA
| | - Nawei Sun
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| | - Kathleen H Burns
- 3Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, 21205 MD USA
| | - Jinchuan Xing
- 1Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ 08854 USA.,2Human Genetic Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, 08854 NJ USA
| |
Collapse
|
173
|
Mobile element insertion detection in 89,874 clinical exomes. Genet Med 2020; 22:974-978. [PMID: 31965078 PMCID: PMC7200591 DOI: 10.1038/s41436-020-0749-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 01/07/2020] [Indexed: 12/20/2022] Open
Abstract
Purpose Exome sequencing (ES) is increasingly used for the diagnosis of rare genetic disease. However, some pathogenic sequence variants within the exome go undetected due to the technical difficulty of identifying them. Mobile element insertions (MEIs) are a known cause of genetic disease in humans but have been historically difficult to detect via ES and similar targeted sequencing methods. Methods We developed and applied a novel MEI detection method prospectively to samples received for clinical ES beginning in November 2017. Positive MEI findings were confirmed by an orthogonal method and reported back to the ordering provider. In this study, we examined 89,874 samples from 38,871 cases. Results Diagnostic MEIs were present in 0.03% (95% binomial test confidence interval: 0.02–0.06%) of all cases and account for 0.15% (95% binomial test confidence interval: 0.08–0.25%) of cases with a molecular diagnosis. One diagnostic MEI was a novel founder event. Most patients with pathogenic MEIs had prior genetic testing, three of whom had previous negative DNA sequencing analysis of the diagnostic gene. Conclusion MEI detection from ES is a valuable diagnostic tool, reveals molecular findings that may be undetected by other sequencing assays, and increases diagnostic yield by 0.15%.
Collapse
|
174
|
Lou C, Goodier JL, Qiang R. A potential new mechanism for pregnancy loss: considering the role of LINE-1 retrotransposons in early spontaneous miscarriage. Reprod Biol Endocrinol 2020; 18:6. [PMID: 31964400 PMCID: PMC6971995 DOI: 10.1186/s12958-020-0564-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 01/07/2020] [Indexed: 12/14/2022] Open
Abstract
LINE1 retrotransposons are mobile DNA elements that copy and paste themselves into new sites in the genome. To ensure their evolutionary success, heritable new LINE-1 insertions accumulate in cells that can transmit genetic information to the next generation (i.e., germ cells and embryonic stem cells). It is our hypothesis that LINE1 retrotransposons, insertional mutagens that affect expression of genes, may be causal agents of early miscarriage in humans. The cell has evolved various defenses restricting retrotransposition-caused mutation, but these are occasionally relaxed in certain somatic cell types, including those of the early embryo. We predict that reduced suppression of L1s in germ cells or early-stage embryos may lead to excessive genome mutation by retrotransposon insertion, or to the induction of an inflammatory response or apoptosis due to increased expression of L1-derived nucleic acids and proteins, and so disrupt gene function important for embryogenesis. If correct, a novel threat to normal human development is revealed, and reverse transcriptase therapy could be one future strategy for controlling this cause of embryonic damage in patients with recurrent miscarriages.
Collapse
Affiliation(s)
- Chao Lou
- Department of Genetics, Northwest Women’s and Children’s Hospital, 1616 Yanxiang Road, Xi’an, Shaanxi Province People’s Republic of China
| | - John L. Goodier
- 0000 0001 2171 9311grid.21107.35McKusick-Nathans Deartment of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Rong Qiang
- Department of Genetics, Northwest Women’s and Children’s Hospital, 1616 Yanxiang Road, Xi’an, Shaanxi Province People’s Republic of China
| |
Collapse
|
175
|
A pediatric perspective on genomics and prevention in the twenty-first century. Pediatr Res 2020; 87:338-344. [PMID: 31578042 DOI: 10.1038/s41390-019-0597-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 09/18/2019] [Indexed: 12/19/2022]
Abstract
We present evidence from diverse disciplines and populations to identify the current and emerging role of genomics in prevention from both medical and public health perspectives as well as key challenges and potential untoward consequences of increasing the role of genomics in these endeavors. We begin by comparing screening in healthy populations (newborn screening), with testing in symptomatic populations, which may incidentally identify secondary findings and at-risk relatives. Emerging evidence suggests that variants in genes subject to the reporting of secondary findings are more common than expected in patients who otherwise would not meet the criteria for testing and population testing for variants in these genes may more precisely identify discrete populations to target for various prevention strategies starting in childhood. Conversely, despite its theoretical promise, recent studies attempting to demonstrate benefits of next-generation sequencing for newborn screening have instead demonstrated numerous barriers and pitfalls to this approach. We also examine the special cases of pharmacogenomics and polygenic risk scores as examples of ways genomics can contribute to prevention amongst a broader population than that affected by rare Mendelian disease. We conclude with unresolved questions which will benefit from future investigations of the role of genomics in disease prevention.
Collapse
|
176
|
Vendrell-Mir P, Barteri F, Merenciano M, González J, Casacuberta JM, Castanera R. A benchmark of transposon insertion detection tools using real data. Mob DNA 2019; 10:53. [PMID: 31892957 PMCID: PMC6937713 DOI: 10.1186/s13100-019-0197-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/17/2019] [Indexed: 02/01/2023] Open
Abstract
Background Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. Results We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. Conclusions We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.
Collapse
Affiliation(s)
- Pol Vendrell-Mir
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Fabio Barteri
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Miriam Merenciano
- 2Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Passeig Maritim Barceloneta 37-49, 08003 Barcelona, Spain
| | - Josefa González
- 2Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Passeig Maritim Barceloneta 37-49, 08003 Barcelona, Spain
| | - Josep M Casacuberta
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Raúl Castanera
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| |
Collapse
|
177
|
Nguyen THM, Carreira PE, Sanchez-Luque FJ, Schauer SN, Fagg AC, Richardson SR, Davies CM, Jesuadian JS, Kempen MJHC, Troskie RL, James C, Beaven EA, Wallis TP, Coward JIG, Chetty NP, Crandon AJ, Venter DJ, Armes JE, Perrin LC, Hooper JD, Ewing AD, Upton KR, Faulkner GJ. L1 Retrotransposon Heterogeneity in Ovarian Tumor Cell Evolution. Cell Rep 2019; 23:3730-3740. [PMID: 29949758 DOI: 10.1016/j.celrep.2018.05.090] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Revised: 01/04/2018] [Accepted: 05/26/2018] [Indexed: 01/07/2023] Open
Abstract
LINE-1 (L1) retrotransposons are a source of insertional mutagenesis in tumor cells. However, the clinical significance of L1 mobilization during tumorigenesis remains unclear. Here, we applied retrotransposon capture sequencing (RC-seq) to multiple single-cell clones isolated from five ovarian cancer cell lines and HeLa cells and detected endogenous L1 retrotransposition in vitro. We then applied RC-seq to ovarian tumor and matched blood samples from 19 patients and identified 88 tumor-specific L1 insertions. In one tumor, an intronic de novo L1 insertion supplied a novel cis-enhancer to the putative chemoresistance gene STC1. Notably, the tumor subclone carrying the STC1 L1 mutation increased in prevalence after chemotherapy, further increasing STC1 expression. We also identified hypomethylated donor L1s responsible for new L1 insertions in tumors and cultivated cancer cells. These congruent in vitro and in vivo results highlight L1 insertional mutagenesis as a common component of ovarian tumorigenesis and cancer genome heterogeneity.
Collapse
Affiliation(s)
- Thu H M Nguyen
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Patricia E Carreira
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Francisco J Sanchez-Luque
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Pfizer-University of Granada-Andalusian Government Centre for Genomics and Oncological Research, PT Ciencias de la Salud, Granada 18016, Spain
| | - Stephanie N Schauer
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Allister C Fagg
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Sandra R Richardson
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | | | - J Samuel Jesuadian
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Marie-Jeanne H C Kempen
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Robin-Lee Troskie
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Cini James
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | | | | | - Jermaine I G Coward
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Mater Health Services, South Brisbane, QLD 4101, Australia
| | - Naven P Chetty
- Mater Health Services, South Brisbane, QLD 4101, Australia
| | | | - Deon J Venter
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Mater Health Services, South Brisbane, QLD 4101, Australia
| | - Jane E Armes
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Mater Health Services, South Brisbane, QLD 4101, Australia
| | - Lewis C Perrin
- Mater Health Services, South Brisbane, QLD 4101, Australia
| | - John D Hooper
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Adam D Ewing
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Kyle R Upton
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
178
|
Spirito G, Mangoni D, Sanges R, Gustincich S. Impact of polymorphic transposable elements on transcription in lymphoblastoid cell lines from public data. BMC Bioinformatics 2019; 20:495. [PMID: 31757210 PMCID: PMC6873650 DOI: 10.1186/s12859-019-3113-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 09/20/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are DNA sequences able to mobilize themselves and to increase their copy-number in the host genome. In the past, they have been considered mainly selfish DNA without evident functions. Nevertheless, currently they are believed to have been extensively involved in the evolution of primate genomes, especially from a regulatory perspective. Due to their recent activity they are also one of the primary sources of structural variants (SVs) in the human genome. By taking advantage of sequencing technologies and bioinformatics tools, recent surveys uncovered specific TE structural variants (TEVs) that gave rise to polymorphisms in human populations. When combined with RNA-seq data this information provides the opportunity to study the potential impact of TEs on gene expression in human. RESULTS In this work, we assessed the effects of the presence of specific TEs in cis on the expression of flanking genes by producing associations between polymorphic TEs and flanking gene expression levels in human lymphoblastoid cell lines. By using public data from the 1000 Genome Project and the Geuvadis consortium, we exploited an expression quantitative trait loci (eQTL) approach integrated with additional bioinformatics data mining analyses. We uncovered human loci enriched for common, less common and rare TEVs and identified 323 significant TEV-cis-eQTL associations. SINE-R/VNTR/Alus (SVAs) resulted the TE class with the strongest effects on gene expression. We also unveiled differential functional enrichments on genes associated to TEVs, genes associated to TEV-cis-eQTLs and genes associated to the genomic regions mostly enriched in TEV-cis-eQTLs highlighting, at multiple levels, the impact of TEVs on the host genome. Finally, we also identified polymorphic TEs putatively embedded in transcriptional units, proposing a novel mechanism in which TEVs may mediate individual-specific traits. CONCLUSION We contributed to unveiling the effect of polymorphic TEs on transcription in lymphoblastoid cell lines.
Collapse
Affiliation(s)
- Giovanni Spirito
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Damiano Mangoni
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
| | - Remo Sanges
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy.
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy.
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Naples, Italy.
| | - Stefano Gustincich
- Area of Neuroscience, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy.
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Genoa, Italy.
| |
Collapse
|
179
|
Gardner EJ, Prigmore E, Gallone G, Danecek P, Samocha KE, Handsaker J, Gerety SS, Ironfield H, Short PJ, Sifrim A, Singh T, Chandler KE, Clement E, Lachlan KL, Prescott K, Rosser E, FitzPatrick DR, Firth HV, Hurles ME. Contribution of retrotransposition to developmental disorders. Nat Commun 2019; 10:4630. [PMID: 31604926 PMCID: PMC6789007 DOI: 10.1038/s41467-019-12520-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 09/11/2019] [Indexed: 02/08/2023] Open
Abstract
Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.
Collapse
Affiliation(s)
- Eugene J Gardner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Elena Prigmore
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Giuseppe Gallone
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Kaitlin E Samocha
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Juliet Handsaker
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Sebastian S Gerety
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Holly Ironfield
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Patrick J Short
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Alejandro Sifrim
- Department of Human Genetics, KU Leuven, Herestraat 49, Box 602, Leuven, B-3000, Belgium
| | - Tarjinder Singh
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Kate E Chandler
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, Greater, Manchester, M13 9WL, UK
| | - Emma Clement
- Department of Clinical Genetics, North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Trust, Holborn, London, WC1N 3JH, UK
| | - Katherine L Lachlan
- Wessex Clinical Genetics Service, Southampton University Hospitals NHS Foundation Trust, Princess Anne Hospital, Southampton, SO16 5YA, UK.,Faculty of Medicine, Human Development and Health, University of Southampton, Southampton, SO17 1BJ, UK
| | - Katrina Prescott
- Clinical Genetics Department, Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Chapel Allerton Hospital, Leeds, LS7 4SA, UK
| | - Elisabeth Rosser
- Department of Clinical Genetics, North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Trust, Holborn, London, WC1N 3JH, UK
| | - David R FitzPatrick
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, WGH, Edinburgh, EH4 2SP, UK
| | - Helen V Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK.,East Anglian Medical Genetics Service, Box 134, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Matthew E Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK.
| |
Collapse
|
180
|
Yang WR, Ardeljan D, Pacyna CN, Payer LM, Burns KH. SQuIRE reveals locus-specific regulation of interspersed repeat expression. Nucleic Acids Res 2019; 47:e27. [PMID: 30624635 PMCID: PMC6411935 DOI: 10.1093/nar/gky1301] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 12/18/2018] [Accepted: 01/03/2019] [Indexed: 12/13/2022] Open
Abstract
Transposable elements (TEs) are interspersed repeat sequences that make up much of the human genome. Their expression has been implicated in development and disease. However, TE-derived RNA-seq reads are difficult to quantify. Past approaches have excluded these reads or aggregated RNA expression to subfamilies shared by similar TE copies, sacrificing quantitative accuracy or the genomic context necessary to understand the basis of TE transcription. As a result, the effects of TEs on gene expression and associated phenotypes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), the first RNA-seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression (https://github.com/wyang17/SQuIRE). SQuIRE is an accurate and user-friendly tool that can be used for a variety of species. We applied SQuIRE to RNA-seq from normal mouse tissues and a Drosophila model of amyotrophic lateral sclerosis. In both model organisms, we recapitulated previously reported TE subfamily expression levels and revealed locus-specific TE expression. We also identified differences in TE transcription patterns relating to transcript type, gene expression and RNA splicing that would be lost with other approaches using subfamily-level analyses. Altogether, our findings illustrate the importance of studying TE transcription with locus-level resolution.
Collapse
Affiliation(s)
- Wan R Yang
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniel Ardeljan
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,McKusick-Nathans Institute of Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Clarissa N Pacyna
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Thomas C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, MD, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,McKusick-Nathans Institute of Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
181
|
McKerrow W, Fenyö D. L1EM: a tool for accurate locus specific LINE-1 RNA quantification. Bioinformatics 2019; 36:1167-1173. [PMID: 31584629 PMCID: PMC8215917 DOI: 10.1093/bioinformatics/btz724] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/31/2019] [Accepted: 09/25/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION LINE-1 elements are retrotransposons that are capable of copying their sequence to new genomic loci. LINE-1 derepression is associated with a number of disease states, and has the potential to cause significant cellular damage. Because LINE-1 elements are repetitive, it is difficult to quantify LINE-1 RNA at specific loci and to separate transcripts with protein coding capability from other sources of LINE-1 RNA. RESULTS We provide a tool, L1EM that uses the expectation maximization algorithm to quantify LINE-1 RNA at each genomic locus, separating transcripts that are capable of generating retrotransposition from those that are not. We show the accuracy of L1EM on simulated data and against long read sequencing from HEK cells. AVAILABILITY AND IMPLEMENTATION L1EM is written in python. The source code along with the necessary annotations are available at https://github.com/FenyoLab/L1EM and distributed under GPLv3. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - David Fenyö
- To whom correspondence should be addressed. or
| |
Collapse
|
182
|
Feusier J, Watkins WS, Thomas J, Farrell A, Witherspoon DJ, Baird L, Ha H, Xing J, Jorde LB. Pedigree-based estimation of human mobile element retrotransposition rates. Genome Res 2019; 29:1567-1577. [PMID: 31575651 PMCID: PMC6771411 DOI: 10.1101/gr.247965.118] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 08/14/2019] [Indexed: 12/26/2022]
Abstract
Germline mutation rates in humans have been estimated for a variety of mutation types, including single-nucleotide and large structural variants. Here, we directly measure the germline retrotransposition rate for the three active retrotransposon elements: L1, Alu, and SVA. We used three tools for calling mobile element insertions (MEIs) (MELT, RUFUS, and TranSurVeyor) on blood-derived whole-genome sequence (WGS) data from 599 CEPH individuals, comprising 33 three-generation pedigrees. We identified 26 de novo MEIs in 437 births. The retrotransposition rate estimates for Alu elements, one in 40 births, is roughly half the rate estimated using phylogenetic analyses, a difference in magnitude similar to that observed for single-nucleotide variants. The L1 retrotransposition rate is one in 63 births and is within range of previous estimates (1:20-1:200 births). The SVA retrotransposition rate, one in 63 births, is much higher than the previous estimate of one in 900 births. Our large, three-generation pedigrees allowed us to assess parent-of-origin effects and the timing of insertion events in either gametogenesis or early embryonic development. We find a statistically significant paternal bias in Alu retrotransposition. Our study represents the first in-depth analysis of the rate and dynamics of human retrotransposition from WGS data in three-generation human pedigrees.
Collapse
Affiliation(s)
- Julie Feusier
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| | - W Scott Watkins
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| | - Jainy Thomas
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| | - Andrew Farrell
- USTAR Center for Genetic Discovery, Salt Lake City, Utah 84112, USA
| | - David J Witherspoon
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| | - Lisa Baird
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| | - Hongseok Ha
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84112, USA
| |
Collapse
|
183
|
|
184
|
Zhou Y, Minio A, Massonnet M, Solares E, Lv Y, Beridze T, Cantu D, Gaut BS. The population genetics of structural variants in grapevine domestication. NATURE PLANTS 2019; 5:965-979. [PMID: 31506640 DOI: 10.1038/s41477-019-0507-8] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 07/26/2019] [Indexed: 05/20/2023]
Abstract
Structural variants (SVs) are a largely unexplored feature of plant genomes. Little is known about the type and size of SVs, their distribution among individuals and, especially, their population dynamics. Understanding these dynamics is critical for understanding both the contributions of SVs to phenotypes and the likelihood of identifying them as causal genetic variants in genome-wide associations. Here, we identify SVs and study their evolutionary genomics in clonally propagated grapevine cultivars and their outcrossing wild progenitors. To catalogue SVs, we assembled the highly heterozygous Chardonnay genome, for which one in seven genes is hemizygous based on SVs. Using an integrative comparison between Chardonnay and Cabernet Sauvignon genomes by whole-genome, long-read and short-read alignment, we extended SV detection to population samples. We found that strong purifying selection acts against SVs but particularly against inversion and translocation events. SVs nonetheless accrue as recessive heterozygotes in clonally propagated lineages. They also define outlier regions of genomic divergence between wild and cultivated grapevines, suggesting roles in domestication. Outlier regions include the sex-determination region and the berry colour locus, where independent large, complex inversions have driven convergent phenotypic evolution.
Collapse
Affiliation(s)
- Yongfeng Zhou
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Andrea Minio
- Department of Viticulture and Enology, UC Davis, Davis, CA, USA
| | | | - Edwin Solares
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Yuanda Lv
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Tengiz Beridze
- Institute of Molecular Genetics, Agricultural University of Georgia, Tbilisi, Georgia
| | - Dario Cantu
- Department of Viticulture and Enology, UC Davis, Davis, CA, USA.
| | - Brandon S Gaut
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA.
| |
Collapse
|
185
|
Rajaby R, Sung WK. TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Res 2019; 46:e122. [PMID: 30137425 PMCID: PMC6237741 DOI: 10.1093/nar/gky685] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 07/19/2018] [Indexed: 01/21/2023] Open
Abstract
Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling: first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
Collapse
Affiliation(s)
- Ramesh Rajaby
- School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, 117456, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore.,Genome Institute of Singapore, 60 Biopolis Street, Genome, 138672, Singapore
| |
Collapse
|
186
|
Abstract
Context: Africa's role in the narrative of human evolution is indisputably emphasised in the emergence of Homo sapiens. However, once humans dispersed beyond Africa, the history of those who stayed remains vastly under-studied, lacking the proper attention the birthplace of both modern and archaic humans deserves. The sequencing of Neanderthal and Denisovan genomes has elucidated evidence of admixture between archaic and modern humans outside of Africa, but has not aided efforts in answering whether archaic admixture happened within Africa. Objectives: This article reviews the state of research for archaic introgression in African populations and discusses recent insights into this topic. Methods: Gathering published sources and recently released preprints, this review reports on the different methods developed for detecting archaic introgression. Particularly it discusses how relevant these are when implemented on African populations and what findings these studies have shown so far. Results: Methods for detecting archaic introgression have been predominantly developed and implemented on non-African populations. Recent preprints present new methods considering African populations. While a number of studies using these methods suggest archaic introgression in Africa, without an African archaic genome to validate these results, such findings remain as putative archaic introgression. Conclusion: In light of the caveats with implementing current archaic introgression detection methods in Africa, we recommend future studies to concentrate on unravelling the complicated demographic history of Africa through means of ancient DNA where possible and through more focused efforts to sequence modern DNA from more representative populations across the African continent.
Collapse
Affiliation(s)
- Cindy Santander
- a Department of Zoology , University of Oxford , Oxford , UK
| | - Francesco Montinaro
- a Department of Zoology , University of Oxford , Oxford , UK.,b Estonian Biocentre , University of Tartu , Tartu , Estonia
| | | |
Collapse
|
187
|
Sanchez-Luque FJ, Kempen MJHC, Gerdes P, Vargas-Landin DB, Richardson SR, Troskie RL, Jesuadian JS, Cheetham SW, Carreira PE, Salvador-Palomeque C, García-Cañadas M, Muñoz-Lopez M, Sanchez L, Lundberg M, Macia A, Heras SR, Brennan PM, Lister R, Garcia-Perez JL, Ewing AD, Faulkner GJ. LINE-1 Evasion of Epigenetic Repression in Humans. Mol Cell 2019; 75:590-604.e12. [PMID: 31230816 DOI: 10.1016/j.molcel.2019.05.024] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 04/08/2019] [Accepted: 05/15/2019] [Indexed: 02/07/2023]
Abstract
Epigenetic silencing defends against LINE-1 (L1) retrotransposition in mammalian cells. However, the mechanisms that repress young L1 families and how L1 escapes to cause somatic genome mosaicism in the brain remain unclear. Here we report that a conserved Yin Yang 1 (YY1) transcription factor binding site mediates L1 promoter DNA methylation in pluripotent and differentiated cells. By analyzing 24 hippocampal neurons with three distinct single-cell genomic approaches, we characterized and validated a somatic L1 insertion bearing a 3' transduction. The source (donor) L1 for this insertion was slightly 5' truncated, lacked the YY1 binding site, and was highly mobile when tested in vitro. Locus-specific bisulfite sequencing revealed that the donor L1 and other young L1s with mutated YY1 binding sites were hypomethylated in embryonic stem cells, during neurodifferentiation, and in liver and brain tissue. These results explain how L1 can evade repression and retrotranspose in the human body.
Collapse
Affiliation(s)
- Francisco J Sanchez-Luque
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain.
| | - Marie-Jeanne H C Kempen
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Patricia Gerdes
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Dulce B Vargas-Landin
- Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, the University of Western Australia, Perth, WA 6009, Australia; Harry Perkins Institute of Medical Research, Perth, WA 6009, Australia
| | - Sandra R Richardson
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Robin-Lee Troskie
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - J Samuel Jesuadian
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Seth W Cheetham
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Patricia E Carreira
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Carmen Salvador-Palomeque
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Marta García-Cañadas
- GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain
| | - Martin Muñoz-Lopez
- GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain
| | - Laura Sanchez
- GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain
| | - Mischa Lundberg
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Angela Macia
- Department of Pediatrics/Rady Children's Hospital San Diego, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Sara R Heras
- GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain; Department of Biochemistry and Molecular Biology II, Faculty of Pharmacy, University of Granada, Campus Universitario de Cartuja, 18071 Granada, Spain
| | - Paul M Brennan
- Edinburgh Cancer Research Centre, Western General Hospital, Edinburgh, EH4 2XR, UK
| | - Ryan Lister
- Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, the University of Western Australia, Perth, WA 6009, Australia; Harry Perkins Institute of Medical Research, Perth, WA 6009, Australia
| | - Jose L Garcia-Perez
- GENYO Centre for Genomics and Oncological Research, Pfizer University of Granada, Andalusian Regional Government, Avda Ilustración, 114, PTS Granada 18016, Spain; MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Adam D Ewing
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
188
|
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019; 20:117. [PMID: 31159850 PMCID: PMC6547561 DOI: 10.1186/s13059-019-1720-5] [Citation(s) in RCA: 236] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 05/20/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Xiaoxi Liu
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Chikashi Terao
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| |
Collapse
|
189
|
Bourgeois Y, Boissinot S. On the Population Dynamics of Junk: A Review on the Population Genomics of Transposable Elements. Genes (Basel) 2019; 10:genes10060419. [PMID: 31151307 PMCID: PMC6627506 DOI: 10.3390/genes10060419] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 05/05/2019] [Accepted: 05/21/2019] [Indexed: 01/18/2023] Open
Abstract
Transposable elements (TEs) play an important role in shaping genomic organization and structure, and may cause dramatic changes in phenotypes. Despite the genetic load they may impose on their host and their importance in microevolutionary processes such as adaptation and speciation, the number of population genetics studies focused on TEs has been rather limited so far compared to single nucleotide polymorphisms (SNPs). Here, we review the current knowledge about the dynamics of transposable elements at recent evolutionary time scales, and discuss the mechanisms that condition their abundance and frequency. We first discuss non-adaptive mechanisms such as purifying selection and the variable rates of transposition and elimination, and then focus on positive and balancing selection, to finally conclude on the potential role of TEs in causing genomic incompatibilities and eventually speciation. We also suggest possible ways to better model TEs dynamics in a population genomics context by incorporating recent advances in TEs into the rich information provided by SNPs about the demography, selection, and intrinsic properties of genomes.
Collapse
Affiliation(s)
- Yann Bourgeois
- New York University Abu Dhabi, P.O. 129188, Saadiyat Island, Abu Dhabi, United Arab Emirates.
| | - Stéphane Boissinot
- New York University Abu Dhabi, P.O. 129188, Saadiyat Island, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
190
|
Fiévet A, Bellanger D, Rieunier G, Dubois d'Enghien C, Sophie J, Calvas P, Carriere JP, Anheim M, Castrioto A, Flabeau O, Degos B, Ewenczyk C, Mahlaoui N, Touzot F, Suarez F, Hully M, Roubertie A, Aladjidi N, Tison F, Antoine-Poirel H, Dahan K, Doummar D, Nougues MC, Ioos C, Rougeot C, Masurel A, Bourjault C, Ginglinger E, Prieur F, Siri A, Bordigoni P, Nguyen K, Philippe N, Bellesme C, Demeocq F, Altuzarra C, Mathieu-Dramard M, Couderc F, Dörk T, Auger N, Parfait B, Abidallah K, Moncoutier V, Collet A, Stoppa-Lyonnet D, Stern MH. Functional classification of ATM variants in ataxia-telangiectasia patients. Hum Mutat 2019; 40:1713-1730. [PMID: 31050087 DOI: 10.1002/humu.23778] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/11/2022]
Abstract
Ataxia-telangiectasia (A-T) is a recessive disorder caused by biallelic pathogenic variants of ataxia-telangiectasia mutated (ATM). This disease is characterized by progressive ataxia, telangiectasia, immune deficiency, predisposition to malignancies, and radiosensitivity. However, hypomorphic variants may be discovered associated with very atypical phenotypes, raising the importance of evaluating their pathogenic effects. In this study, multiple functional analyses were performed on lymphoblastoid cell lines from 36 patients, comprising 49 ATM variants, 24 being of uncertain significance. Thirteen patients with atypical phenotype and presumably hypomorphic variants were of particular interest to test strength of functional analyses and to highlight discrepancies with typical patients. Western-blot combined with transcript analyses allowed the identification of one missing variant, confirmed suspected splice defects and revealed unsuspected minor transcripts. Subcellular localization analyses confirmed the low level and abnormal cytoplasmic localization of ATM for most A-T cell lines. Interestingly, atypical patients had lower kinase defect and less altered cell-cycle distribution after genotoxic stress than typical patients. In conclusion, this study demonstrated the pathogenic effects of the 49 variants, highlighted the strength of KAP1 phosphorylation test for pathogenicity assessment and allowed the establishment of the Ataxia-TeLangiectasia Atypical Score to predict atypical phenotype. Altogether, we propose strategies for ATM variant detection and classification.
Collapse
Affiliation(s)
- Alice Fiévet
- Institut Curie, PSL Research University, INSERM U830, Paris, France.,Institut Curie, Hôpital, Service de Génétique, Paris, France
| | - Dorine Bellanger
- Institut Curie, PSL Research University, INSERM U830, Paris, France
| | | | | | - Julia Sophie
- CHU de Toulouse, Service de Génétique Médicale, Toulouse, France
| | - Patrick Calvas
- CHU de Toulouse, Service de Génétique Médicale, Toulouse, France
| | - Jean-Paul Carriere
- Hopital des enfants de Toulouse, Unité de Neuropédiatrie, Toulouse, France
| | - Mathieu Anheim
- CHU de Strasbourg, Service de Neurologie, Strasbourg, France
| | - Anna Castrioto
- CHU de Grenoble, Pole de Psychiatrie et de Neurologie, Grenoble, France
| | - Olivier Flabeau
- CH de la côte Basque, Service de Neurologie, Bayonne, France
| | - Bertrand Degos
- Département des Maladies du Système Nerveux, Hôpitaux Universitaires Pitié Salpêtrière - Charles Foix, Paris, France
| | - Claire Ewenczyk
- Hôpitaux universitaires Pitié Salpêtrière - Charles Foix, Service de Génétique, Paris, France
| | - Nizar Mahlaoui
- Hôpital Necker Enfants Malades, Service d'Immunologie, d'Hématologie et de Rhumatologie Pédiatriques, Paris, France
| | - Fabien Touzot
- Hôpital Necker Enfants Malades, Service d'Immunologie, d'Hématologie et de Rhumatologie Pédiatriques, Paris, France
| | - Felipe Suarez
- Hôpital Necker Enfants Malades, Service d'Hématologie Adulte, Paris, France
| | - Marie Hully
- Hôpital Necker Enfants Malades, Service de Neurologie Pédiatrique, Paris, France
| | - Agathe Roubertie
- CHU de Montpellier, Service de Neuropédiatrie, Montpellier, France
| | | | - François Tison
- CHU de Bordeaux, Département de Neurologie, Bordeaux, France
| | - Hélène Antoine-Poirel
- Centre de Génétique Humaine, Cliniques Universitaires Saint-Luc & Université Catholique de Louvain, Brussels, Belgium
| | - Karine Dahan
- Centre de Génétique Humaine, Cliniques Universitaires Saint-Luc & Université Catholique de Louvain, Brussels, Belgium
| | - Diane Doummar
- Hopital Armand Trousseau, Service de Neurologie Pédiatrique, Paris, France
| | | | - Christine Ioos
- Hôpital Raymond Poincaré, Pôle de Pédiatrie, Garches, France
| | | | - Alice Masurel
- Hopital d'Enfants de Dijon, Service de Génétique, Dijon, France
| | - Caroline Bourjault
- CH de Bretagne sud, Site du Scorff, Service de Pédiatrie, Lorient, France
| | | | - Fabienne Prieur
- CHU de St Etienne, Hôpital Nord, Service de Génétique Médicale, Saint Etienne, France
| | - Aurélie Siri
- CHU de Nancy, Service de Neurologie, Nancy, France
| | - Pierre Bordigoni
- CHU Nancy, Hôpitaux de Brabois, Service de Pédiatrie II, Vandoeuvre, France
| | - Karine Nguyen
- Département de Génétique Médicale, Hopital de la Timone, Marseille, France
| | - Noel Philippe
- Hopital Debrousse, Service d'Hématologie Pédiatrique, Lyon, France
| | - Céline Bellesme
- GH Cochin-saint-Vincent de Paul, Service d'Endocrinologie et de Neurologie Pédiatrique, Paris, France
| | - François Demeocq
- CHU de Clermont-Ferrand, Hôtel Dieu, Service de Pédiatrie B, Clermont-Ferrand, France
| | | | | | - Fanny Couderc
- CH d'Aix en Provence - du Pays d'Aix, Service de Pédiatrie, Aix en Provence, France
| | - Thilo Dörk
- Gynecology Research Unit, Hannover Medical School, Hannover, Germany
| | - Nathalie Auger
- Gustave Roussy, Service Génétique des Tumeurs, Villejuif, France
| | - Béatrice Parfait
- Centre de ressources Biologiques, Hôpital Cochin, Assistance Publique-Hôpitaux de Paris, Paris, France
| | | | | | - Agnès Collet
- Institut Curie, Hôpital, Service de Génétique, Paris, France
| | - Dominique Stoppa-Lyonnet
- Institut Curie, PSL Research University, INSERM U830, Paris, France.,Institut Curie, Hôpital, Service de Génétique, Paris, France.,University Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Marc-Henri Stern
- Institut Curie, PSL Research University, INSERM U830, Paris, France.,Institut Curie, Hôpital, Service de Génétique, Paris, France
| |
Collapse
|
191
|
Sultana T, van Essen D, Siol O, Bailly-Bechet M, Philippe C, Zine El Aabidine A, Pioger L, Nigumann P, Saccani S, Andrau JC, Gilbert N, Cristofari G. The Landscape of L1 Retrotransposons in the Human Genome Is Shaped by Pre-insertion Sequence Biases and Post-insertion Selection. Mol Cell 2019; 74:555-570.e7. [PMID: 30956044 DOI: 10.1016/j.molcel.2019.02.036] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 01/28/2019] [Accepted: 02/25/2019] [Indexed: 01/10/2023]
Abstract
L1 retrotransposons are transposable elements and major contributors of genetic variation in humans. Where L1 integrates into the genome can directly impact human evolution and disease. Here, we experimentally induced L1 retrotransposition in cells and mapped integration sites at nucleotide resolution. At local scales, L1 integration is mostly restricted by genome sequence biases and the specificity of the L1 machinery. At regional scales, L1 shows a broad capacity for integration into all chromatin states, in contrast to other known mobile genetic elements. However, integration is influenced by the replication timing of target regions, suggesting a link to host DNA replication. The distribution of new L1 integrations differs from those of preexisting L1 copies, which are significantly reshaped by natural selection. Our findings reveal that the L1 machinery has evolved to efficiently target all genomic regions and underline a predominant role for post-integrative processes on the distribution of endogenous L1 elements.
Collapse
Affiliation(s)
- Tania Sultana
- Université Côte d'Azur, Inserm, CNRS, IRCAN, Nice, France
| | | | - Oliver Siol
- Institut de Génétique Humaine, University of Montpellier, CNRS, Montpellier, France
| | | | | | - Amal Zine El Aabidine
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Léo Pioger
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Pilvi Nigumann
- Université Côte d'Azur, Inserm, CNRS, IRCAN, Nice, France
| | - Simona Saccani
- Université Côte d'Azur, Inserm, CNRS, IRCAN, Nice, France
| | - Jean-Christophe Andrau
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Nicolas Gilbert
- Institut de Génétique Humaine, University of Montpellier, CNRS, Montpellier, France; Institut de Médecine Régénératrice et de Biothérapie, Inserm U1183, CHU Montpellier, Montpellier, France
| | | |
Collapse
|
192
|
Lerat E, Casacuberta J, Chaparro C, Vieira C. On the Importance to Acknowledge Transposable Elements in Epigenomic Analyses. Genes (Basel) 2019; 10:genes10040258. [PMID: 30935103 PMCID: PMC6523952 DOI: 10.3390/genes10040258] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 03/27/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open
Abstract
Eukaryotic genomes comprise a large proportion of repeated sequences, an important fraction of which are transposable elements (TEs). TEs are mobile elements that have a significant impact on genome evolution and on gene functioning. Although some TE insertions could provide adaptive advantages to species, transposition is a highly mutagenic event that has to be tightly controlled to ensure its viability. Genomes have evolved sophisticated mechanisms to control TE activity, the most important being epigenetic silencing. However, the epigenetic control of TEs can also affect genes located nearby that can become epigenetically regulated. It has been proposed that the combination of TE mobilization and the induced changes in the epigenetic landscape could allow a rapid phenotypic adaptation to global environmental changes. In this review, we argue the crucial need to take into account the repeated part of genomes when studying the global impact of epigenetic modifications on an organism. We emphasize more particularly why it is important to carefully consider TEs and what bioinformatic tools can be used to do so.
Collapse
Affiliation(s)
- Emmanuelle Lerat
- CNRS, Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, UMR 5558, F-69622 Villeurbanne, France.
| | - Josep Casacuberta
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, 08193 Barcelona, Spain.
| | - Cristian Chaparro
- CNRS, IHPE UMR 5244, University of Perpignan Via Domitia, IFREMER, University Montpellier, F-66860 Perpignan, France.
| | - Cristina Vieira
- CNRS, Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, UMR 5558, F-69622 Villeurbanne, France.
| |
Collapse
|
193
|
Dynamic Methylation of an L1 Transduction Family during Reprogramming and Neurodifferentiation. Mol Cell Biol 2019; 39:MCB.00499-18. [PMID: 30692270 PMCID: PMC6425141 DOI: 10.1128/mcb.00499-18] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 01/11/2019] [Indexed: 01/28/2023] Open
Abstract
The retrotransposon LINE-1 (L1) is a significant source of endogenous mutagenesis in humans. In each individual genome, a few retrotransposition-competent L1s (RC-L1s) can generate new heritable L1 insertions in the early embryo, primordial germ line, and germ cells. L1 retrotransposition can also occur in the neuronal lineage and cause somatic mosaicism. Although DNA methylation mediates L1 promoter repression, the temporal pattern of methylation applied to individual RC-L1s during neurogenesis is unclear. Here, we identified a de novo L1 insertion in a human induced pluripotent stem cell (hiPSC) line via retrotransposon capture sequencing (RC-seq). The L1 insertion was full-length and carried 5' and 3' transductions. The corresponding donor RC-L1 was part of a large and recently active L1 transduction family and was highly mobile in a cultured-cell L1 retrotransposition reporter assay. Notably, we observed distinct and dynamic DNA methylation profiles for the de novo L1 and members of its extended transduction family during neuronal differentiation. These experiments reveal how a de novo L1 insertion in a pluripotent stem cell is rapidly recognized and repressed, albeit incompletely, by the host genome during neurodifferentiation, while retaining potential for further retrotransposition.
Collapse
|
194
|
Steranka JP, Tang Z, Grivainis M, Huang CRL, Payer LM, Rego FOR, Miller TLA, Galante PAF, Ramaswami S, Heguy A, Fenyö D, Boeke JD, Burns KH. Transposon insertion profiling by sequencing (TIPseq) for mapping LINE-1 insertions in the human genome. Mob DNA 2019; 10:8. [PMID: 30899333 PMCID: PMC6407172 DOI: 10.1186/s13100-019-0148-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 01/14/2019] [Indexed: 12/14/2022] Open
Abstract
Background Transposable elements make up a significant portion of the human genome. Accurately locating these mobile DNAs is vital to understand their role as a source of structural variation and somatic mutation. To this end, laboratories have developed strategies to selectively amplify or otherwise enrich transposable element insertion sites in genomic DNA. Results Here we describe a technique, Transposon Insertion Profiling by sequencing (TIPseq), to map Long INterspersed Element 1 (LINE-1, L1) retrotransposon insertions in the human genome. This method uses vectorette PCR to amplify species-specific L1 (L1PA1) insertion sites followed by paired-end Illumina sequencing. In addition to providing a step-by-step molecular biology protocol, we offer users a guide to our pipeline for data analysis, TIPseqHunter. Our recent studies in pancreatic and ovarian cancer demonstrate the ability of TIPseq to identify invariant (fixed), polymorphic (inherited variants), as well as somatically-acquired L1 insertions that distinguish cancer genomes from a patient’s constitutional make-up. Conclusions TIPseq provides an approach for amplifying evolutionarily young, active transposable element insertion sites from genomic DNA. Our rationale and variations on this protocol may be useful to those mapping L1 and other mobile elements in complex genomes. Electronic supplementary material The online version of this article (10.1186/s13100-019-0148-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jared P Steranka
- 1Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA.,2McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Zuojian Tang
- 3Department for Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016 USA.,4Institute for Systems Genetics, NYU Langone Health, New York, NY 10016 USA
| | - Mark Grivainis
- 3Department for Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016 USA.,4Institute for Systems Genetics, NYU Langone Health, New York, NY 10016 USA
| | - Cheng Ran Lisa Huang
- 2McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Lindsay M Payer
- 1Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| | - Fernanda O R Rego
- 5Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo, Brazil
| | - Thiago Luiz Araujo Miller
- 5Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo, Brazil.,Departamento de Bioquímica, Instituto de Química, Universidade de São Paul, São Paulo, Brazil
| | - Pedro A F Galante
- 5Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo, Brazil
| | - Sitharam Ramaswami
- 7Genome Technology Center, Division of Advanced Research Technologies, NYU Langone Health, New York, NY USA
| | - Adriana Heguy
- 7Genome Technology Center, Division of Advanced Research Technologies, NYU Langone Health, New York, NY USA
| | - David Fenyö
- 3Department for Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016 USA.,4Institute for Systems Genetics, NYU Langone Health, New York, NY 10016 USA
| | - Jef D Boeke
- 4Institute for Systems Genetics, NYU Langone Health, New York, NY 10016 USA
| | - Kathleen H Burns
- 1Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA.,2McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 USA
| |
Collapse
|
195
|
Lammers F, Blumer M, Rücklé C, Nilsson MA. Retrophylogenomics in rorquals indicate large ancestral population sizes and a rapid radiation. Mob DNA 2019; 10:5. [PMID: 30679961 PMCID: PMC6340175 DOI: 10.1186/s13100-018-0143-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 12/18/2018] [Indexed: 02/03/2023] Open
Abstract
Background Baleen whales (Mysticeti) are the largest animals on earth and their evolutionary history has been studied in detail, but some relationships still remain contentious. In particular, reconstructing the phylogenetic position of the gray whales (Eschrichtiidae) has been complicated by evolutionary processes such as gene flow and incomplete lineage sorting (ILS). Here, whole-genome sequencing data of the extant baleen whale radiation allowed us to identify transposable element (TE) insertions in order to perform phylogenomic analyses and measure germline insertion rates of TEs. Baleen whales exhibit the slowest nucleotide substitution rate among mammals, hence we additionally examined the evolutionary insertion rates of TE insertions across the genomes. Results In eleven whole-genome sequences representing the extant radiation of baleen whales, we identified 91,859 CHR-SINE insertions that were used to reconstruct the phylogeny with different approaches as well as perform evolutionary network analyses and a quantification of conflicting phylogenetic signals. Our results indicate that the radiation of rorquals and gray whales might not be bifurcating. The morphologically derived gray whales are placed inside the rorqual group, as the sister-species to humpback and fin whales. Detailed investigation of TE insertion rates confirm that a mutational slow down in the whale lineage is present but less pronounced for TEs than for nucleotide substitutions. Conclusions Whole genome sequencing based detection of TE insertions showed that the speciation processes in baleen whales represent a rapid radiation. Large genome-scale TE data sets in addition allow to understand retrotransposition rates in non-model organisms and show the potential for TE calling methods to study the evolutionary history of species. Electronic supplementary material The online version of this article (10.1186/s13100-018-0143-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fritjof Lammers
- 1Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt am Main, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325 Frankfurt am Main, Germany.,3Institute for Ecology, Evolution and Diversity, Goethe University Frankfurt, Biologicum, Max-von-Laue-Straße 13, 60439 Frankfurt am Main, Germany
| | - Moritz Blumer
- 1Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt am Main, Germany
| | - Cornelia Rücklé
- 1Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt am Main, Germany
| | - Maria A Nilsson
- 1Senckenberg Biodiversity and Climate Research Centre, Senckenberg Gesellschaft für Naturforschung, Senckenberganlage 25, 60325 Frankfurt am Main, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325 Frankfurt am Main, Germany
| |
Collapse
|
196
|
Transposable Elements: Classification, Identification, and Their Use As a Tool For Comparative Genomics. Methods Mol Biol 2019; 1910:177-207. [PMID: 31278665 DOI: 10.1007/978-1-4939-9074-0_6] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Most genomes are populated by hundreds of thousands of sequences originated from mobile elements. On the one hand, these sequences present a real challenge in the process of genome analysis and annotation. On the other hand, they are very interesting biological subjects involved in many cellular processes. Here we present an overview of transposable elements biodiversity, and we discuss different approaches to transposable elements detection and analyses.
Collapse
|
197
|
Thomas J, Perron H, Feschotte C. Variation in proviral content among human genomes mediated by LTR recombination. Mob DNA 2018; 9:36. [PMID: 30568734 PMCID: PMC6298018 DOI: 10.1186/s13100-018-0142-3] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 11/29/2018] [Indexed: 01/23/2023] Open
Abstract
Background Human endogenous retroviruses (HERVs) occupy a substantial fraction of the genome and impact cellular function with both beneficial and deleterious consequences. The vast majority of HERV sequences descend from ancient retroviral families no longer capable of infection or genomic propagation. In fact, most are no longer represented by full-length proviruses but by solitary long terminal repeats (solo LTRs) that arose via non-allelic recombination events between the two LTRs of a proviral insertion. Because LTR-LTR recombination events may occur long after proviral insertion but are challenging to detect in resequencing data, we hypothesize that this mechanism is a source of genomic variation in the human population that remains vastly underestimated. Results We developed a computational pipeline specifically designed to capture dimorphic proviral/solo HERV allelic variants from short-read genome sequencing data. When applied to 279 individuals sequenced as part of the Simons Genome Diversity Project, the pipeline retrieves most of the dimorphic loci previously reported for the HERV-K(HML2) subfamily as well as dozens of additional candidates, including members of the HERV-H and HERV-W families previously involved in human development and disease. We experimentally validate several of these newly discovered dimorphisms, including the first reported instance of an unfixed HERV-W provirus and an HERV-H locus driving a transcript (ESRG) implicated in the maintenance of embryonic stem cell pluripotency. Conclusions Our findings indicate that human proviral content exhibit more extensive interindividual variation than previously recognized, which has important bearings for deciphering the contribution of HERVs to human physiology and disease. Because LTR retroelements and LTR recombination are ubiquitous in eukaryotes, our computational pipeline should facilitate the mapping of this type of genomic variation for a wide range of organisms. Electronic supplementary material The online version of this article (10.1186/s13100-018-0142-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jainy Thomas
- 1Department of Human Genetics, University of Utah School of Medicine, 15 North 2030 East, Rm 5100, Salt Lake City, UT 84112 USA
| | - Hervé Perron
- GeNeuro, Plan-les-Ouates, Geneva, Switzerland.,3Université Claude Bernard, Lyon, France
| | - Cédric Feschotte
- 4Department of Molecular Biology and Genetics, Cornell University, 107 Biotechnology Building, Ithaca, NY 14853 USA
| |
Collapse
|
198
|
Bae J, Lee KW, Islam MN, Yim HS, Park H, Rho M. iMGEins: detecting novel mobile genetic elements inserted in individual genomes. BMC Genomics 2018; 19:944. [PMID: 30563451 PMCID: PMC6299635 DOI: 10.1186/s12864-018-5290-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/20/2018] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in sequencing technology have allowed us to investigate personal genomes to find structural variations, which have been studied extensively to identify their association with the physiology of diseases such as cancer. In particular, mobile genetic elements (MGEs) are one of the major constituents of the human genomes, and cause genome instability by insertion, mutation, and rearrangement. Result We have developed a new program, iMGEins, to identify such novel MGEs by using sequencing reads of individual genomes, and to explore the breakpoints with the supporting reads and MGEs detected. iMGEins is the first MGE detection program that integrates three algorithmic components: discordant read-pair mapping, split-read mapping, and insertion sequence assembly. Our evaluation results showed its outstanding performance in detecting novel MGEs from simulated genomes, as well as real personal genomes. In detail, the average recall and precision rates of iMGEins are 96.67 and 100%, respectively, which are the highest among the programs compared. In the testing with real human genomes of the NA12878 sample, iMGEins shows the highest accuracy in detecting MGEs within 20 bp proximity of the breakpoints annotated. Conclusion In order to study the dynamics of MGEs in individual genomes, iMGEins was developed to accurately detect breakpoints and report inserted MGEs. Compared with other programs, iMGEins has valuable features of identifying novel MGEs and assembling the MGEs inserted. Electronic supplementary material The online version of this article (10.1186/s12864-018-5290-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junwoo Bae
- Department of Electronics and Computer Engineering, Hanyang University, Seoul, Korea
| | - Kyeong Won Lee
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea
| | - Mohammad Nazrul Islam
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea.,Department of Marine Biotechnology, Korea University of Science and Technology, Daejeon, Korea.,Department of Biotechnology, Sher-e-Bangla Agricultural University, Dhaka, 1207, Bangladesh
| | - Hyung-Soon Yim
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea.,Department of Marine Biotechnology, Korea University of Science and Technology, Daejeon, Korea
| | - Heejin Park
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea. .,Department of Biomedical Informatics, Hanyang University, Seoul, Korea.
| | - Mina Rho
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea. .,Department of Biomedical Informatics, Hanyang University, Seoul, Korea.
| |
Collapse
|
199
|
Tavares E, Tang CY, Vig A, Li S, Billingsley G, Sung W, Vincent A, Thiruvahindrapuram B, Héon E. Retrotransposon insertion as a novel mutational event in Bardet-Biedl syndrome. Mol Genet Genomic Med 2018; 7:e00521. [PMID: 30484961 PMCID: PMC6393654 DOI: 10.1002/mgg3.521] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 08/23/2018] [Accepted: 10/26/2018] [Indexed: 01/12/2023] Open
Abstract
Background Bardet‐Biedl syndrome (BBS) is an autosomal recessive pleiotropic disorder of the primary cilia that leads to severe visual loss in the teenage years. Approximately 80% of BBS cases are explained by mutations in one of the 21 identified genes. Documented causative mutation types include missense, nonsense, copy number variation (CNV), frameshift deletions or insertions, and splicing variants. Methods Whole genome sequencing was performed on a patient affected with BBS for whom no mutations were identified using clinically approved genetic testing of the known genes. Analysis of the WGS was done using internal protocols and publicly available algorithms. The phenotype was defined by retrospective chart review. Results We document a female affected with BBS carrying the most common BBS1 mutation (BBS1: Met390Arg) on the maternal allele and an insertion of a ~1.7‐kb retrotransposon in exon 13 on the paternal allele. This retrotransposon insertion was not automatically annotated by the standard variant calling protocols used. This novel variant was identified by visual inspection of the alignment file followed by specific genome analysis with an available algorithm for transposable elements. Conclusion This report documents a novel mutation type associated with BBS and highlights the importance of systematically performing transposon detection analysis on WGS data of unsolved cases.
Collapse
Affiliation(s)
- Erika Tavares
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Chen Yu Tang
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Anjali Vig
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada.,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Shuning Li
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Gail Billingsley
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Wilson Sung
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ajoy Vincent
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada.,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada.,Ophthalmology and Vision Sciences, Hospital for Sick Children, Toronto, Ontario, Canada
| | | | - Elise Héon
- Genetics and Genome Biology, Hospital for Sick Children, Toronto, Ontario, Canada.,Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada.,Ophthalmology and Vision Sciences, Hospital for Sick Children, Toronto, Ontario, Canada
| |
Collapse
|
200
|
Manthey JD, Moyle RG, Boissinot S. Multiple and Independent Phases of Transposable Element Amplification in the Genomes of Piciformes (Woodpeckers and Allies). Genome Biol Evol 2018; 10:1445-1456. [PMID: 29850797 PMCID: PMC6007501 DOI: 10.1093/gbe/evy105] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2018] [Indexed: 12/15/2022] Open
Abstract
The small and conserved genomes of birds are likely a result of flight-related metabolic constraints. Recombination-driven deletions and minimal transposable element (TE) expansions have led to continually shrinking genomes during evolution of many lineages of volant birds. Despite constraints of genome size in birds, we identified multiple waves of amplification of TEs in Piciformes (woodpeckers, honeyguides, toucans, and barbets). Relative to other bird species’ genomic TE abundance (< 10% of genome), we found ∼17–30% TE content in multiple clades within Piciformes. Several families of the retrotransposon superfamily chicken repeat 1 (CR1) expanded in at least three different waves of activity. The most recent CR1 expansions (∼4–7% of genome) preceded bursts of diversification in the woodpecker clade and in the American barbets + toucans clade. Additionally, we identified several thousand polymorphic CR1 insertions (hundreds per individual) in three closely related woodpecker species. Woodpecker CR1 insertion polymorphisms are maintained at lower frequencies than single nucleotide polymorphisms indicating that purifying selection is acting against additional CR1 copies and that these elements impose a fitness cost on their host. These findings provide evidence of large scale and ongoing TE activity in avian genomes despite continual constraint on genome size.
Collapse
Affiliation(s)
- Joseph D Manthey
- New York University Abu Dhabi, UAE.,Department of Biological Sciences, Texas Tech University
| | - Robert G Moyle
- Department of Ecology and Evolutionary Biology, Biodiversity Institute, University of Kansas
| | | |
Collapse
|