1
|
An Z, Jiang A, Chen J. Toward understanding the role of genomic repeat elements in neurodegenerative diseases. Neural Regen Res 2025; 20:646-659. [PMID: 38886931 DOI: 10.4103/nrr.nrr-d-23-01568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 03/02/2024] [Indexed: 06/20/2024] Open
Abstract
Neurodegenerative diseases cause great medical and economic burdens for both patients and society; however, the complex molecular mechanisms thereof are not yet well understood. With the development of high-coverage sequencing technology, researchers have started to notice that genomic repeat regions, previously neglected in search of disease culprits, are active contributors to multiple neurodegenerative diseases. In this review, we describe the association between repeat element variants and multiple degenerative diseases through genome-wide association studies and targeted sequencing. We discuss the identification of disease-relevant repeat element variants, further powered by the advancement of long-read sequencing technologies and their related tools, and summarize recent findings in the molecular mechanisms of repeat element variants in brain degeneration, such as those causing transcriptional silencing or RNA-mediated gain of toxic function. Furthermore, we describe how in silico predictions using innovative computational models, such as deep learning language models, could enhance and accelerate our understanding of the functional impact of repeat element variants. Finally, we discuss future directions to advance current findings for a better understanding of neurodegenerative diseases and the clinical applications of genomic repeat elements.
Collapse
Affiliation(s)
- Zhengyu An
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Aidi Jiang
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Jingqi Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
2
|
Wang ZY, Ge LP, Ouyang Y, Jin X, Jiang YZ. Targeting transposable elements in cancer: developments and opportunities. Biochim Biophys Acta Rev Cancer 2024; 1879:189143. [PMID: 38936517 DOI: 10.1016/j.bbcan.2024.189143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 05/23/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
Transposable elements (TEs), comprising nearly 50% of the human genome, have transitioned from being perceived as "genomic junk" to key players in cancer progression. Contemporary research links TE regulatory disruptions with cancer development, underscoring their therapeutic potential. Advances in long-read sequencing, computational analytics, single-cell sequencing, proteomics, and CRISPR-Cas9 technologies have enriched our understanding of TEs' clinical implications, notably their impact on genome architecture, gene regulation, and evolutionary processes. In cancer, TEs, including long interspersed element-1 (LINE-1), Alus, and long terminal repeat (LTR) elements, demonstrate altered patterns, influencing both tumorigenic and tumor-suppressive mechanisms. TE-derived nucleic acids and tumor antigens play critical roles in tumor immunity, bridging innate and adaptive responses. Given their central role in oncology, TE-targeted therapies, particularly through reverse transcriptase inhibitors and epigenetic modulators, represent a novel avenue in cancer treatment. Combining these TE-focused strategies with existing chemotherapy or immunotherapy regimens could enhance efficacy and offer a new dimension in cancer treatment. This review delves into recent TE detection advancements, explores their multifaceted roles in tumorigenesis and immune regulation, discusses emerging diagnostic and therapeutic approaches centered on TEs, and anticipates future directions in cancer research.
Collapse
Affiliation(s)
- Zi-Yu Wang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Li-Ping Ge
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yang Ouyang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xi Jin
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yi-Zhou Jiang
- Department of Breast Surgery, Fudan University Shanghai Cancer Center; Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China.
| |
Collapse
|
3
|
Zhang X, Celic I, Mitchell H, Stuckert S, Vedula L, Han J. Comprehensive profiling of L1 retrotransposons in mouse. Nucleic Acids Res 2024; 52:5166-5178. [PMID: 38647072 PMCID: PMC11109951 DOI: 10.1093/nar/gkae273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/25/2024] [Accepted: 04/06/2024] [Indexed: 04/25/2024] Open
Abstract
L1 elements are retrotransposons currently active in mammals. Although L1s are typically silenced in most normal tissues, elevated L1 expression is associated with a variety of conditions, including cancer, aging, infertility and neurological disease. These associations have raised interest in the mapping of human endogenous de novo L1 insertions, and a variety of methods have been developed for this purpose. Adapting these methods to mouse genomes would allow us to monitor endogenous in vivo L1 activity in controlled, experimental conditions using mouse disease models. Here, we use a modified version of transposon insertion profiling, called nanoTIPseq, to selectively enrich young mouse L1s. By linking this amplification step with nanopore sequencing, we identified >95% annotated L1s from C57BL/6 genomic DNA using only 200 000 sequencing reads. In the process, we discovered 82 unannotated L1 insertions from a single C57BL/6 genome. Most of these unannotated L1s were near repetitive sequence and were not found with short-read TIPseq. We used nanoTIPseq on individual mouse breast cancer cells and were able to identify the annotated and unannotated L1s, as well as new insertions specific to individual cells, providing proof of principle for using nanoTIPseq to interrogate retrotransposition activity at the single-cell level in vivo.
Collapse
Affiliation(s)
- Xuanming Zhang
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Ivana Celic
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Hannah Mitchell
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Sam Stuckert
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Lalitha Vedula
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Jeffrey S Han
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, LA 70112, USA
- Tulane Cancer Center, Tulane University School of Medicine, New Orleans, LA 70112, USA
| |
Collapse
|
4
|
Janecki DM, Sen R, Szóstak N, Kajdasz A, Kordyś M, Plawgo K, Pandakov D, Philips A, Warkocki Z. LINE-1 mRNA 3' end dynamics shape its biology and retrotransposition potential. Nucleic Acids Res 2024; 52:3327-3345. [PMID: 38197223 PMCID: PMC11014359 DOI: 10.1093/nar/gkad1251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 12/16/2023] [Accepted: 12/20/2023] [Indexed: 01/11/2024] Open
Abstract
LINE-1 (L1) retrotransposons are mobile genetic elements that create new genomic insertions by a copy-paste mechanism involving L1 RNA/RNP intermediates. L1 encodes two ORFs, of which L1-ORF2p nicks genomic DNA and reverse transcribes L1 mRNA using the nicked DNA as a primer which base-pairs with poly(A) tail of L1 mRNA. To better understand the importance of non-templated L1 3' ends' dynamics and the interplay between L1 3' and 5' ends, we investigated the effects of genomic knock-outs and temporal knock-downs of XRN1, DCP2, and other factors. We hypothesized that in the absence of XRN1, the major 5'→3' exoribonuclease, there would be more L1 mRNA and retrotransposition. Conversely, we observed that loss of XRN1 decreased L1 retrotransposition. This occurred despite slight stabilization of L1 mRNA, but with decreased L1 RNP formation. Similarly, loss of DCP2, the catalytic subunit of the decapping complex, lowered retrotransposition despite increased steady-state levels of L1 proteins. In both XRN1 and DCP2 depletions we observed shortening of L1 3' poly(A) tails and their increased uridylation by TUT4/7. We explain the observed reduction of L1 retrotransposition by the changed qualities of non-templated L1 mRNA 3' ends demonstrating the important role of L1 3' end dynamics in L1 biology.
Collapse
Affiliation(s)
- Damian M Janecki
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Raneet Sen
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Natalia Szóstak
- Laboratory of Bioinformatics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Arkadiusz Kajdasz
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Martyna Kordyś
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Kinga Plawgo
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Dmytro Pandakov
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Anna Philips
- Laboratory of Bioinformatics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Zbigniew Warkocki
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
5
|
Lee M, Ahmad SF, Xu J. Regulation and function of transposable elements in cancer genomes. Cell Mol Life Sci 2024; 81:157. [PMID: 38556602 PMCID: PMC10982106 DOI: 10.1007/s00018-024-05195-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/02/2024]
Abstract
Over half of human genomic DNA is composed of repetitive sequences generated throughout evolution by prolific mobile genetic parasites called transposable elements (TEs). Long disregarded as "junk" or "selfish" DNA, TEs are increasingly recognized as formative elements in genome evolution, wired intimately into the structure and function of the human genome. Advances in sequencing technologies and computational methods have ushered in an era of unprecedented insight into how TE activity impacts human biology in health and disease. Here we discuss the current views on how TEs have shaped the regulatory landscape of the human genome, how TE activity is implicated in human cancers, and how recent findings motivate novel strategies to leverage TE activity for improved cancer therapy. Given the crucial role of methodological advances in TE biology, we pair our conceptual discussions with an in-depth review of the inherent technical challenges in studying repeats, specifically related to structural variation, expression analyses, and chromatin regulation. Lastly, we provide a catalog of existing and emerging assays and bioinformatic software that altogether are enabling the most sophisticated and comprehensive investigations yet into the regulation and function of interspersed repeats in cancer genomes.
Collapse
Affiliation(s)
- Michael Lee
- Department of Pediatrics, Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, 6000 Harry Hines Blvd., Dallas, TX, 75390, USA.
| | - Syed Farhan Ahmad
- Department of Pathology, Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place - MS 345, Memphis, TN, 38105, USA
| | - Jian Xu
- Department of Pathology, Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place - MS 345, Memphis, TN, 38105, USA.
| |
Collapse
|
6
|
Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res 2024; 34:7-19. [PMID: 38176712 PMCID: PMC10904011 DOI: 10.1101/gr.278203.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/02/2024] [Indexed: 01/06/2024]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA;
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| |
Collapse
|
7
|
Yang L, Metzger GA, Padilla Del Valle R, Delgadillo Rubalcaba D, McLaughlin RN. Evolutionary insights from profiling LINE-1 activity at allelic resolution in a single human genome. EMBO J 2024; 43:112-131. [PMID: 38177314 PMCID: PMC10883270 DOI: 10.1038/s44318-023-00007-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/18/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open
Abstract
Transposable elements have created the majority of the sequence in many genomes. In mammals, LINE-1 retrotransposons have been expanding for more than 100 million years as distinct, consecutive lineages; however, the drivers of this recurrent lineage emergence and disappearance are unknown. Most human genome assemblies provide a record of this ancient evolution, but fail to resolve ongoing LINE-1 retrotranspositions. Utilizing the human CHM1 long-read-based haploid assembly, we identified and cloned all full-length, intact LINE-1s, and found 29 LINE-1s with measurable in vitro retrotransposition activity. Among individuals, these LINE-1s varied in their presence, their allelic sequences, and their activity. We found that recently retrotransposed LINE-1s tend to be active in vitro and polymorphic in the population relative to more ancient LINE-1s. However, some rare allelic forms of old LINE-1s retain activity, suggesting older lineages can persist longer than expected. Finally, in LINE-1s with in vitro activity and in vivo fitness, we identified mutations that may have increased replication in ancient genomes and may prove promising candidates for mechanistic investigations of the drivers of LINE-1 evolution and which LINE-1 sequences contribute to human disease.
Collapse
Affiliation(s)
- Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | - Ricky Padilla Del Valle
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | | | - Richard N McLaughlin
- Pacific Northwest Research Institute, Seattle, WA, USA.
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Zhang X, Celic I, Mitchell H, Stuckert S, Vedula L, Han JS. Comprehensive profiling of L1 retrotransposons in mouse. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566638. [PMID: 38014156 PMCID: PMC10680791 DOI: 10.1101/2023.11.13.566638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
L1 elements are retrotransposons currently active in mammals. Although L1s are typically silenced in most normal tissues, elevated L1 expression is associated with a variety of conditions, including cancer, aging, infertility, and neurological disease. These associations have raised interest in the mapping of human endogenous de novo L1 insertions, and a variety of methods have been developed for this purpose. Adapting these methods to mouse genomes would allow us to monitor endogenous in vivo L1 activity in controlled, experimental conditions using mouse disease models. Here we use a modified version of transposon insertion profiling, called nanoTIPseq, to selectively enrich young mouse L1s. By linking this amplification step with nanopore sequencing, we identified >95% annotated L1s from C57BL/6 genomic DNA using only 200,000 sequencing reads. In the process, we discovered 82 unannotated L1 insertions from a single C57BL/6 genome. Most of these unannotated L1s were near repetitive sequence and were not found with short-read TIPseq. We used nanoTIPseq on individual mouse breast cancer cells and were able to identify the annotated and unannotated L1s, as well as new insertions specific to individual cells, providing proof of principle for using nanoTIPseq to interrogate retrotransposition activity at the single cell level in vivo .
Collapse
|
9
|
Li X, Lu K, Chen X, Tu K, Xie D. capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements. Commun Biol 2023; 6:974. [PMID: 37741908 PMCID: PMC10517987 DOI: 10.1038/s42003-023-05349-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 09/12/2023] [Indexed: 09/25/2023] Open
Abstract
Transposable elements (TEs) serve as both insertional mutagens and regulatory elements in cells, and their aberrant activity is increasingly being revealed to contribute to diseases and cancers. However, measuring the transcriptional consequences of nonreference and young TEs at individual loci remains challenging with current methods, primarily due to technical limitations, including short read lengths generated and insufficient coverage in target regions. Here, we introduce a long-read targeted RNA sequencing method, Cas9-assisted profiling TE expression sequencing (capTEs), for quantitative analysis of transcriptional outputs for individual TEs, including transcribed nonreference insertions, noncanonical transcripts from various transcription patterns and their correlations with expression changes in related genes. This method selectively identified TE-containing transcripts and outputted data with up to 90% TE reads, maintaining a comparable data yield to whole-transcriptome sequencing. We applied capTEs to human cancer cells and found that internal and inserted Alu elements may employ distinct regulatory mechanisms to upregulate gene expression. We expect that capTEs will be a critical tool for advancing our understanding of the biological functions of individual TEs at the locus level, revealing their roles as both mutagens and regulators in biological and pathogenic processes.
Collapse
Affiliation(s)
- Xuemei Li
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Keying Lu
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiao Chen
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Kailing Tu
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Dan Xie
- Laboratory of Omics Technology and Bioinformatics, Frontiers Science Center for Disease-related Molecular Network, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China.
| |
Collapse
|
10
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
11
|
Bilgrav Saether K, Nilsson D, Thonberg H, Tham E, Ameur A, Eisfeldt J, Lindstrand A. Transposable element insertions in 1000 Swedish individuals. PLoS One 2023; 18:e0289346. [PMID: 37506127 PMCID: PMC10381067 DOI: 10.1371/journal.pone.0289346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 07/09/2023] [Indexed: 07/30/2023] Open
Abstract
The majority of rare diseases are genetic, and regardless of advanced high-throughput genomics-based investigations, 60% of patients remain undiagnosed. A major factor limiting our ability to identify disease-causing alterations is a poor understanding of the morbid and normal human genome. A major genomic contributor of which function and distribution remain largely unstudied are the transposable elements (TE), which constitute 50% of our genome. Here we aim to resolve this knowledge gap and increase the diagnostic yield of rare disease patients investigated with clinical genome sequencing. To this end we characterized TE insertions in 1000 Swedish individuals from the SweGen dataset and 2504 individuals from the 1000 Genomes Project (1KGP), creating seven population-specific TE insertion databases. Of note, 66% of TE insertions in SweGen were present at >1% in the 1KGP databases, proving that most insertions are common across populations. Focusing on the rare TE insertions, we show that even though ~0.7% of those insertions affect protein coding genes, they rarely affect known disease casing genes (<0.1%). Finally, we applied a TE insertion identification workflow on two clinical cases where disease causing TE insertions were suspected and could verify the presence of pathogenic TE insertions in both. Altogether we demonstrate the importance of TE insertion detection and highlight possible clinical implications in rare disease diagnostics.
Collapse
Affiliation(s)
- Kristine Bilgrav Saether
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Håkan Thonberg
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Emma Tham
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
12
|
Zhao P, Peng C, Fang L, Wang Z, Liu GE. Taming transposable elements in livestock and poultry: a review of their roles and applications. Genet Sel Evol 2023; 55:50. [PMID: 37479995 PMCID: PMC10362595 DOI: 10.1186/s12711-023-00821-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/30/2023] [Indexed: 07/23/2023] Open
Abstract
Livestock and poultry play a significant role in human nutrition by converting agricultural by-products into high-quality proteins. To meet the growing demand for safe animal protein, genetic improvement of livestock must be done sustainably while minimizing negative environmental impacts. Transposable elements (TE) are important components of livestock and poultry genomes, contributing to their genetic diversity, chromatin states, gene regulatory networks, and complex traits of economic value. However, compared to other species, research on TE in livestock and poultry is still in its early stages. In this review, we analyze 72 studies published in the past 20 years, summarize the TE composition in livestock and poultry genomes, and focus on their potential roles in functional genomics. We also discuss bioinformatic tools and strategies for integrating multi-omics data with TE, and explore future directions, feasibility, and challenges of TE research in livestock and poultry. In addition, we suggest strategies to apply TE in basic biological research and animal breeding. Our goal is to provide a new perspective on the importance of TE in livestock and poultry genomes.
Collapse
Affiliation(s)
- Pengju Zhao
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Chen Peng
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark.
| | - Zhengguang Wang
- Hainan Institute of Zhejiang University, Hainan Sanya, 572000, China.
- College of Animal Sciences, Zhejiang University, Zhejiang, Hangzhou, People's Republic of China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
| |
Collapse
|
13
|
Audano PA, Beck CR. Small allelic variants are a source of ancestral bias in structural variant breakpoint placement. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.25.546295. [PMID: 37425850 PMCID: PMC10327140 DOI: 10.1101/2023.06.25.546295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.
Collapse
Affiliation(s)
- Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, USA
| |
Collapse
|
14
|
Du Q, Stow EC, LaCoste D, Freeman B, Baddoo M, Shareef A, Miller KM, Belancio VP. A novel role of TRIM28 B box domain in L1 retrotransposition and ORF2p-mediated cDNA synthesis. Nucleic Acids Res 2023; 51:4429-4450. [PMID: 37070200 PMCID: PMC10201437 DOI: 10.1093/nar/gkad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 03/22/2023] [Accepted: 03/25/2023] [Indexed: 04/19/2023] Open
Abstract
The long interspersed element 1 (LINE-1 or L1) integration is affected by many cellular factors through various mechanisms. Some of these factors are required for L1 amplification, while others either suppress or enhance specific steps during L1 propagation. Previously, TRIM28 has been identified to suppress transposable elements, including L1 expression via its canonical role in chromatin remodeling. Here, we report that TRIM28 through its B box domain increases L1 retrotransposition and facilitates shorter cDNA and L1 insert generation in cultured cells. Consistent with the latter, we observe that tumor specific L1 inserts are shorter in endometrial, ovarian, and prostate tumors with higher TRIM28 mRNA expression than in those with lower TRIM28 expression. We determine that three amino acids in the B box domain that are involved in TRIM28 multimerization are critical for its effect on both L1 retrotransposition and cDNA synthesis. We provide evidence that B boxes from the other two members in the Class VI TRIM proteins, TRIM24 and TRIM33, also increase L1 retrotransposition. Our findings could lead to a better understanding of the host/L1 evolutionary arms race in the germline and their interplay during tumorigenesis.
Collapse
Affiliation(s)
- Qianhui Du
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| | - Emily C Stow
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| | - Dawn LaCoste
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| | - Benjamin Freeman
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| | - Melody Baddoo
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
| | - Afzaal M Shareef
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| | - Kyle M Miller
- Department of Molecular Biosciences, Institute for Cellular and Molecular Biology, University of Texas at Austin, 100 E 24th Street, Austin, TX 78712, USA
| | - Victoria P Belancio
- Tulane Cancer Center, Tulane Health Sciences Center, 1700 Tulane Ave, New Orleans, LA 70112, USA
- Department of Structural and Cellular Biology, Tulane School of Medicine, 1430 Tulane Ave, New Orleans 70112, USA
| |
Collapse
|
15
|
Ferraj A, Audano PA, Balachandran P, Czechanski A, Flores JI, Radecki AA, Mosur V, Gordon DS, Walawalkar IA, Eichler EE, Reinholdt LG, Beck CR. Resolution of structural variation in diverse mouse genomes reveals chromatin remodeling due to transposable elements. CELL GENOMICS 2023; 3:100291. [PMID: 37228752 PMCID: PMC10203049 DOI: 10.1016/j.xgen.2023.100291] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 05/25/2023]
Abstract
Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.
Collapse
Affiliation(s)
- Ardian Ferraj
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | | | - Jacob I. Flores
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexander A. Radecki
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Varun Mosur
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David S. Gordon
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Isha A. Walawalkar
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Evan E. Eichler
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Christine R. Beck
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
16
|
Warkocki Z. An update on post-transcriptional regulation of retrotransposons. FEBS Lett 2023; 597:380-406. [PMID: 36460901 DOI: 10.1002/1873-3468.14551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/09/2022] [Accepted: 11/18/2022] [Indexed: 12/04/2022]
Abstract
Retrotransposons, including LINE-1, Alu, SVA, and endogenous retroviruses, are one of the major constituents of human genomic repetitive sequences. Through the process of retrotransposition, some of them occasionally insert into new genomic locations by a copy-paste mechanism involving RNA intermediates. Irrespective of de novo genomic insertions, retrotransposon expression can lead to DNA double-strand breaks and stimulate cellular innate immunity through endogenous patterns. As a result, retrotransposons are tightly regulated by multi-layered regulatory processes to prevent the dangerous effects of their expression. In recent years, significant progress was made in revealing how retrotransposon biology intertwines with general post-transcriptional RNA metabolism. Here, I summarize current knowledge on the involvement of post-transcriptional factors in the biology of retrotransposons, focusing on LINE-1. I emphasize general RNA metabolisms such as methylation of adenine (m6 A), RNA 3'-end polyadenylation and uridylation, RNA decay and translation regulation. I discuss the effects of retrotransposon RNP sequestration in cytoplasmic bodies and autophagy. Finally, I summarize how innate immunity restricts retrotransposons and how retrotransposons make use of cellular enzymes, including the DNA repair machinery, to complete their replication cycles.
Collapse
Affiliation(s)
- Zbigniew Warkocki
- Department of RNA Metabolism, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
17
|
Smits N, Faulkner GJ. Nanopore Sequencing to Identify Transposable Element Insertions and Their Epigenetic Modifications. Methods Mol Biol 2023; 2607:151-171. [PMID: 36449163 DOI: 10.1007/978-1-0716-2883-6_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Over the past 20 years, high-throughput genomic assays have fundamentally changed how transposable elements (TEs) are studied. While short-read DNA sequencing has been at the heart of these efforts, novel technologies that generate longer reads are driving a shift in the field. Long-read sequencing now permits locus-specific approaches to locate individual TE insertions and understand their epigenetic and transcriptional regulation, while still profiling TE activity genome-wide. Here we provide detailed guidelines to implement Oxford Nanopore Technologies (ONT) sequencing to identify polymorphic TE insertions and profile TE epigenetic landscapes. Using human long interspersed element-1 (LINE-1, L1) as an example, we explain the procedures involved, including final visualization, and potential bottlenecks and pitfalls. ONT sequencing will be, in our view, a workhorse technology for the foreseeable future in the TE field.
Collapse
Affiliation(s)
- Nathan Smits
- Mater Research Institute, University of Queensland, Woolloongabba, QLD, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, QLD, Australia.
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
18
|
Gerdes P, Lim SM, Ewing AD, Larcombe MR, Chan D, Sanchez-Luque FJ, Walker L, Carleton AL, James C, Knaupp AS, Carreira PE, Nefzger CM, Lister R, Richardson SR, Polo JM, Faulkner GJ. Retrotransposon instability dominates the acquired mutation landscape of mouse induced pluripotent stem cells. Nat Commun 2022; 13:7470. [PMID: 36463236 PMCID: PMC9719517 DOI: 10.1038/s41467-022-35180-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 11/22/2022] [Indexed: 12/04/2022] Open
Abstract
Induced pluripotent stem cells (iPSCs) can in principle differentiate into any cell of the body, and have revolutionized biomedical research and regenerative medicine. Unlike their human counterparts, mouse iPSCs (miPSCs) are reported to silence transposable elements and prevent transposable element-mediated mutagenesis. Here we apply short-read or Oxford Nanopore Technologies long-read genome sequencing to 38 bulk miPSC lines reprogrammed from 10 parental cell types, and 18 single-cell miPSC clones. While single nucleotide variants and structural variants restricted to miPSCs are rare, we find 83 de novo transposable element insertions, including examples intronic to Brca1 and Dmd. LINE-1 retrotransposons are profoundly hypomethylated in miPSCs, beyond other transposable elements and the genome overall, and harbor alternative protein-coding gene promoters. We show that treatment with the LINE-1 inhibitor lamivudine does not hinder reprogramming and efficiently blocks endogenous retrotransposition, as detected by long-read genome sequencing. These experiments reveal the complete spectrum and potential significance of mutations acquired by miPSCs.
Collapse
Affiliation(s)
- Patricia Gerdes
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Sue Mei Lim
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Adam D. Ewing
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Michael R. Larcombe
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Dorothy Chan
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Francisco J. Sanchez-Luque
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia ,grid.418805.00000 0004 0500 8423GENYO. Pfizer-University of Granada-Andalusian Government Centre for Genomics and Oncological Research, PTS, Granada, 18016 Spain
| | - Lucinda Walker
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Alexander L. Carleton
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Cini James
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Anja S. Knaupp
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Patricia E. Carreira
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Christian M. Nefzger
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia
| | - Ryan Lister
- grid.1012.20000 0004 1936 7910Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, The University of Western Australia, Perth, WA 6009 Australia ,grid.431595.f0000 0004 0469 0045Harry Perkins Institute of Medical Research, Perth, WA 6009 Australia
| | - Sandra R. Richardson
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia
| | - Jose M. Polo
- grid.1002.30000 0004 1936 7857Department of Anatomy & Developmental Biology, Monash University, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Melbourne, VIC 3800 Australia ,grid.1002.30000 0004 1936 7857Australian Regenerative Medicine Institute, Monash University, Melbourne, VIC 3800 Australia ,grid.1010.00000 0004 1936 7304Adelaide Centre for Epigenetics and The South Australian Immunogenomics Cancer Institute, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005 Australia
| | - Geoffrey J. Faulkner
- grid.1003.20000 0000 9320 7537Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102 Australia ,grid.1003.20000 0000 9320 7537Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072 Australia
| |
Collapse
|
19
|
Bajus M, Macko-Podgórni A, Grzebelus D, Baránek M. A review of strategies used to identify transposition events in plant genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:1080993. [PMID: 36531345 PMCID: PMC9751208 DOI: 10.3389/fpls.2022.1080993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Transposable elements (TEs) were initially considered redundant and dubbed 'junk DNA'. However, more recently they were recognized as an essential element of genome plasticity. In nature, they frequently become active upon exposition of the host to stress conditions. Even though most transposition events are neutral or even deleterious, occasionally they may happen to be beneficial, resulting in genetic novelty providing better fitness to the host. Hence, TE mobilization may promote adaptability and, in the long run, act as a significant evolutionary force. There are many examples of TE insertions resulting in increased tolerance to stresses or in novel features of crops which are appealing to the consumer. Possibly, TE-driven de novo variability could be utilized for crop improvement. However, in order to systematically study the mechanisms of TE/host interactions, it is necessary to have suitable tools to globally monitor any ongoing TE mobilization. With the development of novel potent technologies, new high-throughput strategies for studying TE dynamics are emerging. Here, we present currently available methods applied to monitor the activity of TEs in plants. We divide them on the basis of their operational principles, the position of target molecules in the process of transposition and their ability to capture real cases of actively transposing elements. Their possible theoretical and practical drawbacks are also discussed. Finally, conceivable strategies and combinations of methods resulting in an improved performance are proposed.
Collapse
Affiliation(s)
- Marko Bajus
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| | - Alicja Macko-Podgórni
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Dariusz Grzebelus
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Miroslav Baránek
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| |
Collapse
|
20
|
Lee H, Min JW, Mun S, Han K. Human Retrotransposons and Effective Computational Detection Methods for Next-Generation Sequencing Data. LIFE (BASEL, SWITZERLAND) 2022; 12:life12101583. [PMID: 36295018 PMCID: PMC9605557 DOI: 10.3390/life12101583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/03/2022] [Accepted: 10/10/2022] [Indexed: 11/16/2022]
Abstract
Transposable elements (TEs) are classified into two classes according to their mobilization mechanism. Compared to DNA transposons that move by the "cut and paste" mechanism, retrotransposons mobilize via the "copy and paste" method. They have been an essential research topic because some of the active elements, such as Long interspersed element 1 (LINE-1), Alu, and SVA elements, have contributed to the genetic diversity of primates beyond humans. In addition, they can cause genetic disorders by altering gene expression and generating structural variations (SVs). The development and rapid technological advances in next-generation sequencing (NGS) have led to new perspectives on detecting retrotransposon-mediated SVs, especially insertions. Moreover, various computational methods have been developed based on NGS data to precisely detect the insertions and deletions in the human genome. Therefore, this review discusses details about the recently studied and utilized NGS technologies and the effective computational approaches for discovering retrotransposons through it. The final part covers a diverse range of computational methods for detecting retrotransposon insertions with human NGS data. This review will give researchers insights into understanding the TEs and how to investigate them and find connections with research interests.
Collapse
Affiliation(s)
- Haeun Lee
- Department of Bioconvergence Engineering, Dankook University, Yongin 16890, Korea
| | - Jun Won Min
- Department of Surgery, Dankook University College of Medicine, Cheonan 31116, Korea
| | - Seyoung Mun
- Department of Microbiology, College of Science & Technology, Dankook University, Cheonan 31116, Korea
- Center for Bio Medical Engineering Core Facility, Dankook University, Cheonan 31116, Korea
- Correspondence: (S.M.); (K.H.)
| | - Kyudong Han
- Department of Bioconvergence Engineering, Dankook University, Yongin 16890, Korea
- Department of Microbiology, College of Science & Technology, Dankook University, Cheonan 31116, Korea
- Center for Bio Medical Engineering Core Facility, Dankook University, Cheonan 31116, Korea
- HuNbiome Co., Ltd., R&D Center, Seoul 08507, Korea
- Correspondence: (S.M.); (K.H.)
| |
Collapse
|
21
|
Han S, Dias GB, Basting PJ, Viswanatha R, Perrimon N, Bergman C. Local assembly of long reads enables phylogenomics of transposable elements in a polyploid cell line. Nucleic Acids Res 2022; 50:e124. [PMID: 36156149 PMCID: PMC9757076 DOI: 10.1093/nar/gkac794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Collapse
Affiliation(s)
| | | | - Preston J Basting
- Institute of Bioinformatics, University of Georgia, 120 E. Green St., Athens, GA, USA
| | - Raghuvir Viswanatha
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA,Howard Hughes Medical Institute, Boston, MA, USA
| | - Casey M Bergman
- To whom correspondence should be addressed. Tel: +1 706 542 1764; Fax: +1 706 542 3910;
| |
Collapse
|
22
|
Billon V, Sanchez-Luque FJ, Rasmussen J, Bodea GO, Gerhardt DJ, Gerdes P, Cheetham SW, Schauer SN, Ajjikuttira P, Meyer TJ, Layman CE, Nevonen KA, Jansz N, Garcia-Perez JL, Richardson SR, Ewing AD, Carbone L, Faulkner GJ. Somatic retrotransposition in the developing rhesus macaque brain. Genome Res 2022; 32:1298-1314. [PMID: 35728967 PMCID: PMC9341517 DOI: 10.1101/gr.276451.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 06/14/2022] [Indexed: 12/03/2022]
Abstract
The retrotransposon LINE-1 (L1) is central to the recent evolutionary history of the human genome and continues to drive genetic diversity and germline pathogenesis. However, the spatiotemporal extent and biological significance of somatic L1 activity are poorly defined and are virtually unexplored in other primates. From a single L1 lineage active at the divergence of apes and Old World monkeys, successive L1 subfamilies have emerged in each descendant primate germline. As revealed by case studies, the presently active human L1 subfamily can also mobilize during embryonic and brain development in vivo. It is unknown whether nonhuman primate L1s can similarly generate somatic insertions in the brain. Here we applied approximately 40× single-cell whole-genome sequencing (scWGS), as well as retrotransposon capture sequencing (RC-seq), to 20 hippocampal neurons from two rhesus macaques (Macaca mulatta). In one animal, we detected and PCR-validated a somatic L1 insertion that generated target site duplications, carried a short 5′ transduction, and was present in ∼7% of hippocampal neurons but absent from cerebellum and nonbrain tissues. The corresponding donor L1 allele was exceptionally mobile in vitro and was embedded in PRDM4, a gene expressed throughout development and in neural stem cells. Nanopore long-read methylome and RNA-seq transcriptome analyses indicated young retrotransposon subfamily activation in the early embryo, followed by repression in adult tissues. These data highlight endogenous macaque L1 retrotransposition potential, provide prototypical evidence of L1-mediated somatic mosaicism in a nonhuman primate, and allude to L1 mobility in the brain over the past 30 million years of human evolution.
Collapse
|
23
|
Niu Y, Teng X, Zhou H, Shi Y, Li Y, Tang Y, Zhang P, Luo H, Kang Q, Xu T, He S. Characterizing mobile element insertions in 5675 genomes. Nucleic Acids Res 2022; 50:2493-2508. [PMID: 35212372 PMCID: PMC8934628 DOI: 10.1093/nar/gkac128] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 02/07/2022] [Accepted: 02/11/2022] [Indexed: 12/30/2022] Open
Abstract
Mobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map of 36 699 non-reference MEIs constructed from 5675 genomes, comprising 2998 Chinese samples (∼26.2×, NyuWa) and 2677 samples from the 1000 Genomes Project (∼7.4×, 1KGP). We discovered that LINE-1 insertions were highly enriched in centromere regions, implying the role of chromosome context in retroelement insertion. After functional annotation, we estimated that MEIs are responsible for about 9.3% of all protein-truncating events per genome. Finally, we built a companion database named HMEID for public use. This resource represents the latest and largest genomewide study on MEIs and will have broad utility for exploration of human MEI findings.
Collapse
Affiliation(s)
- Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xueyi Teng
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Honghong Zhou
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiheng Tang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Quan Kang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
24
|
A Cross-sectional Study About Nurses' and Physicians' Experience of Disaster Management Preparedness Throughout COVID-19. Disaster Med Public Health Prep 2022; 17:e125. [PMID: 35152935 PMCID: PMC9021579 DOI: 10.1017/dmp.2022.34] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
OBJECTIVE The aim of this study was to assess and compare nurses' and physicians' knowledge of disaster management preparedness. An effective health-care system response to various disasters is paramount, and nurses and physicians must be prepared with appropriate competencies to be able to manage the disaster events. METHODS This is a cross-sectional study. A total of 636 nurses and 257 physicians were recruited from 1 hospital in Saudi Arabia. Of them, 608 (95.6%) nurses and 228 (83.2%) physicians completed self-administered, online questionnaires. The questionnaire assessed participants' sociodemographic data, and disaster management knowledge. RESULTS The findings revealed that participants had more knowledge regarding the disaster preparedness stage than mitigation and recovery stages. They also reported a need for advanced disaster training areas. A total of 10.1% of nurses' and 15.6% of physicians' overall knowledge is explained by their demographic and work-related characteristics. CONCLUSIONS Both nurses and physicians had to some extent knowledge regarding the information and practices required for disaster management process. It is proposed that hospital managers must look for opportunities to effectively adopt national standards to manage disasters and include nurses and physicians in major-related learning activities because experience has suggested a somewhat low overall perceived competence in managing disaster situations.
Collapse
|
25
|
Song R, Wang Z, Wang H, Zhang H, Wang X, Nguyen H, Holding D, Yu B, Clemente T, Jia S, Zhang C. InMut-finder: a software tool for insertion identification in mutagenesis using Nanopore long reads. BMC Genomics 2021; 22:908. [PMID: 34923956 PMCID: PMC8684674 DOI: 10.1186/s12864-021-08206-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/24/2021] [Indexed: 11/24/2022] Open
Abstract
Background Biological mutagens (such as transposon) with sequences inserted, play a crucial role to link observed phenotype and genotype in reverse genetic studies. For this reason, accurate and efficient software tools for identifying insertion sites based on the analysis of sequencing reads are desired. Results We developed a bioinformatics tool, a Finder, to identify genome-wide Insertions in Mutagenesis (named as “InMut-Finder”), based on target sequences and flanking sequences from long reads, such as Oxford Nanopore Sequencing. InMut-Finder succeeded in identify > 100 insertion sites in Medicago truncatula and soybean mutants based on sequencing reads of whole-genome DNA or enriched insertion-site DNA fragments. Insertion sites discovered by InMut-Finder were validated by PCR experiments. Conclusion InMut-Finder is a comprehensive and powerful tool for automated insertion detection from Nanopore long reads. The simplicity, efficiency, and flexibility of InMut-Finder make it a valuable tool for functional genomics and forward and reverse genetics. InMut-Finder was implemented with Perl, R, and Shell scripts, which are independent of the OS. The source code and instructions can be accessed at https://github.com/jsg200830/InMut-Finder. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08206-9.
Collapse
Affiliation(s)
- Rui Song
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ziyao Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Hui Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Han Zhang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xuemeng Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Hanh Nguyen
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - David Holding
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA.,Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Bin Yu
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,School of Biological Sciences, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - Tom Clemente
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - Shangang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Chi Zhang
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA. .,School of Biological Sciences, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA.
| |
Collapse
|
26
|
Kirov I, Merkulov P, Dudnikov M, Polkhovskaya E, Komakhin RA, Konstantinov Z, Gvaramiya S, Ermolaev A, Kudryavtseva N, Gilyok M, Divashuk MG, Karlov GI, Soloviev A. Transposons Hidden in Arabidopsis thaliana Genome Assembly Gaps and Mobilization of Non-Autonomous LTR Retrotransposons Unravelled by Nanotei Pipeline. PLANTS (BASEL, SWITZERLAND) 2021; 10:2681. [PMID: 34961152 PMCID: PMC8704663 DOI: 10.3390/plants10122681] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 06/12/2023]
Abstract
Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Pavel Merkulov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Ekaterina Polkhovskaya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Roman A. Komakhin
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Zakhar Konstantinov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Sofya Gvaramiya
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Aleksey Ermolaev
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Natalya Kudryavtseva
- Center of Molecular Biotechnology, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, 127550 Moscow, Russia; (A.E.); (N.K.)
| | - Marina Gilyok
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Mikhail G. Divashuk
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
- Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia
| | - Gennady I. Karlov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| | - Alexander Soloviev
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, 127550 Moscow, Russia; (P.M.); (M.D.); (E.P.); (R.A.K.); (Z.K.); (S.G.); (M.G.); (M.G.D.); (G.I.K.); (A.S.)
| |
Collapse
|
27
|
Borges-Monroy R, Chu C, Dias C, Choi J, Lee S, Gao Y, Shin T, Park PJ, Walsh CA, Lee EA. Whole-genome analysis reveals the contribution of non-coding de novo transposon insertions to autism spectrum disorder. Mob DNA 2021; 12:28. [PMID: 34838103 PMCID: PMC8627061 DOI: 10.1186/s13100-021-00256-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 11/02/2021] [Indexed: 12/30/2022] Open
Abstract
Background Retrotransposons have been implicated as causes of Mendelian disease, but their role in autism spectrum disorder (ASD) has not been systematically defined, because they are only called with adequate sensitivity from whole genome sequencing (WGS) data and a large enough cohort for this analysis has only recently become available. Results We analyzed WGS data from a cohort of 2288 ASD families from the Simons Simplex Collection by establishing a scalable computational pipeline for retrotransposon insertion detection. We report 86,154 polymorphic retrotransposon insertions—including > 60% not previously reported—and 158 de novo retrotransposition events. The overall burden of de novo events was similar between ASD individuals and unaffected siblings, with 1 de novo insertion per 29, 117, and 206 births for Alu, L1, and SVA respectively, and 1 de novo insertion per 21 births total. However, ASD cases showed more de novo L1 insertions than expected in ASD genes. Additionally, we observed exonic insertions in loss-of-function intolerant genes, including a likely pathogenic exonic insertion in CSDE1, only in ASD individuals. Conclusions These findings suggest a modest, but important, impact of intronic and exonic retrotransposon insertions in ASD, show the importance of WGS for their analysis, and highlight the utility of specific bioinformatic tools for high-throughput detection of retrotransposon insertions. Supplementary Information The online version contains supplementary material available at 10.1186/s13100-021-00256-w.
Collapse
Affiliation(s)
- Rebeca Borges-Monroy
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Caroline Dias
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.,Division of Developmental Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jaejoon Choi
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Department of Genetics, Harvard Medical School, MA, Boston, USA
| | - Soohyun Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yue Gao
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Department of Pediatrics, Harvard Medical School, MA, Boston, USA
| | - Taehwan Shin
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Department of Pediatrics, Harvard Medical School, MA, Boston, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christopher A Walsh
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA. .,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA. .,Department of Pediatrics, Harvard Medical School, MA, Boston, USA. .,Department of Neurology, Harvard Medical School, Boston, MA, USA. .,Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA.
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA. .,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA. .,Department of Pediatrics, Harvard Medical School, MA, Boston, USA.
| |
Collapse
|
28
|
Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 439] [Impact Index Per Article: 146.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
|
29
|
Bao Y, Wadden J, Erb-Downward JR, Ranjan P, Zhou W, McDonald TL, Mills RE, Boyle AP, Dickson RP, Blaauw D, Welch JD. SquiggleNet: real-time, direct classification of nanopore signals. Genome Biol 2021; 22:298. [PMID: 34706748 PMCID: PMC8548853 DOI: 10.1186/s13059-021-02511-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 10/04/2021] [Indexed: 11/17/2022] Open
Abstract
We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.
Collapse
Affiliation(s)
- Yuwei Bao
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jack Wadden
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, 48109, MI, USA
- Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, 48109, MI, USA
| | - John R Erb-Downward
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, USA
| | - Piyush Ranjan
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Torrin L McDonald
- Department of Human Genetics, University of Michigan Medical, Ann Arbor, 48109, MI, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109, MI, USA
- Department of Human Genetics, University of Michigan Medical, Ann Arbor, 48109, MI, USA
| | - Alan P Boyle
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109, MI, USA
- Department of Human Genetics, University of Michigan Medical, Ann Arbor, 48109, MI, USA
| | - Robert P Dickson
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, USA
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, 48109, MI, USA
- Michigan Center for Integrative Research in Critical Care, Ann Arbor, 48109, MI, USA
| | - David Blaauw
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, MI, USA
| | - Joshua D Welch
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, 48109, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109, MI, USA.
| |
Collapse
|
30
|
Wong JS, Jadhav T, Young E, Wang Y, Xiao M. Characterization of full-length LINE-1 insertions in 154 genomes. Genomics 2021; 113:3804-3810. [PMID: 34534648 DOI: 10.1016/j.ygeno.2021.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 08/18/2021] [Accepted: 09/11/2021] [Indexed: 10/20/2022]
Abstract
Long interspersed nuclear elements (LINEs) are retrotransposons that contribute to genetic variation in the human genome. LINE-1 elements in larger-scale studies are challenging to identify using sequencing technologies due to cost and scalability. We developed an approach using optical mapping for detection of full-length LINE-1 insertions and 10× sequencing for confirmation. We found 51 true positive full-length LINE-1 insertions, of which 4 are novel insertions, in NA12878. Repeating our analysis on a larger sample set representing 26 populations, we identified 329 full-length LINE-1 elements, of which 123 are novel. 24.8% of these 329 LINE-1 insertions were shared amongst all 5 superpopulations (AFR, AMR, EUR, EAS, SAS). The African superpopulation has a higher percentage of population-specific LINE-1 insertions than any other superpopulation. These data indicate that our approach can provide high-speed, cost-effective, and increased accuracy for LINE-1 detection. These data also provide an insight into variations of LINE-1 elements between different populations.
Collapse
Affiliation(s)
- Jessica S Wong
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Tanaya Jadhav
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Eleanor Young
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Yilin Wang
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America
| | - Ming Xiao
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, United States of America; Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, United States of America.
| |
Collapse
|
31
|
Cortés-Llanos B, Wang Y, Sims CE, Allbritton NL. A technology of a different sort: microraft arrays. LAB ON A CHIP 2021; 21:3204-3218. [PMID: 34346456 PMCID: PMC8387436 DOI: 10.1039/d1lc00506e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A common procedure performed throughout biomedical research is the selection and isolation of biological entities such as organelles, cells and organoids from a mixed population. In this review, we describe the development and application of microraft arrays, an analysis and isolation platform which enables a vast range of criteria and strategies to be used when separating biological entities. The microraft arrays are comprised of elastomeric microwells with detachable polymer bases (microrafts) that act as capture and culture sites as well as supporting carriers during cell isolation. The technology is elegant in its simplicity and can be implemented for samples possessing tens to millions of objects yielding a flexible platform for applications such as single-cell RNA sequencing, subcellular organelle capture and assay, high-throughput screening and development of CRISPR gene-edited cell lines, and organoid manipulation and selection. The transparent arrays are compatible with a multitude of imaging modalities enabling selection based on 2D or 3D spatial phenotypes or temporal properties. Each microraft can be individually isolated on demand with retention of high viability due to the near zero hydrodynamic stress imposed upon the cells during microraft release, capture and deposition. The platform has been utilized as a simple manual add-on to a standard microscope or incorporated into fully automated instruments that implement state-of-the-art imaging algorithms and machine learning. The vast array of selection criteria enables separations not possible with conventional sorting methods, thus garnering widespread interest in the biological and pharmaceutical sciences.
Collapse
|
32
|
Watkins WS, Feusier JE, Thomas J, Goubert C, Mallick S, Jorde LB. The Simons Genome Diversity Project: A Global Analysis of Mobile Element Diversity. Genome Biol Evol 2021; 12:779-794. [PMID: 32359137 PMCID: PMC7290288 DOI: 10.1093/gbe/evaa086] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2020] [Indexed: 12/30/2022] Open
Abstract
Ongoing retrotransposition of Alu, LINE-1, and SINE–VNTR–Alu elements generates diversity and variation among human populations. Previous analyses investigating the population genetics of mobile element insertions (MEIs) have been limited by population ascertainment bias or by relatively small numbers of populations and low sequencing coverage. Here, we use 296 individuals representing 142 global populations from the Simons Genome Diversity Project (SGDP) to discover and characterize MEI diversity from deeply sequenced whole-genome data. We report 5,742 MEIs not originally reported by the 1000 Genomes Project and show that high sampling diversity leads to a 4- to 7-fold increase in MEI discovery rates over the original 1000 Genomes Project data. As a result of negative selection, nonreference polymorphic MEIs are underrepresented within genes, and MEIs within genes are often found in the transcriptional orientation opposite that of the gene. Globally, 80% of Alu subfamilies predate the expansion of modern humans from Africa. Polymorphic MEIs show heterozygosity gradients that decrease from Africa to Eurasia to the Americas, and the number of MEIs found uniquely in a single individual are also distributed in this general pattern. The maximum fraction of MEI diversity partitioned among the seven major SGDP population groups (FST) is 7.4%, similar to, but slightly lower than, previous estimates and likely attributable to the diverse sampling strategy of the SGDP. Finally, we utilize these MEIs to extrapolate the primary Native American shared ancestry component to back to Asia and provide new evidence from genome-wide identical-by-descent genetic markers that add additional support for a southeastern Siberian origin for most Native Americans.
Collapse
Affiliation(s)
| | | | - Jainy Thomas
- Department of Human Genetics, University of Utah
| | - Clement Goubert
- Department of Molecular Biology and Genetics, Cornell University
| | - Swapon Mallick
- Department of Genetics, Harvard Medical School, Boston, Massachusetts
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah
| |
Collapse
|
33
|
Uppuluri L, Jadhav T, Wang Y, Xiao M. Multicolor Whole-Genome Mapping in Nanochannels for Genetic Analysis. Anal Chem 2021; 93:9808-9816. [PMID: 34232611 DOI: 10.1021/acs.analchem.1c01373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Analysis of structural variations (SVs) is important to understand mutations underlying genetic disorders and pathogenic conditions. However, characterizing SVs using short-read, high-throughput sequencing technology is difficult. Although long-read sequencing technologies are being increasingly employed in characterizing SVs, their low throughput and high costs discourage widespread adoption. Sequence motif-based optical mapping in nanochannels is useful in whole-genome mapping and SV detection, but it is not possible to precisely locate the breakpoints or estimate the copy numbers. We present here a universal multicolor mapping strategy in nanochannels combining conventional sequence-motif labeling system with Cas9-mediated target-specific labeling of any 20-base sequences (20mers) to create custom labels and detect new features. The sequence motifs are labeled with green fluorophores and the 20mers are labeled with red fluorophores. Using this strategy, it is possible to not only detect the SVs but also utilize custom labels to interrogate the features not accessible to motif-labeling, locate breakpoints, and precisely estimate copy numbers of genomic repeats. We validated our approach by quantifying the D4Z4 copy numbers, a known biomarker for facioscapulohumeral muscular dystrophy (FSHD) and estimating the telomere length, a clinical biomarker for assessing disease risk factors in aging-related diseases and malignant cancers. We also demonstrate the application of our methodology in discovering transposable long non-interspersed Elements 1 (LINE-1) insertions across the whole genome.
Collapse
Affiliation(s)
- Lahari Uppuluri
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Tanaya Jadhav
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Yilin Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States.,Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
34
|
Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun 2021; 12:3836. [PMID: 34158502 PMCID: PMC8219666 DOI: 10.1038/s41467-021-24041-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 05/27/2021] [Indexed: 02/05/2023] Open
Abstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Collapse
|
35
|
McDonald TL, Zhou W, Castro CP, Mumm C, Switzenberg JA, Mills RE, Boyle AP. Cas9 targeted enrichment of mobile elements using nanopore sequencing. Nat Commun 2021; 12:3586. [PMID: 34117247 PMCID: PMC8196195 DOI: 10.1038/s41467-021-23918-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 05/25/2021] [Indexed: 02/05/2023] Open
Abstract
Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.
Collapse
Affiliation(s)
- Torrin L McDonald
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Christopher P Castro
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Camille Mumm
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Jessica A Switzenberg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Alan P Boyle
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
36
|
Jansz N, Faulkner GJ. Endogenous retroviruses in the origins and treatment of cancer. Genome Biol 2021; 22:147. [PMID: 33971937 PMCID: PMC8108463 DOI: 10.1186/s13059-021-02357-4] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 04/21/2021] [Indexed: 02/07/2023] Open
Abstract
Endogenous retroviruses (ERVs) are emerging as promising therapeutic targets in cancer. As remnants of ancient retroviral infections, ERV-derived regulatory elements coordinate expression from gene networks, including those underpinning embryogenesis and immune cell function. ERV activation can promote an interferon response, a phenomenon termed viral mimicry. Although ERV expression is associated with cancer, and provisionally with autoimmune and neurodegenerative diseases, ERV-mediated inflammation is being explored as a way to sensitize tumors to immunotherapy. Here we review ERV co-option in development and innate immunity, the aberrant contribution of ERVs to tumorigenesis, and the wider biomedical potential of therapies directed at ERVs.
Collapse
Affiliation(s)
- Natasha Jansz
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD, 4102, Australia.
| | - Geoffrey J Faulkner
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD, 4102, Australia. .,Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
37
|
Zhao X, Collins RL, Lee WP, Weber AM, Jun Y, Zhu Q, Weisburd B, Huang Y, Audano PA, Wang H, Walker M, Lowther C, Fu J, Gerstein MB, Devine SE, Marschall T, Korbel JO, Eichler EE, Chaisson MJP, Lee C, Mills RE, Brand H, Talkowski ME. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am J Hum Genet 2021; 108:919-928. [PMID: 33789087 PMCID: PMC8206509 DOI: 10.1016/j.ajhg.2021.03.014] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/12/2021] [Indexed: 12/13/2022] Open
Abstract
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Collapse
Affiliation(s)
- Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Ryan L Collins
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Wan-Ping Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Alexandra M Weber
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Yukyung Jun
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Yongqing Huang
- Data Sciences Platform, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Harold Wang
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Mark Walker
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Chelsea Lowther
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Jack Fu
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Mark B Gerstein
- Yale University Medical School, Computational Biology and Bioinformatics Program, New Haven, CT 06520, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Department of Graduate Studies - Life Sciences, Ewha Womans University, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, South Korea; Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an 710061, Shaanxi, People's Republic of China
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics and Stanley Center for Psychiatric Disorders, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Medical Sciences, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
38
|
Kojima S, Kamada AJ, Parrish NF. Virus-derived variation in diverse human genomes. PLoS Genet 2021; 17:e1009324. [PMID: 33901175 PMCID: PMC8101998 DOI: 10.1371/journal.pgen.1009324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 05/06/2021] [Accepted: 03/25/2021] [Indexed: 11/19/2022] Open
Abstract
Acquisition of genetic material from viruses by their hosts can generate inter-host structural genome variation. We developed computational tools enabling us to study virus-derived structural variants (SVs) in population-scale whole genome sequencing (WGS) datasets and applied them to 3,332 humans. Although SVs had already been cataloged in these subjects, we found previously-overlooked virus-derived SVs. We detected non-germline SVs derived from squirrel monkey retrovirus (SMRV), human immunodeficiency virus 1 (HIV-1), and human T lymphotropic virus (HTLV-1); these variants are attributable to infection of the sequenced lymphoblastoid cell lines (LCLs) or their progenitor cells and may impact gene expression results and the biosafety of experiments using these cells. In addition, we detected new heritable SVs derived from human herpesvirus 6 (HHV-6) and human endogenous retrovirus-K (HERV-K). We report the first solo-direct repeat (DR) HHV-6 likely to reflect DR rearrangement of a known full-length endogenous HHV-6. We used linkage disequilibrium between single nucleotide variants (SNVs) and variants in reads that align to HERV-K, which often cannot be mapped uniquely using conventional short-read sequencing analysis methods, to locate previously-unknown polymorphic HERV-K loci. Some of these loci are tightly linked to trait-associated SNVs, some are in complex genome regions inaccessible by prior methods, and some contain novel HERV-K haplotypes likely derived from gene conversion from an unknown source or introgression. These tools and results broaden our perspective on the coevolution between viruses and humans, including ongoing virus-to-human gene transfer contributing to genetic variation between humans.
Collapse
Affiliation(s)
- Shohei Kojima
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Anselmo Jiro Kamada
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
| | - Nicholas F. Parrish
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences and RIKEN Cluster for Pioneering Research, Yokohama, Japan
- * E-mail:
| |
Collapse
|
39
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 289] [Impact Index Per Article: 96.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
40
|
Wang Y, Bae T, Thorpe J, Sherman MA, Jones AG, Cho S, Daily K, Dou Y, Ganz J, Galor A, Lobon I, Pattni R, Rosenbluh C, Tomasi S, Tomasini L, Yang X, Zhou B, Akbarian S, Ball LL, Bizzotto S, Emery SB, Doan R, Fasching L, Jang Y, Juan D, Lizano E, Luquette LJ, Moldovan JB, Narurkar R, Oetjens MT, Rodin RE, Sekar S, Shin JH, Soriano E, Straub RE, Zhou W, Chess A, Gleeson JG, Marquès-Bonet T, Park PJ, Peters MA, Pevsner J, Walsh CA, Weinberger DR, Vaccarino FM, Moran JV, Urban AE, Kidd JM, Mills RE, Abyzov A. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol 2021; 22:92. [PMID: 33781308 PMCID: PMC8006362 DOI: 10.1186/s13059-021-02285-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 02/01/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. RESULTS Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. CONCLUSIONS This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.
Collapse
Affiliation(s)
- Yifan Wang
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Taejeong Bae
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jeremy Thorpe
- Program in Biochemistry, Cellular and Molecular Biology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Maxwell A Sherman
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- MIT Department of Electrical Engineering and Computer Science, Cambridge, MA, USA
| | - Attila G Jones
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Sean Cho
- Department of Neurology, Kennedy Krieger Institute, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Present Address: Arcus Biosciences, Hayward, CA, 94545, USA
| | | | - Yanmei Dou
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Javier Ganz
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alon Galor
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Irene Lobon
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
- Department of Cell Biology, Physiology and Immunology, and Institute of Neurosciences, University of Barcelona, 08028, Barcelona, Spain
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Chaggai Rosenbluh
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Simone Tomasi
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Livia Tomasini
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Xiaoxu Yang
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Schahram Akbarian
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laurel L Ball
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Sara Bizzotto
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Ryan Doan
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Liana Fasching
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Yeongjun Jang
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
| | - Esther Lizano
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
| | - Lovelace J Luquette
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Moldovan
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Rujuta Narurkar
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
| | - Matthew T Oetjens
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Rachel E Rodin
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Shobana Sekar
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Eduardo Soriano
- Department of Cell Biology, Physiology and Immunology, and Institute of Neurosciences, University of Barcelona, 08028, Barcelona, Spain
- Vall d'Hebron Institut de Recerca, 08035, Barcelona, Spain
- Centro de Investigación en Red sobre Enfermedades Neurodegenerativas (CIBERNED), 28031, Madrid, Spain
- ICREA Academia, 08010 Barcelona, Spain
| | - Richard E Straub
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Andrew Chess
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technologies, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph G Gleeson
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Tomas Marquès-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), PRBB, 08003, Barcelona, Catalonia, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Barcelona, Spain
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Jonathan Pevsner
- Department of Neurology, Kennedy Krieger Institute, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Christopher A Walsh
- Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, 02115, USA
- Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Daniel R Weinberger
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Flora M Vaccarino
- Child Study Center, Yale University, New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University, New Haven, 06520, CT, USA
| | - John V Moran
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, CA, 94305, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI, 48109, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
41
|
Halo JV, Pendleton AL, Shen F, Doucet AJ, Derrien T, Hitte C, Kirby LE, Myers B, Sliwerska E, Emery S, Moran JV, Boyko AR, Kidd JM. Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes. Proc Natl Acad Sci U S A 2021; 118:e2016274118. [PMID: 33836575 PMCID: PMC7980453 DOI: 10.1073/pnas.2016274118] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3' end of LINE-1_Cfs (i.e., LINE-1_Cf 3'-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
Collapse
Affiliation(s)
- Julia V Halo
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Amanda L Pendleton
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Feichen Shen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Aurélien J Doucet
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
- Université Côte d'Azur, CNRS, INSERM, Institut de Recherche sur le Cancer et le Vieillissement de Nice, F-06100 Nice, France
| | - Thomas Derrien
- Université de Rennes 1, CNRS, Institut de Génétique et Développement de Rennes-UMR 6290, F-35000 Rennes, France
| | - Christophe Hitte
- Université de Rennes 1, CNRS, Institut de Génétique et Développement de Rennes-UMR 6290, F-35000 Rennes, France
| | - Laura E Kirby
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Bridget Myers
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Elzbieta Sliwerska
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Sarah Emery
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - John V Moran
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109
| | - Adam R Boyko
- Department of Biomedical Sciences, Cornell University, Ithaca, NY 14850
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109;
- Department Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
42
|
Liu S, Gao G, Layer RM, Thorgaard GH, Wiens GD, Leeds TD, Martin KE, Palti Y. Identification of High-Confidence Structural Variants in Domesticated Rainbow Trout Using Whole-Genome Sequencing. Front Genet 2021; 12:639355. [PMID: 33732289 PMCID: PMC7959816 DOI: 10.3389/fgene.2021.639355] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022] Open
Abstract
Genomic structural variants (SVs) are a major source of genetic and phenotypic variation but have not been investigated systematically in rainbow trout (Oncorhynchus mykiss), an important aquaculture species of cold freshwater. The objectives of this study were 1) to identify and validate high-confidence SVs in rainbow trout using whole-genome re-sequencing; and 2) to examine the contribution of transposable elements (TEs) to SVs in rainbow trout. A total of 96 rainbow trout, including 11 homozygous lines and 85 outbred fish from three breeding populations, were whole-genome sequenced with an average genome coverage of 17.2×. Putative SVs were identified using the program Smoove which integrates LUMPY and other associated tools into one package. After rigorous filtering, 13,863 high-confidence SVs were identified. Pacific Biosciences long-reads of Arlee, one of the homozygous lines used for SV detection, validated 98% (3,948 of 4,030) of the high-confidence SVs identified in the Arlee homozygous line. Based on principal component analysis, the 85 outbred fish clustered into three groups consistent with their populations of origin, further indicating that the high-confidence SVs identified in this study are robust. The repetitive DNA content of the high-confidence SV sequences was 86.5%, which is much higher than the 57.1% repetitive DNA content of the reference genome, and is also higher than the repetitive DNA content of Atlantic salmon SVs reported previously. TEs thus contribute substantially to SVs in rainbow trout as TEs make up the majority of repetitive sequences. Hundreds of the high-confidence SVs were annotated as exon-loss or gene-fusion variants, and may have phenotypic effects. The high-confidence SVs reported in this study provide a foundation for further rainbow trout SV studies.
Collapse
Affiliation(s)
- Sixin Liu
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, Kearneysville, WV, United States
| | - Guangtu Gao
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, Kearneysville, WV, United States
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, United States.,Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States
| | - Gary H Thorgaard
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA, United States
| | - Gregory D Wiens
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, Kearneysville, WV, United States
| | - Timothy D Leeds
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, Kearneysville, WV, United States
| | | | - Yniv Palti
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, Kearneysville, WV, United States
| |
Collapse
|
43
|
Rusetska N, Kober P, Król SK, Boresowicz J, Maksymowicz M, Kunicki J, Bonicki W, Bujko M. Invasive and Noninvasive Nonfunctioning Gonadotroph Pituitary Tumors Differ in DNA Methylation Level of LINE-1 Repetitive Elements. J Clin Med 2021; 10:560. [PMID: 33546126 PMCID: PMC7913198 DOI: 10.3390/jcm10040560] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 12/11/2022] Open
Abstract
PURPOSE Epigenetic dysregulation plays a role in pituitary tumor pathogenesis. Some differences in DNA methylation were observed between invasive and noninvasive nonfunctioning gonadotroph tumors. This study sought to determine the role of DNA methylation changes in repetitive LINE-1 elements in nonfunctioning gonadotroph pituitary tumors. METHODS We investigated LINE-1 methylation levels in 80 tumors and normal pituitary glands with bisulfite-pyrosequencing. Expression of two LINE-1 open reading frames (L1-ORF1 and L1-ORF2) was analyzed with qRT-PCR in tumor samples and mouse gonadotroph pituitary cells treated with DNA methyltransferase inhibitor. Immunohistochemical staining against L1-ORF1p was also performed in normal pituitary glands and tumors. RESULTS Hypomethylation of LINE-1 was observed in pituitary tumors. Tumors characterized by invasive growth revealed lower LINE-1 methylation level than noninvasive ones. LINE-1 methylation correlated with overall DNA methylation assessed with HM450K arrays and negatively correlated with L1-ORF1 and L1-ORF2 expression. Treatment of αT3-1 gonadotroph cells with 5-Azacytidine clearly increased the level of L1-ORF1 and L1-ORF2 mRNA; however, its effect on LβT2 cells was less pronounced. Immunoreactivity against L1-ORF1p was higher in tumors than normal tissue. No difference in L1-ORF1p expression was observed in invasive and noninvasive tumors. CONCLUSION Hypomethylation of LINE-1 is related to invasive growth and influences transcriptional activity of transposable elements.
Collapse
Affiliation(s)
- Natalia Rusetska
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (N.R.); (P.K.); (S.K.K.); (J.B.)
| | - Paulina Kober
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (N.R.); (P.K.); (S.K.K.); (J.B.)
| | - Sylwia Katarzyna Król
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (N.R.); (P.K.); (S.K.K.); (J.B.)
| | - Joanna Boresowicz
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (N.R.); (P.K.); (S.K.K.); (J.B.)
| | - Maria Maksymowicz
- Department of Pathology and Laboratory Diagnostics, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland;
| | - Jacek Kunicki
- Department of Neurosurgery, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (J.K.); (W.B.)
| | - Wiesław Bonicki
- Department of Neurosurgery, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (J.K.); (W.B.)
| | - Mateusz Bujko
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland; (N.R.); (P.K.); (S.K.K.); (J.B.)
| |
Collapse
|
44
|
Zhu X, Zhou B, Pattni R, Gleason K, Tan C, Kalinowski A, Sloan S, Fiston-Lavier AS, Mariani J, Petrov D, Barres BA, Duncan L, Abyzov A, Vogel H, Moran JV, Vaccarino FM, Tamminga CA, Levinson DF, Urban AE. Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nat Neurosci 2021; 24:186-196. [PMID: 33432196 PMCID: PMC8806165 DOI: 10.1038/s41593-020-00767-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 11/21/2020] [Indexed: 02/06/2023]
Abstract
Retrotransposons can cause somatic genome variation in the human nervous system, which is hypothesized to have relevance to brain development and neuropsychiatric disease. However, the detection of individual somatic mobile element insertions presents a difficult signal-to-noise problem. Using a machine-learning method (RetroSom) and deep whole-genome sequencing, we analyzed L1 and Alu retrotransposition in sorted neurons and glia from human brains. We characterized two brain-specific L1 insertions in neurons and glia from a donor with schizophrenia. There was anatomical distribution of the L1 insertions in neurons and glia across both hemispheres, indicating retrotransposition occurred during early embryogenesis. Both insertions were within the introns of genes (CNNM2 and FRMD4A) inside genomic loci associated with neuropsychiatric disorders. Proof-of-principle experiments revealed these L1 insertions significantly reduced gene expression. These results demonstrate that RetroSom has broad applications for studies of brain development and may provide insight into the possible pathological effects of somatic retrotransposition.
Collapse
Affiliation(s)
- Xiaowei Zhu
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | - Kelly Gleason
- Division of Translational Research in Schizophrenia, Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Chunfeng Tan
- Division of Translational Research in Schizophrenia, Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Agnieszka Kalinowski
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
| | - Steven Sloan
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Anna-Sophie Fiston-Lavier
- Institut des Sciences de l'Evolution de Montpellier (UMR 5554, CNRS-UM-IRD-EPHE), Université de Montpellier, Montpellier, France
| | | | - Dmitri Petrov
- Department of Biology, Stanford University, Palo Alto, CA, USA
| | - Ben A Barres
- Department of Neurobiology, Stanford University, Palo Alto, CA, USA
| | - Laramie Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Hannes Vogel
- Department of Pathology, Stanford University, Palo Alto, CA, USA
| | - John V Moran
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Flora M Vaccarino
- Child Study Center, Yale University, New Haven, CT, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Carol A Tamminga
- Division of Translational Research in Schizophrenia, Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Douglas F Levinson
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University, Palo Alto, CA, USA.
- Department of Genetics, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
45
|
Dayama G, Zhou W, Prado-Martinez J, Marques-Bonet T, Mills RE. Characterization of nuclear mitochondrial insertions in the whole genomes of primates. NAR Genom Bioinform 2020; 2:lqaa089. [PMID: 33575633 PMCID: PMC7671390 DOI: 10.1093/nargab/lqaa089] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/04/2020] [Accepted: 10/15/2020] [Indexed: 12/30/2022] Open
Abstract
The transfer and integration of whole and partial mitochondrial genomes into the nuclear genomes of eukaryotes is an ongoing process that has facilitated the transfer of genes and contributed to the evolution of various cellular pathways. Many previous studies have explored the impact of these insertions, referred to as NumtS, but have focused primarily on older events that have become fixed and are therefore present in all individual genomes for a given species. We previously developed an approach to identify novel Numt polymorphisms from next-generation sequence data and applied it to thousands of human genomes. Here, we extend this analysis to 79 individuals of other great ape species including chimpanzee, bonobo, gorilla, orang-utan and also an old world monkey, macaque. We show that recent Numt insertions are prevalent in each species though at different apparent rates, with chimpanzees exhibiting a significant increase in both polymorphic and fixed Numt sequences as compared to other great apes. We further assessed positional effects in each species in terms of evolutionary time and rate of insertion and identified putative hotspots on chromosome 5 for Numt integration, providing insight into both recent polymorphic and older fixed reference NumtS in great apes in comparison to human events.
Collapse
Affiliation(s)
- Gargi Dayama
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
46
|
Evans TA, Erwin JA. Retroelement-derived RNA and its role in the brain. Semin Cell Dev Biol 2020; 114:68-80. [PMID: 33229216 DOI: 10.1016/j.semcdb.2020.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 10/20/2020] [Accepted: 11/04/2020] [Indexed: 12/17/2022]
Abstract
Comprising ~40% of the human genome, retroelements are mobile genetic elements which are transcribed into RNA, then reverse-transcribed into DNA and inserted into a new site in the genome. Retroelements are referred to as "genetic parasites", residing among host genes and relying on host machinery for transcription and evolutionary propagation. The healthy brain has the highest expression of retroelement-derived sequences compared to other somatic tissue, which leads to the question: how does retroelement-derived RNA influence human traits and cellular states? While the functional importance of upregulating retroelement expression in the brain is an active area of research, RNA species derived from retroelements influence both self- and host gene expression by contributing to chromatin remodeling, alternative splicing, somatic mosaicism and translational repression. Here, we review the emerging evidence that the functional importance of RNA derived from retroelements is multifaceted. Retroelements can influence organismal states through the seeding of epigenetic states in chromatin, the production of structured RNA and even catalytically active ribozymes, the generation of cytoplasmic ssDNA and RNA/DNA hybrids, the production of viral-like proteins, and the generation of somatic mutations. Comparative sequencing suggests that retroelements can contribute to intraspecies variation through these mechanisms to alter transcript identity and abundance. In humans, an increasing number of neurodevelopmental and neurodegenerative conditions are associated with dysregulated retroelements, including Aicardi-Goutieres syndrome (AGS), Rett syndrome (RTT), Amyotrophic Lateral Sclerosis (ALS), Alzheimer's disease (AD), multiple sclerosis (MS), schizophrenia (SZ), and aging. Taken together, these concepts suggest a larger functional role for RNA derived from retroelements. This review aims to define retroelement-derived RNA, discuss how it impacts the mammalian genome, as well as summarize data supporting phenotypic consequences of this unique RNA subset in the brain.
Collapse
Affiliation(s)
- Taylor A Evans
- Lieber Institute for Brain Development, Baltimore, MD, USA; Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Jennifer Ann Erwin
- Lieber Institute for Brain Development, Baltimore, MD, USA; Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
47
|
Ewing AD, Smits N, Sanchez-Luque FJ, Faivre J, Brennan PM, Richardson SR, Cheetham SW, Faulkner GJ. Nanopore Sequencing Enables Comprehensive Transposable Element Epigenomic Profiling. Mol Cell 2020; 80:915-928.e5. [PMID: 33186547 DOI: 10.1016/j.molcel.2020.10.024] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 10/14/2020] [Accepted: 10/15/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) drive genome evolution and are a notable source of pathogenesis, including cancer. While CpG methylation regulates TE activity, the locus-specific methylation landscape of mobile human TEs has to date proven largely inaccessible. Here, we apply new computational tools and long-read nanopore sequencing to directly infer CpG methylation of novel and extant TE insertions in hippocampus, heart, and liver, as well as paired tumor and non-tumor liver. As opposed to an indiscriminate stochastic process, we find pronounced demethylation of young long interspersed element 1 (LINE-1) retrotransposons in cancer, often distinct to the adjacent genome and other TEs. SINE-VNTR-Alu (SVA) retrotransposons, including their internal tandem repeat-associated CpG island, are near-universally methylated. We encounter allele-specific TE methylation and demethylation of aberrantly expressed young LINE-1s in normal tissues. Finally, we recover the complete sequences of tumor-specific LINE-1 insertions and their retrotransposition hallmarks, demonstrating how long-read sequencing can simultaneously survey the epigenome and detect somatic TE mobilization.
Collapse
Affiliation(s)
- Adam D Ewing
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia.
| | - Nathan Smits
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia
| | - Francisco J Sanchez-Luque
- GENYO, Pfizer-University of Granada-Andalusian Government Centre for Genomics and Oncological Research, PTS Granada 18016, Spain; MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Jamila Faivre
- INSERM, U1193, Paul-Brousse University Hospital, Hepatobiliary Centre, Villejuif 94800, France
| | - Paul M Brennan
- Translational Neurosurgery, Centre for Clinical Brain Sciences, Edinburgh EH16 4SB, UK
| | - Sandra R Richardson
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia
| | - Seth W Cheetham
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia.
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, QLD 4102, Australia; Queensland Brain Institute, University of Queensland, St. Lucia, QLD 4067, Australia.
| |
Collapse
|
48
|
Sekar S, Tomasini L, Proukakis C, Bae T, Manlove L, Jang Y, Scuderi S, Zhou B, Kalyva M, Amiri A, Mariani J, Sedlazeck FJ, Urban AE, Vaccarino FM, Abyzov A. Complex mosaic structural variations in human fetal brains. Genome Res 2020; 30:1695-1704. [PMID: 33122304 PMCID: PMC7706730 DOI: 10.1101/gr.262667.120] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Accepted: 09/12/2020] [Indexed: 11/24/2022]
Abstract
Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to ∼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in ∼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (∼6200 vs. ∼4000 bp), implying that they may have similar functional consequences.
Collapse
Affiliation(s)
- Shobana Sekar
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Livia Tomasini
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Queen Square Institute of Neurology, University College London, London NW3 2PF, United Kingdom
| | - Taejeong Bae
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Logan Manlove
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Yeongjun Jang
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Soraya Scuderi
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Bo Zhou
- Departments of Psychiatry and Genetics, Stanford University, Palo Alto, California 94305, USA
| | - Maria Kalyva
- Department of Clinical and Movement Neurosciences, Queen Square Institute of Neurology, University College London, London NW3 2PF, United Kingdom
| | - Anahita Amiri
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Jessica Mariani
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Alexander E Urban
- Departments of Psychiatry and Genetics, Stanford University, Palo Alto, California 94305, USA
| | - Flora M Vaccarino
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| |
Collapse
|
49
|
Walsh T, Casadei S, Munson KM, Eng M, Mandell JB, Gulsuner S, King MC. CRISPR-Cas9/long-read sequencing approach to identify cryptic mutations in BRCA1 and other tumour suppressor genes. J Med Genet 2020; 58:850-852. [PMID: 33060287 PMCID: PMC8046837 DOI: 10.1136/jmedgenet-2020-107320] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 08/24/2020] [Accepted: 08/27/2020] [Indexed: 01/07/2023]
Abstract
Current clinical approaches for mutation discovery are based on short sequence reads (100-300 bp) of exons and flanking splice sites targeted by multigene panels or whole exomes. Short-read sequencing is highly accurate for detection of single nucleotide variants, small indels and simple copy number differences but is of limited use for identifying complex insertions and deletions and other structural rearrangements. We used CRISPR-Cas9 to excise complete BRCA1 and BRCA2 genomic regions from lymphoblast cells of patients with breast cancer, then sequenced these regions with long reads (>10 000 bp) to fully characterise all non-coding regions for structural variation. In a family severely affected with early-onset bilateral breast cancer and with negative (normal) results by gene panel and exome sequencing, we identified an intronic SINE-VNTR-Alu retrotransposon insertion that led to the creation of a pseudoexon in the BRCA1 message and introduced a premature truncation. This combination of CRISPR-Cas9 excision and long-read sequencing reveals a class of complex, damaging and otherwise cryptic mutations that may be particularly frequent in tumour suppressor genes replete with intronic repeats.
Collapse
Affiliation(s)
- Tom Walsh
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Silvia Casadei
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Katherine M Munson
- Department of Genome Sciences, Unversity of Washington, Seattle, Washington, USA
| | - Mary Eng
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Jessica B Mandell
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Suleyman Gulsuner
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Mary-Claire King
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
50
|
Cao X, Zhang Y, Payer LM, Lords H, Steranka JP, Burns KH, Xing J. Polymorphic mobile element insertions contribute to gene expression and alternative splicing in human tissues. Genome Biol 2020; 21:185. [PMID: 32718348 PMCID: PMC7385971 DOI: 10.1186/s13059-020-02101-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 07/14/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Mobile elements are a major source of structural variants in the human genome, and some mobile elements can regulate gene expression and transcript splicing. However, the impact of polymorphic mobile element insertions (pMEIs) on gene expression and splicing in diverse human tissues has not been thoroughly studied. The multi-tissue gene expression and whole genome sequencing data generated by the Genotype-Tissue Expression (GTEx) project provide a great opportunity to systematically evaluate the role of pMEIs in regulating gene expression in human tissues. RESULTS Using the GTEx whole genome sequencing data, we identify 20,545 high-quality pMEIs from 639 individuals. Coupling pMEI genotypes with gene expression profiles, we identify pMEI-associated expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) in 48 tissues. Using joint analyses of pMEIs and other genomic variants, pMEIs are predicted to be the potential causal variant for 3522 eQTLs and 3717 sQTLs. The pMEI-associated eQTLs and sQTLs show a high level of tissue specificity, and these pMEIs are enriched in the proximity of affected genes and in regulatory elements. Using reporter assays, we confirm that several pMEIs associated with eQTLs and sQTLs can alter gene expression levels and isoform proportions, respectively. CONCLUSION Overall, our study shows that pMEIs are associated with thousands of gene expression and splicing variations, indicating that pMEIs could have a significant role in regulating tissue-specific gene expression and transcript splicing. Detailed mechanisms for the role of pMEIs in gene regulation in different tissues will be an important direction for future studies.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Yeting Zhang
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Hannah Lords
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jared P Steranka
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|