1
|
Chu C, Ljungström V, Tran A, Jin H, Park PJ. Contribution of de novo retroelements to birth defects and childhood cancers. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305733. [PMID: 38699361 PMCID: PMC11065029 DOI: 10.1101/2024.04.15.24305733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Insertion of active retroelements-L1s, Alus, and SVAs-can disrupt proper genome function and lead to various disorders including cancer. However, the role of de novo retroelements (DNRTs) in birth defects and childhood cancers has not been well characterized due to the lack of adequate data and efficient computational tools. Here, we examine whole-genome sequencing data of 3,244 trios from 12 birth defect and childhood cancer cohorts in the Gabriella Miller Kids First Pediatric Research Program. Using an improved version of our tool xTea (x-Transposable element analyzer) that incorporates a deep-learning module, we identified 162 DNRTs, as well as 2 pseudogene insertions. Several variants are likely to be causal, such as a de novo Alu insertion that led to the ablation of a whole exon in the NF1 gene in a proband with brain tumor. We observe a high de novo SVA insertion burden in both high-intolerance loss-of-function genes and exons as well as more frequent de novo Alu insertions of paternal origin. We also identify potential mosaic DNRTs from embryonic stages. Our study reveals the important roles of DNRTs in causing birth defects and predisposition to childhood cancers.
Collapse
Affiliation(s)
- Chong Chu
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Viktor Ljungström
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Antuan Tran
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Hu Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Peter J. Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
2
|
Yan Y, Tian Y, Wu Z, Zhang K, Yang R. Interchromosomal Colocalization with Parental Genes Is Linked to the Function and Evolution of Mammalian Retrocopies. Mol Biol Evol 2023; 40:msad265. [PMID: 38060983 PMCID: PMC10733166 DOI: 10.1093/molbev/msad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/25/2023] [Accepted: 11/29/2023] [Indexed: 12/22/2023] Open
Abstract
Retrocopies are gene duplicates arising from reverse transcription of mature mRNA transcripts and their insertion back into the genome. While long being regarded as processed pseudogenes, more and more functional retrocopies have been discovered. How the stripped-down retrocopies recover expression capability and become functional paralogs continually intrigues evolutionary biologists. Here, we investigated the function and evolution of retrocopies in the context of 3D genome organization. By mapping retrocopy-parent pairs onto sequencing-based and imaging-based chromatin contact maps in human and mouse cell lines and onto Hi-C interaction maps in 5 other mammals, we found that retrocopies and their parental genes show a higher-than-expected interchromosomal colocalization frequency. The spatial interactions between retrocopies and parental genes occur frequently at loci in active subcompartments and near nuclear speckles. Accordingly, colocalized retrocopies are more actively transcribed and translated and are more evolutionarily conserved than noncolocalized ones. The active transcription of colocalized retrocopies may result from their permissive epigenetic environment and shared regulatory elements with parental genes. Population genetic analysis of retroposed gene copy number variants in human populations revealed that retrocopy insertions are not entirely random in regard to interchromosomal interactions and that colocalized retroposed gene copy number variants are more likely to reach high frequencies, suggesting that both insertion bias and natural selection contribute to the colocalization of retrocopy-parent pairs. Further dissection implies that reduced selection efficacy, rather than positive selection, contributes to the elevated allele frequency of colocalized retroposed gene copy number variants. Overall, our results hint a role of interchromosomal colocalization in the "resurrection" of initially neutral retrocopies.
Collapse
Affiliation(s)
- Yubin Yan
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Yuhan Tian
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Zefeng Wu
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Kunling Zhang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Ruolin Yang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
3
|
Zhou B, He Y, Chen Y, Su B. Comparative Genomic Analysis Identifies Great-Ape-Specific Structural Variants and Their Evolutionary Relevance. Mol Biol Evol 2023; 40:msad184. [PMID: 37565562 PMCID: PMC10461412 DOI: 10.1093/molbev/msad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/01/2023] [Accepted: 08/10/2023] [Indexed: 08/12/2023] Open
Abstract
During the origin of great apes about 14 million years ago, a series of phenotypic innovations emerged, such as the increased body size, the enlarged brain volume, the improved cognitive skill, and the diversified diet. Yet, the genomic basis of these evolutionary changes remains unclear. Utilizing the high-quality genome assemblies of great apes (including human), gibbon, and macaque, we conducted comparative genome analyses and identified 15,885 great ape-specific structural variants (GSSVs), including eight coding GSSVs resulting in the creation of novel proteins (e.g., ACAN and CMYA5). Functional annotations of the GSSV-related genes revealed the enrichment of genes involved in development and morphogenesis, especially neurogenesis and neural network formation, suggesting the potential role of GSSVs in shaping the great ape-shared traits. Further dissection of the brain-related GSSVs shows great ape-specific changes of enhancer activities and gene expression in the brain, involving a group of GSSV-regulated genes (such as NOL3) that potentially contribute to the altered brain development and function in great apes. The presented data highlight the evolutionary role of structural variants in the phenotypic innovations during the origin of the great ape lineage.
Collapse
Affiliation(s)
- Bin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yongjie Chen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
4
|
Batcher K, Varney S, Raudsepp T, Jevit M, Dickinson P, Jagannathan V, Leeb T, Bannasch D. Ancient segmentally duplicated LCORL retrocopies in equids. PLoS One 2023; 18:e0286861. [PMID: 37289743 PMCID: PMC10249811 DOI: 10.1371/journal.pone.0286861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/25/2023] [Indexed: 06/10/2023] Open
Abstract
LINE-1 is an active transposable element encoding proteins capable of inserting host gene retrocopies, resulting in retro-copy number variants (retroCNVs) between individuals. Here, we performed retroCNV discovery using 86 equids and identified 437 retrocopy insertions. Only 5 retroCNVs were shared between horses and other equids, indicating that the majority of retroCNVs inserted after the species diverged. A large number (17-35 copies) of segmentally duplicated Ligand Dependent Nuclear Receptor Corepressor Like (LCORL) retrocopies were present in all equids but absent from other extant perissodactyls. The majority of LCORL transcripts in horses and donkeys originate from the retrocopies. The initial LCORL retrotransposition occurred 18 million years ago (17-19 95% CI), which is coincident with the increase in body size, reduction in digit number, and changes in dentition that characterized equid evolution. Evolutionary conservation of the LCORL retrocopy segmental amplification in the Equidae family, high expression levels and the ancient timeline for LCORL retrotransposition support a functional role for this structural variant.
Collapse
Affiliation(s)
- Kevin Batcher
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| | - Scarlett Varney
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| | - Terje Raudsepp
- Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Matthew Jevit
- Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Peter Dickinson
- Department of Surgical and Radiological Sciences, University of California Davis, Davis, CA, United States of America
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Danika Bannasch
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| |
Collapse
|
5
|
Batcher K, Varney S, York D, Blacksmith M, Kidd JM, Rebhun R, Dickinson P, Bannasch D. Recent, full-length gene retrocopies are common in canids. Genome Res 2022; 32:gr.276828.122. [PMID: 35961775 PMCID: PMC9435743 DOI: 10.1101/gr.276828.122] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/19/2022] [Indexed: 02/03/2023]
Abstract
Gene retrocopies arise from the reverse transcription and insertion into the genome of processed mRNA transcripts. Although many retrocopies have acquired mutations that render them functionally inactive, most mammals retain active LINE-1 sequences capable of producing new retrocopies. New retrocopies, referred to as retro copy number variants (retroCNVs), may not be identified by standard variant calling techniques in high-throughput sequencing data. Although multiple functional FGF4 retroCNVs have been associated with skeletal dysplasias in dogs, the full landscape of canid retroCNVs has not been characterized. Here, retroCNV discovery was performed on a whole-genome sequencing data set of 293 canids from 76 breeds. We identified retroCNV parent genes via the presence of mRNA-specific 30-mers, and then identified retroCNV insertion sites through discordant read analysis. In total, we resolved insertion sites for 1911 retroCNVs from 1179 parent genes, 1236 of which appeared identical to their parent genes. Dogs had on average 54.1 total retroCNVs and 1.4 private retroCNVs. We found evidence of expression in testes for 12% (14/113) of the retroCNVs identified in six Golden Retrievers, including four chimeric transcripts, and 97 retroCNVs also had significantly elevated F ST across dog breeds, possibly indicating selection. We applied our approach to a subset of human genomes and detected an average of 4.2 retroCNVs per sample, highlighting a 13-fold relative increase of retroCNV frequency in dogs. Particularly in canids, retroCNVs are a largely unexplored source of genetic variation that can contribute to genome plasticity and that should be considered when investigating traits and diseases.
Collapse
Affiliation(s)
- Kevin Batcher
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| | - Scarlett Varney
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| | - Daniel York
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Matthew Blacksmith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
| | - Robert Rebhun
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Peter Dickinson
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Danika Bannasch
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| |
Collapse
|
6
|
Nsengimana B, Khan FA, Awan UA, Wang D, Fang N, Wei W, Zhang W, Ji S. Pseudogenes and Liquid Phase Separation in Epigenetic Expression. Front Oncol 2022; 12:912282. [PMID: 35875144 PMCID: PMC9305658 DOI: 10.3389/fonc.2022.912282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 06/13/2022] [Indexed: 11/24/2022] Open
Abstract
Pseudogenes have been considered as non-functional genes. However, peptides and long non-coding RNAs produced by pseudogenes are expressed in different tumors. Moreover, the dysregulation of pseudogenes is associated with cancer, and their expressions are higher in tumors compared to normal tissues. Recent studies show that pseudogenes can influence the liquid phase condensates formation. Liquid phase separation involves regulating different epigenetic stages, including transcription, chromatin organization, 3D DNA structure, splicing, and post-transcription modifications like m6A. Several membrane-less organelles, formed through the liquid phase separate, are also involved in the epigenetic regulation, and their defects are associated with cancer development. However, the association between pseudogenes and liquid phase separation remains unrevealed. The current study sought to investigate the relationship between pseudogenes and liquid phase separation in cancer development, as well as their therapeutic implications.
Collapse
Affiliation(s)
- Bernard Nsengimana
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
| | - Faiz Ali Khan
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
- School of Life Sciences, Henan University, Kaifeng, China
- Department of Basic Sciences Research, Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC), Lahore, Pakistan
| | - Usman Ayub Awan
- Department of Medical Laboratory Technology, The University of Haripur, Haripur, Pakistan
| | - Dandan Wang
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
| | - Na Fang
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
| | - Wenqiang Wei
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
- *Correspondence: Wenqiang Wei, ; Weijuan Zhang, ; Shaoping Ji,
| | - Weijuan Zhang
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
- *Correspondence: Wenqiang Wei, ; Weijuan Zhang, ; Shaoping Ji,
| | - Shaoping Ji
- Laboratory of Cell Signal Transduction, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Henan University, Kaifeng, China
- *Correspondence: Wenqiang Wei, ; Weijuan Zhang, ; Shaoping Ji,
| |
Collapse
|
7
|
Leonard AS, Crysnanto D, Fang ZH, Heaton MP, Vander Ley BL, Herrera C, Bollwein H, Bickhart DM, Kuhn KL, Smith TPL, Rosen BD, Pausch H. Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies. Nat Commun 2022; 13:3012. [PMID: 35641504 PMCID: PMC9156671 DOI: 10.1038/s41467-022-30680-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 05/10/2022] [Indexed: 12/12/2022] Open
Abstract
Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype.
Collapse
Affiliation(s)
- Alexander S Leonard
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland.
| | - Danang Crysnanto
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Zih-Hua Fang
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland
| | - Michael P Heaton
- U.S. Meat Animal Research Center, USDA-ARS, 844 Road 313, Clay Center, NE, 68933, USA
| | - Brian L Vander Ley
- Great Plains Veterinary Educational Center, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Carolina Herrera
- Clinic of Reproductive Medicine, Department for Farm Animals, University of Zurich, 8057, Zurich, Switzerland
| | - Heinrich Bollwein
- Clinic of Reproductive Medicine, Department for Farm Animals, University of Zurich, 8057, Zurich, Switzerland
| | - Derek M Bickhart
- Dairy Forage Research Center, USDA-ARS, 1925 Linden Drive, Madison, WI, 53706, USA
| | - Kristen L Kuhn
- U.S. Meat Animal Research Center, USDA-ARS, 844 Road 313, Clay Center, NE, 68933, USA
| | - Timothy P L Smith
- U.S. Meat Animal Research Center, USDA-ARS, 844 Road 313, Clay Center, NE, 68933, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, 10300 Baltimore Ave, Beltsville, MD, 20705, USA.
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland.
| |
Collapse
|
8
|
Fawcett KA, Demidov G, Shrine N, Paynton ML, Ossowski S, Sayers I, Wain LV, Hollox EJ. Exome-wide analysis of copy number variation shows association of the human leukocyte antigen region with asthma in UK Biobank. BMC Med Genomics 2022; 15:119. [PMID: 35597955 PMCID: PMC9124406 DOI: 10.1186/s12920-022-01268-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 05/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The role of copy number variants (CNVs) in susceptibility to asthma is not well understood. This is, in part, due to the difficulty of accurately measuring CNVs in large enough sample sizes to detect associations. The recent availability of whole-exome sequencing (WES) in large biobank studies provides an unprecedented opportunity to study the role of CNVs in asthma. METHODS We called common CNVs in 49,953 individuals in the first release of UK Biobank WES using ClinCNV software. CNVs were tested for association with asthma in a stage 1 analysis comprising 7098 asthma cases and 36,578 controls from the first release of sequencing data. Nominally-associated CNVs were then meta-analysed in stage 2 with an additional 17,280 asthma cases and 115,562 controls from the second release of UK Biobank exome sequencing, followed by validation and fine-mapping. RESULTS Five of 189 CNVs were associated with asthma in stage 2, including a deletion overlapping the HLA-DQA1 and HLA-DQB1 genes, a duplication of CHROMR/PRKRA, deletions within MUC22 and TAP2, and a duplication in FBRSL1. The HLA-DQA1, HLA-DQB1, MUC22 and TAP2 genes all reside within the human leukocyte antigen (HLA) region on chromosome 6. In silico analyses demonstrated that the deletion overlapping HLA-DQA1 and HLA-DQB1 is likely to be an artefact arising from under-mapping of reads from non-reference HLA haplotypes, and that the CHROMR/PRKRA and FBRSL1 duplications represent presence/absence of pseudogenes within the HLA region. Bayesian fine-mapping of the HLA region suggested that there are two independent asthma association signals. The variants with the largest posterior inclusion probability in the two credible sets were an amino acid change in HLA-DQB1 (glutamine to histidine at residue 253) and a multi-allelic amino acid change in HLA-DRB1 (presence/absence of serine, glycine or leucine at residue 11). CONCLUSIONS At least two independent loci characterised by amino acid changes in the HLA-DQA1, HLA-DQB1 and HLA-DRB1 genes are likely to account for association of SNPs and CNVs in this region with asthma. The high divergence of haplotypes in the HLA can give rise to spurious CNVs, providing an important, cautionary tale for future large-scale analyses of sequencing data.
Collapse
Affiliation(s)
- Katherine A Fawcett
- Department of Health Sciences, University of Leicester, Leicester, LE1 7RH, UK.
| | - German Demidov
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Nick Shrine
- Department of Health Sciences, University of Leicester, Leicester, LE1 7RH, UK
| | - Megan L Paynton
- Department of Health Sciences, University of Leicester, Leicester, LE1 7RH, UK
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Ian Sayers
- Translational Medical Sciences, NIHR Respiratory Biomedical Research Centre, School of Medicine, Biodiscovery Institute, University of Nottingham, University Park, Nottingham, UK
| | - Louise V Wain
- Department of Health Sciences, University of Leicester, Leicester, LE1 7RH, UK.,Leicester Respiratory Biomedical Research Centre, National Institute for Health Research, Glenfield Hospital, Leicester, LE3 9QP, UK
| | - Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| |
Collapse
|
9
|
Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat Commun 2021; 12:3836. [PMID: 34158502 PMCID: PMC8219666 DOI: 10.1038/s41467-021-24041-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 05/27/2021] [Indexed: 02/05/2023] Open
Abstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea .
Collapse
|