1
|
Podvalnyi A, Kopernik A, Sayganova M, Woroncow M, Zobkova G, Smirnova A, Esibov A, Deviatkin A, Volchkov P, Albert E. Quantitative Analysis of Pseudogene-Associated Errors During Germline Variant Calling. Int J Mol Sci 2025; 26:363. [PMID: 39796219 PMCID: PMC11719938 DOI: 10.3390/ijms26010363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 01/13/2025] Open
Abstract
A pseudogene is a non-functional copy of a protein-coding gene. Processed pseudogenes, which are created by the reverse transcription of mRNA and subsequent integration of the resulting cDNA into the genome, being a major pseudogene class, represent a significant challenge in genome analysis due to their high sequence similarity to the parent genes and their frequent absence in the reference genome. This homology can lead to errors in variant identification, as sequences derived from processed pseudogenes can be incorrectly assigned to parental genes, complicating correct variant calling. In this study, we quantified the occurrence of variant calling errors associated with pseudogenes, generated by the most popular germline variant callers, namely GATK-HC, DRAGEN, and DeepVariant, when analysing 30x human whole-genome sequencing data (n = 13,307). The results show that the presence of pseudogenes can interfere with variant calling, leading to false positive identifications of potentially clinically relevant variants. Compared to other approaches, DeepVariant was the most effective in correcting these errors.
Collapse
Affiliation(s)
- Artem Podvalnyi
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Computer Science, HSE University, 101000 Moscow, Russia
| | - Arina Kopernik
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Mariia Sayganova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Mary Woroncow
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| | | | | | - Anton Esibov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Andrey Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
| | - Pavel Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Eugene Albert
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia (A.D.)
- Faculty of Fundamental Medicine, Lomonosov Moscow State University, 119991 Moscow, Russia
| |
Collapse
|
2
|
Yan Y, Tian Y, Wu Z, Zhang K, Yang R. Interchromosomal Colocalization with Parental Genes Is Linked to the Function and Evolution of Mammalian Retrocopies. Mol Biol Evol 2023; 40:msad265. [PMID: 38060983 PMCID: PMC10733166 DOI: 10.1093/molbev/msad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/25/2023] [Accepted: 11/29/2023] [Indexed: 12/22/2023] Open
Abstract
Retrocopies are gene duplicates arising from reverse transcription of mature mRNA transcripts and their insertion back into the genome. While long being regarded as processed pseudogenes, more and more functional retrocopies have been discovered. How the stripped-down retrocopies recover expression capability and become functional paralogs continually intrigues evolutionary biologists. Here, we investigated the function and evolution of retrocopies in the context of 3D genome organization. By mapping retrocopy-parent pairs onto sequencing-based and imaging-based chromatin contact maps in human and mouse cell lines and onto Hi-C interaction maps in 5 other mammals, we found that retrocopies and their parental genes show a higher-than-expected interchromosomal colocalization frequency. The spatial interactions between retrocopies and parental genes occur frequently at loci in active subcompartments and near nuclear speckles. Accordingly, colocalized retrocopies are more actively transcribed and translated and are more evolutionarily conserved than noncolocalized ones. The active transcription of colocalized retrocopies may result from their permissive epigenetic environment and shared regulatory elements with parental genes. Population genetic analysis of retroposed gene copy number variants in human populations revealed that retrocopy insertions are not entirely random in regard to interchromosomal interactions and that colocalized retroposed gene copy number variants are more likely to reach high frequencies, suggesting that both insertion bias and natural selection contribute to the colocalization of retrocopy-parent pairs. Further dissection implies that reduced selection efficacy, rather than positive selection, contributes to the elevated allele frequency of colocalized retroposed gene copy number variants. Overall, our results hint a role of interchromosomal colocalization in the "resurrection" of initially neutral retrocopies.
Collapse
Affiliation(s)
- Yubin Yan
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Yuhan Tian
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Zefeng Wu
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Kunling Zhang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Ruolin Yang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
3
|
Batcher K, Varney S, Raudsepp T, Jevit M, Dickinson P, Jagannathan V, Leeb T, Bannasch D. Ancient segmentally duplicated LCORL retrocopies in equids. PLoS One 2023; 18:e0286861. [PMID: 37289743 PMCID: PMC10249811 DOI: 10.1371/journal.pone.0286861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/25/2023] [Indexed: 06/10/2023] Open
Abstract
LINE-1 is an active transposable element encoding proteins capable of inserting host gene retrocopies, resulting in retro-copy number variants (retroCNVs) between individuals. Here, we performed retroCNV discovery using 86 equids and identified 437 retrocopy insertions. Only 5 retroCNVs were shared between horses and other equids, indicating that the majority of retroCNVs inserted after the species diverged. A large number (17-35 copies) of segmentally duplicated Ligand Dependent Nuclear Receptor Corepressor Like (LCORL) retrocopies were present in all equids but absent from other extant perissodactyls. The majority of LCORL transcripts in horses and donkeys originate from the retrocopies. The initial LCORL retrotransposition occurred 18 million years ago (17-19 95% CI), which is coincident with the increase in body size, reduction in digit number, and changes in dentition that characterized equid evolution. Evolutionary conservation of the LCORL retrocopy segmental amplification in the Equidae family, high expression levels and the ancient timeline for LCORL retrotransposition support a functional role for this structural variant.
Collapse
Affiliation(s)
- Kevin Batcher
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| | - Scarlett Varney
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| | - Terje Raudsepp
- Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Matthew Jevit
- Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, United States of America
| | - Peter Dickinson
- Department of Surgical and Radiological Sciences, University of California Davis, Davis, CA, United States of America
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Danika Bannasch
- Department of Population Health and Reproduction, University of California Davis, Davis, CA, United States of America
| |
Collapse
|
4
|
Ten Berk de Boer E, Bilgrav Saether K, Eisfeldt J. Discovery of non-reference processed pseudogenes in the Swedish population. Front Genet 2023; 14:1176626. [PMID: 37323659 PMCID: PMC10267823 DOI: 10.3389/fgene.2023.1176626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/19/2023] [Indexed: 06/17/2023] Open
Abstract
The vast majority of the human genome is non-coding. There is a diversity of non-coding features, some of which have functional importance. Although the non-coding regions constitute the majority of the genome, they remain understudied, and for a long time, these regions have been referred to as junk DNA. Pseudogenes are one of these features. A pseudogene is a non-functional copy of a protein-coding gene. Pseudogenes may arise through a variety of genetic mechanisms. Processed pseudogenes are formed through reverse transcription of mRNA by LINE elements, after which the cDNA is integrated into the genome. Processed pseudogenes are known to be variable across populations; however, the variability and distribution remains unknown. Herein, we apply a custom-designed processed pseudogene pipeline on the whole genome sequencing data of 3,500 individuals; 2,500 individuals from the thousand genomes dataset, as well as 1,000 Swedish individuals. Through these analyses, we discover over 3,000 pseudogenes missing from the GRCh38 reference. Utilising our pipeline, we position 74% of the detected processed pseudogenes-allowing for analyses of formation. Notably, we find that common structural variant callers, such as Delly, classify the processed pseudogenes as deletion events, which are later predicted to be truncating variants. By compiling lists of non-reference processed pseudogenes and their frequencies, we find a great variability of pseudogenes; indicating that non-reference processed pseudogenes may be useful for DNA testing and as population-specific markers. In summary, our findings highlight a great diversity of processed pseudogenes, that processed pseudogenes are actively formed in the human genome; and that our pipeline may be used to reduce false positive structural variation caused by the misalignment and subsequent misclassification of non-reference processed pseudogenes.
Collapse
Affiliation(s)
- Esmee Ten Berk de Boer
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Kristine Bilgrav Saether
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
5
|
Whole Genome Analysis of Dizygotic Twins With Autism Reveals Prevalent Transposon Insertion Within Neuronal Regulatory Elements: Potential Implications for Disease Etiology and Clinical Assessment. J Autism Dev Disord 2023; 53:1091-1106. [PMID: 35759154 DOI: 10.1007/s10803-022-05636-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/03/2022] [Indexed: 10/17/2022]
Abstract
Transposable elements (TEs) have been implicated in autism spectrum disorder (ASD). However, our understanding of their roles is far from complete. Herein, we explored de novo TE insertions (dnTEIs) and de novo variants (DNVs) across the genomes of dizygotic twins with ASD and their parents. The neuronal regulatory elements had a tendency to harbor dnTEIs that were shared between twins, but ASD-risk genes had dnTEIs that were unique to each twin. The dnTEIs were 4.6-fold enriched in enhancers that are active in embryonic stem cell (ESC)-neurons (p < 0.001), but DNVs were 1.5-fold enriched in active enhancers of astrocytes (p = 0.0051). Our findings suggest that dnTEIs and DNVs play a role in ASD etiology by disrupting enhancers of neurons and astrocytes.
Collapse
|
6
|
Batcher K, Varney S, York D, Blacksmith M, Kidd JM, Rebhun R, Dickinson P, Bannasch D. Recent, full-length gene retrocopies are common in canids. Genome Res 2022; 32:1602-1611. [PMID: 35961775 PMCID: PMC9435743 DOI: 10.1101/gr.276828.122] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/19/2022] [Indexed: 02/03/2023]
Abstract
Gene retrocopies arise from the reverse transcription and insertion into the genome of processed mRNA transcripts. Although many retrocopies have acquired mutations that render them functionally inactive, most mammals retain active LINE-1 sequences capable of producing new retrocopies. New retrocopies, referred to as retro copy number variants (retroCNVs), may not be identified by standard variant calling techniques in high-throughput sequencing data. Although multiple functional FGF4 retroCNVs have been associated with skeletal dysplasias in dogs, the full landscape of canid retroCNVs has not been characterized. Here, retroCNV discovery was performed on a whole-genome sequencing data set of 293 canids from 76 breeds. We identified retroCNV parent genes via the presence of mRNA-specific 30-mers, and then identified retroCNV insertion sites through discordant read analysis. In total, we resolved insertion sites for 1911 retroCNVs from 1179 parent genes, 1236 of which appeared identical to their parent genes. Dogs had on average 54.1 total retroCNVs and 1.4 private retroCNVs. We found evidence of expression in testes for 12% (14/113) of the retroCNVs identified in six Golden Retrievers, including four chimeric transcripts, and 97 retroCNVs also had significantly elevated F ST across dog breeds, possibly indicating selection. We applied our approach to a subset of human genomes and detected an average of 4.2 retroCNVs per sample, highlighting a 13-fold relative increase of retroCNV frequency in dogs. Particularly in canids, retroCNVs are a largely unexplored source of genetic variation that can contribute to genome plasticity and that should be considered when investigating traits and diseases.
Collapse
Affiliation(s)
- Kevin Batcher
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| | - Scarlett Varney
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| | - Daniel York
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Matthew Blacksmith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
| | - Robert Rebhun
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Peter Dickinson
- Department of Surgical and Radiological Sciences, University of California, Davis, Davis, California 95616, USA
| | - Danika Bannasch
- Department of Population Health and Reproduction, University of California, Davis, Davis, California 95616, USA
| |
Collapse
|
7
|
Domazet-Lošo T. mRNA Vaccines: Why Is the Biology of Retroposition Ignored? Genes (Basel) 2022; 13:719. [PMID: 35627104 PMCID: PMC9141755 DOI: 10.3390/genes13050719] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The major advantage of mRNA vaccines over more conventional approaches is their potential for rapid development and large-scale deployment in pandemic situations. In the current COVID-19 crisis, two mRNA COVID-19 vaccines have been conditionally approved and broadly applied, while others are still in clinical trials. However, there is no previous experience with the use of mRNA vaccines on a large scale in the general population. This warrants a careful evaluation of mRNA vaccine safety properties by considering all available knowledge about mRNA molecular biology and evolution. Here, I discuss the pervasive claim that mRNA-based vaccines cannot alter genomes. Surprisingly, this notion is widely stated in the mRNA vaccine literature but never supported by referencing any primary scientific papers that would specifically address this question. This discrepancy becomes even more puzzling if one considers previous work on the molecular and evolutionary aspects of retroposition in murine and human populations that clearly documents the frequent integration of mRNA molecules into genomes, including clinical contexts. By performing basic comparisons, I show that the sequence features of mRNA vaccines meet all known requirements for retroposition using L1 elements-the most abundant autonomously active retrotransposons in the human genome. In fact, many factors associated with mRNA vaccines increase the possibility of their L1-mediated retroposition. I conclude that is unfounded to a priori assume that mRNA-based therapeutics do not impact genomes and that the route to genome integration of vaccine mRNAs via endogenous L1 retroelements is easily conceivable. This implies that we urgently need experimental studies that would rigorously test for the potential retroposition of vaccine mRNAs. At present, the insertional mutagenesis safety of mRNA-based vaccines should be considered unresolved.
Collapse
Affiliation(s)
- Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička Cesta 54, HR-10000 Zagreb, Croatia;
- School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000 Zagreb, Croatia
| |
Collapse
|
8
|
Zhang W, Tautz D. Tracing the origin and evolutionary fate of recent gene retrocopies in natural populations of the house mouse. Mol Biol Evol 2021; 39:6481550. [PMID: 34940842 PMCID: PMC8826619 DOI: 10.1093/molbev/msab360] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Although the contribution of retrogenes to the evolution of genes and genomes has long been recognized, the evolutionary patterns of very recently derived retrocopies that are still polymorphic within natural populations have not been much studied so far. We use here a set of 2,025 such retrocopies in nine house mouse populations from three subspecies (Mus musculus domesticus, M. m. musculus, and M. m. castaneus) to trace their origin and evolutionary fate. We find that ancient house-keeping genes are significantly more likely to generate retrocopies than younger genes and that the propensity to generate a retrocopy depends on its level of expression in the germline. Although most retrocopies are detrimental and quickly purged, we focus here on the subset that appears to be neutral or even adaptive. We show that retrocopies from X-chromosomal parental genes have a higher likelihood to reach elevated frequencies in the populations, confirming the notion of adaptive effects for “out-of-X” retrogenes. Also, retrocopies in intergenic regions are more likely to reach higher population frequencies than those in introns of genes, implying a more detrimental effect when they land within transcribed regions. For a small subset of retrocopies, we find signatures of positive selection, indicating they were involved in a recent adaptation process. We show that the population-specific distribution pattern of retrocopies is phylogenetically informative and can be used to infer population history with a better resolution than with SNP markers.
Collapse
Affiliation(s)
- Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön, D-24306, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön, D-24306, Germany
| |
Collapse
|
9
|
Feliciello I, Procino A. mRNA vaccines: Why and how they should be modified. JOURNAL OF BIOLOGICAL RESEARCH - BOLLETTINO DELLA SOCIETÀ ITALIANA DI BIOLOGIA SPERIMENTALE 2021. [DOI: 10.4081/jbr.2021.10072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
The COVID-19 pandemic has stimulated the production of different therapeutic approaches for the resolution of coronavirus infections. On one hand, nanobiomolecules have been proposed as bait material for viruses,1,2 on the other hand unconventional messenger RNA vaccines have been produced like SARS-CoV-2 mRNA vaccines (BioNTech/Pfizer BNT162b2 and Moderna mRNA-1273). [...]
Collapse
|
10
|
Troskie RL, Faulkner GJ, Cheetham SW. Processed pseudogenes: A substrate for evolutionary innovation: Retrotransposition contributes to genome evolution by propagating pseudogene sequences with rich regulatory potential throughout the genome. Bioessays 2021; 43:e2100186. [PMID: 34569081 DOI: 10.1002/bies.202100186] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/08/2022]
Abstract
Processed pseudogenes may serve as a genetic reservoir for evolutionary innovation. Here, we argue that through the activity of long interspersed element-1 retrotransposons, processed pseudogenes disperse coding and noncoding sequences rich with regulatory potential throughout the human genome. While these sequences may appear to be non-functional, a lack of contemporary function does not prohibit future development of biological activity. Here, we discuss the dynamic evolution of certain processed pseudogenes into coding and noncoding genes and regulatory elements, and their implication in wide-ranging biological and pathological processes. Also see the video abstract here: https://youtu.be/iUY_mteVoPI.
Collapse
Affiliation(s)
- Robin-Lee Troskie
- Mater Research Institute, University of Queensland, Woolloongabba, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute, University of Queensland, Woolloongabba, Australia.,Queensland Brain Institute, University of Queensland, Brisbane, Australia
| | - Seth W Cheetham
- Mater Research Institute, University of Queensland, Woolloongabba, Australia
| |
Collapse
|
11
|
Pattan V, Kashyap R, Bansal V, Candula N, Koritala T, Surani S. Genomics in medicine: A new era in medicine. World J Methodol 2021; 11:231-242. [PMID: 34631481 PMCID: PMC8472545 DOI: 10.5662/wjm.v11.i5.231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 06/18/2021] [Accepted: 07/19/2021] [Indexed: 02/06/2023] Open
Abstract
The sequencing of complete human genome revolutionized the genomic medicine. However, the complex interplay of gene-environment-lifestyle and influence of non-coding genomic regions on human health remain largely unexplored. Genomic medicine has great potential for diagnoses or disease prediction, disease prevention and, targeted treatment. However, many of the promising tools of genomic medicine are still in their infancy and their application may be limited because of the limited knowledge we have that precludes its use in many clinical settings. In this review article, we have reviewed the evolution of genomic methodologies/tools, their limitations, and scope, for current and future clinical application.
Collapse
Affiliation(s)
- Vishwanath Pattan
- Division of Endocrinology, Wyoming Medical Center, Casper, WY 82601, United States
| | - Rahul Kashyap
- Department of Anesthesiology and Peri-operative Medicine, Mayo Clinic, Rochester, MN 55905, United States
| | - Vikas Bansal
- Department of Anesthesiology and Peri-operative Medicine, Mayo Clinic, Rochester, MN 55905, United States
| | - Narsimha Candula
- Hospital Medicine, University Florida Health, Jacksonville, FL 32209, United States
| | - Thoyaja Koritala
- Hospital Medicine, Mayo Clinic Health System, Mankato, MN 56001, United States
| | - Salim Surani
- Department of Internal Medicine, Texas A&M University, Corpus Christi, TX 78405, United States
| |
Collapse
|
12
|
Miller TLA, Orpinelli Rego F, Buzzo JLL, Galante PAF. sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies. Bioinformatics 2021; 37:419-421. [PMID: 32717039 DOI: 10.1093/bioinformatics/btaa689] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/29/2020] [Accepted: 07/23/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Retrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or polymorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need. RESULTS Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity and genomic context and classifying them as somatic or polymorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them, in addition to finding polymorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of bona fide retroCNVs. AVAILABILITY AND IMPLEMENTATION sideRETRO is available at https://github.com/galantelab/sideRETRO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thiago L A Miller
- Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo 01308-060, Brazil.,Departmento de Bioquímica, Universidade de São Paulo, São Paulo 05508-000, Brazil
| | | | - José Leonel L Buzzo
- Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo 01308-060, Brazil.,Departmento de Bioquímica, Universidade de São Paulo, São Paulo 05508-000, Brazil
| | - Pedro A F Galante
- Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo 01308-060, Brazil
| |
Collapse
|
13
|
The mutational load in natural populations is significantly affected by high primary rates of retroposition. Proc Natl Acad Sci U S A 2021; 118:2013043118. [PMID: 33526666 PMCID: PMC8017666 DOI: 10.1073/pnas.2013043118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The phenomenon of retroposition (the reintegration of reverse-transcribed RNA into the genome) has been well studied in comparisons between species and has been identified as a source of evolutionary innovation. However, less attention has been paid to possible negative effects of retroposition. To trace the evolutionary dynamics of these negative effects, our study uses a unique genomic dataset of house mouse populations. It reveals that the initial retroposition rate is very high and that most of these newly transposed retrocopies have a deleterious impact, apparently through modifying the expression of their parental genes. In humans, this effect is expected to cause disease alleles, and we propose that genetic screening should include the search for newly transposed retrocopies. Gene retroposition is known to contribute to patterns of gene evolution and adaptations. However, possible negative effects of gene retroposition remain largely unexplored since most previous studies have focused on between-species comparisons where negatively selected copies are mostly not observed, as they are quickly lost from populations. Here, we show for natural house mouse populations that the primary rate of retroposition is orders of magnitude higher than the long-term rate. Comparisons with single-nucleotide polymorphism distribution patterns in the same populations show that most retroposition events are deleterious. Transcriptomic profiling analysis shows that new retroposed copies become easily subject to transcription and have an influence on the expression levels of their parental genes, especially when transcribed in the antisense direction. Our results imply that the impact of retroposition on the mutational load has been highly underestimated in natural populations. This has additional implications for strategies of disease allele detection in humans.
Collapse
|
14
|
Qu L, Wang L, He F, Han Y, Yang L, Wang MD, Zhu H. The Landscape of Micro-Inversions Provide Clues for Population Genetic Analysis of Humans. Interdiscip Sci 2020; 12:499-514. [PMID: 32929667 PMCID: PMC7658078 DOI: 10.1007/s12539-020-00392-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 09/02/2020] [Accepted: 09/03/2020] [Indexed: 11/04/2022]
Abstract
Background Variations in the human genome have been studied extensively. However, little is known about the role of micro-inversions (MIs), generally defined as small (< 100 bp) inversions, in human evolution, diversity, and health. Depicting the pattern of MIs among diverse populations is critical for interpreting human evolutionary history and obtaining insight into genetic diseases. Results In this paper, we explored the distribution of MIs in genomes from 26 human populations and 7 nonhuman primate genomes and analyzed the phylogenetic structure of the 26 human populations based on the MIs. We further investigated the functions of the MIs located within genes associated with human health. With hg19 as the reference genome, we detected 6968 MIs among the 1937 human samples and 24,476 MIs among the 7 nonhuman primate genomes. The analyses of MIs in human genomes showed that the MIs were rarely located in exonic regions. Nonhuman primates and human populations shared only 82 inverted alleles, and Africans had the most inverted alleles in common with nonhuman primates, which was consistent with the “Out of Africa” hypothesis. The clustering of MIs among the human populations also coincided with human migration history and ancestral lineages. Conclusions We propose that MIs are potential evolutionary markers for investigating population dynamics. Our results revealed the diversity of MIs in human populations and showed that they are essential to construct human population relationships and have a potential effect on human health. Electronic supplementary material The online version of this article (10.1007/s12539-020-00392-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Li Qu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA
| | - Luotong Wang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Feifei He
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Yilun Han
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Longshu Yang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - May D Wang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China. .,Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA. .,Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
15
|
Batcher K, Dickinson P, Maciejczyk K, Brzeski K, Rasouliha SH, Letko A, Drögemüller C, Leeb T, Bannasch D. Multiple FGF4 Retrocopies Recently Derived within Canids. Genes (Basel) 2020; 11:genes11080839. [PMID: 32717834 PMCID: PMC7465015 DOI: 10.3390/genes11080839] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 07/21/2020] [Accepted: 07/21/2020] [Indexed: 12/17/2022] Open
Abstract
Two transcribed retrocopies of the fibroblast growth factor 4 (FGF4) gene have previously been described in the domestic dog. An FGF4 retrocopy on chr18 is associated with disproportionate dwarfism, while an FGF4 retrocopy on chr12 is associated with both disproportionate dwarfism and intervertebral disc disease (IVDD). In this study, whole-genome sequencing data were queried to identify other FGF4 retrocopies that could be contributing to phenotypic diversity in canids. Additionally, dogs with surgically confirmed IVDD were assayed for novel FGF4 retrocopies. Five additional and distinct FGF4 retrocopies were identified in canids including a copy unique to red wolves (Canis rufus). The FGF4 retrocopies identified in domestic dogs were identical to domestic dog FGF4 haplotypes, which are distinct from modern wolf FGF4 haplotypes, indicating that these retrotransposition events likely occurred after domestication. The identification of multiple, full length FGF4 retrocopies with open reading frames in canids indicates that gene retrotransposition events occur much more frequently than previously thought and provide a mechanism for continued genetic and phenotypic diversity in canids.
Collapse
Affiliation(s)
- Kevin Batcher
- Department of Population Health and Reproduction, University of California-Davis, Davis, CA 95616, USA; (K.B.); (K.M.)
| | - Peter Dickinson
- Department of Surgical and Radiological Sciences, University of California-Davis, Davis, CA 95616, USA;
| | - Kimberly Maciejczyk
- Department of Population Health and Reproduction, University of California-Davis, Davis, CA 95616, USA; (K.B.); (K.M.)
| | - Kristin Brzeski
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA;
| | - Sheida Hadji Rasouliha
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3012 Bern, Switzerland; (S.H.R.); (A.L.); (C.D.); (T.L.)
| | - Anna Letko
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3012 Bern, Switzerland; (S.H.R.); (A.L.); (C.D.); (T.L.)
| | - Cord Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3012 Bern, Switzerland; (S.H.R.); (A.L.); (C.D.); (T.L.)
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3012 Bern, Switzerland; (S.H.R.); (A.L.); (C.D.); (T.L.)
| | - Danika Bannasch
- Department of Population Health and Reproduction, University of California-Davis, Davis, CA 95616, USA; (K.B.); (K.M.)
- Correspondence:
| |
Collapse
|
16
|
Johnson TS, Li S, Franz E, Huang Z, Dan Li S, Campbell MJ, Huang K, Zhang Y. PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers. Gigascience 2019; 8:5480571. [PMID: 31029062 PMCID: PMC6486473 DOI: 10.1093/gigascience/giz046] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 12/13/2018] [Accepted: 03/29/2019] [Indexed: 12/14/2022] Open
Abstract
Background Long thought “relics” of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene–parent gene relationships without leveraging other homologous genes/pseudogenes. Results We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four “flavors” of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a “one stop shop” for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. Conclusions Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.
Collapse
Affiliation(s)
- Travis S Johnson
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Sihong Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA
| | - Eric Franz
- Ohio Supercomputer Center, 1224 Kinnear Road, Columbus, OH 43212, USA
| | - Zhi Huang
- School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Shuyu Dan Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Moray J Campbell
- Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy, The Ohio State University, 500 West 12 th Avenue, Columbus, OH 43210, USA
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA.,Regenstrief Institute, Indiana University, 1101 West 10 th Street, Indianapolis, IN 46262, USA
| | - Yan Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,The Ohio State University Comprehensive Cancer Center (OSUCCC - James), 460 West 10 th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
17
|
McCole RB, Erceg J, Saylor W, Wu CT. Ultraconserved Elements Occupy Specific Arenas of Three-Dimensional Mammalian Genome Organization. Cell Rep 2019; 24:479-488. [PMID: 29996107 DOI: 10.1016/j.celrep.2018.06.031] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 05/09/2018] [Accepted: 06/07/2018] [Indexed: 12/23/2022] Open
Abstract
This study explores the relationship between three-dimensional genome organization and ultraconserved elements (UCEs), an enigmatic set of DNA elements that are perfectly conserved between the reference genomes of distantly related species. Examining both human and mouse genomes, we interrogate the relationship of UCEs to three features of chromosome organization derived from Hi-C studies. We find that UCEs are enriched within contact domains and, further, that the subset of UCEs within domains shared across diverse cell types are linked to kidney-related and neuronal processes. In boundaries, UCEs are generally depleted, with those that do overlap boundaries being overrepresented in exonic UCEs. Regarding loop anchors, UCEs are neither overrepresented nor underrepresented, but those present in loop anchors are enriched for splice sites. Finally, as the relationships between UCEs and human Hi-C features are conserved in mouse, our findings suggest that UCEs contribute to interspecies conservation of genome organization and, thus, genome stability.
Collapse
Affiliation(s)
- Ruth B McCole
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Jelena Erceg
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Wren Saylor
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Chao-Ting Wu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
18
|
piRNA-Guided CRISPR-like Immunity in Eukaryotes. Trends Immunol 2019; 40:998-1010. [DOI: 10.1016/j.it.2019.09.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 02/07/2023]
|
19
|
Gardner EJ, Prigmore E, Gallone G, Danecek P, Samocha KE, Handsaker J, Gerety SS, Ironfield H, Short PJ, Sifrim A, Singh T, Chandler KE, Clement E, Lachlan KL, Prescott K, Rosser E, FitzPatrick DR, Firth HV, Hurles ME. Contribution of retrotransposition to developmental disorders. Nat Commun 2019; 10:4630. [PMID: 31604926 PMCID: PMC6789007 DOI: 10.1038/s41467-019-12520-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 09/11/2019] [Indexed: 02/08/2023] Open
Abstract
Mobile genetic Elements (MEs) are segments of DNA which can copy themselves and other transcribed sequences through the process of retrotransposition (RT). In humans several disorders have been attributed to RT, but the role of RT in severe developmental disorders (DD) has not yet been explored. Here we identify RT-derived events in 9738 exome sequenced trios with DD-affected probands. We ascertain 9 de novo MEs, 4 of which are likely causative of the patient's symptoms (0.04%), as well as 2 de novo gene retroduplications. Beyond identifying likely diagnostic RT events, we estimate genome-wide germline ME mutation rate and selective constraint and demonstrate that coding RT events have signatures of purifying selection equivalent to those of truncating mutations. Overall, our analysis represents a comprehensive interrogation of the impact of retrotransposition on protein coding genes and a framework for future evolutionary and disease studies.
Collapse
Affiliation(s)
- Eugene J Gardner
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Elena Prigmore
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Giuseppe Gallone
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Kaitlin E Samocha
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Juliet Handsaker
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Sebastian S Gerety
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Holly Ironfield
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Patrick J Short
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Alejandro Sifrim
- Department of Human Genetics, KU Leuven, Herestraat 49, Box 602, Leuven, B-3000, Belgium
| | - Tarjinder Singh
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK
| | - Kate E Chandler
- Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, Greater, Manchester, M13 9WL, UK
| | - Emma Clement
- Department of Clinical Genetics, North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Trust, Holborn, London, WC1N 3JH, UK
| | - Katherine L Lachlan
- Wessex Clinical Genetics Service, Southampton University Hospitals NHS Foundation Trust, Princess Anne Hospital, Southampton, SO16 5YA, UK.,Faculty of Medicine, Human Development and Health, University of Southampton, Southampton, SO17 1BJ, UK
| | - Katrina Prescott
- Clinical Genetics Department, Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Chapel Allerton Hospital, Leeds, LS7 4SA, UK
| | - Elisabeth Rosser
- Department of Clinical Genetics, North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Trust, Holborn, London, WC1N 3JH, UK
| | - David R FitzPatrick
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, WGH, Edinburgh, EH4 2SP, UK
| | - Helen V Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK.,East Anglian Medical Genetics Service, Box 134, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Matthew E Hurles
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SA, UK.
| |
Collapse
|
20
|
|
21
|
Klein SJ, O'Neill RJ. Transposable elements: genome innovation, chromosome diversity, and centromere conflict. Chromosome Res 2018; 26:5-23. [PMID: 29332159 PMCID: PMC5857280 DOI: 10.1007/s10577-017-9569-5] [Citation(s) in RCA: 115] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/05/2017] [Accepted: 12/12/2017] [Indexed: 12/21/2022]
Abstract
Although it was nearly 70 years ago when transposable elements (TEs) were first discovered “jumping” from one genomic location to another, TEs are now recognized as contributors to genomic innovations as well as genome instability across a wide variety of species. In this review, we illustrate the ways in which active TEs, specifically retroelements, can create novel chromosome rearrangements and impact gene expression, leading to disease in some cases and species-specific diversity in others. We explore the ways in which eukaryotic genomes have evolved defense mechanisms to temper TE activity and the ways in which TEs continue to influence genome structure despite being rendered transpositionally inactive. Finally, we focus on the role of TEs in the establishment, maintenance, and stabilization of critical, yet rapidly evolving, chromosome features: eukaryotic centromeres. Across centromeres, specific types of TEs participate in genomic conflict, a balancing act wherein they are actively inserting into centromeric domains yet are harnessed for the recruitment of centromeric histones and potentially new centromere formation.
Collapse
Affiliation(s)
- Savannah J Klein
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
22
|
Johnson TS, Li S, Kho JR, Huang K, Zhang Y. Network analysis of pseudogene-gene relationships: from pseudogene evolution to their functional potentials. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:536-547. [PMID: 29218912 PMCID: PMC5744670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Pseudogenes are fossil relatives of genes. Pseudogenes have long been thought of as "junk DNAs", since they do not code proteins in normal tissues. Although most of the human pseudogenes do not have noticeable functions, ∼20% of them exhibit transcriptional activity. There has been evidence showing that some pseudogenes adopted functions as lncRNAs and work as regulators of gene expression. Furthermore, pseudogenes can even be "reactivated" in some conditions, such as cancer initiation. Some pseudogenes are transcribed in specific cancer types, and some are even translated into proteins as observed in several cancer cell lines. All the above have shown that pseudogenes could have functional roles or potentials in the genome. Evaluating the relationships between pseudogenes and their gene counterparts could help us reveal the evolutionary path of pseudogenes and associate pseudogenes with functional potentials. It also provides an insight into the regulatory networks involving pseudogenes with transcriptional and even translational activities.In this study, we develop a novel approach integrating graph analysis, sequence alignment and functional analysis to evaluate pseudogene-gene relationships, and apply it to human gene homologs and pseudogenes. We generated a comprehensive set of 445 pseudogene-gene (PGG) families from the original 3,281 gene families (13.56%). Of these 438 (98.4% PGG, 13.3% total) were non-trivial (containing more than one pseudogene). Each PGG family contains multiple genes and pseudogenes with high sequence similarity. For each family, we generate a sequence alignment network and phylogenetic trees recapitulating the evolutionary paths. We find evidence supporting the evolution history of olfactory family (both genes and pseudogenes) in human, which also supports the validity of our analysis method. Next, we evaluate these networks in respect to the gene ontology from which we identify functions enriched in these pseudogene-gene families and infer functional impact of pseudogenes involved in the networks. This demonstrates the application of our PGG network database in the study of pseudogene function in disease context.
Collapse
Affiliation(s)
- Travis S Johnson
- Dept. Biomedical Informatics, Ohio State University, 5000 HITS, 410 W. 10th St. Indianapolis, Indiana, 46202, USA,
| | | | | | | | | |
Collapse
|