1
|
Kogan V, Molodtsov I, Fleyshman DI, Leontieva OV, Koman IE, Gudkov AV. The reconstruction of evolutionary dynamics of processed pseudogenes indicates deep silencing of "retrobiome" in naked mole rat. Proc Natl Acad Sci U S A 2024; 121:e2313581121. [PMID: 39467133 DOI: 10.1073/pnas.2313581121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 09/02/2024] [Indexed: 10/30/2024] Open
Abstract
Approximately half of mammalian genomes are occupied by retrotransposons, highly repetitive interspersed genetic elements expanded through the mechanism of reverse transcription. The evolution of this "retrobiome" involved a series of explosive amplifications, presumably associated with high mutation rates, interspersed with periods of silencing. A by-product of retrotransposon activity is the formation of processed pseudogenes (PPGs)-intron-less, promoter-less DNA copies of messenger RNA (mRNA). We examined the proportion of PPGs with varying degrees of deviation from their ancestor mRNAs as an indicator of the intensity of retrotranspositions at different times in the past. Our analysis revealed a high proportion of "young'' (recently acquired) PPGs in the DNA of mice and rats, indicating significant retrobiome activity during the recent evolution of these species. The ongoing process of new PPG entries in mouse germ line DNA was confirmed by identifying diversity in PPG content within the single strain of mice, C57BL/6. In contrast, the highly abundant PPGs of the naked mole rat (NMR) exhibited substantial deviation from their mRNAs, with a near-complete lack of PPGs without mutations, indicative of the silencing of the retrobiome in the most recent evolutionary past, preceded by a period of high activity. This distinctive feature of the NMR genome was confirmed through the analysis of a broad range of mammalian species. The peculiar evolutionary dynamics of PPGs in the NMR, an organism with exceptional longevity and resistance to cancer, may reflect the role played by the retrobiome in aging and cancer.
Collapse
Affiliation(s)
- Valeria Kogan
- Institute for Personalized and Translational Medicine, Adelson School of Medicine, Ariel University, Ariel 4070000, Israel
| | - Ivan Molodtsov
- Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| | - Daria I Fleyshman
- Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| | - Olga V Leontieva
- Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| | - Igor E Koman
- Institute for Personalized and Translational Medicine, Adelson School of Medicine, Ariel University, Ariel 4070000, Israel
| | - Andrei V Gudkov
- Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263
| |
Collapse
|
2
|
Zhang H, Ni S, Frith MC. An immune-suppressing protein in human endogenous retroviruses. BIOINFORMATICS ADVANCES 2023; 3:vbad013. [PMID: 36818731 PMCID: PMC9927554 DOI: 10.1093/bioadv/vbad013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/25/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023]
Abstract
Motivation Retroviruses are important contributors to disease and evolution in vertebrates. Sometimes, retrovirus DNA is heritably inserted in a vertebrate genome: an endogenous retrovirus (ERV). Vertebrate genomes have many such virus-derived fragments, usually with mutations disabling their original functions. Results Some primate ERVs appear to encode an overlooked protein. This protein is homologous to protein MC132 from Molluscum contagiosum virus, which is a human poxvirus, not a retrovirus. MC132 suppresses the immune system by targeting NF- κ B, and it had no known homologs until now. The ERV homologs of MC132 in the human genome are mostly disrupted by mutations, but there is an intact copy on chromosome 4. We found homologs of MC132 in ERVs of apes, monkeys and bushbaby, but not tarsiers, lemurs or non-primates. This suggests that some primate retroviruses had, or have, an extra immune-suppressing protein, which underwent horizontal genetic transfer between unrelated viruses. Contact mcfrith@edu.k.u-tokyo.ac.jp.
Collapse
|
3
|
Duly AMP, Kao FCL, Teo WS, Kavallaris M. βIII-Tubulin Gene Regulation in Health and Disease. Front Cell Dev Biol 2022; 10:851542. [PMID: 35573698 PMCID: PMC9096907 DOI: 10.3389/fcell.2022.851542] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/07/2022] [Indexed: 11/24/2022] Open
Abstract
Microtubule proteins form a dynamic component of the cytoskeleton, and play key roles in cellular processes, such as vesicular transport, cell motility and mitosis. Expression of microtubule proteins are often dysregulated in cancer. In particular, the microtubule protein βIII-tubulin, encoded by the TUBB3 gene, is aberrantly expressed in a range of epithelial tumours and is associated with drug resistance and aggressive disease. In normal cells, TUBB3 expression is tightly restricted, and is found almost exclusively in neuronal and testicular tissues. Understanding the mechanisms that control TUBB3 expression, both in cancer, mature and developing tissues will help to unravel the basic biology of the protein, its role in cancer, and may ultimately lead to the development of new therapeutic approaches to target this protein. This review is devoted to the transcriptional and posttranscriptional regulation of TUBB3 in normal and cancerous tissue.
Collapse
Affiliation(s)
- Alastair M. P. Duly
- Children’s Cancer Institute, Lowy Cancer Research Center, UNSW Sydney, Randwick, NSW, Australia
| | - Felicity C. L. Kao
- Children’s Cancer Institute, Lowy Cancer Research Center, UNSW Sydney, Randwick, NSW, Australia
- Australian Center for NanoMedicine, UNSW Sydney, Sydney, NSW, Australia
- School of Women and Children’s Health, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW, Australia
| | - Wee Siang Teo
- Children’s Cancer Institute, Lowy Cancer Research Center, UNSW Sydney, Randwick, NSW, Australia
- Australian Center for NanoMedicine, UNSW Sydney, Sydney, NSW, Australia
| | - Maria Kavallaris
- Children’s Cancer Institute, Lowy Cancer Research Center, UNSW Sydney, Randwick, NSW, Australia
- Australian Center for NanoMedicine, UNSW Sydney, Sydney, NSW, Australia
- School of Women and Children’s Health, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW, Australia
- UNSW RNA Institute, UNSW Sydney, Sydney, NSW, Australia
| |
Collapse
|
4
|
Frith MC. Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements. Mol Biol Evol 2022; 39:6555113. [PMID: 35348724 PMCID: PMC9004415 DOI: 10.1093/molbev/msac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomes hold a treasure trove of protein fossils: fragments of formerly protein-coding DNA, which mainly come from transposable elements (TEs) or host genes. These fossils reveal ancient evolution of TEs and genomes, and many fossils have been exapted to perform diverse functions important for the host's fitness. However, old and highly-degraded fossils are hard to identify, standard methods (e.g. BLAST) are not optimized for this task, and few Paleozoic protein fossils have been found. Here, a recently optimized method is used to find protein fossils in vertebrate genomes. It finds Paleozoic fossils predating the amphibian/amniote divergence from most major TE categories, including virus-related Polinton and Gypsy elements. It finds 10 fossils in the human genome (8 from TEs and 2 from host genes) that predate the last common ancestor of all jawed vertebrates, probably from the Ordovician period. It also finds types of transposon and retrotransposon not found in human before. These fossils have extreme sequence conservation, indicating exaptation: some have evidence of gene-regulatory function, and they tend to lienearest to developmental genes. Some ancient fossils suggest "genome tectonics", where two fragments of one TE have drifted apart by up to megabases, possibly explaining gene deserts and large introns. This paints a picture of great TE diversity in our aquatic ancestors, with patchy TE inheritance by later vertebrates, producing new genes and regulatory elements on the way. Host-gene fossils too have contributed anciently-conserved DNA segments. This paves the way to further studies of ancient protein fossils.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
5
|
Khor JM, Ettensohn CA. Architecture and evolution of the cis-regulatory system of the echinoderm kirrelL gene. eLife 2022; 11:72834. [PMID: 35212624 PMCID: PMC8903837 DOI: 10.7554/elife.72834] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 02/22/2022] [Indexed: 11/17/2022] Open
Abstract
The gene regulatory network (GRN) that underlies echinoderm skeletogenesis is a prominent model of GRN architecture and evolution. KirrelL is an essential downstream effector gene in this network and encodes an Ig-superfamily protein required for the fusion of skeletogenic cells and the formation of the skeleton. In this study, we dissected the transcriptional control region of the kirrelL gene of the purple sea urchin, Strongylocentrotus purpuratus. Using plasmid- and bacterial artificial chromosome-based transgenic reporter assays, we identified key cis-regulatory elements (CREs) and transcription factor inputs that regulate Sp-kirrelL, including direct, positive inputs from two key transcription factors in the skeletogenic GRN, Alx1 and Ets1. We next identified kirrelL cis-regulatory regions from seven other echinoderm species that together represent all classes within the phylum. By introducing these heterologous regulatory regions into developing sea urchin embryos we provide evidence of their remarkable conservation across ~500 million years of evolution. We dissected in detail the kirrelL regulatory region of the sea star, Patiria miniata, and demonstrated that it also receives direct inputs from Alx1 and Ets1. Our findings identify kirrelL as a component of the ancestral echinoderm skeletogenic GRN. They support the view that GRN subcircuits, including specific transcription factor–CRE interactions, can remain stable over vast periods of evolutionary history. Lastly, our analysis of kirrelL establishes direct linkages between a developmental GRN and an effector gene that controls a key morphogenetic cell behavior, cell–cell fusion, providing a paradigm for extending the explanatory power of GRNs.
Collapse
Affiliation(s)
- Jian Ming Khor
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, United States
| | - Charles A Ettensohn
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, United States
| |
Collapse
|
6
|
Abrahamsson S, Eiengård F, Rohlin A, Dávila López M. PΨFinder: a practical tool for the identification and visualization of novel pseudogenes in DNA sequencing data. BMC Bioinformatics 2022; 23:59. [PMID: 35114952 PMCID: PMC8812246 DOI: 10.1186/s12859-022-04583-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 01/24/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Processed pseudogenes (PΨgs) are disabled gene copies that are transcribed and may affect expression of paralogous genes. Moreover, their insertion in the genome can disrupt the structure or the regulatory region of a gene, affecting its expression level. These events have been identified as occurring mutations during cancer development, thus being able to identify PΨgs and their location will improve their impact on diagnostic testing, not only in cancer but also in inherited disorders. RESULTS We have implemented PΨFinder (P-psy-finder), a tool that identifies PΨgs, annotates known ones and predicts their insertion site(s) in the genome. The tool screens alignment files and provides user-friendly summary reports and visualizations. To demonstrate its applicability, we scanned 218 DNA samples from patients screened for hereditary colorectal cancer. We detected 423 PΨgs distributed in 96% of the samples, comprising 7 different parent genes. Among these, we confirmed the well-known insertion site of the SMAD4-PΨg within the last intron of the SCAI gene in one sample. While for the ubiquitous CBX3-PΨg, present in 82.6% of the samples, we found it reversed inserted in the second intron of the C15ORF57 gene. CONCLUSIONS PΨFinder is a tool that can automatically identify novel PΨgs from DNA sequencing data and determine their location in the genome with high sensitivity (95.92%). It generates high quality figures and tables that facilitate the interpretation of the results and can guide the experimental validation. PΨFinder is a complementary analysis to any mutational screening in the identification of disease-causing mutations within cancer and other diseases.
Collapse
Affiliation(s)
- Sanna Abrahamsson
- Bioinformatics Core Facility, Sahlgrenska Academy, University of Gothenburg, Box 115, 405 30, Gothenburg, Sweden
| | - Frida Eiengård
- Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Anna Rohlin
- Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden.,Unit of Genetic Analysis and Bioinformatics, Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Marcela Dávila López
- Bioinformatics Core Facility, Sahlgrenska Academy, University of Gothenburg, Box 115, 405 30, Gothenburg, Sweden.
| |
Collapse
|
7
|
Epigenomic signatures on paralogous genes reveal underappreciated universality of active histone codes adopted across animals. Comput Struct Biotechnol J 2022; 20:353-367. [PMID: 35035788 PMCID: PMC8741409 DOI: 10.1016/j.csbj.2021.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/15/2021] [Accepted: 12/18/2021] [Indexed: 11/21/2022] Open
|
8
|
Yanovsky-Dagan S, Frumkin A, Lupski JR, Harel T. CRISPR/Cas9-induced gene conversion between ATAD3 paralogs. HGG ADVANCES 2022; 3:100092. [PMID: 35199044 PMCID: PMC8844715 DOI: 10.1016/j.xhgg.2022.100092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/19/2022] [Indexed: 11/23/2022] Open
Abstract
Paralogs and pseudogenes are abundant within the human genome, and can mediate non-allelic homologous recombination (NAHR) or gene conversion events. The ATAD3 locus contains three paralogs situated in tandem, and is therefore prone to NAHR-mediated deletions and duplications associated with severe neurological phenotypes. To study this locus further, we aimed to generate biallelic loss-of-function variants in ATAD3A by CRISPR/Cas9 genome editing. Unexpectedly, two of the generated clones underwent gene conversion, as evidenced by replacement of the targeted sequence of ATAD3A by a donor sequence from its paralog ATAD3B. We highlight the complexity of CRISPR/Cas9 design, end-product formation, and recombination repair mechanisms for CRISPR/Cas9 delivery as a nucleic acid molecular therapy when targeting genes that have paralogs or pseudogenes, and advocate meticulous evaluation of resultant clones in model organisms. In addition, we suggest that endogenous gene conversion may be used to repair missense variants in genes with paralogs or pseudogenes.
Collapse
Affiliation(s)
| | - Ayala Frumkin
- Department of Genetics, Hadassah Medical Organization, Jerusalem, Israel
| | - James R. Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX, USA
| | - Tamar Harel
- Department of Genetics, Hadassah Medical Organization, Jerusalem, Israel
- Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
- Corresponding author
| |
Collapse
|
9
|
Zhang W, Tautz D. Tracing the origin and evolutionary fate of recent gene retrocopies in natural populations of the house mouse. Mol Biol Evol 2021; 39:6481550. [PMID: 34940842 PMCID: PMC8826619 DOI: 10.1093/molbev/msab360] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Although the contribution of retrogenes to the evolution of genes and genomes has long been recognized, the evolutionary patterns of very recently derived retrocopies that are still polymorphic within natural populations have not been much studied so far. We use here a set of 2,025 such retrocopies in nine house mouse populations from three subspecies (Mus musculus domesticus, M. m. musculus, and M. m. castaneus) to trace their origin and evolutionary fate. We find that ancient house-keeping genes are significantly more likely to generate retrocopies than younger genes and that the propensity to generate a retrocopy depends on its level of expression in the germline. Although most retrocopies are detrimental and quickly purged, we focus here on the subset that appears to be neutral or even adaptive. We show that retrocopies from X-chromosomal parental genes have a higher likelihood to reach elevated frequencies in the populations, confirming the notion of adaptive effects for “out-of-X” retrogenes. Also, retrocopies in intergenic regions are more likely to reach higher population frequencies than those in introns of genes, implying a more detrimental effect when they land within transcribed regions. For a small subset of retrocopies, we find signatures of positive selection, indicating they were involved in a recent adaptation process. We show that the population-specific distribution pattern of retrocopies is phylogenetically informative and can be used to infer population history with a better resolution than with SNP markers.
Collapse
Affiliation(s)
- Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön, D-24306, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön, D-24306, Germany
| |
Collapse
|
10
|
Kazachenka A, Kassiotis G. SARS-CoV-2-Host Chimeric RNA-Sequencing Reads Do Not Necessarily Arise From Virus Integration Into the Host DNA. Front Microbiol 2021; 12:676693. [PMID: 34149667 PMCID: PMC8206523 DOI: 10.3389/fmicb.2021.676693] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/05/2021] [Indexed: 12/11/2022] Open
Abstract
The human genome bears evidence of extensive invasion by retroviruses and other retroelements, as well as by diverse RNA and DNA viruses. High frequency of somatic integration of the RNA virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) into the DNA of infected cells was recently suggested, based on a number of observations. One key observation was the presence of chimeric RNA-sequencing (RNA-seq) reads between SARS-CoV-2 RNA and RNA transcribed from human host DNA. Here, we examined the possible origin specifically of human-SARS-CoV-2 chimeric reads in RNA-seq libraries and provide alternative explanations for their origin. Chimeric reads were frequently detected also between SARS-CoV-2 RNA and RNA transcribed from mitochondrial DNA or episomal adenoviral DNA present in transfected cell lines, which was unlikely the result of SARS-CoV-2 integration. Furthermore, chimeric reads between SARS-CoV-2 RNA and RNA transcribed from nuclear DNA were highly enriched for host exonic, rather than intronic or intergenic sequences and often involved the same, highly expressed host genes. Although these findings do not rule out SARS-CoV-2 somatic integration, they nevertheless suggest that human-SARS-CoV-2 chimeric reads found in RNA-seq data may arise during library preparation and do not necessarily signify SARS-CoV-2 reverse transcription, integration in to host DNA and further transcription.
Collapse
Affiliation(s)
| | - George Kassiotis
- Retroviral Immunology, The Francis Crick Institute, London, United Kingdom
- Department of Infectious Disease, St Mary’s Hospital, Imperial College London, London, United Kingdom
| |
Collapse
|
11
|
Troskie RL, Jafrani Y, Mercer TR, Ewing AD, Faulkner GJ, Cheetham SW. Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome. Genome Biol 2021; 22:146. [PMID: 33971925 PMCID: PMC8108447 DOI: 10.1186/s13059-021-02369-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 04/28/2021] [Indexed: 01/05/2023] Open
Abstract
Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.
Collapse
Affiliation(s)
- Robin-Lee Troskie
- Mater Research Institute-University of Queensland, TRI Building, QLD 4102 Woolloongabba, Australia
| | - Yohaann Jafrani
- Mater Research Institute-University of Queensland, TRI Building, QLD 4102 Woolloongabba, Australia
| | - Tim R. Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD 4072 Australia
| | - Adam D. Ewing
- Mater Research Institute-University of Queensland, TRI Building, QLD 4102 Woolloongabba, Australia
| | - Geoffrey J. Faulkner
- Mater Research Institute-University of Queensland, TRI Building, QLD 4102 Woolloongabba, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072 Australia
| | - Seth W. Cheetham
- Mater Research Institute-University of Queensland, TRI Building, QLD 4102 Woolloongabba, Australia
| |
Collapse
|
12
|
Cancer, Retrogenes, and Evolution. Life (Basel) 2021; 11:life11010072. [PMID: 33478113 PMCID: PMC7835786 DOI: 10.3390/life11010072] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/14/2021] [Accepted: 01/15/2021] [Indexed: 12/18/2022] Open
Abstract
This review summarizes the knowledge about retrogenes in the context of cancer and evolution. The retroposition, in which the processed mRNA from parental genes undergoes reverse transcription and the resulting cDNA is integrated back into the genome, results in additional copies of existing genes. Despite the initial misconception, retroposition-derived copies can become functional, and due to their role in the molecular evolution of genomes, they have been named the “seeds of evolution”. It is convincing that retrogenes, as important elements involved in the evolution of species, also take part in the evolution of neoplastic tumors at the cell and species levels. The occurrence of specific “resistance mechanisms” to neoplastic transformation in some species has been noted. This phenomenon has been related to additional gene copies, including retrogenes. In addition, the role of retrogenes in the evolution of tumors has been described. Retrogene expression correlates with the occurrence of specific cancer subtypes, their stages, and their response to therapy. Phylogenetic insights into retrogenes show that most cancer-related retrocopies arose in the lineage of primates, and the number of identified cancer-related retrogenes demonstrates that these duplicates are quite important players in human carcinogenesis.
Collapse
|
13
|
Zeng H, Chen X, Li H, Zhang J, Wei Z, Wang Y. Interpopulation differences of retroduplication variations (RDVs) in rice retrogenes and their phenotypic correlations. Comput Struct Biotechnol J 2021; 19:600-611. [PMID: 33510865 PMCID: PMC7811064 DOI: 10.1016/j.csbj.2020.12.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 12/29/2020] [Accepted: 12/31/2020] [Indexed: 11/21/2022] Open
Abstract
Retroduplication variation (RDV), a type of retrocopy polymorphism, is considered to have essential biological significance, but its effect on gene function and species phenotype is still poorly understood. To this end, we analyzed the retrocopies and RDVs in 3,010 rice genomes. We calculated the RDV frequencies in the genome of each rice population; detected the mutated, ancestral and expressed retrogenes in rice genomes; and analyzed their RDV influence on rice phenotypic traits. Collectively, 73 RDVs were identified, and 14 RDVs in ancestral retrogenes can significantly affect rice phenotypes. Our research reveals that RDV plays an important role in rice migration, domestication and evolution. We think that RDV is a good molecular breeding marker candidate. To our knowledge, this is the first study on the relationship between retrogene function, expression, RDV and species phenotype.
Collapse
Affiliation(s)
- Haiyue Zeng
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China
- Biological Science Research Center, Southwest University, Chongqing 400715, China
- Shennong Class, Southwest University, Chongqing 400715, China
| | - Xingyu Chen
- Shennong Class, Southwest University, Chongqing 400715, China
| | - Hongbo Li
- College of Electronic and Information Engineering, Southwest University, Chongqing 400715
| | - Jun Zhang
- College of Computer & Information Science, Southwest University, Chongqing 400715, China
| | - Zhaoyuan Wei
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| | - Yi Wang
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing 400715, China
- Biological Science Research Center, Southwest University, Chongqing 400715, China
| |
Collapse
|
14
|
Ehrlich KC, Baribault C, Ehrlich M. Epigenetics of Muscle- and Brain-Specific Expression of KLHL Family Genes. Int J Mol Sci 2020; 21:E8394. [PMID: 33182325 PMCID: PMC7672584 DOI: 10.3390/ijms21218394] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/02/2020] [Accepted: 11/06/2020] [Indexed: 02/07/2023] Open
Abstract
KLHL and the related KBTBD genes encode components of the Cullin-E3 ubiquitin ligase complex and typically target tissue-specific proteins for degradation, thereby affecting differentiation, homeostasis, metabolism, cell signaling, and the oxidative stress response. Despite their importance in cell function and disease (especially, KLHL40, KLHL41, KBTBD13, KEAP1, and ENC1), previous studies of epigenetic factors that affect transcription were predominantly limited to promoter DNA methylation. Using diverse tissue and cell culture whole-genome profiles, we examined 17 KLHL or KBTBD genes preferentially expressed in skeletal muscle or brain to identify tissue-specific enhancer and promoter chromatin, open chromatin (DNaseI hypersensitivity), and DNA hypomethylation. Sixteen of the 17 genes displayed muscle- or brain-specific enhancer chromatin in their gene bodies, and most exhibited specific intergenic enhancer chromatin as well. Seven genes were embedded in super-enhancers (particularly strong, tissue-specific clusters of enhancers). The enhancer chromatin regions typically displayed foci of DNA hypomethylation at peaks of open chromatin. In addition, we found evidence for an intragenic enhancer in one gene upregulating expression of its neighboring gene, specifically for KLHL40/HHATL and KLHL38/FBXO32 gene pairs. Many KLHL/KBTBD genes had tissue-specific promoter chromatin at their 5' ends, but surprisingly, two (KBTBD11 and KLHL31) had constitutively unmethylated promoter chromatin in their 3' exons that overlaps a retrotransposed KLHL gene. Our findings demonstrate the importance of expanding epigenetic analyses beyond the 5' ends of genes in studies of normal and abnormal gene regulation.
Collapse
Affiliation(s)
- Kenneth C. Ehrlich
- Center for Biomedical Informatics and Genomics, Tulane University Health Sciences Center, New Orleans, LA 70112, USA;
| | - Carl Baribault
- Center for Research and Scientific Computing (CRSC), Tulane University Information Technology, Tulane University, New Orleans, LA 70112, USA;
| | - Melanie Ehrlich
- Center for Biomedical Informatics and Genomics, Tulane Cancer Center, Hayward Genetics Program, Tulane University Health Sciences Center, New Orleans, LA 70112, USA
| |
Collapse
|
15
|
Transcriptional activity and strain-specific history of mouse pseudogenes. Nat Commun 2020; 11:3695. [PMID: 32728065 PMCID: PMC7392758 DOI: 10.1038/s41467-020-17157-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 06/08/2020] [Indexed: 01/07/2023] Open
Abstract
Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.
Collapse
|
16
|
Alves LQ, Ruivo R, Fonseca MM, Lopes-Marques M, Ribeiro P, Castro L. PseudoChecker: an integrated online platform for gene inactivation inference. Nucleic Acids Res 2020; 48:W321-W331. [PMID: 32449938 PMCID: PMC7319564 DOI: 10.1093/nar/gkaa408] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/22/2020] [Accepted: 05/06/2020] [Indexed: 01/21/2023] Open
Abstract
The rapid expansion of high-quality genome assemblies, exemplified by ongoing initiatives such as the Genome-10K and i5k, demands novel automated methods to approach comparative genomics. Of these, the study of inactivating mutations in the coding region of genes, or pseudogenization, as a source of evolutionary novelty is mostly overlooked. Thus, to address such evolutionary/genomic events, a systematic, accurate and computationally automated approach is required. Here, we present PseudoChecker, the first integrated online platform for gene inactivation inference. Unlike the few existing methods, our comparative genomics-based approach displays full automation, a built-in graphical user interface and a novel index, PseudoIndex, for an empirical evaluation of the gene coding status. As a multi-platform online service, PseudoChecker simplifies access and usability, allowing a fast identification of disruptive mutations. An analysis of 30 genes previously reported to be eroded in mammals, and 30 viable genes from the same lineages, demonstrated that PseudoChecker was able to correctly infer 97% of loss events and 95% of functional genes, confirming its reliability. PseudoChecker is freely available, without login required, at http://pseudochecker.ciimar.up.pt.
Collapse
Affiliation(s)
- Luís Q Alves
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Raquel Ruivo
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Miguel M Fonseca
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Mónica Lopes-Marques
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Pedro Ribeiro
- CRACS & INESC-TEC Department of Computer Science, FCUP, Porto, 4169-007, Portugal
| | - L Filipe C Castro
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
- Department of Biology, FCUP, Porto, 4169-007, Portugal
| |
Collapse
|
17
|
Complex Analysis of Retroposed Genes' Contribution to Human Genome, Proteome and Transcriptome. Genes (Basel) 2020; 11:genes11050542. [PMID: 32408516 PMCID: PMC7290577 DOI: 10.3390/genes11050542] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 05/06/2020] [Accepted: 05/08/2020] [Indexed: 02/07/2023] Open
Abstract
Gene duplication is a major driver of organismal evolution. One of the main mechanisms of gene duplications is retroposition, a process in which mRNA is first transcribed into DNA and then reintegrated into the genome. Most gene retrocopies are depleted of the regulatory regions. Nevertheless, examples of functional retrogenes are rapidly increasing. These functions come from the gain of new spatio-temporal expression patterns, imposed by the content of the genomic sequence surrounding inserted cDNA and/or by selectively advantageous mutations, which may lead to the switch from protein coding to regulatory RNA. As recent studies have shown, these genes may lead to new protein domain formation through fusion with other genes, new regulatory RNAs or other regulatory elements. We utilized existing data from high-throughput technologies to create a complex description of retrogenes functionality. Our analysis led to the identification of human retroposed genes that substantially contributed to transcriptome and proteome. These retrocopies demonstrated the potential to encode proteins or short peptides, act as cis- and trans- Natural Antisense Transcripts (NATs), regulate their progenitors’ expression by competing for the same microRNAs, and provide a sequence to lncRNA and novel exons to existing protein-coding genes. Our study also revealed that retrocopies, similarly to retrotransposons, may act as recombination hot spots. To our best knowledge this is the first complex analysis of these functions of retrocopies.
Collapse
|
18
|
Ueberham U, Arendt T. Genomic Indexing by Somatic Gene Recombination of mRNA/ncRNA - Does It Play a Role in Genomic Mosaicism, Memory Formation, and Alzheimer's Disease? Front Genet 2020; 11:370. [PMID: 32411177 PMCID: PMC7200996 DOI: 10.3389/fgene.2020.00370] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 03/25/2020] [Indexed: 12/26/2022] Open
Abstract
Recent evidence indicates that genomic individuality of neurons, characterized by DNA-content variation, is a common if not universal phenomenon in the human brain that occurs naturally but can also show aberrancies that have been linked to the pathomechanism of Alzheimer’s disease and related neurodegenerative disorders. Etiologically, this genomic mosaic has been suggested to arise from defects of cell cycle regulation that may occur either during brain development or in the mature brain after terminal differentiation of neurons. Here, we aim to draw attention towards another mechanism that can give rise to genomic individuality of neurons, with far-reaching consequences. This mechanism has its origin in the transcriptome rather than in replication defects of the genome, i.e., somatic gene recombination of RNA. We continue to develop the concept that somatic gene recombination of RNA provides a physiological process that, through integration of intronless mRNA/ncRNA into the genome, allows a particular functional state at the level of the individual neuron to be indexed. By insertion of defined RNAs in a somatic recombination process, the presence of specific mRNA transcripts within a definite temporal context can be “frozen” and can serve as an index that can be recalled at any later point in time. This allows information related to a specific neuronal state of differentiation and/or activity relevant to a memory trace to be fixed. We suggest that this process is used throughout the lifetime of each neuron and might have both advantageous and deleterious consequences.
Collapse
Affiliation(s)
- Uwe Ueberham
- Paul Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany
| | - Thomas Arendt
- Paul Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany
| |
Collapse
|
19
|
Johnson TS, Li S, Franz E, Huang Z, Dan Li S, Campbell MJ, Huang K, Zhang Y. PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers. Gigascience 2019; 8:5480571. [PMID: 31029062 PMCID: PMC6486473 DOI: 10.1093/gigascience/giz046] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 12/13/2018] [Accepted: 03/29/2019] [Indexed: 12/14/2022] Open
Abstract
Background Long thought “relics” of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene–parent gene relationships without leveraging other homologous genes/pseudogenes. Results We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four “flavors” of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a “one stop shop” for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. Conclusions Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike.
Collapse
Affiliation(s)
- Travis S Johnson
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Sihong Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA
| | - Eric Franz
- Ohio Supercomputer Center, 1224 Kinnear Road, Columbus, OH 43212, USA
| | - Zhi Huang
- School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Avenue, West Lafayette, IN 47907, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Shuyu Dan Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Moray J Campbell
- Division of Pharmaceutics and Pharmaceutical Chemistry, College of Pharmacy, The Ohio State University, 500 West 12 th Avenue, Columbus, OH 43210, USA
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN 46202, USA.,Regenstrief Institute, Indiana University, 1101 West 10 th Street, Indianapolis, IN 46262, USA
| | - Yan Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1800 Cannon Drive, Columbus, OH 43210, USA.,The Ohio State University Comprehensive Cancer Center (OSUCCC - James), 460 West 10 th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
20
|
Overcoming challenges and dogmas to understand the functions of pseudogenes. Nat Rev Genet 2019; 21:191-201. [DOI: 10.1038/s41576-019-0196-1] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2019] [Indexed: 01/08/2023]
|
21
|
Frith MC, Khan S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 2019; 46:1661-1673. [PMID: 29272440 PMCID: PMC5829575 DOI: 10.1093/nar/gkx1266] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/07/2017] [Indexed: 01/29/2023] Open
Abstract
Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex 'local' mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| | - Sofia Khan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| |
Collapse
|
22
|
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019; 47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 1848] [Impact Index Per Article: 369.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne-Maud Ferreira
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
23
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
24
|
Schrader L, Schmitz J. The impact of transposable elements in adaptive evolution. Mol Ecol 2018; 28:1537-1549. [PMID: 30003608 DOI: 10.1111/mec.14794] [Citation(s) in RCA: 151] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 07/06/2018] [Indexed: 12/16/2022]
Abstract
The growing knowledge about the influence of transposable elements (TEs) on (a) long-term genome and transcriptome evolution; (b) genomic, transcriptomic and epigenetic variation within populations; and (c) patterns of somatic genetic differences in individuals continues to spur the interest of evolutionary biologists in the role of TEs in adaptive evolution. As TEs can trigger a broad range of molecular variation in a population with potentially severe fitness and phenotypic consequences for individuals, different mechanisms evolved to keep TE activity in check, allowing for a dynamic interplay between the host, its TEs and the environment in evolution. Here, we review evidence for adaptive phenotypic changes associated with TEs and the basic molecular mechanisms by which the underlying genetic changes arise: (a) domestication, (b) exaptation, (c) host gene regulation, (d) TE-mediated formation of intronless gene copies-so-called retrogenes and (e) overall increased genome plasticity. Furthermore, we review and discuss how the stress-dependent incapacitation of defence mechanisms against the activity of TEs might facilitate adaptive responses to environmental challenges and how such mechanisms might be particularly relevant in species frequently facing novel environments, such as invasive, pathogenic or parasitic species.
Collapse
Affiliation(s)
- Lukas Schrader
- Institute for Evolution and Biodiversity (IEB), University of Münster, Münster, Germany
| | - Jürgen Schmitz
- Institute of Experimental Pathology, University of Münster, Münster, Germany
| |
Collapse
|
25
|
Willett CS, Wilson EM. Evolution of Melanoma Antigen-A11 (MAGEA11) During Primate Phylogeny. J Mol Evol 2018; 86:240-253. [PMID: 29574604 DOI: 10.1007/s00239-018-9838-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 03/21/2018] [Indexed: 12/19/2022]
Abstract
Melanoma antigen-A11 (MAGE-A11) is an X-linked and primate-specific steroid hormone receptor transcriptional coregulator and proto-oncogenic protein whose increased expression promotes the growth of prostate cancer. The MAGEA11 gene is expressed at low levels in normal human testis, ovary, and endometrium, and at highest levels in castration-resistant prostate cancer. Annotated genome predictions throughout the surviving primate lineage show that MAGEA11 acquired three 5' coding exons unique within the MAGEA subfamily during the evolution of New World monkeys (NWM), Old World monkeys (OWM), and apes. MAGE-A11 in all primates has a conserved FXXIF coactivator-binding motif that suggests interaction with p160 coactivators contributed to its early evolution as a transcriptional coregulator. An ancestral form of MAGE-A11 in the more distantly related lemur has significant amino acid sequence identity with human MAGE-A11, but lacks coregulator activity based on the absence of the three 5' coding exons that include a nuclear localization signal (NLS). NWM MAGE-A11 has greater amino acid sequence identity than lemur to human MAGE-A11, but inframe premature stop codons suggest that MAGEA11 is a pseudogene in NWM. MAGE-A11 in OWM and apes has nearly identical 5' coding exon amino acid sequence and conserved interaction sites for p300 acetyltransferase and cyclin A. We conclude that the evolution of MAGEA11 within the lineage leading to OWM and apes resulted in steroid hormone receptor transcriptional coregulator activity through the acquisition of three 5' coding exons that include a NLS sequence and nonsynonymous substitutions required to interact with cell cycle regulatory proteins and transcription factors.
Collapse
Affiliation(s)
- Christopher S Willett
- Department of Biology, University of North Carolina, Chapel Hill, NC, 27599-7500, USA
| | - Elizabeth M Wilson
- Laboratories for Reproductive Biology, Department of Pediatrics, Lineberger Comprehensive Cancer Center, and Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, 27599-7500, USA.
| |
Collapse
|
26
|
Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017; 6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]
Abstract
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
27
|
Wilkes MC, Repellin CE, Sakamoto KM. Beyond mRNA: The role of non-coding RNAs in normal and aberrant hematopoiesis. Mol Genet Metab 2017; 122:28-38. [PMID: 28757239 PMCID: PMC5722683 DOI: 10.1016/j.ymgme.2017.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 07/21/2017] [Accepted: 07/21/2017] [Indexed: 02/02/2023]
Abstract
The role of non-coding Ribonucleic Acids (ncRNAs) in biology is currently an area of intense focus. Hematopoiesis requires rapidly changing regulatory molecules to guide appropriate differentiation and ncRNA are well suited for this. It is not surprising that virtually all aspects of hematopoiesis have roles for ncRNAs assigned to them and doubtlessly much more await characterization. Stem cell maintenance, lymphoid, myeloid and erythroid differentiation are all regulated by various ncRNAs, including microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and various transposable elements within the genome. As our understanding of the many and complex ncRNA roles continues to grow, new discoveries are challenging the existing classification schemes. In this review we briefly overview the broad categories of ncRNAs and discuss a few examples regulating normal and aberrant hematopoiesis.
Collapse
Affiliation(s)
- Mark C Wilkes
- Division of Hematology/Oncology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Kathleen M Sakamoto
- Division of Hematology/Oncology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
28
|
Casola C, Betrán E. The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses? Genome Biol Evol 2017; 9:1351-1373. [PMID: 28605529 PMCID: PMC5470649 DOI: 10.1093/gbe/evx081] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2017] [Indexed: 02/07/2023] Open
Abstract
Gene duplication is a major driver of organismal evolution. Gene retroposition is a mechanism of gene duplication whereby a gene's transcript is used as a template to generate retroposed gene copies, or retrocopies. Intriguingly, the formation of retrocopies depends upon the enzymatic machinery encoded by retrotransposable elements, genomic parasites occurring in the majority of eukaryotes. Most retrocopies are depleted of the regulatory regions found upstream of their parental genes; therefore, they were initially considered transcriptionally incompetent gene copies, or retropseudogenes. However, examples of functional retrocopies, or retrogenes, have accumulated since the 1980s. Here, we review what we have learned about retrocopies in animals, plants and other eukaryotic organisms, with a particular emphasis on comparative and population genomic analyses complemented with transcriptomic datasets. In addition, these data have provided information about the dynamics of the different "life cycle" stages of retrocopies (i.e., polymorphic retrocopy number variants, fixed retropseudogenes and retrogenes) and have provided key insights into the retroduplication mechanisms, the patterns and evolutionary forces at work during the fixation process and the biological function of retrogenes. Functional genomic and transcriptomic data have also revealed that many retropseudogenes are transcriptionally active and a biological role has been experimentally determined for many. Finally, we have learned that not only non-long terminal repeat retroelements but also long terminal repeat retroelements play a role in the emergence of retrocopies across eukaryotes. This body of work has shown that mRNA-mediated duplication represents a widespread phenomenon that produces an array of new genes that contribute to organismal diversity and adaptation.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University, TX
| | - Esther Betrán
- Department of Biology, University of Texas at Arlington, Arlington, TX
| |
Collapse
|
29
|
Protein-Coding Genes' Retrocopies and Their Functions. Viruses 2017; 9:v9040080. [PMID: 28406439 PMCID: PMC5408686 DOI: 10.3390/v9040080] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 04/07/2017] [Accepted: 04/11/2017] [Indexed: 12/11/2022] Open
Abstract
Transposable elements, often considered to be not important for survival, significantly contribute to the evolution of transcriptomes, promoters, and proteomes. Reverse transcriptase, encoded by some transposable elements, can be used in trans to produce a DNA copy of any RNA molecule in the cell. The retrotransposition of protein-coding genes requires the presence of reverse transcriptase, which could be delivered by either non-long terminal repeat (non-LTR) or LTR transposons. The majority of these copies are in a state of “relaxed” selection and remain “dormant” because they are lacking regulatory regions; however, many become functional. In the course of evolution, they may undergo subfunctionalization, neofunctionalization, or replace their progenitors. Functional retrocopies (retrogenes) can encode proteins, novel or similar to those encoded by their progenitors, can be used as alternative exons or create chimeric transcripts, and can also be involved in transcriptional interference and participate in the epigenetic regulation of parental gene expression. They can also act in trans as natural antisense transcripts, microRNA (miRNA) sponges, or a source of various small RNAs. Moreover, many retrocopies of protein-coding genes are linked to human diseases, especially various types of cancer.
Collapse
|
30
|
Wang Y. PlantRGDB: A Database of Plant Retrocopied Genes. PLANT & CELL PHYSIOLOGY 2017; 58:e2. [PMID: 28111365 DOI: 10.1093/pcp/pcw210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 11/17/2016] [Indexed: 05/26/2023]
Abstract
RNA-based gene duplication, known as retrocopy, plays important roles in gene origination and genome evolution. The genomes of many plants have been sequenced, offering an opportunity to annotate and mine the retrocopies in plant genomes. However, comprehensive and unified annotation of retrocopies in these plants is still lacking. In this study I constructed the PlantRGDB (Plant Retrocopied Gene DataBase), the first database of plant retrocopies, to provide a putatively complete centralized list of retrocopies in plant genomes. The database is freely accessible at http://probes.pw.usda.gov/plantrgdb or http://aegilops.wheat.ucdavis.edu/plantrgdb. It currently integrates 49 plant species and 38,997 retrocopies along with characterization information. PlantRGDB provides a user-friendly web interface for searching, browsing and downloading the retrocopies in the database. PlantRGDB also offers graphical viewer-integrated sequence information for displaying the structure of each retrocopy. The attributes of the retrocopies of each species are reported using a browse function. In addition, useful tools, such as an advanced search and BLAST, are available to search the database more conveniently. In conclusion, the database will provide a web platform for obtaining valuable insight into the generation of retrocopies and will supplement research on gene duplication and genome evolution in plants.
Collapse
Affiliation(s)
- Yi Wang
- USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics Research Unit, Albany, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
- USDA-ARS, Plant Gene Expression Center, Albany, CA, USA
| |
Collapse
|
31
|
Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You FM. Pseudogenes and Their Genome-Wide Prediction in Plants. Int J Mol Sci 2016; 17:E1991. [PMID: 27916797 PMCID: PMC5187791 DOI: 10.3390/ijms17121991] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 11/17/2022] Open
Abstract
Pseudogenes are paralogs generated from ancestral functional genes (parents) during genome evolution, which contain critical defects in their sequences, such as lacking a promoter, having a premature stop codon or frameshift mutations. Generally, pseudogenes are functionless, but recent evidence demonstrates that some of them have potential roles in regulation. The majority of pseudogenes are generated from functional progenitor genes either by gene duplication (duplicated pseudogenes) or retro-transposition (processed pseudogenes). Pseudogenes are primarily identified by comparison to their parent genes. Bioinformatics tools for pseudogene prediction have been developed, among which PseudoPipe, PSF and Shiu's pipeline are publicly available. We compared these three tools using the well-annotated Arabidopsis thaliana genome and its known 924 pseudogenes as a test data set. PseudoPipe and Shiu's pipeline identified ~80% of A. thaliana pseudogenes, of which 94% were shared, while PSF failed to generate adequate results. A need for improvement of the bioinformatics tools for pseudogene prediction accuracy in plant genomes was thus identified, with the ultimate goal of improving the quality of genome annotation in plants.
Collapse
Affiliation(s)
- Jin Xiao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Manoj Kumar Sekhwal
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
- Department of Soil Science, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada.
| | - Pingchuan Li
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| | - Raja Ragupathy
- Department of Plant Science, University of Saskatchewan, Saskatoon, SK S7N 5A2, Canada.
| | - Sylvie Cloutier
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada.
| | - Xiue Wang
- Department of Agronomy, Nanjing Agricultural University, Nanjing 210095, China.
| | - Frank M You
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, Morden, MB R6M 1Y5, Canada.
| |
Collapse
|
32
|
Mori S, Hayashi M, Inagaki S, Oshima T, Tateishi K, Fujii H, Suzuki S. Identification of Multiple Forms of RNA Transcripts Associated with Human-Specific Retrotransposed Gene Copies. Genome Biol Evol 2016; 8:2288-96. [PMID: 27389689 PMCID: PMC5010893 DOI: 10.1093/gbe/evw156] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The human genome contains thousands of retrocopies, mostly as processed pseudogenes, which were recently shown to be prevalently transcribed. In particular, those specifically acquired in the human lineage are able to modulate gene expression in a manner that contributed to the evolution of human-specific traits. Therefore, knowledge of the human-specific retrocopies that are transcribed or their full-length transcript structure contributes to better understand human genome evolution. In this study, we identified 16 human-specific retrocopies that harbor 5' CpG islands by in silico analysis and showed that 12 were transcribed in normal tissues and cancer cell lines with a variety of expression patterns, including cancer-specific expression. Determination of the structure of the transcripts associated with the retrocopies revealed that none were transcribed from their 5' CpG islands, but rather, from inside the 3' UTR and the nearby 5' flanking region of the retrocopies as well as the promoter of neighboring genes. The multiple forms of the transcripts, such as chimeric and individual transcripts in both the sense and antisense orientation, might have introduced novel post-transcriptional regulation into the genome during human evolution. These results shed light on the potential role of human-specific retrocopies in the evolution of gene regulation and genomic disorders.
Collapse
Affiliation(s)
- Saori Mori
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan
| | - Masaaki Hayashi
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan
| | - Shun Inagaki
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan
| | - Takuji Oshima
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan
| | - Ken Tateishi
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan
| | - Hiroshi Fujii
- Department of Interdisciplinary Genome Sciences and Cell Metabolism, Institute for Biomedical Sciences, Interdisciplinary Cluster for Cutting Edge Research, Shinshu University, Kami-Ina, Nagano, Japan
| | - Shunsuke Suzuki
- Epigenomics Division, Frontier Agriscience and Technology Center, Faculty of Agriculture, Shinshu University, Kami-Ina, Nagano, Japan Department of Interdisciplinary Genome Sciences and Cell Metabolism, Institute for Biomedical Sciences, Interdisciplinary Cluster for Cutting Edge Research, Shinshu University, Kami-Ina, Nagano, Japan
| |
Collapse
|
33
|
Annibalini G, Bielli P, De Santi M, Agostini D, Guescini M, Sisti D, Contarelli S, Brandi G, Villarini A, Stocchi V, Sette C, Barbieri E. MIR retroposon exonization promotes evolutionary variability and generates species-specific expression of IGF-1 splice variants. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1859:757-68. [DOI: 10.1016/j.bbagrm.2016.03.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 03/07/2016] [Accepted: 03/23/2016] [Indexed: 12/18/2022]
|
34
|
Vianello A, Passamonti S. Biochemistry and physiology within the framework of the extended synthesis of evolutionary biology. Biol Direct 2016; 11:7. [PMID: 26861860 PMCID: PMC4748562 DOI: 10.1186/s13062-016-0109-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/01/2016] [Indexed: 11/10/2022] Open
Abstract
Functional biologists, like Claude Bernard, ask "How?", meaning that they investigate the mechanisms underlying the emergence of biological functions (proximal causes), while evolutionary biologists, like Charles Darwin, asks "Why?", meaning that they search the causes of adaptation, survival and evolution (remote causes). Are these divergent views on what is life? The epistemological role of functional biology (molecular biology, but also biochemistry, physiology, cell biology and so forth) appears essential, for its capacity to identify several mechanisms of natural selection of new characters, individuals and populations. Nevertheless, several issues remain unsolved, such as orphan metabolic activities, i.e., adaptive functions still missing the identification of the underlying genes and proteins, and orphan genes, i.e., genes that bear no signature of evolutionary history, yet provide an organism with improved adaptation to environmental changes. In the framework of the Extended Synthesis, we suggest that the adaptive roles of any known function/structure are reappraised in terms of their capacity to warrant constancy of the internal environment (homeostasis), a concept that encompasses both proximal and remote causes.
Collapse
Affiliation(s)
- Angelo Vianello
- Dipartimento di Scienze Agrarie e Ambientali, Università degli Studi di Udine, 33100, Udine, Italy.
| | - Sabina Passamonti
- Dipartimento di Scienze della Vita, Università degli Studi di Trieste, 34100, Trieste, Italy.
| |
Collapse
|
35
|
Du K, He S. Evolutionary fate and implications of retrocopies in the African coelacanth genome. BMC Genomics 2015; 16:915. [PMID: 26555943 PMCID: PMC4641402 DOI: 10.1186/s12864-015-2178-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/31/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The coelacanth is known as a "living fossil" because of its morphological resemblance to its fossil ancestors. Thus, it serves as a useful model that provides insight into the fish that first walked on land. Retrocopies are a type of novel genetic element that are likely to contribute to genome or phenotype innovations. Thus, investigating retrocopies in the coelacanth genome can determine the role of retrocopies in coelacanth genome innovations and perhaps even water-to-land adaptations. RESULTS We determined the dS values, dN/dS ratios, expression patterns, and enrichment of functional categories for 472 retrocopies in the African coelacanth genome. Of the retrocopies, 85-355 were shown to be potentially functional (i.e., retrogenes). The distribution of retrocopies based on their dS values revealed a burst pattern of young retrocopies in the genome. The retrocopy birth pattern was shown to be more similar to that in tetrapods than ray-finned fish, which indicates a genomic transformation that accompanied vertebrate evolution from water to land. Among these retrocopies, retrogenes were more prevalent in old than young retrocopies, which indicates that most retrocopies may have been eliminated during evolution, even though some retrocopies survived, attained biological function as retrogenes, and became old. Transcriptome data revealed that many retrocopies showed a biased expression pattern in the testis, although the expression was not specifically associated with a particular retrocopy age range. We identified 225 Ensembl genes that overlapped with the coelacanth genome retrocopies. GO enrichment analysis revealed different overrepresented GO (gene ontology) terms between these "retrocopy-overlapped genes" and the retrocopy parent genes, which indicates potential genomic functional organization produced by retrotranspositions. Among the 225 retrocopy-overlapped genes, we also identified 46 that were coelacanth-specific, which could represent a potential molecular basis for coelacanth evolution. CONCLUSIONS Our study identified 472 retrocopies in the coelacanth genome. Sequence analysis of these retrocopies and their parent genes, transcriptome data, and GO annotation information revealed novel insight about the potential role of genomic retrocopies in coelacanth evolution and vertebrate adaptations during the evolutionary transition from water to land.
Collapse
Affiliation(s)
- Kang Du
- Key Laboratory of Aquatic Biodiversity and Conservation of the Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Shunping He
- Key Laboratory of Aquatic Biodiversity and Conservation of the Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.
| |
Collapse
|
36
|
Abstract
Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes.
Collapse
Affiliation(s)
- Fábio C P Navarro
- Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo, Brazil Dep. de Bioquímica, Universidade de São Paulo, Brazil
| | - Pedro A F Galante
- Centro de Oncologia Molecular, Hospital Sírio-Libanês, São Paulo, Brazil
| |
Collapse
|
37
|
Raabe CA, Brosius J. Does every transcript originate from a gene? Ann N Y Acad Sci 2015; 1341:136-48. [PMID: 25847549 DOI: 10.1111/nyas.12741] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/05/2015] [Accepted: 02/11/2015] [Indexed: 12/20/2022]
Abstract
Outdated gene definitions favored regions corresponding to mature messenger RNAs, in particular, the open reading frame. In eukaryotes, the intergenic space was widely regarded nonfunctional and devoid of RNA transcription. Original concepts were based on the assumption that RNA expression was restricted to known protein-coding genes and a few so-called structural RNA genes, such as ribosomal RNAs or transfer RNAs. With the discovery of introns and, more recently, sensitive techniques for monitoring genome-wide transcription, this view had to be substantially modified. Tiling microarrays and RNA deep sequencing revealed myriads of transcripts, which cover almost entire genomes. The tremendous complexity of non-protein-coding RNA transcription has to be integrated into novel gene definitions. Despite an ever-growing list of functional RNAs, questions concerning the mass of identified transcripts are under dispute. Here, we examined genome-wide transcription from various angles, including evolutionary considerations, and suggest, in analogy to novel alternative splice variants that do not persist, that the vast majority of transcripts represent raw material for potential, albeit rare, exaptation events.
Collapse
Affiliation(s)
- Carsten A Raabe
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | | |
Collapse
|
38
|
Pouwels SD, Heijink IH, Brouwer U, Gras R, den Boef LE, Boezen HM, Korstanje R, van Oosterhout AJM, Nawijn MC. Genetic variation associates with susceptibility for cigarette smoke-induced neutrophilia in mice. Am J Physiol Lung Cell Mol Physiol 2015; 308:L693-709. [PMID: 25637605 DOI: 10.1152/ajplung.00118.2014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 01/16/2015] [Indexed: 11/22/2022] Open
Abstract
Neutrophilic airway inflammation is one of the major hallmarks of chronic obstructive pulmonary disease and is also seen in steroid resistant asthma. Neutrophilic airway inflammation can be induced by different stimuli including cigarette smoke (CS). Short-term exposure to CS induces neutrophilic airway inflammation in both mice and humans. Since not all individuals develop extensive neutrophilic airway inflammation upon smoking, we hypothesized that this CS-induced innate inflammation has a genetic component. This hypothesis was addressed by exposing 30 different inbred mouse strains to CS or control air for 5 consecutive days, followed by analysis of neutrophilic lung inflammation. By genomewide haplotype association mapping, we identified four susceptibility genes with a significant association to lung tissue levels of the neutrophil marker myeloperoxidase under basal conditions and an additional five genes specifically associated with CS-induced tissue MPO levels. Analysis of the expression levels of the susceptibility genes by quantitative RT-PCR revealed that three of the four genes associated with CS-induced tissue MPO levels had CS-induced changes in gene expression levels that correlate with CS-induced airway inflammation. Most notably, CS exposure induces an increased expression of the coiled-coil domain containing gene, Ccdc93, in mouse strains susceptible for CS-induced airway inflammation whereas Ccdc93 expression was decreased upon CS exposure in nonsusceptible mouse strains. In conclusion, this study shows that CS-induced neutrophilic airway inflammation has a genetic component and that several genes contribute to the susceptibility for this response.
Collapse
Affiliation(s)
- Simon D Pouwels
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Irene H Heijink
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands; University of Groningen, University Medical Center Groningen, Department of Pulmonology, Groningen, The Netherlands
| | - Uilke Brouwer
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Renee Gras
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Lisette E den Boef
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - H Marike Boezen
- GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands; University of Groningen, University Medical Center Groningen, Department of Epidemiology, Groningen, The Netherlands; and
| | | | - Antoon J M van Oosterhout
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Martijn C Nawijn
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, Experimental Pulmonology and Inflammation Research, Groningen, The Netherlands; GRIAC Research Institute, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands;
| |
Collapse
|
39
|
Radenbaugh AJ, Ma S, Ewing A, Stuart JM, Collisson EA, Zhu J, Haussler D. RADIA: RNA and DNA integrated analysis for somatic mutation detection. PLoS One 2014; 9:e111516. [PMID: 25405470 PMCID: PMC4236012 DOI: 10.1371/journal.pone.0111516] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Accepted: 09/30/2014] [Indexed: 01/30/2023] Open
Abstract
The detection of somatic single nucleotide variants is a crucial component to the characterization of the cancer genome. Mutation calling algorithms thus far have focused on comparing the normal and tumor genomes from the same individual. In recent years, it has become routine for projects like The Cancer Genome Atlas (TCGA) to also sequence the tumor RNA. Here we present RADIA (RNA and DNA Integrated Analysis), a novel computational method combining the patient-matched normal and tumor DNA with the tumor RNA to detect somatic mutations. The inclusion of the RNA increases the power to detect somatic mutations, especially at low DNA allelic frequencies. By integrating an individual's DNA and RNA, we are able to detect mutations that would otherwise be missed by traditional algorithms that examine only the DNA. We demonstrate high sensitivity (84%) and very high precision (98% and 99%) for RADIA in patient data from endometrial carcinoma and lung adenocarcinoma from TCGA. Mutations with both high DNA and RNA read support have the highest validation rate of over 99%. We also introduce a simulation package that spikes in artificial mutations to patient data, rather than simulating sequencing data from a reference genome. We evaluate sensitivity on the simulation data and demonstrate our ability to rescue back mutations at low DNA allelic frequencies by including the RNA. Finally, we highlight mutations in important cancer genes that were rescued due to the incorporation of the RNA.
Collapse
Affiliation(s)
- Amie J. Radenbaugh
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Singer Ma
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Adam Ewing
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Joshua M. Stuart
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Eric A. Collisson
- Division of Hematology/Oncology, University of California San Francisco, San Francisco, California, United States of America
| | - Jingchun Zhu
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - David Haussler
- University of California Santa Cruz Genomics Institute, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| |
Collapse
|
40
|
Signaling through retinoic acid receptors in cardiac development: Doing the right things at the right times. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2014; 1849:94-111. [PMID: 25134739 DOI: 10.1016/j.bbagrm.2014.08.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Revised: 06/19/2014] [Accepted: 08/04/2014] [Indexed: 11/23/2022]
Abstract
Retinoic acid (RA) is a terpenoid that is synthesized from vitamin A/retinol (ROL) and binds to the nuclear receptors retinoic acid receptor (RAR)/retinoid X receptor (RXR) to control multiple developmental processes in vertebrates. The available clinical and experimental data provide uncontested evidence for the pleiotropic roles of RA signaling in development of multiple embryonic structures and organs such eyes, central nervous system, gonads, lungs and heart. The development of any of these above-mentioned embryonic organ systems can be effectively utilized to showcase the many strategies utilized by RA signaling. However, it is very likely that the strategies employed to transfer RA signals during cardiac development comprise the majority of the relevant and sophisticated ways through which retinoid signals can be conveyed in a complex biological system. Here, we provide the reader with arguments indicating that RA signaling is exquisitely regulated according to specific phases of cardiac development and that RA signaling itself is one of the major regulators of the timing of cardiac morphogenesis and differentiation. We will focus on the role of signaling by RA receptors (RARs) in early phases of heart development. This article is part of a Special Issue entitled: Nuclear receptors in animal development.
Collapse
|
41
|
Brosius J. The persistent contributions of RNA to eukaryotic gen(om)e architecture and cellular function. Cold Spring Harb Perspect Biol 2014; 6:a016089. [PMID: 25081515 DOI: 10.1101/cshperspect.a016089] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Currently, the best scenario for earliest forms of life is based on RNA molecules as they have the proven ability to catalyze enzymatic reactions and harbor genetic information. Evolutionary principles valid today become apparent in such models already. Furthermore, many features of eukaryotic genome architecture might have their origins in an RNA or RNA/protein (RNP) world, including the onset of a further transition, when DNA replaced RNA as the genetic bookkeeper of the cell. Chromosome maintenance, splicing, and regulatory function via RNA may be deeply rooted in the RNA/RNP worlds. Mostly in eukaryotes, conversion from RNA to DNA is still ongoing, which greatly impacts the plasticity of extant genomes. Raw material for novel genes encoding protein or RNA, or parts of genes including regulatory elements that selection can act on, continues to enter the evolutionary lottery.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| |
Collapse
|
42
|
Abstract
Discoveries in cytogenetics, molecular biology, and genomics have revealed that genome change is an active cell-mediated physiological process. This is distinctly at variance with the pre-DNA assumption that genetic changes arise accidentally and sporadically. The discovery that DNA changes arise as the result of regulated cell biochemistry means that the genome is best modelled as a read-write (RW) data storage system rather than a read-only memory (ROM). The evidence behind this change in thinking and a consideration of some of its implications are the subjects of this article. Specific points include the following: cells protect themselves from accidental genome change with proofreading and DNA damage repair systems; localized point mutations result from the action of specialized trans-lesion mutator DNA polymerases; cells can join broken chromosomes and generate genome rearrangements by non-homologous end-joining (NHEJ) processes in specialized subnuclear repair centres; cells have a broad variety of natural genetic engineering (NGE) functions for transporting, diversifying and reorganizing DNA sequences in ways that generate many classes of genomic novelties; natural genetic engineering functions are regulated and subject to activation by a range of challenging life history events; cells can target the action of natural genetic engineering functions to particular genome locations by a range of well-established molecular interactions, including protein binding with regulatory factors and linkage to transcription; and genome changes in cancer can usefully be considered as consequences of the loss of homeostatic control over natural genetic engineering functions.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago, GCISW123B, 979 E. 57th Street, Chicago, IL 60637, USA
| |
Collapse
|
43
|
Vlaikou AM, Manolakos E, Noutsopoulos D, Markopoulos G, Liehr T, Vetro A, Ziegler M, Weise A, Kreskowski K, Papoulidis I, Thomaidis L, Syrrou M. An Interstitial 4q31.21q31.22 Microdeletion Associated with Developmental Delay: Case Report and Literature Review. Cytogenet Genome Res 2014; 142:227-38. [DOI: 10.1159/000361001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2014] [Indexed: 11/19/2022] Open
|
44
|
Cooke SL, Shlien A, Marshall J, Pipinikas CP, Martincorena I, Tubio JM, Li Y, Menzies A, Mudie L, Ramakrishna M, Yates L, Davies H, Bolli N, Bignell GR, Tarpey PS, Behjati S, Nik-Zainal S, Papaemmanuil E, Teixeira VH, Raine K, O’Meara S, Dodoran MS, Teague JW, Butler AP, Iacobuzio-Donahue C, Santarius T, Grundy RG, Malkin D, Greaves M, Munshi N, Flanagan AM, Bowtell D, Martin S, Larsimont D, Reis-Filho JS, Boussioutas A, Taylor JA, Hayes ND, Janes SM, Futreal PA, Stratton MR, McDermott U, Campbell PJ. Processed pseudogenes acquired somatically during cancer development. Nat Commun 2014; 5:3644. [PMID: 24714652 PMCID: PMC3996531 DOI: 10.1038/ncomms4644] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 03/13/2014] [Indexed: 12/14/2022] Open
Abstract
Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.
Collapse
Affiliation(s)
- Susanna L. Cooke
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Adam Shlien
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - John Marshall
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | - Inigo Martincorena
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Jose M.C. Tubio
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Yilong Li
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Andrew Menzies
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Laura Mudie
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Manasa Ramakrishna
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Lucy Yates
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Helen Davies
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Niccolo Bolli
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- University of Cambridge, Cambridge CB2 0XY, UK
| | - Graham R. Bignell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Patrick S. Tarpey
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Sam Behjati
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- University of Cambridge, Cambridge CB2 0XY, UK
| | - Serena Nik-Zainal
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Elli Papaemmanuil
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Vitor H. Teixeira
- Lungs for Living Research Centre, Rayne Institute, University College London, London WC1E 6JF, UK
| | - Keiran Raine
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Sarah O’Meara
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Maryam S. Dodoran
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Jon W. Teague
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Adam P. Butler
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | | | - Richard G. Grundy
- Children’s Brain Tumour Research Centre, University of Nottingham, Nottingham NG7 2UH, UK
| | - David Malkin
- Department of Pediatrics, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada M5G 1X8
| | - Mel Greaves
- Institute for Cancer Research, Sutton, London SM2 5NG, UK
| | - Nikhil Munshi
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Adrienne M. Flanagan
- Lungs for Living Research Centre, Rayne Institute, University College London, London WC1E 6JF, UK
- Royal National Orthopaedic Hospital, Middlesex HA7 4LP, UK
| | - David Bowtell
- Peter MacCallum Cancer Centre, Melbourne, Victoria 3002, Australia
| | - Sancha Martin
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Denis Larsimont
- Department of Pathology, Jules Bordet Institute, 1000 Brussels, Belgium
| | - Jorge S. Reis-Filho
- Department of Pathology, Memorial-Sloan-Kettering Cancer Center, New York, New York 10065, USA
| | - Alex Boussioutas
- Peter MacCallum Cancer Centre, Melbourne, Victoria 3002, Australia
- Department of Gastroenterology, Royal Melbourne Hospital, University of Melbourne, Parkville, Victoria 3050, Australia
| | - Jack A. Taylor
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27713, USA
| | - Neil D. Hayes
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Sam M. Janes
- Lungs for Living Research Centre, Rayne Institute, University College London, London WC1E 6JF, UK
| | - P. Andrew Futreal
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael R. Stratton
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Ultan McDermott
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Addenbrooke’s NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Peter J. Campbell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- University of Cambridge, Cambridge CB2 0XY, UK
- Addenbrooke’s NHS Foundation Trust, Cambridge CB2 0QQ, UK
| |
Collapse
|
45
|
de Boer M, van Leeuwen K, Geissler J, Weemaes CM, van den Berg TK, Kuijpers TW, Warris A, Roos D. Primary immunodeficiency caused by an exonized retroposed gene copy inserted in the CYBB gene. Hum Mutat 2014; 35:486-96. [PMID: 24478191 DOI: 10.1002/humu.22519] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 01/10/2014] [Indexed: 01/12/2023]
Abstract
Retrotransposon-mediated insertion of a long interspersed nuclear element (LINE)-1 or an Alu element into a human gene is a well-known pathogenic mechanism. We report a novel LINE-1-mediated insertion of a transcript from the TMF1 gene on chromosome 3 into the CYBB gene on the X-chromosome. In a Dutch male patient with chronic granulomatous disease, a 5.8-kb, incomplete and partly exonized TMF1 transcript was identified in intron 1 of CYBB, in opposite orientation to the host gene. The sequence of the insertion showed the hallmarks of a retrotransposition event, with an antisense poly(A) tail, target site duplication, and a consensus LINE-1 endonuclease cleavage site. This insertion induced aberrant CYBB mRNA splicing, with inclusion of an extra 117-bp exon between exons 1 and 2 of CYBB. This extra exon contained a premature stop codon. The retrotransposition took place in an early stage of fetal development in the mother of the patient, because she showed a somatic mosaicism for the mutation that was not present in the DNA of her parents. However, the mutated allele was not expressed in the patient's mother because the insertion was found only in the methylated fraction of her DNA.
Collapse
Affiliation(s)
- Martin de Boer
- Sanquin Research and Landsteiner Laboratory, Academic Medical Centre, University of Amsterdam, Amsterdam, 1066 CX, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Abyzov A, Iskow R, Gokcumen O, Radke DW, Balasubramanian S, Pei B, Habegger L, Lee C, Gerstein M. Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division. Genome Res 2013; 23:2042-52. [PMID: 24026178 PMCID: PMC3847774 DOI: 10.1101/gr.154625.113] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either “retrogenes” coding for functioning proteins, or expressed “processed pseudogenes,” which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify “novel” retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.
Collapse
|
47
|
Zhang Q. The role of mRNA-based duplication in the evolution of the primate genome. FEBS Lett 2013; 587:3500-7. [DOI: 10.1016/j.febslet.2013.08.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Revised: 08/24/2013] [Accepted: 08/30/2013] [Indexed: 12/28/2022]
|
48
|
RNA-Mediated Gene Duplication and Retroposons: Retrogenes, LINEs, SINEs, and Sequence Specificity. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2013; 2013:424726. [PMID: 23984183 PMCID: PMC3747384 DOI: 10.1155/2013/424726] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 07/01/2013] [Indexed: 11/18/2022]
Abstract
A substantial number of “retrogenes” that are derived from the mRNA of various intron-containing genes have been reported. A class of mammalian retroposons, long interspersed element-1 (LINE1, L1), has been shown to be involved in the reverse transcription of retrogenes (or processed pseudogenes) and non-autonomous short interspersed elements (SINEs). The 3′-end sequences of various SINEs originated from a corresponding LINE. As the 3′-untranslated regions of several LINEs are essential for retroposition, these LINEs presumably require “stringent” recognition of the 3′-end sequence of the RNA template. However, the 3′-ends of mammalian L1s do not exhibit any similarity to SINEs, except for the presence of 3′-poly(A) repeats. Since the 3′-poly(A) repeats of L1 and Alu SINE are critical for their retroposition, L1 probably recognizes the poly(A) repeats, thereby mobilizing not only Alu SINE but also cytosolic mRNA. Many flowering plants only harbor L1-clade LINEs and a significant number of SINEs with poly(A) repeats, but no homology to the LINEs. Moreover, processed pseudogenes have also been found in flowering plants. I propose that the ancestral L1-clade LINE in the common ancestor of green plants may have recognized a specific RNA template, with stringent recognition then becoming relaxed during the course of plant evolution.
Collapse
|
49
|
Hahn Y. Evidence for the dissemination of cryptic non-coding RNAs transcribed from intronic and intergenic segments by retroposition. Bioinformatics 2013; 29:1593-9. [PMID: 23652427 DOI: 10.1093/bioinformatics/btt258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Insertion of DNA segments is one mechanism by which genomes evolve. The bulk of genomic segments are now known to be transcribed into long and short non-coding RNAs (ncRNAs), promoter-associated transcripts and enhancer-templated transcripts. These various cryptic ncRNAs are thought to be dispersed in the human and other genomes by retroposition. RESULTS In this study, I report clear evidence for dissemination of cryptic ncRNAs transcribed from intronic and intergenic segments by retroposition. I used highly stringent conditions to find recently retroposed ncRNAs that had a poly(A) tract and were flanked by target site duplication. I identified 73 instances of retroposition in the human, mouse, and rat genomes (12, 36 and 25 instances, respectively). The inserted segments, in some cases, served as a novel exon or promoter for the associated gene, resulting in novel transcript variants. Some disseminated sequences showed sequence conservation across animals, implying a possible regulatory role. My results indicate that retroposition is one of the mechanisms for dispersion of ncRNAs. I propose that these newly inserted segments may play a role in genome evolution by potentially functioning as novel exons, promoters or enhancers. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yoonsoo Hahn
- Department of Life Science, Research Center for Biomolecules and Biosystems, Chung-Ang University, Seoul 156-756, Korea.
| |
Collapse
|
50
|
|