1
|
Frith MC. Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements. Mol Biol Evol 2022; 39:6555113. [PMID: 35348724 PMCID: PMC9004415 DOI: 10.1093/molbev/msac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomes hold a treasure trove of protein fossils: fragments of formerly protein-coding DNA, which mainly come from transposable elements (TEs) or host genes. These fossils reveal ancient evolution of TEs and genomes, and many fossils have been exapted to perform diverse functions important for the host's fitness. However, old and highly-degraded fossils are hard to identify, standard methods (e.g. BLAST) are not optimized for this task, and few Paleozoic protein fossils have been found. Here, a recently optimized method is used to find protein fossils in vertebrate genomes. It finds Paleozoic fossils predating the amphibian/amniote divergence from most major TE categories, including virus-related Polinton and Gypsy elements. It finds 10 fossils in the human genome (8 from TEs and 2 from host genes) that predate the last common ancestor of all jawed vertebrates, probably from the Ordovician period. It also finds types of transposon and retrotransposon not found in human before. These fossils have extreme sequence conservation, indicating exaptation: some have evidence of gene-regulatory function, and they tend to lienearest to developmental genes. Some ancient fossils suggest "genome tectonics", where two fragments of one TE have drifted apart by up to megabases, possibly explaining gene deserts and large introns. This paints a picture of great TE diversity in our aquatic ancestors, with patchy TE inheritance by later vertebrates, producing new genes and regulatory elements on the way. Host-gene fossils too have contributed anciently-conserved DNA segments. This paves the way to further studies of ancient protein fossils.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
2
|
Song B, Buckler ES, Wang H, Wu Y, Rees E, Kellogg EA, Gates DJ, Khaipho-Burch M, Bradbury PJ, Ross-Ibarra J, Hufford MB, Romay MC. Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize. Genome Res 2021; 31:1245-1257. [PMID: 34045362 PMCID: PMC8256870 DOI: 10.1101/gr.266528.120] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 05/21/2021] [Indexed: 01/16/2023]
Abstract
Thousands of species will be sequenced in the next few years; however, understanding how their genomes work, without an unlimited budget, requires both molecular and novel evolutionary approaches. We developed a sensitive sequence alignment pipeline to identify conserved noncoding sequences (CNSs) in the Andropogoneae tribe (multiple crop species descended from a common ancestor ∼18 million years ago). The Andropogoneae share similar physiology while being tremendously genomically diverse, harboring a broad range of ploidy levels, structural variation, and transposons. These contribute to the potential of Andropogoneae as a powerful system for studying CNSs and are factors we leverage to understand the function of maize CNSs. We found that 86% of CNSs were comprised of annotated features, including introns, UTRs, putative cis-regulatory elements, chromatin loop anchors, noncoding RNA (ncRNA) genes, and several transposable element superfamilies. CNSs were enriched in active regions of DNA replication in the early S phase of the mitotic cell cycle and showed different DNA methylation ratios compared to the genome-wide background. More than half of putative cis-regulatory sequences (identified via other methods) overlapped with CNSs detected in this study. Variants in CNSs were associated with gene expression levels, and CNS absence contributed to loss of gene expression. Furthermore, the evolution of CNSs was associated with the functional diversification of duplicated genes in the context of maize subgenomes. Our results provide a quantitative understanding of the molecular processes governing the evolution of CNSs in maize.
Collapse
Affiliation(s)
- Baoxing Song
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, New York 14853, USA
| | - Hai Wang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, Joint Laboratory for International Cooperation in Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
| | - Yaoyao Wu
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Evan Rees
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | | | - Daniel J Gates
- Department of Evolution and Ecology, University of California Davis, Davis, California 95616, USA
| | - Merritt Khaipho-Burch
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Peter J Bradbury
- Agricultural Research Service, United States Department of Agriculture, Ithaca, New York 14853, USA
| | - Jeffrey Ross-Ibarra
- Department of Evolution and Ecology, University of California Davis, Davis, California 95616, USA
- Center for Population Biology and Genome Center, University of California Davis, Davis, California 95616, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011, USA
| | - M Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
3
|
Sun X, Wang Z, Hall JM, Perez-Cervantes C, Ruthenburg AJ, Moskowitz IP, Gribskov M, Yang XH. Chromatin-enriched RNAs mark active and repressive cis-regulation: An analysis of nuclear RNA-seq. PLoS Comput Biol 2020; 16:e1007119. [PMID: 32040509 PMCID: PMC7034927 DOI: 10.1371/journal.pcbi.1007119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 02/21/2020] [Accepted: 01/14/2020] [Indexed: 01/22/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) localize in the cell nucleus and influence gene expression through a variety of molecular mechanisms. Chromatin-enriched RNAs (cheRNAs) are a unique class of lncRNAs that are tightly bound to chromatin and putatively function to locally cis-activate gene transcription. CheRNAs can be identified by biochemical fractionation of nuclear RNA followed by RNA sequencing, but until now, a rigorous analytic pipeline for nuclear RNA-seq has been lacking. In this study, we survey four computational strategies for nuclear RNA-seq data analysis and develop a new pipeline, Tuxedo-ch, which outperforms other approaches. Tuxedo-ch assembles a more complete transcriptome and identifies cheRNA with higher accuracy than other approaches. We used Tuxedo-ch to analyze benchmark datasets of K562 cells and further characterize the genomic features of intergenic cheRNA (icheRNA) and their similarity to enhancer RNAs (eRNAs). We quantify the transcriptional correlation of icheRNA and adjacent genes and show that icheRNA is more positively associated with neighboring gene expression than eRNA or cap analysis of gene expression (CAGE) signals. We also explore two novel genomic associations of cheRNA, which indicate that cheRNAs may function to promote or repress gene expression in a context-dependent manner. IcheRNA loci with significant levels of H3K9me3 modifications are associated with active enhancers, consistent with the hypothesis that enhancers are derived from ancient mobile elements. In contrast, antisense cheRNA (as-cheRNA) may play a role in local gene repression, possibly through local RNA:DNA:DNA triple-helix formation. Nuclear RNA-seq provides a powerful way to profile the transcriptional landscape, especially the noncoding transcriptome. Through analyzing nuclear RNA-seq, the chromatin-enriched RNA (cheRNA) class of gene regulatory non-coding RNAs was identified. The computational framework presented here provides a reliable approach to identifying cheRNAs from nuclear RNA-seq, and for studying cell-type specific gene regulation. We find that intergenic cheRNA, including transcripts mapped to regions with high levels of classically repressive H3K9me3-marks, may act as a transcriptional activator. In contrast, antisense cheRNA, which originates from the DNA strand complementary to the candidate target protein-coding gene may interact with diverse chromatin modulators to repress local transcription. Our new pipeline allows the identification of a more complete set of cheRNAs than other approaches. A future challenge will be to refine the functional mechanisms of cheRNAs by exploring their regulatory roles, which are involved in diverse molecular and cellular processes in humans and other organisms.
Collapse
Affiliation(s)
- Xiangying Sun
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Zhezhen Wang
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
| | - Johnathon M Hall
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Carlos Perez-Cervantes
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
| | - Alexander J Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Ivan P Moskowitz
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
| | - Xinan H Yang
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
4
|
Hormozdiari F, van de Geijn B, Nasser J, Weissbrod O, Gazal S, Ju CJT, Connor LO, Hujoel MLA, Engreitz J, Hormozdiari F, Price AL. Functional disease architectures reveal unique biological role of transposable elements. Nat Commun 2019; 10:4054. [PMID: 31492842 PMCID: PMC6731302 DOI: 10.1038/s41467-019-11957-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 08/08/2019] [Indexed: 12/19/2022] Open
Abstract
Transposable elements (TE) comprise roughly half of the human genome. Though initially derided as junk DNA, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases remains unknown. Here, we analyze data from 41 independent diseases and complex traits to draw three conclusions. First, TE are uniquely informative for disease heritability. Despite overall depletion for heritability (54% of SNPs, 39 ± 2% of heritability), TE explain substantially more heritability than expected based on their depletion for known functional annotations. This implies that TE acquire function in ways that differ from known functional annotations. Second, older TE contribute more to disease heritability, consistent with acquiring biological function. Third, Short Interspersed Nuclear Elements (SINE) are far more enriched for blood traits than for other traits. Our results can help elucidate the biological roles that TE play in the genetic architecture of diseases.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. .,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joseph Nasser
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
| | - Luke O' Connor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA, USA
| | - Margaux L A Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Jesse Engreitz
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, 95616, USA.,MIND Institute and UC-Davis Genome Center, Davis, CA, 95616, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. .,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
5
|
Casanova EL, Switala AE, Dandamudi S, Hickman AR, Vandenbrink J, Sharp JL, Feltus FA, Casanova MF. Autism risk genes are evolutionarily ancient and maintain a unique feature landscape that echoes their function. Autism Res 2019; 12:860-869. [PMID: 31025836 PMCID: PMC6613973 DOI: 10.1002/aur.2112] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 03/22/2019] [Accepted: 04/06/2019] [Indexed: 11/09/2022]
Abstract
Previous research on autism risk (ASD), developmental regulatory (DevReg), and central nervous system (CNS) genes suggests they tend to be large in size, enriched in nested repeats, and mutation intolerant. The relevance of these genomic features is intriguing yet poorly understood. In this study, we investigated the feature landscape of these gene groups to discover structural themes useful in interpreting their function, developmental patterns, and evolutionary history. ASD, DevReg, CNS, housekeeping, and whole genome control (WGC) groups were compiled using various resources. Multiple gene features of interest were extracted from NCBI/UCSC Bioinformatics. Residual variation intolerance scores, Exome Aggregation Consortium pLI scores, and copy number variation data from Decipher were used to estimate variation intolerance. Gene age and protein-protein interactions (PPI) were estimated using Ensembl and EBI Intact databases, respectively. Compared to WGC: ASD, DevReg, and CNS genes are longer, produce larger proteins, maintain greater numbers/density of conserved noncoding elements and transposable elements, produce more transcript variants, and are comparatively variation intolerant. After controlling for gene size, mutation tolerance, and clinical association, ASD genes still retain many of these same features. In addition, we also found that ASD genes that are extremely mutation intolerant have larger PPI networks. These data support many of the recent findings within the field of autism genetics but also expand our understanding of the evolution of these broad gene groups, their potential regulatory complexity, and the extent to which they interact with the cellular network. Autism Res 2019, 12: 860-869. © 2019 International Society for Autism Research, Wiley Periodicals, Inc. LAY SUMMARY: Autism risk genes are more ancient compared to other genes in the genome. As such, they exhibit physical features related to their age, including long gene and protein size and regulatory sequences that help to control gene expression. They share many of these same features with other genes that are expressed in the brain and/or are associated with prenatal development.
Collapse
Affiliation(s)
- Emily L. Casanova
- Department of Biomedical Sciences, University of South
Carolina, South Carolina, USA
- Department of Pediatrics, Greenville Health System,
Greenville, USA
| | - Andrew E. Switala
- Department of Bioengineering, University of Louisville,
Louisville, Kentucky, USA
| | - Srini Dandamudi
- Department of Statistics, Colorado State University, Fort
Collins, Colorado, USA
| | - Allison R. Hickman
- Department of Genetics and Biochemistry, Clemson
University, Clemson, South Carolina, USA
| | | | - Julia L. Sharp
- Department of Statistics, Colorado State University, Fort
Collins, Colorado, USA
| | - F. Alex Feltus
- Department of Genetics and Biochemistry, Clemson
University, Clemson, South Carolina, USA
| | - Manuel F. Casanova
- Department of Biomedical Sciences, University of South
Carolina, South Carolina, USA
- Department of Pediatrics, Greenville Health System,
Greenville, USA
| |
Collapse
|
6
|
Kojima KK. LINEs Contribute to the Origins of Middle Bodies of SINEs besides 3' Tails. Genome Biol Evol 2018; 10:370-379. [PMID: 29325122 PMCID: PMC5786205 DOI: 10.1093/gbe/evy008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2018] [Indexed: 01/06/2023] Open
Abstract
Short interspersed elements (SINEs), which are nonautonomous transposable elements, require the transposition machinery of long interspersed elements (LINEs) to mobilize. SINEs are composed of two or more independently originating parts. The 5′ region is called the “head” and is derived mainly from small RNAs, and the 3′ region (“tail”) originates from the 3′ region of LINEs and is responsible for being recognized by counterpart LINE proteins. The origin of the middle “body” of SINEs is enigmatic, although significant sequence similarities among SINEs from very diverse species have been observed. Here, a systematic analysis of the similarities among SINEs and LINEs deposited on Repbase, a comprehensive database of eukaryotic repeat sequences was performed. Three primary findings are described: 1) The 5′ regions of only two clades of LINEs, RTE and Vingi, were revealed to have contributed to the middle parts of SINEs; 2) The linkage of the 5′ and 3′ parts of LINEs can be lost due to occasional tail exchange of SINEs; and 3) The previously proposed Ceph-domain was revealed to be a fusion of a CORE-domain and a 5′ part of RTE clade of LINE. Based on these findings, a hypothesis that the 5′ parts of bipartite nonautonomous LINEs, which possess only the 5′ and 3′ regions of the original LINEs, can contribute to the undefined middle part of SINEs is proposed.
Collapse
Affiliation(s)
- Kenji K Kojima
- Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan.,Genetic Information Research Institute, Mountain View, California
| |
Collapse
|
7
|
Craig RJ, Suh A, Wang M, Ellegren H. Natural selection beyond genes: Identification and analyses of evolutionarily conserved elements in the genome of the collared flycatcher (Ficedula albicollis). Mol Ecol 2018; 27:476-492. [DOI: 10.1111/mec.14462] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 11/28/2017] [Accepted: 11/28/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Rory J. Craig
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Institute of Evolutionary Biology; School of Biological Sciences; University of Edinburgh; Edinburgh UK
| | - Alexander Suh
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Mi Wang
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
8
|
Kojima KK. Human transposable elements in Repbase: genomic footprints from fish to humans. Mob DNA 2018; 9:2. [PMID: 29308093 PMCID: PMC5753468 DOI: 10.1186/s13100-017-0107-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 12/20/2017] [Indexed: 01/21/2023] Open
Abstract
Repbase is a comprehensive database of eukaryotic transposable elements (TEs) and repeat sequences, containing over 1300 human repeat sequences. Recent analyses of these repeat sequences have accumulated evidences for their contribution to human evolution through becoming functional elements, such as protein-coding regions or binding sites of transcriptional regulators. However, resolving the origins of repeat sequences is a challenge, due to their age, divergence, and degradation. Ancient repeats have been continuously classified as TEs by finding similar TEs from other organisms. Here, the most comprehensive picture of human repeat sequences is presented. The human genome contains traces of 10 clades (L1, CR1, L2, Crack, RTE, RTEX, R4, Vingi, Tx1 and Penelope) of non-long terminal repeat (non-LTR) retrotransposons (long interspersed elements, LINEs), 3 types (SINE1/7SL, SINE2/tRNA, and SINE3/5S) of short interspersed elements (SINEs), 1 composite retrotransposon (SVA) family, 5 classes (ERV1, ERV2, ERV3, Gypsy and DIRS) of LTR retrotransposons, and 12 superfamilies (Crypton, Ginger1, Harbinger, hAT, Helitron, Kolobok, Mariner, Merlin, MuDR, P, piggyBac and Transib) of DNA transposons. These TE footprints demonstrate an evolutionary continuum of the human genome.
Collapse
Affiliation(s)
- Kenji K Kojima
- Genetic Information Research Institute, 465 Fairchild Drive, Suite 201, Mountain View, CA 94043 USA.,Department of Life Sciences, National Cheng Kung University, No. 1, Daxue Rd, East District, Tainan, 701 Taiwan
| |
Collapse
|
9
|
Biscotti MA, Canapa A, Forkoni M, Gerdol M, Pallavicini A, Schartl M, Barucca M. The small non-coding RNA processing machinery of two living fossil species, lungfish and coelacanth, gives new insights into the evolution of the Argonaute protein family. Genome Biol Evol 2017; 9:438-453. [PMID: 28206606 PMCID: PMC5381642 DOI: 10.1093/gbe/evx017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/21/2016] [Accepted: 02/04/2017] [Indexed: 12/20/2022] Open
Affiliation(s)
- Maria Assunta Biscotti
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona (Italy)
| | - Adriana Canapa
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona (Italy)
| | - Mariko Forkoni
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona (Italy)
| | - Marco Gerdol
- Dipartimento di Scienze della Vita, Università di Trieste (Italy)
| | | | - Manifred Schartl
- Physiological Chemistry, Biocenter, University of Wuerzburg and Comprehensive Cancer Center Mainfranken, University Clinic Wuerzburg, Wuerzburg, Germany; and Texas Institute for Advanced Study and Department of Biology, Texas A&M University, College Station, USA
| | - Marco Barucca
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona (Italy)
| |
Collapse
|
10
|
Polychronopoulos D, Athanasopoulou L, Almirantis Y. Fractality and entropic scaling in the chromosomal distribution of conserved noncoding elements in the human genome. Gene 2016; 584:148-60. [DOI: 10.1016/j.gene.2016.02.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 01/22/2016] [Accepted: 02/14/2016] [Indexed: 11/15/2022]
|
11
|
A novel satellite DNA isolated in Pecten jacobaeus shows high sequence similarity among molluscs. Mol Genet Genomics 2015; 290:1717-25. [PMID: 25832354 DOI: 10.1007/s00438-015-1036-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 03/24/2015] [Indexed: 12/25/2022]
Abstract
The aim of this work is to investigate the sequence conservation and the evolution of repeated DNA in related species. Satellite DNA is a component of eukaryotic genomes and is made up of tandemly repeated sequences. These sequences are affected by high rates of mutation that lead to the occurrence of species-specific satellite DNAs, which are different in terms of both quantity and quality. In this work, a novel repetitive DNA family, named PjHhaI sat, is described in Pecten jacobaeus. The quantitative analyses revealed a different abundance of this element in the molluscan species investigated in agreement with the "library hypothesis" even if, in this case, at a high taxonomic level. In addition, the qualitative analysis demonstrated an astonishing sequence conservation not only among scallops but also in six other molluscan species belonging to three classes. These findings suggest that the PjHhaI sat may be considered as the most ancients of DNA described so far, which remained "frozen" during molluscan evolution. The widespread distribution of this sat DNA in molluscs as well as its long evolutionary preservation open up questions on the functional role of this element. A future challenge might be the identification of proteins or molecules which interact with the PjHhaI sat.
Collapse
|
12
|
Joly-Lopez Z, Bureau TE. Diversity and evolution of transposable elements in Arabidopsis. Chromosome Res 2015; 22:203-16. [PMID: 24801342 DOI: 10.1007/s10577-014-9418-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Transposable elements are mobile genetic elements that have successfully populated eukaryotic genomes and show diversity in their structure and transposition mechanisms. Although first viewed solely as selfish, transposable elements are now known as important vectors to drive the adaptation and evolution of their host genome. Transposable elements can affect host gene structures, gene copy number, gene expression, and even as a source for novel genes. For example, a number of transposable element sequences have been co-opted to contribute to evolutionary innovation, such as the mammalian placenta and the vertebrate immune system. In plants, the need to adapt rapidly to changing environmental conditions is essential and is reflected, as will be discussed, by genome plasticity and an abundance of diverse, active transposon families. This review focuses on transposable elements in plants, particularly those that have beneficial effects on the host. We also emphasize the importance of having proper tools to annotate and classify transposons to better understand their biology.
Collapse
Affiliation(s)
- Zoé Joly-Lopez
- Department of Biology, McGill University, Montreal, QC, Canada
| | | |
Collapse
|
13
|
Polychronopoulos D, Sellis D, Almirantis Y. Conserved noncoding elements follow power-law-like distributions in several genomes as a result of genome dynamics. PLoS One 2014; 9:e95437. [PMID: 24787386 PMCID: PMC4008492 DOI: 10.1371/journal.pone.0095437] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
Conserved, ultraconserved and other classes of constrained elements (collectively referred as CNEs here), identified by comparative genomics in a wide variety of genomes, are non-randomly distributed across chromosomes. These elements are defined using various degrees of conservation between organisms and several thresholds of minimal length. We here investigate the chromosomal distribution of CNEs by studying the statistical properties of distances between consecutive CNEs. We find widespread power-law-like distributions, i.e. linearity in double logarithmic scale, in the inter-CNE distances, a feature which is connected with fractality and self-similarity. Given that CNEs are often found to be spatially associated with genes, especially with those that regulate developmental processes, we verify by appropriate gene masking that a power-law-like pattern emerges irrespectively of whether elements found close or inside genes are excluded or not. An evolutionary model is put forward for the understanding of these findings that includes segmental or whole genome duplication events and eliminations (loss) of most of the duplicated CNEs. Simulations reproduce the main features of the observed size distributions. Power-law-like patterns in the genomic distributions of CNEs are in accordance with current knowledge about their evolutionary history in several genomes.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Diamantis Sellis
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- * E-mail:
| |
Collapse
|
14
|
Forconi M, Chalopin D, Barucca M, Biscotti MA, De Moro G, Galiana D, Gerdol M, Pallavicini A, Canapa A, Olmo E, Volff JN. Transcriptional activity of transposable elements in coelacanth. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2013; 322:379-89. [PMID: 24038780 DOI: 10.1002/jez.b.22527] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2013] [Revised: 06/04/2013] [Accepted: 07/14/2013] [Indexed: 01/22/2023]
Abstract
The morphological stasis of coelacanths has long suggested a slow evolutionary rate. General genomic stasis might also imply a decrease of transposable elements activity. To evaluate the potential activity of transposable elements (TEs) in "living fossil" species, transcriptomic data of Latimeria chalumnae and its Indonesian congener Latimeria menadoensis were compared through the RNA-sequencing mapping procedures in three different organs (liver, testis, and muscle). The analysis of coelacanth transcriptomes highlights a significant percentage of transcribed TEs in both species. Major contributors are LINE retrotransposons, especially from the CR1 family. Furthermore, some particular elements such as a LF-SINE and a LINE2 sequences seem to be more expressed than other elements. The amount of TEs expressed in testis suggests possible transposition burst in incoming generations. Moreover, significant amount of TEs in liver and muscle transcriptomes were also observed. Analyses of elements displaying marked organ-specific expression gave us the opportunity to highlight exaptation cases, that is, the recruitment of TEs as new cellular genes, but also to identify a new Latimeria-specific family of Short Interspersed Nuclear Elements called CoeG-SINEs. Overall, transcriptome results do not seem to be in line with a slow-evolving genome with poor TE activity.
Collapse
Affiliation(s)
- Mariko Forconi
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona, Italy; Institut de Génomique Fonctionnelle de Lyon, ENS Lyon, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Pallavicini A, Canapa A, Barucca M, Alfőldi J, Biscotti MA, Buonocore F, De Moro G, Di Palma F, Fausto AM, Forconi M, Gerdol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G. Analysis of the transcriptome of the Indonesian coelacanth Latimeria menadoensis. BMC Genomics 2013; 14:538. [PMID: 23927401 PMCID: PMC3750513 DOI: 10.1186/1471-2164-14-538] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 06/26/2013] [Indexed: 02/01/2023] Open
Abstract
Background Latimeria menadoensis is a coelacanth species first identified in 1997 in Indonesia, at 10,000 Km of distance from its African congener. To date, only six specimens have been caught and just a very limited molecular data is available. In the present work we describe the de novo transcriptome assembly obtained from liver and testis samples collected from the fifth specimen ever caught of this species. Results The deep RNA sequencing performed with Illumina technologies generated 145,435,156 paired-end reads, accounting for ~14 GB of sequence data, which were de novo assembled using a Trinity/CLC combined strategy. The assembly output was processed and filtered producing a set of 66,308 contigs, whose quality was thoroughly assessed. The comparison with the recently sequenced genome of the African congener Latimeria chalumnae and with the available genomic resources of other vertebrates revealed a good reconstruction of full length transcripts and a high coverage of the predicted full coelacanth transcriptome. The RNA-seq analysis revealed remarkable differences in the expression profiles between the two tissues, allowing the identification of liver- and testis-specific transcripts which may play a fundamental role in important biological processes carried out by these two organs. Conclusion Given the high genomic affinity between the two coelacanth species, the here described de novo transcriptome assembly can be considered a valuable support tool for the improvement of gene prediction within the genome of L. chalumnae and a valuable resource for investigation of many aspects of tetrapod evolution.
Collapse
|
16
|
de Souza FS, Franchini LF, Rubinstein M. Exaptation of transposable elements into novel cis-regulatory elements: is the evidence always strong? Mol Biol Evol 2013; 30:1239-51. [PMID: 23486611 PMCID: PMC3649676 DOI: 10.1093/molbev/mst045] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Transposable elements (TEs) are mobile genetic sequences that can jump around the genome from one location to another, behaving as genomic parasites. TEs have been particularly effective in colonizing mammalian genomes, and such heavy TE load is expected to have conditioned genome evolution. Indeed, studies conducted both at the gene and genome levels have uncovered TE insertions that seem to have been co-opted--or exapted--by providing transcription factor binding sites (TFBSs) that serve as promoters and enhancers, leading to the hypothesis that TE exaptation is a major factor in the evolution of gene regulation. Here, we critically review the evidence for exaptation of TE-derived sequences as TFBSs, promoters, enhancers, and silencers/insulators both at the gene and genome levels. We classify the functional impact attributed to TE insertions into four categories of increasing complexity and argue that so far very few studies have conclusively demonstrated exaptation of TEs as transcriptional regulatory regions. We also contend that many genome-wide studies dealing with TE exaptation in recent lineages of mammals are still inconclusive and that the hypothesis of rapid transcriptional regulatory rewiring mediated by TE mobilization must be taken with caution. Finally, we suggest experimental approaches that may help attributing higher-order functions to candidate exapted TEs.
Collapse
Affiliation(s)
- Flávio S.J. de Souza
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Lucía F. Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Marcelo Rubinstein
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
17
|
Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates. Biol Direct 2012; 7:36. [PMID: 23098210 PMCID: PMC3500645 DOI: 10.1186/1745-6150-7-36] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 10/23/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations. RESULTS We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific. CONCLUSIONS The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift.
Collapse
|
18
|
Klimopoulos A, Sellis D, Almirantis Y. Widespread occurrence of power-law distributions in inter-repeat distances shaped by genome dynamics. Gene 2012; 499:88-98. [PMID: 22370293 DOI: 10.1016/j.gene.2012.02.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Revised: 02/05/2012] [Accepted: 02/06/2012] [Indexed: 11/25/2022]
Abstract
Repetitive DNA sequences derived from transposable elements (TE) are distributed in a non-random way, co-clustering with other classes of repeat elements, genes and other genomic components. In a previous work we reported power-law-like size distributions (linearity in log-log scale) in the spatial arrangement of Alu and LINE1 elements in the human genome. Here we investigate the large-scale features of the spatial arrangement of all principal classes of TEs in 14 genomes from phylogenetically distant organisms by studying the size distribution of inter-repeat distances. Power-law-like size distributions are found to be widespread, extending up to several orders of magnitude. In order to understand the emergence of this distributional pattern, we introduce an evolutionary scenario, which includes (i) Insertions of DNA segments (e.g., more recent repeats) into the considered sequence and (ii) Eliminations of members of the studied TE family. In the proposed model we also incorporate the potential for transposition events (characteristic of the DNA transposons' life-cycle) and segmental duplications. Simulations reproduce the main features of the observed size distributions. Furthermore, we investigate the effects of various genomic features on the presence and extent of power-law size distributions including TE class and age, mode of parental TE transmission, GC content, deletion and recombination rates in the studied genomic region, etc. Our observations corroborate the hypothesis that insertions of genomic material and eliminations of repeats are at the basis of power-laws in inter-repeat distances. The existence of these power-laws could facilitate the formation of the recently proposed "fractal globule" for the confined chromatin organization.
Collapse
Affiliation(s)
- Alexandros Klimopoulos
- National Center for Scientific Research "Demokritos," Institute of Biology, 153 10 Athens, Greece.
| | | | | |
Collapse
|
19
|
Hellen EHB, Brookfield JFY. Investigation of the origin and spread of a Mammalian transposable element based on current sequence diversity. J Mol Evol 2012; 73:287-96. [PMID: 22222953 PMCID: PMC3268980 DOI: 10.1007/s00239-011-9475-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 11/28/2011] [Indexed: 01/07/2023]
Abstract
Almost half the human genome consists of mobile DNA elements, and their analysis is a vital part of understanding the human genome as a whole. Many of these elements are ancient and have persisted in the genome for tens or hundreds of millions of years, providing a window into the evolution of modern mammals. The Golem family have been used as model transposons to highlight computational analyses which can be used to investigate these elements, particularly the use of molecular dating with large transposon families. Whole-genome searches found Golem sequences in 20 mammalian species. Golem A and B subsequences were only found in primates and squirrel. Interestingly, the full-length Golem, found as a few copies in many mammalian genomes, was found abundantly in horse. A phylogenetic profile suggested that Golem originated after the eutherian–metatherian divergence and that the A and B subfamilies originated at a much later date. Molecular dating based on sequence diversity suggests an early age, of 175 Mya, for the origin of the family and that the A and B lineages originated much earlier than expected from their current taxonomic distribution and have subsequently been lost in some lineages. Using publically available data, it is possible to investigate the evolutionary history of transposon families. Determining in which organisms a transposon can be found is often used to date the origin and expansion of the families. However, in this analysis, molecular dating, commonly used for determining the age of gene sequences, has been used, reducing the likelihood of errors from deleted lineages.
Collapse
Affiliation(s)
- Elizabeth H B Hellen
- Centre for Genetics and Genomics, School of Biology, University of Nottingham, University Park, Nottingham, UK
| | | |
Collapse
|
20
|
Large-scale DNA editing of retrotransposons accelerates mammalian genome evolution. Nat Commun 2011; 2:519. [PMID: 22044998 DOI: 10.1038/ncomms1525] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 10/03/2011] [Indexed: 11/08/2022] Open
Abstract
Retrotransposons had an important role in genome evolution, including the formation of new genes and promoters and the rewiring of gene networks. However, it is unclear how such a repertoire of functions emerged from a relatively limited number of source sequences. Here we show that DNA editing, an antiviral mechanism, accelerated the evolution of mammalian genomes by large-scale modification of their retrotransposon sequences. We find numerous pairs of retrotransposons containing long clusters of G-to-A mutations that cannot be attributed to random mutagenesis. These clusters, which we find across different mammalian genomes and retrotransposon families, are the hallmark of APOBEC3 activity, a potent antiretroviral protein family with cytidine deamination function. As DNA editing simultaneously generates a large number of mutations, each affected element begins its evolutionary trajectory from a unique starting point, thereby increasing the probability of developing a novel function. Our findings thus suggest a potential mechanism for retrotransposon domestication.
Collapse
|
21
|
Smith JJ, Sumiyama K, Amemiya CT. A living fossil in the genome of a living fossil: Harbinger transposons in the coelacanth genome. Mol Biol Evol 2011; 29:985-93. [PMID: 22045999 DOI: 10.1093/molbev/msr267] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Emerging data from the coelacanth genome are beginning to shed light on the origin and evolution of tetrapod genes and noncoding elements. Of particular relevance is the realization that coelacanth retains active copies of transposable elements that once served as raw material for the evolution of new functional sequences in the vertebrate lineage. Recognizing the evolutionary significance of coelacanth genome in this regard, we employed an ab initio search strategy to further classify its repetitive complement. This analysis uncovered a class of interspersed elements (Latimeria Harbinger 1-LatiHarb1) that is a major contributor to coelacanth genome structure and gene content (∼1% to 4% or the genome). Sequence analyses indicate that 1) each ∼8.7 kb LatiHarb1 element contains two coding regions, a transposase gene and a gene whose function is as yet unknown (MYB-like) and 2) copies of LatiHarb1 retain biological activity in the coelacanth genome. Functional analyses verify transcriptional and enhancer activities of LatiHarb1 in vivo and reveal transcriptional decoupling that could permit MYB-like genes to play functional roles not directly linked to transposition. Thus, LatiHarb1 represents the first known instance of a harbinger-superfamily transposon with contemporary activity in a vertebrate genome. Analyses of LatiHarb1 further corroborate the notion that exaptation of anciently active harbinger elements gave rise to at least two vertebrate genes (harbi1 and naif1) and indicate that the vertebrate gene tsnare1 also traces its ancestry to this transposon superfamily. Based on our analyses of LatiHarb1, we speculate that several functional features of harbinger elements may predispose the transposon superfamily toward recurrent exaptive evolution of cellular coding genes. In addition, these analyses further reinforce the broad utility of the coelacanth genome and other "outgroup" genomes in understanding the ancestry and evolution of vertebrate genes and genomes.
Collapse
Affiliation(s)
- Jeramiah J Smith
- Benaroya Research Institute at Virginia Mason Medical Center, Seattle, WA, USA.
| | | | | |
Collapse
|
22
|
Chatterjee S, Lufkin T. Fishing for function: zebrafish BAC transgenics for functional genomics. MOLECULAR BIOSYSTEMS 2011; 7:2345-51. [PMID: 21647532 DOI: 10.1039/c1mb05116d] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Transgenics using bacterial artificial chromosomes (BACs) offers a great opportunity to look at gene regulation in a developing embryo. The modified BAC containing a reporter inserted just before the translational start site of the gene of interest allows for the visualization of spatio-temporal gene expression. Though this method has been used in the mouse model extensively, its utility in zebrafish studies is relatively new. This review aims to look at the utility of making BAC transgenics in zebrafish and its applications in functional genomics. We look at the various methods to modify the BAC, some limitations and what the future holds.
Collapse
Affiliation(s)
- Sumantra Chatterjee
- Stem Cell and Developmental Biology, Genome Institute of Singapore, Singapore
| | | |
Collapse
|
23
|
Zhang Y, Romanish MT, Mager DL. Distributions of transposable elements reveal hazardous zones in mammalian introns. PLoS Comput Biol 2011; 7:e1002046. [PMID: 21573203 PMCID: PMC3088655 DOI: 10.1371/journal.pcbi.1002046] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Accepted: 03/25/2011] [Indexed: 11/20/2022] Open
Abstract
Comprising nearly half of the human and mouse genomes, transposable elements (TEs) are found within most genes. Although the vast majority of TEs in introns are fixed in the species and presumably exert no significant effects on the enclosing gene, some markedly perturb transcription and result in disease or a mutated phenotype. Factors determining the likelihood that an intronic TE will affect transcription are not clear. In this study, we examined intronic TE distributions in both human and mouse and found several factors that likely contribute to whether a particular TE can influence gene transcription. Specifically, we observed that TEs near exons are greatly underrepresented compared to random distributions, but the size of these “underrepresentation zones” differs between TE classes. Compared to elsewhere in introns, TEs within these zones are shorter on average and show stronger orientation biases. Moreover, TEs in extremely close proximity (<20 bp) to exons show a strong bias to be near splice-donor sites. Interestingly, disease-causing intronic TE insertions show the opposite distributional trends, and by examining expressed sequence tag (EST) databases, we found that the proportion of TEs contributing to chimeric TE-gene transcripts is significantly higher within their underrepresentation zones. In addition, an analysis of predicted splice sites within human long terminal repeat (LTR) elements showed a significantly lower total number and weaker strength for intronic LTRs near exons. Based on these factors, we selectively examined a list of polymorphic mouse LTR elements in introns and showed clear evidence of transcriptional disruption by LTR element insertions in the Trpc6 and Kcnh6 genes. Taken together, these studies lend insight into the potential selective forces that have shaped intronic TE distributions and enable identification of TEs most likely to exert transcriptional effects on genes. Sequences derived from transposable elements (TEs) are major constituents of mammalian genomes and are found within introns of most genes. While nearly all TEs within introns appear harmless, some de novo intronic TE insertions do disrupt gene transcription and splicing and cause disease. It is unclear why some intronic TEs perturb gene transcription whereas most do not. Here, we examined intronic TE distributions in both human and mouse genes to gain insight into which TEs may be more likely to affect transcription. We found evidence that TEs near exons are likely subject to strong negative selection but the size of the region under selection or “underrepresentation zone” differs for different TE classes. Strikingly, all reported human disease-causing intronic TE insertions fall within these underrepresentation zones, and the proportion of TEs contributing to chimeric TE-gene transcripts is significantly higher when TEs are located in these zones. We also examined insertionally polymorphic mouse TEs located within underrepresentation zones and found evidence of transcriptional disruption in two genes. Given the growing appreciation for ongoing activity of TEs in human, our results should be of value in prioritizing insertionally polymorphic TEs for study of their potential contributions to gene expression differences and phenotypic variability.
Collapse
Affiliation(s)
- Ying Zhang
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Mark T. Romanish
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Dixie L. Mager
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail:
| |
Collapse
|
24
|
Abstract
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Collapse
|
25
|
When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 2010; 350:239-54. [PMID: 21130761 DOI: 10.1016/j.ydbio.2010.11.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 11/11/2010] [Accepted: 11/22/2010] [Indexed: 01/22/2023]
Abstract
A major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found. Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project.
Collapse
|
26
|
Jung CH, Makunin IV, Mattick JS. Identification of conserved Drosophila-specific euchromatin-restricted non-coding sequence motifs. Genomics 2010; 96:154-66. [PMID: 20595017 DOI: 10.1016/j.ygeno.2010.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 05/25/2010] [Accepted: 05/26/2010] [Indexed: 01/19/2023]
Abstract
Non-protein-coding DNA comprises the majority of animal genomes but its functions are largely unknown. We identified over 17,000 different tetranucleotide pairs in the Drosophila melanogaster genome that are over-represented at distances up to 100nt in conserved non-exonic sequences. Those exhibiting the highest information content in surrounding nucleotides were classified into five groups: tRNAs, motifs associated with histone genes, Suppressor-of-Hairy-wing binding sites, and two sets of previously unrecognized motifs (DLM3 and DLM4). There are hundreds to thousands of copies of DLM3 and DLM4, respectively, in the genome, located almost exclusively in non-coding regions. They have similar copy numbers among drosophilids, but are largely absent in other insects. DLM3 is likely a cis-regulatory element, whereas DLM4 sequences are capable of forming a short hairpin structure and are expressed as approximately 80nt RNAs. This work reports the existence of Drosophila genus-specific sequence motifs, and suggests that many more novel functional elements may be discovered in genomes using the general approach outlined herein.
Collapse
Affiliation(s)
- Chol-Hee Jung
- Institute for Molecular Bioscience, The University of Queensland, St Lucia QLD, Australia
| | | | | |
Collapse
|
27
|
Complete HOX cluster characterization of the coelacanth provides further evidence for slow evolution of its genome. Proc Natl Acad Sci U S A 2010; 107:3622-7. [PMID: 20139301 DOI: 10.1073/pnas.0914312107] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The living coelacanth is a lobe-finned fish that represents an early evolutionary departure from the lineage that led to land vertebrates, and is of extreme interest scientifically. It has changed very little in appearance from fossilized coelacanths of the Cretaceous (150 to 65 million years ago), and is often referred to as a "living fossil." An important general question is whether long-term stasis in morphological evolution is associated with stasis in genome evolution. To this end we have used targeted genome sequencing for acquiring 1,612,752 bp of high quality finished sequence encompassing the four HOX clusters of the Indonesian coelacanth Latimeria menadoensis. Detailed analyses were carried out on genomic structure, gene and repeat contents, conserved noncoding regions, and relative rates of sequence evolution in both coding and noncoding tracts. Our results demonstrate conclusively that the coelacanth HOX clusters are evolving comparatively slowly and that this taxon should serve as a viable outgroup for interpretation of the genomes of tetrapod species.
Collapse
|
28
|
Stanke F, Becker T, Hedtfeld S, Tamm S, Wienker TF, Tümmler B. Hierarchical fine mapping of the cystic fibrosis modifier locus on 19q13 identifies an association with two elements near the genes CEACAM3 and CEACAM6. Hum Genet 2010; 127:383-94. [PMID: 20047061 DOI: 10.1007/s00439-009-0779-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 12/18/2009] [Indexed: 12/23/2022]
Abstract
On 19q13, TGFB1 and the cystic fibrosis modifier 1 locus (CFM1) have been identified as modifiers of the course of the monogenic disease cystic fibrosis (CF). Recently, we have described a transmission disequilibrium at the microsatellite D19S197, localized between TGFB1 and CFM1. To map the corresponding molecular variants, we have selected informative SNP markers within a 600-kb area and compared two-marker-haplotype-distributions between phenotypically contrasting sib pair groups, intending to type only phylogenetically old markers by aiming for close-to-maximal polymorphism information content of the SNPs. Starting with a seed set of five SNPs that cover intermarker distances of up to 50 kb, we have iteratively added more SNPs to the map, until we could identify two genomic fragments of 3,289 and 2,052 bp for which pairs with contrasting phenotypes showed different haplotype distributions on the final 17-SNP-map (P(raw) = 0.0002, P(corr17SNPs) = 0.0106 and P(raw) = 0.0008, P(corr17SNPs) = 0.0469, respectively). Resequencing of these fragments of four unrelated individuals for each element showed that the mildly and severely affected pairs differ in seven SNPs and concordant pairs differ from discordant pairs in five SNPs. Annotation of these variants indicate that CEACAM6 and a regulatory element near the 3' end of CEACAM3 are associated with CF disease severity and intrapair discordance, respectively. While our approach was only guided by the markers' position, the involvement of genes from the CEACAM family in host defense and innate immunity designates these proteins as likely modifiers of the multi-organ disease cystic fibrosis which is known for its cytokine imbalance and pro-inflammatory phenotype.
Collapse
Affiliation(s)
- Frauke Stanke
- Department of Pediatrics, Hannover Medical School, Hannover, Germany.
| | | | | | | | | | | |
Collapse
|
29
|
Dong X, Navratilova P, Fredman D, Drivenes Ø, Becker TS, Lenhard B. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons. Nucleic Acids Res 2009; 38:1071-85. [PMID: 19969543 PMCID: PMC2831330 DOI: 10.1093/nar/gkp1124] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity.
Collapse
Affiliation(s)
- Xianjun Dong
- Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | | | | | | | | | | |
Collapse
|
30
|
TIAN J, ZHAO ZH, CHEN HP. [Conserved non-coding elements in human genome]. YI CHUAN = HEREDITAS 2009; 31:1067-1076. [PMID: 19933086 DOI: 10.3724/sp.j.1005.2009.01067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Study of comparative genomics has revealed that about 5% of the human genome are under purifying selection, 3.5% of which are conserved non-coding elements (CNEs). While the coding regions comprise of only a small part. In human, the CNEs are functionally important, which may be associated with the process of the establishment and maintain of chromatin architecture, transcription regulation, and pre-mRNA processing. They are also related to ontogeny of mammals and human diseases. This review outlined the identification, functional significance, evolutionary origin, and effects on human genetic defects of the CNEs.
Collapse
Affiliation(s)
- Jing TIAN
- Institute of Biotechnology, Academy of Military Medical Science, Beijing 100071, China.
| | | | | |
Collapse
|
31
|
Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. Identifying novel constrained elements by exploiting biased substitution patterns. ACTA ACUST UNITED AC 2009; 25:i54-62. [PMID: 19478016 PMCID: PMC2687944 DOI: 10.1093/bioinformatics/btp190] [Citation(s) in RCA: 248] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection. Availability: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. Contact:xhx@ics.uci.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Garber
- Department of Biology, Broad Institute of MIT and Harvard, 7 Cambridge Center, MIT, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
32
|
Wang J, Bowen NJ, Mariño-Ramírez L, Jordan IK. A c-Myc regulatory subnetwork from human transposable element sequences. MOLECULAR BIOSYSTEMS 2009; 5:1831-9. [PMID: 19763338 DOI: 10.1039/b908494k] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Transposable elements (TEs) can donate regulatory sequences that help to control the expression of human genes. The oncogene c-Myc is a promiscuous transcription factor that is thought to regulate the expression of hundreds of genes. We evaluated the contribution of TEs to the c-Myc regulatory network by searching for c-Myc binding sites derived from TEs and by analyzing the expression and function of target genes with nearby TE-derived c-Myc binding sites. There are thousands of TE sequences in the human genome that are bound by c-Myc. A conservative analysis indicated that 816-4564 of these TEs contain canonical c-Myc binding site motifs. c-Myc binding sites are over-represented among sequences derived from the ancient TE families L2 and MIR, consistent with their preservation by purifying selection. Genes associated with TE-derived c-Myc binding sites are co-expressed with each other and with c-Myc. A number of these putative TE-derived c-Myc target genes are differentially expressed between Burkitt's lymphoma cells versus normal B cells and encode proteins with cancer-related functions. Despite several lines of evidence pointing to their regulation by c-Myc and relevance to cancer, the set of genes identified as TE-derived c-Myc targets does not significantly overlap with two previously characterized c-Myc target gene sets. These data point to a substantial contribution of TEs to the regulation of human genes by c-Myc. Genes that are regulated by TE-derived c-Myc binding sites appear to form a distinct c-Myc regulatory subnetwork.
Collapse
Affiliation(s)
- Jianrong Wang
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
33
|
Berman BP, Frenkel B, Coetzee GA. Location, location, (ChIP-)location! Mapping chromatin landscapes one immunoprecipitation at a time. J Cell Biochem 2009; 107:1-5. [PMID: 19308935 DOI: 10.1002/jcb.22133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A small fraction of the typical animal genome (<5% in humans) codes for the organism's collection of proteins, yet the study of protein coding sequences dominated the early years of genomics research. In the decade since the sequencing of complete eukaryotic genomes, however, genomic techniques have shed a great deal of light on the non-coding DNA making up the remainder. A single molecular technique, Chromatin Immuno-Precipitation (ChIP) location analysis, has had a profound impact and has made possible the study of an incredible range of biology. This issue of The Journal of Cellular Biochemistry aims to put into context advancements made possible by the ChIP-location revolution, while at the same time highlighting some of the most important technical aspects and challenges along with some of the work yet to come.
Collapse
Affiliation(s)
- Benjamin P Berman
- USC Epigenome Center, University of Southern California, Los Angeles, California 90033, USA.
| | | | | |
Collapse
|
34
|
Pereira V, Enard D, Eyre-Walker A. The effect of transposable element insertions on gene expression evolution in rodents. PLoS One 2009; 4:e4321. [PMID: 19183808 PMCID: PMC2629548 DOI: 10.1371/journal.pone.0004321] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2008] [Accepted: 11/24/2008] [Indexed: 01/04/2023] Open
Abstract
Background Many genomes contain a substantial number of transposable elements (TEs), a few of which are known to be involved in regulating gene expression. However, recent observations suggest that TEs may have played a very important role in the evolution of gene expression because many conserved non-genic sequences, some of which are know to be involved in gene regulation, resemble TEs. Results Here we investigate whether new TE insertions affect gene expression profiles by testing whether gene expression divergence between mouse and rat is correlated to the numbers of new transposable elements inserted near genes. We show that expression divergence is significantly correlated to the number of new LTR and SINE elements, but not to the numbers of LINEs. We also show that expression divergence is not significantly correlated to the numbers of ancestral TEs in most cases, which suggests that the correlations between expression divergence and the numbers of new TEs are causal in nature. We quantify the effect and estimate that TE insertion has accounted for ∼20% (95% confidence interval: 12% to 26%) of all expression profile divergence in rodents. Conclusions We conclude that TE insertions may have had a major impact on the evolution of gene expression levels in rodents.
Collapse
Affiliation(s)
- Vini Pereira
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail: (VP) (VP); (AEW) (AE)
| | - David Enard
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail: (VP) (VP); (AEW) (AE)
| |
Collapse
|
35
|
Hirakawa M, Nishihara H, Kanehisa M, Okada N. Characterization and evolutionary landscape of AmnSINE1 in Amniota genomes. Gene 2008; 441:100-10. [PMID: 19166919 DOI: 10.1016/j.gene.2008.12.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Revised: 11/29/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022]
Abstract
Discovery of a large number of conserved non-coding elements (CNEs) in vertebrate genomes provides a cornerstone to elucidate molecular mechanisms of macroevolution. Extensive comparative genomics has proven that transposons such as short interspersed elements (SINEs) were an important source of CNEs. We recently characterized AmnSINE1, a SINE family in Amniota genomes, some of which are present in CNEs, and demonstrated that two AmnSINE1 loci play an important role in mammalian-specific brain development by functioning as an enhancer (Sasaki et al. Proc. Natl. Acad. Sci. USA 2008). To get more information about AmnSINE1s, we here performed a multi-species search for AmnSINE1, and revealed the distribution and evolutionary history of these SINEs in amniote genomes. The number of AmnSINE1 regions in amniotes ranged from 160 to 1200; the number in the eutherians were under 500 and the largest was that in chicken. Phylogenetic analysis established that each AmnSINE1 locus has evolved uniquely, primarily since the divergence of mammals from reptiles. These results support the notion that AmnSINE1s were amplified as an ancient retroposon in a common ancestor of Amniota and subsequently have survived for 300 Myr because of functions acquired by mutation-coupled exaptation prior mammalian radiation. On the basis of sequence homology and conserved synteny, we detected the orthologs of AmnSINE1 for candidates of further enhancer analysis, which are more conserved than two loci that were shown to have been involved in mammalian brain development. The present work provides a comprehensive data set to test the role of AmnSINE1s, many of which were exapted and contributed to mammalian macroevolution.
Collapse
Affiliation(s)
- Mika Hirakawa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | | | | | | |
Collapse
|
36
|
Navratilova P, Fredman D, Hawkins TA, Turner K, Lenhard B, Becker TS. Systematic human/zebrafish comparative identification of cis-regulatory activity around vertebrate developmental transcription factor genes. Dev Biol 2008; 327:526-40. [PMID: 19073165 DOI: 10.1016/j.ydbio.2008.10.044] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2008] [Revised: 10/02/2008] [Accepted: 10/28/2008] [Indexed: 01/01/2023]
Abstract
Pan-vertebrate developmental cis-regulatory elements are discernible as highly conserved noncoding elements (HCNEs) and are often dispersed over large areas around the pleiotropic genes whose expression they control. On the loci of two developmental transcription factor genes, SOX3 and PAX6, we demonstrate that HCNEs conserved between human and zebrafish can be systematically and reliably tested for their regulatory function in multiple stable transgenes in zebrafish, and their genomic reach estimated with confidence using synteny conservation and HCNE density along these loci. HCNEs of both human and zebrafish function as specific developmental enhancers in zebrafish. We show that human HCNEs result in expression patterns in zebrafish equivalent to those in mouse, establishing zebrafish as a suitable model for large-scale testing of human developmental enhancers. Orthologous human and zebrafish enhancers underwent functional evolution within their sequence and often directed related but non-identical expression patterns. Despite an evolutionary distance of 450 million years, one pax6 HCNE drove expression in identical areas when comparing zebrafish vs. human HCNEs. HCNEs from the same area often drive overlapping patterns, suggesting that multiple regulatory inputs are required to achieve robust and precise complex expression patterns exhibited by developmental genes.
Collapse
Affiliation(s)
- Pavla Navratilova
- Sars Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, Norway
| | | | | | | | | | | |
Collapse
|
37
|
Shakes LA, Malcolm TL, Allen KL, De S, Harewood KR, Chatterjee PK. Context dependent function of APPb enhancer identified using enhancer trap-containing BACs as transgenes in zebrafish. Nucleic Acids Res 2008; 36:6237-48. [PMID: 18832376 PMCID: PMC2577333 DOI: 10.1093/nar/gkn628] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
An enhancer within intron 1 of the amyloid precursor protein gene (APPb) of zebrafish is identified functionally using a novel approach. Bacterial artificial chromosomes (BACs) were retrofitted with enhancer traps, and expressed as transgenes in zebrafish. Expression from both transient assays and stable lines were used for analysis. Although the enhancer was active in specific nonneural cells of the notochord when placed with APPb gene promoter proximal elements its function was restricted to, and absolutely required for, specific expression in neurons when juxtaposed with additional far-upstream promoter elements of the gene. We demonstrate that expression of green fluorescent protein fluorescence resembling the tissue distribution of APPb mRNA requires both the intron 1 enhancer and approximately 28 kb of DNA upstream of the gene. The results indicate that tissue-specificity of an isolated enhancer may be quite different from that in the context of its own gene. Using this enhancer and upstream sequence, polymorphic variants of APPb can now more closely recapitulate the endogenous pattern and regulation of APPb expression in animal models for Alzheimer's disease. The methodology should help functionally map multiple noncontiguous regulatory elements in BACs with or without gene-coding sequences.
Collapse
Affiliation(s)
- Leighcraft A Shakes
- Julius L. Chambers Biomedical/Biotechnology Research Institute, Department of Chemistry, North Carolina Central University, Durham, NC 27707, USA
| | | | | | | | | | | |
Collapse
|
38
|
Smith AM, Sanchez MJ, Follows GA, Kinston S, Donaldson IJ, Green AR, Göttgens B. A novel mode of enhancer evolution: the Tal1 stem cell enhancer recruited a MIR element to specifically boost its activity. Genome Res 2008; 18:1422-32. [PMID: 18687876 PMCID: PMC2527711 DOI: 10.1101/gr.077008.108] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Altered cis-regulation is thought to underpin much of metazoan evolution, yet the underlying mechanisms remain largely obscure. The stem cell leukemia TAL1 (also known as SCL) transcription factor is essential for the normal development of blood stem cells and we have previously shown that the Tal1 +19 enhancer directs expression to hematopoietic stem cells, hematopoietic progenitors, and to endothelium. Here we demonstrate that an adjacent region 1 kb upstream (+18 element) is in an open chromatin configuration and carries active histone marks but does not function as an enhancer in transgenic mice. Instead, it boosts activity of the +19 enhancer both in stable transfection assays and during differentiation of embryonic stem (ES) cells carrying single-copy reporter constructs targeted to the Hprt locus. The +18 element contains a mammalian interspersed repeat (MIR) which is essential for the +18 function and which was transposed to the Tal1 locus approximately 160 million years ago at the time of the mammalian/marsupial branchpoint. Our data demonstrate a previously unrecognized mechanism whereby enhancer activity is modulated by a transposon exerting a "booster" function which would go undetected by conventional transgenic approaches.
Collapse
Affiliation(s)
- Aileen M Smith
- University of Cambridge Department of Haematology, Cambridge Institute for Medical Research, Cambridge CB2 2XY, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
39
|
Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, Warburton PE. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol 2008; 3:e137. [PMID: 17630829 PMCID: PMC1914374 DOI: 10.1371/journal.pcbi.0030137] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 05/31/2007] [Indexed: 01/30/2023] Open
Abstract
The constant bombardment of mammalian genomes by transposable elements (TEs) has resulted in TEs comprising at least 45% of the human genome. Because of their great age and abundance, TEs are important in comparative phylogenomics. However, estimates of TE age were previously based on divergence from derived consensus sequences or phylogenetic analysis, which can be unreliable, especially for older more diverged elements. Therefore, a novel genome-wide analysis of TE organization and fragmentation was performed to estimate TE age independently of sequence composition and divergence or the assumption of a constant molecular clock. Analysis of TEs in the human genome revealed ∼600,000 examples where TEs have transposed into and fragmented other TEs, covering >40% of all TEs or ∼542 Mbp of genomic sequence. The relative age of these TEs over evolutionary time is implicit in their organization, because newer TEs have necessarily transposed into older TEs that were already present. A matrix of the number of times that each TE has transposed into every other TE was constructed, and a novel objective function was developed that derived the chronological order and relative ages of human TEs spanning >100 million years. This method has been used to infer the relative ages across all four major TE classes, including the oldest, most diverged elements. Analysis of DNA transposons over the history of the human genome has revealed the early activity of some MER2 transposons, and the relatively recent activity of MER1 transposons during primate lineages. The TEs from six additional mammalian genomes were defragmented and analyzed. Pairwise comparison of the independent chronological orders of TEs in these mammalian genomes revealed species phylogeny, the fact that transposons shared between genomes are older than species-specific transposons, and a subset of TEs that were potentially active during periods of speciation. Transposable elements (TEs) are interspersed repetitive DNA families that are capable of copying themselves from place to place; they have literally infested our genome over evolutionary time, and now comprise as much as 45% of our total DNA. Because of their great age and abundance, TEs are important in evolutionary genomics. However, estimates of their age based on DNA sequence composition have been unreliable, especially for older more diverged elements. Therefore, a novel method to estimate the age of TEs was developed based on the fact that as TEs spread throughout the genome, they inserted into and fragmented older TEs that were already present. Therefore, the age of TEs can be revealed by how often they have been fragmented over evolutionary time. We performed a genome-wide defragmention of TEs, and developed a novel objective function to derive the chronological order of TEs spanning >100 million years. This method has been used to infer the relative ages of TEs from seven sequenced mammalian genomes across all four major TE classes, including the oldest, most diverged elements. This age estimate is independent of TE sequence composition or divergence and does not rely on the assumption of a constant molecular clock. This study provides a novel analysis of the evolutionary history of some of the most abundant and ancient repetitive DNA elements in mammalian genomes, which is important for understanding the dynamic forces that shape our genomes during evolution.
Collapse
Affiliation(s)
- Joti Giordano
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Yongchao Ge
- Department of Neurology, Mount Sinai School of Medicine, New York, New York, United States of America
- Center for Translational Systems Biology, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Yevgeniy Gelfand
- Laboratory for Biocomputing and Informatics, Boston University, Boston, Massachusetts, United States of America
| | - György Abrusán
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Gary Benson
- Departments of Computer Science and Biology, Boston University, Boston, Massachusetts, United States of America
| | - Peter E Warburton
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
40
|
Rose D, Hertel J, Reiche K, Stadler PF, Hackermüller J. NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences. Genomics 2008; 92:65-74. [PMID: 18511233 DOI: 10.1016/j.ygeno.2008.04.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2007] [Revised: 04/09/2008] [Accepted: 04/09/2008] [Indexed: 10/22/2022]
Abstract
Genome-wide multiple sequence alignments (MSAs) are a necessary prerequisite for an increasingly diverse collection of comparative genomic approaches. Here we present a versatile method that generates high-quality MSAs for non-protein-coding sequences. The NcDNAlign pipeline combines pairwise BLAST alignments to create initial MSAs, which are then locally improved and trimmed. The program is optimized for speed and hence is particulary well-suited to pilot studies. We demonstrate the practical use of NcDNAlign in three case studies: the search for ncRNAs in gammaproteobacteria and the analysis of conserved noncoding DNA in nematodes and teleost fish, in the latter case focusing on the fate of duplicated ultra-conserved regions. Compared to the currently widely used genome-wide alignment program TBA, our program results in a 20- to 30-fold reduction of CPU time necessary to generate gammaproteobacterial alignments. A showcase application of bacterial ncRNA prediction based on alignments of both algorithms results in similar sensitivity, false discovery rates, and up to 100 putatively novel ncRNA structures. Similar findings hold for our application of NcDNAlign to the identification of ultra-conserved regions in nematodes and teleosts. Both approaches yield conserved sequences of unknown function, result in novel evolutionary insights into conservation patterns among these genomes, and manifest the benefits of an efficient and reliable genome-wide alignment package. The software is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/.
Collapse
Affiliation(s)
- Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | | | | | | | | |
Collapse
|
41
|
Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA. BMC Genomics 2008; 9:226. [PMID: 18485226 PMCID: PMC2397414 DOI: 10.1186/1471-2164-9-226] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Accepted: 05/17/2008] [Indexed: 12/14/2022] Open
Abstract
Background The majority of human non-protein-coding DNA is made up of repetitive sequences, mainly transposable elements (TEs). It is becoming increasingly apparent that many of these repetitive DNA sequence elements encode gene regulatory functions. This fact has important evolutionary implications, since repetitive DNA is the most dynamic part of the genome. We set out to assess the evolutionary rate and pattern of experimentally characterized human transcription factor binding sites (TFBS) that are derived from repetitive versus non-repetitive DNA to test whether repeat-derived TFBS are in fact rapidly evolving. We also evaluated the position-specific patterns of variation among TFBS to look for signs of functional constraint on TFBS derived from repetitive and non-repetitive DNA. Results We found numerous experimentally characterized TFBS in the human genome, 7–10% of all mapped sites, which are derived from repetitive DNA sequences including simple sequence repeats (SSRs) and TEs. TE-derived TFBS sequences are far less conserved between species than TFBS derived from SSRs and non-repetitive DNA. Despite their rapid evolution, several lines of evidence indicate that TE-derived TFBS are functionally constrained. First of all, ancient TE families, such as MIR and L2, are enriched for TFBS relative to younger families like Alu and L1. Secondly, functionally important positions in TE-derived TFBS, specifically those residues thought to physically interact with their cognate protein binding factors (TF), are more evolutionarily conserved than adjacent TFBS positions. Finally, TE-derived TFBS show position-specific patterns of sequence variation that are highly distinct from random patterns and similar to the variation seen for non-repeat derived sequences of the same TFBS. Conclusion The abundance of experimentally characterized human TFBS that are derived from repetitive DNA speaks to the substantial regulatory effects that this class of sequence has on the human genome. The unique evolutionary properties of repeat-derived TFBS are perhaps even more intriguing. TE-derived TFBS in particular, while clearly functionally constrained, evolve extremely rapidly relative to non-repeat derived sites. Such rapidly evolving TFBS are likely to confer species-specific regulatory phenotypes, i.e. divergent expression patterns, on the human evolutionary lineage. This result has practical implications with respect to the widespread use of evolutionary conservation as a surrogate for functionally relevant non-coding DNA. Most TE-derived TFBS would be missed using the kinds of sequence conservation-based screens, such as phylogenetic footprinting, that are used to help characterize non-coding DNA. Thus, the very TFBS that are most likely to yield human-specific characteristics will be neglected by the comparative genomic techniques that are currently de rigeur for the identification of novel regulatory sites.
Collapse
|
42
|
Abstract
The control and coordination of eukaryotic gene expression rely on transcriptional and post-transcriptional regulatory networks. Although progress has been made in mapping the components and deciphering the function of these networks, the mechanisms by which such intricate circuits originate and evolve remain poorly understood. Here I revisit and expand earlier models and propose that genomic repeats, and in particular transposable elements, have been a rich source of material for the assembly and tinkering of eukaryotic gene regulatory systems.
Collapse
Affiliation(s)
- Cédric Feschotte
- Department of Biology, Life Science Building, BOX 19498, University of Texas, Arlington, Texas 76019, USA.
| |
Collapse
|
43
|
Cooper GM, Brown CD. Qualifying the relationship between sequence conservation and molecular function. Genome Res 2008; 18:201-5. [PMID: 18245453 DOI: 10.1101/gr.7205808] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Quantification of evolutionary constraints via sequence conservation can be leveraged to annotate genomic functional sequences. Recent efforts addressing the converse of this relationship have identified many sites in metazoan genomes with molecular function but without detectable conservation between related species. Here, we discuss explanations and implications for these results considering both practical and theoretical issues. In particular, phylogenetic scope influences the relationship between sequence conservation and function. Comparisons of distantly related species can detect constraint with high specificity due to the loss of conserved neutral sequence, but sensitivity is sacrificed as a result of functional changes related to lineage-specific biology. The strength of natural selection operating on functional sequence is also important. Mutations to functional sequences that result in small fitness effects are subject to weaker constraints. Therefore, particularly when comparing highly divergent species, functional sequences that are degenerate or biologically redundant will be prone to turnover, wherein functional sequences are replaced by effectively equivalent, but nonorthologous counterparts. Finally, considering the size and complexity of metazoan genomes and the fact that many nonconserved sequences are associated with sequence-degenerate, low-level molecular functions, we find it likely that there exist many biochemically functional sequences that are not under constraint. This hypothesis does not lead to the conclusion that huge amounts of vertebrate genomes are functionally important, but rather that such "functionality" represents molecular noise that has weak or no effect on organismal phenotypes.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
44
|
Abstract
Retroposons, such as short interspersed elements (SINEs) and long interspersed elements (LINEs), are the major constituents of higher vertebrate genomes. Although there are many examples of retroposons' acquiring function, none has been implicated in the morphological innovations specific to a certain taxonomic group. We previously characterized a SINE family, AmnSINE1, members of which constitute a part of conserved noncoding elements (CNEs) in mammalian genomes. We proposed that this family acquired genomic functionality or was exapted after retropositioning in a mammalian ancestor. Here we identified 53 new AmnSINE1 loci and refined 124 total loci, two of which were further analyzed. Using a mouse enhancer assay, we demonstrate that one SINE locus, AS071, 178 kbp from the gene FGF8 (fibroblast growth factor 8), is an enhancer that recapitulates FGF8 expression in two regions of the developing forebrain, namely the diencephalon and the hypothalamus. Our gain-of-function analysis revealed that FGF8 expression in the diencephalon controls patterning of thalamic nuclei, which act as a relay center of the neocortex, suggesting a role for FGF8 in mammalian-specific forebrain patterning. Furthermore, we demonstrated that the locus, AS021, 392 kbp from the gene SATB2, controls gene expression in the lateral telencephalon, which is thought to be a signaling center during development. These results suggest important roles for SINEs in the development of the mammalian neuronal network, a part of which was initiated with the exaptation of AmnSINE1 in a common mammalian ancestor.
Collapse
|
45
|
Woolfe A, Elgar G. Organization of conserved elements near key developmental regulators in vertebrate genomes. ADVANCES IN GENETICS 2008; 61:307-38. [PMID: 18282512 DOI: 10.1016/s0065-2660(07)00012-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sequence conservation has traditionally been used as a means to target functional regions of complex genomes. In addition to its use in identifying coding regions of genes, the recent availability of whole genome data for a number of vertebrates has permitted high-resolution analyses of the noncoding "dark matter" of the genome. This has resulted in the identification of a large number of highly conserved sequence elements that appear to be preserved in all bony vertebrates. Further positional analysis of these conserved noncoding elements (CNEs) in the genome demonstrates that they cluster around genes involved in developmental regulation. This chapter describes the identification and characterization of these elements, with particular reference to their composition and organization.
Collapse
Affiliation(s)
- Adam Woolfe
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
| | | |
Collapse
|
46
|
Jurka J, Kapitonov VV, Kohany O, Jurka MV. Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet 2007; 8:241-59. [PMID: 17506661 DOI: 10.1146/annurev.genom.8.080706.092416] [Citation(s) in RCA: 238] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Eukaryotic genomes contain vast amounts of repetitive DNA derived from transposable elements (TEs). Large-scale sequencing of these genomes has produced an unprecedented wealth of information about the origin, diversity, and genomic impact of what was once thought to be "junk DNA." This has also led to the identification of two new classes of DNA transposons, Helitrons and Polintons, as well as several new superfamilies and thousands of new families. TEs are evolutionary precursors of many genes, including RAG1, which plays a role in the vertebrate immune system. They are also the driving force in the evolution of epigenetic regulation and have a long-term impact on genomic stability and evolution. Remnants of TEs appear to be overrepresented in transcription regulatory modules and other regions conserved among distantly related species, which may have implications for our understanding of their impact on speciation.
Collapse
Affiliation(s)
- Jerzy Jurka
- Genetic Information Research Institute, Mountain View, California 94043, USA.
| | | | | | | |
Collapse
|
47
|
Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci U S A 2007; 104:18613-8. [PMID: 18003932 DOI: 10.1073/pnas.0703637104] [Citation(s) in RCA: 288] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The evolutionary forces that establish and hone target gene networks of transcription factors are largely unknown. Transposition of retroelements may play a role, but its global importance, beyond a few well described examples for isolated genes, is not clear. We report that LTR class I endogenous retrovirus (ERV) retroelements impact considerably the transcriptional network of human tumor suppressor protein p53. A total of 1,509 of approximately 319,000 human ERV LTR regions have a near-perfect p53 DNA binding site. The LTR10 and MER61 families are particularly enriched for copies with a p53 site. These ERV families are primate-specific and transposed actively near the time when the New World and Old World monkey lineages split. Other mammalian species lack these p53 response elements. Analysis of published genomewide ChIP data for p53 indicates that more than one-third of identified p53 binding sites are accounted for by ERV copies with a p53 site. ChIP and expression studies for individual genes indicate that human ERV p53 sites are likely part of the p53 transcriptional program and direct regulation of p53 target genes. These results demonstrate how retroelements can significantly shape the regulatory network of a transcription factor in a species-specific manner.
Collapse
|
48
|
Santangelo AM, de Souza FSJ, Franchini LF, Bumaschny VF, Low MJ, Rubinstein M. Ancient exaptation of a CORE-SINE retroposon into a highly conserved mammalian neuronal enhancer of the proopiomelanocortin gene. PLoS Genet 2007; 3:1813-26. [PMID: 17922573 PMCID: PMC2000970 DOI: 10.1371/journal.pgen.0030166] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Accepted: 08/15/2007] [Indexed: 02/01/2023] Open
Abstract
The proopiomelanocortin gene (POMC) is expressed in the pituitary gland and the ventral hypothalamus of all jawed vertebrates, producing several bioactive peptides that function as peripheral hormones or central neuropeptides, respectively. We have recently determined that mouse and human POMC expression in the hypothalamus is conferred by the action of two 5′ distal and unrelated enhancers, nPE1 and nPE2. To investigate the evolutionary origin of the neuronal enhancer nPE2, we searched available vertebrate genome databases and determined that nPE2 is a highly conserved element in placentals, marsupials, and monotremes, whereas it is absent in nonmammalian vertebrates. Following an in silico paleogenomic strategy based on genome-wide searches for paralog sequences, we discovered that opossum and wallaby nPE2 sequences are highly similar to members of the superfamily of CORE-short interspersed nucleotide element (SINE) retroposons, in particular to MAR1 retroposons that are widely present in marsupial genomes. Thus, the neuronal enhancer nPE2 originated from the exaptation of a CORE-SINE retroposon in the lineage leading to mammals and remained under purifying selection in all mammalian orders for the last 170 million years. Expression studies performed in transgenic mice showed that two nonadjacent nPE2 subregions are essential to drive reporter gene expression into POMC hypothalamic neurons, providing the first functional example of an exapted enhancer derived from an ancient CORE-SINE retroposon. In addition, we found that this CORE-SINE family of retroposons is likely to still be active in American and Australian marsupial genomes and that several highly conserved exonic, intronic and intergenic sequences in the human genome originated from the exaptation of CORE-SINE retroposons. Together, our results provide clear evidence of the functional novelties that transposed elements contributed to their host genomes throughout evolution. One of the most striking observations derived from the genomic era is the overwhelming contribution of transposed elements to mammalian genomes. For example, 45% of the human genome is derived from mobile element fragments. Although historically viewed as “junk DNA,” transposed elements could also contribute to novel advantageous functional elements in their host genomes, a process called exaptation. Functionally proven examples of exaptation derived from ancient retroposition events are rare. Using an in silico paleogenomic strategy, we unraveled the evolutionary origin of nPE2, a neuronal enhancer of the proopiomelancortin gene that participates in the production of hypothalamic peptides involved in feeding behavior and stress-induced analgesia. We demonstrate that nPE2 originated from the exaptation of a SINE retroposon in the lineage leading to mammals and remained under purifying selection for the last 170 million years. The difficulty in detecting nPE2 origin as an exapted retroposon illustrates the underestimation of this phenomenon and encourages the finding of the many thousands of retroposon-derived functional elements still hidden within the genomes. Their discovery will contribute to a better understanding of the dynamics of gene evolution and, at a larger scale, the origin of macroevolutionary novelties that lead to the appearance of new species, orders, or classes.
Collapse
Affiliation(s)
- Andrea M Santangelo
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Flávio S. J de Souza
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Lucía F Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Viviana F Bumaschny
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Malcolm J Low
- Center for the Study of Weight Regulation and Associated Disorders, Portland, Oregon, United States of America
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, Oregon, United States of America
| | - Marcelo Rubinstein
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Center for the Study of Weight Regulation and Associated Disorders, Portland, Oregon, United States of America
- Departmento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
- Centro de Estudios Científicos, Valdivia, Chile
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
49
|
Abstract
SUMMARY
It is usually thought that the development of complex organisms is controlled by protein regulatory factors and morphogenetic signals exchanged between cells and differentiating tissues during ontogeny. However, it is now evident that the majority of all animal genomes is transcribed, apparently in a developmentally regulated manner, suggesting that these genomes largely encode RNA machines and that there may be a vast hidden layer of RNA regulatory transactions in the background. I propose that the epigenetic trajectories of differentiation and development are primarily programmed by feed-forward RNA regulatory networks and that most of the information required for multicellular development is embedded in these networks, with cell–cell signalling required to provide important positional information and to correct stochastic errors in the endogenous RNA-directed program.
Collapse
Affiliation(s)
- John S Mattick
- ARC Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia QLD 4072, Australia.
| |
Collapse
|
50
|
Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SMJ, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Graves JAM, Ponting CP, Breen M, Samollow PB, Lander ES, Lindblad-Toh K. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 2007; 447:167-77. [PMID: 17495919 DOI: 10.1038/nature05805] [Citation(s) in RCA: 508] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 04/03/2007] [Indexed: 12/15/2022]
Abstract
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
Collapse
Affiliation(s)
- Tarjei S Mikkelsen
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|