1
|
Frith MC. Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements. Mol Biol Evol 2022; 39:6555113. [PMID: 35348724 PMCID: PMC9004415 DOI: 10.1093/molbev/msac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomes hold a treasure trove of protein fossils: fragments of formerly protein-coding DNA, which mainly come from transposable elements (TEs) or host genes. These fossils reveal ancient evolution of TEs and genomes, and many fossils have been exapted to perform diverse functions important for the host's fitness. However, old and highly-degraded fossils are hard to identify, standard methods (e.g. BLAST) are not optimized for this task, and few Paleozoic protein fossils have been found. Here, a recently optimized method is used to find protein fossils in vertebrate genomes. It finds Paleozoic fossils predating the amphibian/amniote divergence from most major TE categories, including virus-related Polinton and Gypsy elements. It finds 10 fossils in the human genome (8 from TEs and 2 from host genes) that predate the last common ancestor of all jawed vertebrates, probably from the Ordovician period. It also finds types of transposon and retrotransposon not found in human before. These fossils have extreme sequence conservation, indicating exaptation: some have evidence of gene-regulatory function, and they tend to lienearest to developmental genes. Some ancient fossils suggest "genome tectonics", where two fragments of one TE have drifted apart by up to megabases, possibly explaining gene deserts and large introns. This paints a picture of great TE diversity in our aquatic ancestors, with patchy TE inheritance by later vertebrates, producing new genes and regulatory elements on the way. Host-gene fossils too have contributed anciently-conserved DNA segments. This paves the way to further studies of ancient protein fossils.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.,Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
2
|
Berrio A, Haygood R, Wray GA. Identifying branch-specific positive selection throughout the regulatory genome using an appropriate proxy neutral. BMC Genomics 2020; 21:359. [PMID: 32404186 PMCID: PMC7222330 DOI: 10.1186/s12864-020-6752-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 04/21/2020] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Adaptive changes in cis-regulatory elements are an essential component of evolution by natural selection. Identifying adaptive and functional noncoding DNA elements throughout the genome is therefore crucial for understanding the relationship between phenotype and genotype. RESULTS We used ENCODE annotations to identify appropriate proxy neutral sequences and demonstrate that the conservativeness of the test can be modulated during the filtration of reference alignments. We applied the method to noncoding Human Accelerated Elements as well as open chromatin elements previously identified in 125 human tissues and cell lines to demonstrate its utility. Then, we evaluated the impact of query region length, proxy neutral sequence length, and branch count on test sensitivity and specificity. We found that the length of the query alignment can vary between 150 bp and 1 kb without affecting the estimation of selection, while for the reference alignment, we found that a length of 3 kb is adequate for proper testing. We also simulated sequence alignments under different classes of evolution and validated our ability to distinguish positive selection from relaxation of constraint and neutral evolution. Finally, we re-confirmed that a quarter of all non-coding Human Accelerated Elements are evolving by positive selection. CONCLUSION Here, we introduce a method we called adaptiPhy, which adds significant improvements to our earlier method that tests for branch-specific directional selection in noncoding sequences. The motivation for these improvements is to provide a more sensitive and better targeted characterization of directional selection and neutral evolution across the genome.
Collapse
Affiliation(s)
- Alejandro Berrio
- Department of Biology, Duke University, Biological Sciences Building, 124 Science Drive, Durham, NC, 27708, USA.
| | - Ralph Haygood
- Ronin Institute for Independent Scholarship, 127 Haddon Pl., Montclair, NJ, 07043, USA
| | - Gregory A Wray
- Department of Biology, Duke University, Biological Sciences Building, 124 Science Drive, Durham, NC, 27708, USA
| |
Collapse
|
3
|
Zeng Y, Cao Y, Halevy RS, Nguyen P, Liu D, Zhang X, Ahituv N, Han JDJ. Characterization of functional transposable element enhancers in acute myeloid leukemia. SCIENCE CHINA-LIFE SCIENCES 2020; 63:675-687. [PMID: 32170627 DOI: 10.1007/s11427-019-1574-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 10/24/2019] [Indexed: 12/15/2022]
Abstract
Transposable elements (TEs) have been shown to have important gene regulatory functions and their alteration could lead to disease phenotypes. Acute myeloid leukemia (AML) develops as a consequence of a series of genetic changes in hematopoietic precursor cells, including mutations in epigenetic factors. Here, we set out to study the gene regulatory role of TEs in AML. We first explored the epigenetic landscape of TEs in AML patients using ATAC-seq data. We show that a large number of TEs in general, and more specifically mammalian-wide interspersed repeats (MIRs), are more enriched in AML cells than in normal blood cells. We obtained a similar finding when analyzing histone modification data in AML patients. Gene Ontology enrichment analysis showed that genes near MIRs in open chromatin regions are involved in leukemogenesis. To functionally validate their regulatory role, we selected 19 MIR regions in AML cells, and tested them for enhancer activity in an AML cell line (Kasumi-1) and a chronic myeloid leukemia (CML) cell line (K562); the results revealed several MIRs to be functional enhancers. Taken together, our results suggest that TEs are potentially involved in myeloid leukemogenesis and highlight these sequences as potential candidates harboring AML-associated variation.
Collapse
Affiliation(s)
- Yingying Zeng
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yaqiang Cao
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Rivka Sukenik Halevy
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA.,Sackler School of Medicine, Tel-Aviv University, Tel Aviv, 6997801, Israel
| | - Picard Nguyen
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA
| | - Denghui Liu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Xiaoli Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA. .,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA.
| | - Jing-Dong J Han
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. .,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
4
|
|
5
|
Buckley RM, Kortschak RD, Adelson DL. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse. PLoS Comput Biol 2018; 14:e1006091. [PMID: 29677183 PMCID: PMC5931693 DOI: 10.1371/journal.pcbi.1006091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 05/02/2018] [Accepted: 03/15/2018] [Indexed: 12/31/2022] Open
Abstract
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or "churning" in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - R. Daniel Kortschak
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - David L. Adelson
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
- * E-mail:
| |
Collapse
|
6
|
Venuto D, Bourque G. Identifying co-opted transposable elements using comparative epigenomics. Dev Growth Differ 2018; 60:53-62. [PMID: 29363107 DOI: 10.1111/dgd.12423] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 12/08/2017] [Indexed: 12/19/2022]
Abstract
The human genome gives rise to different epigenomic landscapes that define each cell type and can be deregulated in disease. Recent efforts by ENCODE, the NIH Roadmap and the International Human Epigenome Consortium (IHEC) have made significant advances towards assembling reference epigenomic maps of various tissues. Notably, these projects have found that approximately 80% of human DNA was biochemically active in at least one epigenomic assay while only approximately 10% of the sequence displayed signs of purifying selection. Given that transposable elements (TEs) make up at least 50% of the human genome and can be actively transcribed or act as regulatory elements either for their own purposes or be co-opted for the benefit of their host; we are interested in exploring their overall contribution to the "functional" genome. Traditional methods used to identify functional DNA have relied on comparative genomics, conservation analysis and low throughput validation assays. To discover co-opted TEs, and distinguish them from noisy genomic elements, we argue that comparative epigenomic methods will also be important.
Collapse
Affiliation(s)
- David Venuto
- Department of Human Genetics, McGill University, Montréal, H3A 1B1, Québec, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, H3A 1B1, Québec, Canada.,Canadian Center for Computational Genomics, Montréal, H3A 0G1, Québec, Canada.,McGill University and Génome Québec Innovation Center, Montréal, H3A 0G1, Québec, Canada
| |
Collapse
|
7
|
Polychronopoulos D, King JWD, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res 2018; 45:12611-12624. [PMID: 29121339 PMCID: PMC5728398 DOI: 10.1093/nar/gkx1074] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022] Open
Abstract
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks–dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to diseases linked with development, and cancer. The emergence, evolutionary dynamics and functions of CNEs still remain poorly understood, and new approaches are required to enable comprehensive CNE identification and characterization. Here, we review current knowledge and identify challenges that need to be tackled to resolve the impasse in understanding extreme non-coding conservation.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - James W D King
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Alexander J Nash
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Ge Tan
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Boris Lenhard
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK.,Sars International Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| |
Collapse
|
8
|
Harmston N, Ing-Simmons E, Tan G, Perry M, Merkenschlager M, Lenhard B. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat Commun 2017; 8:441. [PMID: 28874668 PMCID: PMC5585340 DOI: 10.1038/s41467-017-00524-5] [Citation(s) in RCA: 108] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Accepted: 07/05/2017] [Indexed: 02/08/2023] Open
Abstract
Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. Here, we show that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of noncoding conservation exhibit distinct properties compared to TADs devoid of extreme noncoding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years. Metazoan genomes contain many clusters of conserved noncoding elements. Here, the authors provide evidence that these clusters coincide with distinct topologically associating domains in humans and Drosophila, revealing a conserved regulatory genomic architecture.
Collapse
Affiliation(s)
- Nathan Harmston
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK. .,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK. .,Program in Cardiovascular and Metabolic Disease, Duke-NUS Graduate Medical School, 8 College Road, Singapore, 169857, Singapore.
| | - Elizabeth Ing-Simmons
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK.,Lymphocyte Development, MRC London Institute of Medical Sciences, London, W12 0NN, UK
| | - Ge Tan
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Malcolm Perry
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Matthias Merkenschlager
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK.,Lymphocyte Development, MRC London Institute of Medical Sciences, London, W12 0NN, UK
| | - Boris Lenhard
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK. .,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, W12 0NN, UK. .,Sars International Centre for Marine Molecular Biology, University of Bergen, N-5008, Bergen, Norway.
| |
Collapse
|
9
|
Rayan NA, Del Rosario RCH, Prabhakar S. Massive contribution of transposable elements to mammalian regulatory sequences. Semin Cell Dev Biol 2016; 57:51-56. [PMID: 27174439 DOI: 10.1016/j.semcdb.2016.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 05/06/2016] [Accepted: 05/06/2016] [Indexed: 12/17/2022]
Abstract
Barbara McClintock discovered the existence of transposable elements (TEs) in the late 1940s and initially proposed that they contributed to the gene regulatory program of higher organisms. This controversial idea gained acceptance only much later in the 1990s, when the first examples of TE-derived promoter sequences were uncovered. It is now known that half of the human genome is recognizably derived from TEs. It is thus important to understand the scope and nature of their contribution to gene regulation. Here, we provide a timeline of major discoveries in this area and discuss how transposons have revolutionized our understanding of mammalian genomes, with a special emphasis on the massive contribution of TEs to primate evolution. Our analysis of primate-specific functional elements supports a simple model for the rate at which new functional elements arise in unique and TE-derived DNA. Finally, we discuss some of the challenges and unresolved questions in the field, which need to be addressed in order to fully characterize the impact of TEs on gene regulation, evolution and disease processes.
Collapse
Affiliation(s)
- Nirmala Arul Rayan
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, 60 Biopolis Street, 138672, Singapore
| | - Ricardo C H Del Rosario
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 75 Ames St., Cambridge, MA 02142, USA
| | - Shyam Prabhakar
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, 60 Biopolis Street, 138672, Singapore.
| |
Collapse
|
10
|
Chandrashekar DS, Dey P, Acharya KK. GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species. PLoS One 2015. [PMID: 26208093 PMCID: PMC4514817 DOI: 10.1371/journal.pone.0133647] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Background Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses. Result We developed the ‘Genomic Repeat Element Analyzer for Mammals’ (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species. Conclusion GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/.
Collapse
Affiliation(s)
- Darshan Shimoga Chandrashekar
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Biotech Park, Electronic City, Bengaluru (Bangalore), 560100, Karnataka state, India
- Manipal University, Manipal, 576104, Karnataka state, India
| | - Poulami Dey
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Biotech Park, Electronic City, Bengaluru (Bangalore), 560100, Karnataka state, India
- Manipal University, Manipal, 576104, Karnataka state, India
| | - Kshitish K. Acharya
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Biotech Park, Electronic City, Bengaluru (Bangalore), 560100, Karnataka state, India
- Shodhaka Life Sciences Pvt. Ltd., IBAB, Biotech Park, Bengaluru (Bangalore), 560100, Karnataka state, India
- * E-mail:
| |
Collapse
|
11
|
Lynch VJ, Nnamani MC, Kapusta A, Brayer K, Plaza SL, Mazur EC, Emera D, Sheikh SZ, Grützner F, Bauersachs S, Graf A, Young SL, Lieb JD, DeMayo FJ, Feschotte C, Wagner GP. Ancient transposable elements transformed the uterine regulatory landscape and transcriptome during the evolution of mammalian pregnancy. Cell Rep 2015; 10:551-61. [PMID: 25640180 PMCID: PMC4447085 DOI: 10.1016/j.celrep.2014.12.052] [Citation(s) in RCA: 181] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 11/14/2014] [Accepted: 12/22/2014] [Indexed: 11/24/2022] Open
Abstract
A major challenge in biology is determining how evolutionarily novel characters originate; however, mechanistic explanations for the origin of new characters are almost completely unknown. The evolution of pregnancy is an excellent system in which to study the origin of novelties because mammals preserve stages in the transition from egg laying to live birth. To determine the molecular bases of this transition, we characterized the pregnant/gravid uterine transcriptome from tetrapods to trace the evolutionary history of uterine gene expression. We show that thousands of genes evolved endometrial expression during the origins of mammalian pregnancy, including genes that mediate maternal-fetal communication and immunotolerance. Furthermore, thousands of cis-regulatory elements that mediate decidualization and cell-type identity in decidualized stromal cells are derived from ancient mammalian transposable elements (TEs). Our results indicate that one of the defining mammalian novelties evolved from DNA sequences derived from ancient mammalian TEs coopted into hormone-responsive regulatory elements distributed throughout the genome.
Collapse
Affiliation(s)
- Vincent J Lynch
- Department of Human Genetics, The University of Chicago, 920 East 58(th) Street, CLSC 319C, Chicago, IL 60637, USA.
| | - Mauris C Nnamani
- Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Aurélie Kapusta
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Kathryn Brayer
- Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Silvia L Plaza
- Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Erik C Mazur
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Deena Emera
- Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| | - Shehzad Z Sheikh
- Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Frank Grützner
- The Robinson Institute, School of Molecular and Biomedical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
| | - Stefan Bauersachs
- Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, LMU Munich, Feodor Lynen Strasse 25, 81377 Munich, Germany
| | - Alexander Graf
- Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, LMU Munich, Feodor Lynen Strasse 25, 81377 Munich, Germany
| | - Steven L Young
- Department of Obstetrics and Gynecology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
| | - Jason D Lieb
- Department of Human Genetics, The University of Chicago, 920 East 58(th) Street, CLSC 319C, Chicago, IL 60637, USA
| | - Francesco J DeMayo
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Cédric Feschotte
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Günter P Wagner
- Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
12
|
del Rosario RCH, Rayan NA, Prabhakar S. Noncoding origins of anthropoid traits and a new null model of transposon functionalization. Genome Res 2014; 24:1469-84. [PMID: 25043600 PMCID: PMC4158753 DOI: 10.1101/gr.168963.113] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Little is known about novel genetic elements that drove the emergence of anthropoid primates. We exploited the sequencing of the marmoset genome to identify 23,849 anthropoid-specific constrained (ASC) regions and confirmed their robust functional signatures. Of the ASC base pairs, 99.7% were noncoding, suggesting that novel anthropoid functional elements were overwhelmingly cis-regulatory. ASCs were highly enriched in loci associated with fetal brain development, motor coordination, neurotransmission, and vision, thus providing a large set of candidate elements for exploring the molecular basis of hallmark primate traits. We validated ASC192 as a primate-specific enhancer in proliferative zones of the developing brain. Unexpectedly, transposable elements (TEs) contributed to >56% of ASCs, and almost all TE families showed functional potential similar to that of nonrepetitive DNA. Three L1PA repeat-derived ASCs displayed coherent eye-enhancer function, thus demonstrating that the "gene-battery" model of TE functionalization applies to enhancers in vivo. Our study provides fundamental insights into genome evolution and the origins of anthropoid phenotypes and supports an elegantly simple new null model of TE exaptation.
Collapse
Affiliation(s)
- Ricardo C H del Rosario
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Nirmala Arul Rayan
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Shyam Prabhakar
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| |
Collapse
|
13
|
Wilkins AS, Wrangham RW, Fitch WT. The "domestication syndrome" in mammals: a unified explanation based on neural crest cell behavior and genetics. Genetics 2014; 197:795-808. [PMID: 25024034 PMCID: PMC4096361 DOI: 10.1534/genetics.114.165423] [Citation(s) in RCA: 344] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Charles Darwin, while trying to devise a general theory of heredity from the observations of animal and plant breeders, discovered that domesticated mammals possess a distinctive and unusual suite of heritable traits not seen in their wild progenitors. Some of these traits also appear in domesticated birds and fish. The origin of Darwin's "domestication syndrome" has remained a conundrum for more than 140 years. Most explanations focus on particular traits, while neglecting others, or on the possible selective factors involved in domestication rather than the underlying developmental and genetic causes of these traits. Here, we propose that the domestication syndrome results predominantly from mild neural crest cell deficits during embryonic development. Most of the modified traits, both morphological and physiological, can be readily explained as direct consequences of such deficiencies, while other traits are explicable as indirect consequences. We first show how the hypothesis can account for the multiple, apparently unrelated traits of the syndrome and then explore its genetic dimensions and predictions, reviewing the available genetic evidence. The article concludes with a brief discussion of some genetic and developmental questions raised by the idea, along with specific predictions and experimental tests.
Collapse
Affiliation(s)
- Adam S Wilkins
- Stellenbosch Institute of Advanced Study, Stellenbosch 7600, South Africa Institute of Theoretical Biology, Humboldt University zu Berlin, Berlin 10115, Germany
| | - Richard W Wrangham
- Stellenbosch Institute of Advanced Study, Stellenbosch 7600, South Africa Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | - W Tecumseh Fitch
- Department of Cognitive Biology, University of Vienna, A-1090 Vienna, Austria
| |
Collapse
|
14
|
Abstract
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Collapse
|
15
|
He S, Gu W, Li Y, Zhu H. ANRIL/CDKN2B-AS shows two-stage clade-specific evolution and becomes conserved after transposon insertions in simians. BMC Evol Biol 2013; 13:247. [PMID: 24225082 PMCID: PMC3831594 DOI: 10.1186/1471-2148-13-247] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 11/08/2013] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Many long non-coding RNA (lncRNA) genes identified in mammals have multiple exons and functional domains, allowing them to bind to polycomb proteins, DNA methyltransferases, and specific DNA sequences to regulate genome methylation. Little is known about the origin and evolution of lncRNAs. ANRIL/CDKN2B-AS consists of 19 exons on human chromosome 9p21 and regulates the expression of three cyclin-dependent kinase inhibitors (CDKN2A/ARF/CDKN2B). RESULTS ANRIL/CDKN2B-AS originated in placental mammals, obtained additional exons during mammalian evolution but gradually lost them during rodent evolution, and reached 19 exons only in simians. ANRIL lacks splicing signals in mammals. In simians, multiple transposons were inserted and transformed into exons of the ANRIL gene, after which ANRIL became highly conserved. A further survey reveals that multiple transposons exist in many lncRNAs. CONCLUSIONS ANRIL shows a two-stage, clade-specific evolutionary process and is fully developed only in simians. The domestication of multiple transposons indicates an impressive pattern of "evolutionary tinkering" and is likely to be important for ANRIL's structure and function. The evolution of lncRNAs and that of transposons may be highly co-opted in primates. Many lncRNAs may be functional only in simians.
Collapse
Affiliation(s)
| | | | | | - Hao Zhu
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Shatai Road, Guangzhou 510515, China.
| |
Collapse
|
16
|
Harmston N, Baresic A, Lenhard B. The mystery of extreme non-coding conservation. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130021. [PMID: 24218634 PMCID: PMC3826495 DOI: 10.1098/rstb.2013.0021] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regions of several dozen to several hundred base pairs of extreme conservation have been found in non-coding regions in all metazoan genomes. The distribution of these elements within and across genomes has suggested that many have roles as transcriptional regulatory elements in multi-cellular organization, differentiation and development. Currently, there is no known mechanism or function that would account for this level of conservation at the observed evolutionary distances. Previous studies have found that, while these regions are under strong purifying selection, and not mutational coldspots, deletion of entire regions in mice does not necessarily lead to identifiable changes in phenotype during development. These opposing findings lead to several questions regarding their functional importance and why they are under strong selection in the first place. In this perspective, we discuss the methods and techniques used in identifying and dissecting these regions, their observed patterns of conservation, and review the current hypotheses on their functional significance.
Collapse
Affiliation(s)
- Nathan Harmston
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London and MRC Clinical Sciences Centre, , Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | | | | |
Collapse
|
17
|
Wenger AM, Clarke SL, Notwell JH, Chung T, Tuteja G, Guturu H, Schaar BT, Bejerano G. The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS Genet 2013; 9:e1003728. [PMID: 24009522 PMCID: PMC3757057 DOI: 10.1371/journal.pgen.1003728] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 07/03/2013] [Indexed: 11/18/2022] Open
Abstract
Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
Collapse
Affiliation(s)
- Aaron M. Wenger
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Shoa L. Clarke
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - James H. Notwell
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, California, United States of America
| | - Bruce T. Schaar
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
18
|
Jacques PÉ, Jeyakani J, Bourque G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet 2013; 9:e1003504. [PMID: 23675311 PMCID: PMC3649963 DOI: 10.1371/journal.pgen.1003504] [Citation(s) in RCA: 222] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Accepted: 03/25/2013] [Indexed: 11/18/2022] Open
Abstract
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Collapse
Affiliation(s)
- Pierre-Étienne Jacques
- Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore
- Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Justin Jeyakani
- Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- McGill University and Génome Québec Innovation Center, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
19
|
Matvienko M, Kozik A, Froenicke L, Lavelle D, Martineau B, Perroud B, Michelmore R. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS One 2013; 8:e55913. [PMID: 23409088 PMCID: PMC3568094 DOI: 10.1371/journal.pone.0055913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 01/04/2013] [Indexed: 12/22/2022] Open
Abstract
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.
Collapse
Affiliation(s)
- Marta Matvienko
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Alexander Kozik
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Lutz Froenicke
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Dean Lavelle
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Belinda Martineau
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Bertrand Perroud
- Genome Center, University of California Davis, Davis, California, United States of America
| | - Richard Michelmore
- Genome Center, University of California Davis, Davis, California, United States of America
- Departments of Plant Sciences, Molecular and Cellular Biology, and Medical Microbiology and Immunology, University of California Davis, Davis, California, United States of America
| |
Collapse
|
20
|
Abstract
A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.
Collapse
Affiliation(s)
- David Simcha
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
| | | | | |
Collapse
|
21
|
Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates. Biol Direct 2012; 7:36. [PMID: 23098210 PMCID: PMC3500645 DOI: 10.1186/1745-6150-7-36] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 10/23/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations. RESULTS We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific. CONCLUSIONS The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift.
Collapse
|
22
|
Testori A, Caizzi L, Cutrupi S, Friard O, De Bortoli M, Cora' D, Caselle M. The role of Transposable Elements in shaping the combinatorial interaction of Transcription Factors. BMC Genomics 2012; 13:400. [PMID: 22897927 PMCID: PMC3478180 DOI: 10.1186/1471-2164-13-400] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 06/28/2012] [Indexed: 12/22/2022] Open
Abstract
Background In the last few years several studies have shown that Transposable Elements (TEs) in the human genome are significantly associated with Transcription Factor Binding Sites (TFBSs) and that in several cases their expansion within the genome led to a substantial rewiring of the regulatory network. Another important feature of the regulatory network which has been thoroughly studied is the combinatorial organization of transcriptional regulation. In this paper we combine these two observations and suggest that TEs, besides rewiring the network, also played a central role in the evolution of particular patterns of combinatorial gene regulation. Results To address this issue we searched for TEs overlapping Estrogen Receptor α (ERα) binding peaks in two publicly available ChIP-seq datasets from the MCF7 cell line corresponding to different modalities of exposure to estrogen. We found a remarkable enrichment of a few specific classes of Transposons. Among these a prominent role was played by MIR (Mammalian Interspersed Repeats) transposons. These TEs underwent a dramatic expansion at the beginning of the mammalian radiation and then stabilized. We conjecture that the special affinity of ERα for the MIR class of TEs could be at the origin of the important role assumed by ERα in Mammalians. We then searched for TFBSs within the TEs overlapping ChIP-seq peaks. We found a strong enrichment of a few precise combinations of TFBS. In several cases the corresponding Transcription Factors (TFs) were known cofactors of ERα, thus supporting the idea of a co-regulatory role of TFBS within the same TE. Moreover, most of these correlations turned out to be strictly associated to specific classes of TEs thus suggesting the presence of a well-defined "transposon code" within the regulatory network. Conclusions In this work we tried to shed light into the role of Transposable Elements (TEs) in shaping the regulatory network of higher eukaryotes. To test this idea we focused on a particular transcription factor: the Estrogen Receptor α (ERα) and we found that ERα preferentially targets a well defined set of TEs and that these TEs host combinations of transcriptional regulators involving several of known co-regulators of ERα. Moreover, a significant number of these TEs turned out to be conserved between human and mouse and located in the vicinity (and thus candidate to be regulators) of important estrogen-related genes.
Collapse
Affiliation(s)
- Alessandro Testori
- Center for Molecular Systems Biology, University of Turin, Turin, Candiolo I-10060, Italy.
| | | | | | | | | | | | | |
Collapse
|
23
|
Tashiro K, Teissier A, Kobayashi N, Nakanishi A, Sasaki T, Yan K, Tarabykin V, Vigier L, Sumiyama K, Hirakawa M, Nishihara H, Pierani A, Okada N. A mammalian conserved element derived from SINE displays enhancer properties recapitulating Satb2 expression in early-born callosal projection neurons. PLoS One 2011; 6:e28497. [PMID: 22174821 PMCID: PMC3234267 DOI: 10.1371/journal.pone.0028497] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 11/09/2011] [Indexed: 02/04/2023] Open
Abstract
Short interspersed repetitive elements (SINEs) are highly repeated sequences that account for a significant proportion of many eukaryotic genomes and are usually considered "junk DNA". However, we previously discovered that many AmnSINE1 loci are evolutionarily conserved across mammalian genomes, suggesting that they may have acquired significant functions involved in controlling mammalian-specific traits. Notably, we identified the AS021 SINE locus, located 390 kbp upstream of Satb2. Using transgenic mice, we showed that this SINE displays specific enhancer activity in the developing cerebral cortex. The transcription factor Satb2 is expressed by cortical neurons extending axons through the corpus callosum and is a determinant of callosal versus subcortical projection. Mouse mutants reveal a crucial function for Sabt2 in corpus callosum formation. In this study, we compared the enhancer activity of the AS021 locus with Satb2 expression during telencephalic development in the mouse. First, we showed that the AS021 enhancer is specifically activated in early-born Satb2(+) neurons. Second, we demonstrated that the activity of the AS021 enhancer recapitulates the expression of Satb2 at later embryonic and postnatal stages in deep-layer but not superficial-layer neurons, suggesting the possibility that the expression of Satb2 in these two subpopulations of cortical neurons is under genetically distinct transcriptional control. Third, we showed that the AS021 enhancer is activated in neurons projecting through the corpus callosum, as described for Satb2(+) neurons. Notably, AS021 drives specific expression in axons crossing through the ventral (TAG1(-)/NPY(+)) portion of the corpus callosum, confirming that it is active in a subpopulation of callosal neurons. These data suggest that exaptation of the AS021 SINE locus might be involved in enhancement of Satb2 expression, leading to the establishment of interhemispheric communication via the corpus callosum, a eutherian-specific brain structure.
Collapse
Affiliation(s)
- Kensuke Tashiro
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
| | - Anne Teissier
- Centre National de la Recherche Scientifique–Unité Mixte de Recherche 7592, Institut Jacques Monod, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Naoki Kobayashi
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
| | - Akiko Nakanishi
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
| | - Takeshi Sasaki
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
| | - Kuo Yan
- Department of Molecular Biology of Neuronal Signals, Max-Plank-Institute for Experimental Medicine, Göttingen, Germany
| | - Victor Tarabykin
- Department of Molecular Biology of Neuronal Signals, Max-Plank-Institute for Experimental Medicine, Göttingen, Germany
| | - Lisa Vigier
- Centre National de la Recherche Scientifique–Unité Mixte de Recherche 7592, Institut Jacques Monod, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Kenta Sumiyama
- National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Mika Hirakawa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
| | - Hidenori Nishihara
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
| | - Alessandra Pierani
- Centre National de la Recherche Scientifique–Unité Mixte de Recherche 7592, Institut Jacques Monod, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- * E-mail: (NO); (AP)
| | - Norihiro Okada
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
- * E-mail: (NO); (AP)
| |
Collapse
|
24
|
Franchini LF, López-Leal R, Nasif S, Beati P, Gelman DM, Low MJ, de Souza FJS, Rubinstein M. Convergent evolution of two mammalian neuronal enhancers by sequential exaptation of unrelated retroposons. Proc Natl Acad Sci U S A 2011; 108:15270-5. [PMID: 21876128 PMCID: PMC3174587 DOI: 10.1073/pnas.1104997108] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The proopiomelanocortin gene (POMC) is expressed in a group of neurons present in the arcuate nucleus of the hypothalamus. Neuron-specific POMC expression in mammals is conveyed by two distal enhancers, named nPE1 and nPE2. Previous transgenic mouse studies showed that nPE1 and nPE2 independently drive reporter gene expression to POMC neurons. Here, we investigated the evolutionary mechanisms that shaped not one but two neuron-specific POMC enhancers and tested whether nPE1 and nPE2 drive identical or complementary spatiotemporal expression patterns. Sequence comparison among representative genomes of most vertebrate classes and mammalian orders showed that nPE1 is a placental novelty. Using in silico paleogenomics we found that nPE1 originated from the exaptation of a mammalian-apparent LTR retrotransposon sometime between the metatherian/eutherian split (147 Mya) and the placental mammal radiation (≈ 90 Mya). Thus, the evolutionary origin of nPE1 differs, in kind and time, from that previously demonstrated for nPE2, which was exapted from a CORE-short interspersed nucleotide element (SINE) retroposon before the origin of prototherians, 166 Mya. Transgenic mice expressing the fluorescent markers tomato and EGFP driven by nPE1 or nPE2, respectively, demonstrated coexpression of both reporter genes along the entire arcuate nucleus. The onset of reporter gene expression guided by nPE1 and nPE2 was also identical and coincidental with the onset of Pomc expression in the presumptive mouse diencephalon. Thus, the independent exaptation of two unrelated retroposons into functional analogs regulating neuronal POMC expression constitutes an authentic example of convergent molecular evolution of cell-specific enhancers.
Collapse
Affiliation(s)
- Lucía F. Franchini
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
| | - Rodrigo López-Leal
- Centro de Estudios Científicos and Universidad Austral de Chile, Valdivia 5110466, Chile
| | - Sofía Nasif
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
| | - Paula Beati
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
| | - Diego M. Gelman
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
| | - Malcolm J. Low
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI 48105; and
| | - Flávio J. S. de Souza
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA Buenos Aires, Argentina
| | - Marcelo Rubinstein
- Instituto de Investigaciones en Ingeniería Genética y Biología Molecular, Consejo Nacional de Investigaciones Científicas y Técnicas, C1428ADN Buenos Aires, Argentina
- Departamento de Fisiología, Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA Buenos Aires, Argentina
| |
Collapse
|
25
|
Shin C, Nam JW, Farh KKH, Chiang HR, Shkumatava A, Bartel DP. Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 2010; 38:789-802. [PMID: 20620952 DOI: 10.1016/j.molcel.2010.06.005] [Citation(s) in RCA: 450] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 04/27/2010] [Accepted: 06/03/2010] [Indexed: 12/30/2022]
Abstract
Most metazoan microRNA (miRNA) target sites have perfect pairing to the seed region, located near the miRNA 5' end. Although pairing to the 3' region sometimes supplements seed matches or compensates for mismatches, pairing to the central region has been known to function only at rare sites that impart Argonaute-catalyzed mRNA cleavage. Here, we present "centered sites," a class of miRNA target sites that lack both perfect seed pairing and 3'-compensatory pairing and instead have 11-12 contiguous Watson-Crick pairs to the center of the miRNA. Although centered sites can impart mRNA cleavage in vitro (in elevated Mg(2+)), in cells they repress protein output without consequential Argonaute-catalyzed cleavage. Our study also identified extensively paired sites that are cleavage substrates in cultured cells and human brain. This expanded repertoire of cleavage targets and the identification of the centered site type help explain why central regions of many miRNAs are evolutionarily conserved.
Collapse
Affiliation(s)
- Chanseok Shin
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
26
|
Paquet Y, Anderson A. Sequence composition similarities with the 7SL RNA are highly predictive of functional genomic features. Nucleic Acids Res 2010; 38:4907-16. [PMID: 20392819 PMCID: PMC2926601 DOI: 10.1093/nar/gkq234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Transposable elements derived from the 7SL RNA gene, such as Alu elements in primates, have had remarkable success in several mammalian lineages. The results presented here show a broad spectrum of functions for genomic segments that display sequence composition similarities with the 7SL RNA gene. Using thoroughly documented loci, we report that DNaseI-hypersensitive sites can be singled out in large genomic sequences by an assessment of sequence composition similarities with the 7SL RNA gene. We apply a root word frequency approach to illustrate a distinctive relationship between the sequence of the 7SL RNA gene and several classes of functional genomic features that are not presumed to be of transposable origin. Transposable elements that show noticeable similarities with the 7SL sequence include Alu sequences, as expected, but also long terminal repeats and the 5′-untranslated regions of long interspersed repetitive elements. In sequences masked for repeated elements, we find, when using the 7SL RNA gene as query sequence, distinctive similarities with promoters, exons and distal gene regulatory regions. The latter being the most notoriously difficult to detect, this approach may be useful for finding genomic segments that have regulatory functions and that may have escaped detection by existing methods.
Collapse
Affiliation(s)
- Yanick Paquet
- Centre de recherche en cancérologie de l’Université Laval, L’Hôtel-Dieu de Québec, Centre hospitalier universitaire de Québec, Québec G1R 2J6 and Département de biologie, Université Laval, Québec G1K 7P4, Canada
| | - Alan Anderson
- Centre de recherche en cancérologie de l’Université Laval, L’Hôtel-Dieu de Québec, Centre hospitalier universitaire de Québec, Québec G1R 2J6 and Département de biologie, Université Laval, Québec G1K 7P4, Canada
- *To whom correspondence should be addressed. Tel: + 418 691 5281; Fax: +418 691 5439;
| |
Collapse
|
27
|
Jung CH, Makunin IV, Mattick JS. Identification of conserved Drosophila-specific euchromatin-restricted non-coding sequence motifs. Genomics 2010; 96:154-66. [PMID: 20595017 DOI: 10.1016/j.ygeno.2010.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 05/25/2010] [Accepted: 05/26/2010] [Indexed: 01/19/2023]
Abstract
Non-protein-coding DNA comprises the majority of animal genomes but its functions are largely unknown. We identified over 17,000 different tetranucleotide pairs in the Drosophila melanogaster genome that are over-represented at distances up to 100nt in conserved non-exonic sequences. Those exhibiting the highest information content in surrounding nucleotides were classified into five groups: tRNAs, motifs associated with histone genes, Suppressor-of-Hairy-wing binding sites, and two sets of previously unrecognized motifs (DLM3 and DLM4). There are hundreds to thousands of copies of DLM3 and DLM4, respectively, in the genome, located almost exclusively in non-coding regions. They have similar copy numbers among drosophilids, but are largely absent in other insects. DLM3 is likely a cis-regulatory element, whereas DLM4 sequences are capable of forming a short hairpin structure and are expressed as approximately 80nt RNAs. This work reports the existence of Drosophila genus-specific sequence motifs, and suggests that many more novel functional elements may be discovered in genomes using the general approach outlined herein.
Collapse
Affiliation(s)
- Chol-Hee Jung
- Institute for Molecular Bioscience, The University of Queensland, St Lucia QLD, Australia
| | | | | |
Collapse
|
28
|
Warnefors M, Pereira V, Eyre-Walker A. Transposable Elements: Insertion Pattern and Impact on Gene Expression Evolution in Hominids. Mol Biol Evol 2010; 27:1955-62. [DOI: 10.1093/molbev/msq084] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
29
|
Eory L, Halligan DL, Keightley PD. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol Biol Evol 2010; 27:177-92. [PMID: 19759235 DOI: 10.1093/molbev/msp219] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.
Collapse
Affiliation(s)
- Lél Eory
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | |
Collapse
|
30
|
Ponicsan SL, Kugel JF, Goodrich JA. Genomic gems: SINE RNAs regulate mRNA production. Curr Opin Genet Dev 2010; 20:149-55. [PMID: 20176473 DOI: 10.1016/j.gde.2010.01.004] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 01/15/2010] [Accepted: 01/24/2010] [Indexed: 01/22/2023]
Abstract
Mammalian short interspersed elements (SINEs) are abundant retrotransposons that have long been considered junk DNA; however, RNAs transcribed from mouse B2 and human Alu SINEs have recently been found to control mRNA production at multiple levels. Upon cell stress B2 and Alu RNAs bind RNA polymerase II (Pol II) and repress transcription of some protein-encoding genes. Bi-directional transcription of a B2 SINE establishes a boundary that places the growth hormone locus in a permissive chromatin state during mouse development. Alu RNAs embedded in Pol II transcripts can promote evolution and proteome diversity through exonization via alternative splicing. Given the diverse means by which SINE encoded RNAs impact production of mRNAs, this genomic junk is proving to contain hidden gems.
Collapse
Affiliation(s)
- Steven L Ponicsan
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, 80309-0215, USA
| | | | | |
Collapse
|
31
|
Transposable elements in gene regulation and in the evolution of vertebrate genomes. Curr Opin Genet Dev 2009; 19:607-12. [PMID: 19914058 DOI: 10.1016/j.gde.2009.10.013] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Revised: 10/20/2009] [Accepted: 10/26/2009] [Indexed: 01/30/2023]
Abstract
Repetitive DNA and in particular transposable elements have been intimately linked to eukaryotic genomes for millions of years. Once overlooked for being only a collection of selfish debris and a nuisance for sequence assembly, genomic repeats are now being recognized as a key driving force in genome evolution. Indeed, by changing the DNA landscape of genomes, transposable elements have been a rich source of innovation in genes, regulatory elements and genome structures. In this review, I will focus on recent advances that demonstrate that genomic repeats have had a global impact on vertebrate gene regulatory networks. I will also summarize results that show how transposable elements have been a major catalyst of structural rearrangements throughout evolution.
Collapse
|
32
|
Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat Rev Genet 2009; 10:691-703. [PMID: 19763152 DOI: 10.1038/nrg2640] [Citation(s) in RCA: 1127] [Impact Index Per Article: 75.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Their ability to move within genomes gives transposable elements an intrinsic propensity to affect genome evolution. Non-long terminal repeat (LTR) retrotransposons--including LINE-1, Alu and SVA elements--have proliferated over the past 80 million years of primate evolution and now account for approximately one-third of the human genome. In this Review, we focus on this major class of elements and discuss the many ways that they affect the human genome: from generating insertion mutations and genomic instability to altering gene expression and contributing to genetic innovation. Increasingly detailed analyses of human and other primate genomes are revealing the scale and complexity of the past and current contributions of non-LTR retrotransposons to genomic change in the human lineage.
Collapse
Affiliation(s)
- Richard Cordaux
- CNRS UMR 6556 Ecologie, Evolution, Symbiose, Université de Poitiers, 40 Avenue du Recteur Pineau, Poitiers, France
| | | |
Collapse
|
33
|
The plasticity of the mammalian transcriptome. Genomics 2009; 95:1-6. [PMID: 19716875 DOI: 10.1016/j.ygeno.2009.08.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2009] [Revised: 08/05/2009] [Accepted: 08/22/2009] [Indexed: 11/28/2022]
Abstract
The dogmatic view of RNA as a mere necessity in the transfer of information between DNA and proteins has during recent years come into question. Novel approaches and new technology has revealed an unprecedented level of inherent complexity in the mammalian transcriptome. Here, the majority of nucleotides are expressed, in sharp contrast to the approximately 1.2% of the human genome harboring protein coding information. Also, >50% of genomic loci contain antisense and interleaved transcription, a conservative estimate since non-coding RNA is highly regulated between tissues and developmental stages, which has only been investigated to a limited extent. Subsequent focus on RNA with no coding potential has revealed numerous species with novel functions, and deep sequencing studies imply that many remain to be discovered. This review gives an overview of the plasticity and dynamics of the mammalian transcriptome and the prevailing interpretation of its effect on the complexity of species.
Collapse
|
34
|
Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. Identifying novel constrained elements by exploiting biased substitution patterns. ACTA ACUST UNITED AC 2009; 25:i54-62. [PMID: 19478016 PMCID: PMC2687944 DOI: 10.1093/bioinformatics/btp190] [Citation(s) in RCA: 248] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection. Availability: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. Contact:xhx@ics.uci.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Manuel Garber
- Department of Biology, Broad Institute of MIT and Harvard, 7 Cambridge Center, MIT, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
35
|
Wang J, Bowen NJ, Mariño-Ramírez L, Jordan IK. A c-Myc regulatory subnetwork from human transposable element sequences. MOLECULAR BIOSYSTEMS 2009; 5:1831-9. [PMID: 19763338 DOI: 10.1039/b908494k] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Transposable elements (TEs) can donate regulatory sequences that help to control the expression of human genes. The oncogene c-Myc is a promiscuous transcription factor that is thought to regulate the expression of hundreds of genes. We evaluated the contribution of TEs to the c-Myc regulatory network by searching for c-Myc binding sites derived from TEs and by analyzing the expression and function of target genes with nearby TE-derived c-Myc binding sites. There are thousands of TE sequences in the human genome that are bound by c-Myc. A conservative analysis indicated that 816-4564 of these TEs contain canonical c-Myc binding site motifs. c-Myc binding sites are over-represented among sequences derived from the ancient TE families L2 and MIR, consistent with their preservation by purifying selection. Genes associated with TE-derived c-Myc binding sites are co-expressed with each other and with c-Myc. A number of these putative TE-derived c-Myc target genes are differentially expressed between Burkitt's lymphoma cells versus normal B cells and encode proteins with cancer-related functions. Despite several lines of evidence pointing to their regulation by c-Myc and relevance to cancer, the set of genes identified as TE-derived c-Myc targets does not significantly overlap with two previously characterized c-Myc target gene sets. These data point to a substantial contribution of TEs to the regulation of human genes by c-Myc. Genes that are regulated by TE-derived c-Myc binding sites appear to form a distinct c-Myc regulatory subnetwork.
Collapse
Affiliation(s)
- Jianrong Wang
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
36
|
Abstract
MOTIVATION Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate 'grammatical organization' of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites. Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features. RESULTS This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate our method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in F1 score. AVAILABILITY AND IMPLEMENTATION The code is publicly available at http://www.sailing.cs.cmu.edu/discover.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenjie Fu
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | |
Collapse
|
37
|
Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009; 9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open
Abstract
Background The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species. Results We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates. Conclusion Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.
Collapse
Affiliation(s)
- Hideo Imamura
- Boston College, Department of Biology, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
38
|
Xu L, Guo L, Shen Z, Loss G, Gish R, Wasilenko S, Mason AL. Duplication of MER115 on chromosome 4 in patients with primary biliary cirrhosis. Liver Int 2009; 29:375-83. [PMID: 19018986 DOI: 10.1111/j.1478-3231.2008.01888.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
BACKGROUND Primary biliary cirrhosis (PBC) is a complex disease with genetic and environmental influences. The disease is more prevalent in families with PBC and candidate gene case-control studies have linked PBC with DRB1(*)08 human leucocyte antigen class II alleles. AIMS The goal of this study was to characterize a MER115 intergenic region on chromosome 4 as a putative genetic variant associated with PBC. METHODS/RESULTS This region was incidentally identified during investigations to discover candidate microbial agents using representational difference analysis (RDA) with liver samples from patients with PBC and primary sclerosing cholangitis (PSC). blast search analysis of all the RDA products from the PBC liver revealed genomic sequences, whereas Escherichia coli, mycoplasma and hepatitis B virus DNA were found in the PSC liver. We identified one of the PBC RDA products as an ancestral repeat, referred to as MER115. Southern blot analysis with the PBC product uncovered a restriction fragment length polymorphism in PBC patients' liver. Southern blot hybridization signal showed increased signal intensity in PBC vs. control patients' DNA (P<0.005) and slot blot hybridization studies confirmed a copy number variation of the MER115 in hepatic DNA of PBC vs. control patients (P=0.02). CONCLUSIONS Further comparative genetic studies will be required to determine the extent of genomic duplication associated with MER115 and provide data on the possible copy number variants of genes close to this intergenic region in patients with PBC.
Collapse
Affiliation(s)
- Lizhe Xu
- PVSS, FADDL, APHIS, USDA, Greenport, NY, USA
| | | | | | | | | | | | | |
Collapse
|
39
|
Pereira V, Enard D, Eyre-Walker A. The effect of transposable element insertions on gene expression evolution in rodents. PLoS One 2009; 4:e4321. [PMID: 19183808 PMCID: PMC2629548 DOI: 10.1371/journal.pone.0004321] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2008] [Accepted: 11/24/2008] [Indexed: 01/04/2023] Open
Abstract
Background Many genomes contain a substantial number of transposable elements (TEs), a few of which are known to be involved in regulating gene expression. However, recent observations suggest that TEs may have played a very important role in the evolution of gene expression because many conserved non-genic sequences, some of which are know to be involved in gene regulation, resemble TEs. Results Here we investigate whether new TE insertions affect gene expression profiles by testing whether gene expression divergence between mouse and rat is correlated to the numbers of new transposable elements inserted near genes. We show that expression divergence is significantly correlated to the number of new LTR and SINE elements, but not to the numbers of LINEs. We also show that expression divergence is not significantly correlated to the numbers of ancestral TEs in most cases, which suggests that the correlations between expression divergence and the numbers of new TEs are causal in nature. We quantify the effect and estimate that TE insertion has accounted for ∼20% (95% confidence interval: 12% to 26%) of all expression profile divergence in rodents. Conclusions We conclude that TE insertions may have had a major impact on the evolution of gene expression levels in rodents.
Collapse
Affiliation(s)
- Vini Pereira
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail: (VP) (VP); (AEW) (AE)
| | - David Enard
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Adam Eyre-Walker
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail: (VP) (VP); (AEW) (AE)
| |
Collapse
|
40
|
Hirakawa M, Nishihara H, Kanehisa M, Okada N. Characterization and evolutionary landscape of AmnSINE1 in Amniota genomes. Gene 2008; 441:100-10. [PMID: 19166919 DOI: 10.1016/j.gene.2008.12.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Revised: 11/29/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022]
Abstract
Discovery of a large number of conserved non-coding elements (CNEs) in vertebrate genomes provides a cornerstone to elucidate molecular mechanisms of macroevolution. Extensive comparative genomics has proven that transposons such as short interspersed elements (SINEs) were an important source of CNEs. We recently characterized AmnSINE1, a SINE family in Amniota genomes, some of which are present in CNEs, and demonstrated that two AmnSINE1 loci play an important role in mammalian-specific brain development by functioning as an enhancer (Sasaki et al. Proc. Natl. Acad. Sci. USA 2008). To get more information about AmnSINE1s, we here performed a multi-species search for AmnSINE1, and revealed the distribution and evolutionary history of these SINEs in amniote genomes. The number of AmnSINE1 regions in amniotes ranged from 160 to 1200; the number in the eutherians were under 500 and the largest was that in chicken. Phylogenetic analysis established that each AmnSINE1 locus has evolved uniquely, primarily since the divergence of mammals from reptiles. These results support the notion that AmnSINE1s were amplified as an ancient retroposon in a common ancestor of Amniota and subsequently have survived for 300 Myr because of functions acquired by mutation-coupled exaptation prior mammalian radiation. On the basis of sequence homology and conserved synteny, we detected the orthologs of AmnSINE1 for candidates of further enhancer analysis, which are more conserved than two loci that were shown to have been involved in mammalian brain development. The present work provides a comprehensive data set to test the role of AmnSINE1s, many of which were exapted and contributed to mammalian macroevolution.
Collapse
Affiliation(s)
- Mika Hirakawa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | | | | | | |
Collapse
|
41
|
Baele G, Van de Peer Y, Vansteelandt S. A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 2008; 57:675-92. [PMID: 18853356 DOI: 10.1080/10635150802422324] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.
Collapse
Affiliation(s)
- Guy Baele
- Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|
42
|
Xie HB, Irwin DM, Zhang YP. Evolution of conserved secondary structures and their function in transcriptional regulation networks. BMC Genomics 2008; 9:520. [PMID: 18976501 PMCID: PMC2584662 DOI: 10.1186/1471-2164-9-520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2008] [Accepted: 11/02/2008] [Indexed: 12/12/2022] Open
Abstract
Background Many conserved secondary structures have been identified within conserved elements in the human genome, but only a small fraction of them are known to be functional RNAs. The evolutionary variations of these conserved secondary structures in human populations and their biological functions have not been fully studied. Results We searched for polymorphisms within conserved secondary structures and identified a number of SNPs within these elements even though they are highly conserved among species. The density of SNPs in conserved secondary structures is about 65% of that of their flanking, non-conserved, sequences. Classification of sites as stems or as loops/bulges revealed that the density of SNPs in stems is about 62% of that found in loops/bulges. Analysis of derived allele frequency data indicates that sites in stems are under stronger evolutionary constraint than sites in loops/bulges. Intergenic conserved secondary structures tend to associate with transcription factor-encoding genes with genetic distance being the measure of regulator-gene associations. A substantial fraction of intergenic conserved secondary structures overlap characterized binding sites for multiple transcription factors. Conclusion Strong purifying selection implies that secondary structures are probably important carriers of biological functions for conserved sequences. The overlap between intergenic conserved secondary structures and transcription factor binding sites further suggests that intergenic conserved secondary structures have essential roles in directing gene expression in transcriptional regulation networks.
Collapse
Affiliation(s)
- Hai-Bing Xie
- State Key Laboratory of Genetic Resource and Evolution, Kunming Institute of Zoology, Kunming 650223, PR China.
| | | | | |
Collapse
|
43
|
Abstract
The strategic importance of the genome sequence of the gray, short-tailed opossum, Monodelphis domestica, accrues from both the unique phylogenetic position of metatherian (marsupial) mammals and the fundamental biologic characteristics of metatherians that distinguish them from other mammalian species. Metatherian and eutherian (placental) mammals are more closely related to one another than to other vertebrate groups, and owing to this close relationship they share fundamentally similar genetic structures and molecular processes. However, during their long evolutionary separation these alternative mammals have developed distinctive anatomical, physiologic, and genetic features that hold tremendous potential for examining relationships between the molecular structures of mammalian genomes and the functional attributes of their components. Comparative analyses using the opossum genome have already provided a wealth of new evidence regarding the importance of noncoding elements in the evolution of mammalian genomes, the role of transposable elements in driving genomic innovation, and the relationships between recombination rate, nucleotide composition, and the genomic distributions of repetitive elements. The genome sequence is also beginning to enlarge our understanding of the evolution and function of the vertebrate immune system, and it provides an alternative model for investigating mechanisms of genomic imprinting. Equally important, availability of the genome sequence is fostering the development of new research tools for physical and functional genomic analyses of M. domestica that are expanding its versatility as an experimental system for a broad range of research applications in basic biology and biomedically oriented research.
Collapse
|
44
|
McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, Goyette P, Zody MC, Hall JL, Brant SR, Cho JH, Duerr RH, Silverberg MS, Taylor KD, Rioux JD, Altshuler D, Daly MJ, Xavier RJ. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat Genet 2008; 40:1107-12. [PMID: 19165925 PMCID: PMC2731799 DOI: 10.1038/ng.215] [Citation(s) in RCA: 518] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Following recent success in genome-wide association studies, a critical focus of human genetics is to understand how genetic variation at implicated loci influences cellular and disease processes. Crohn's disease (CD) is associated with SNPs around IRGM, but coding-sequence variation has been excluded as a source of this association. We identified a common, 20-kb deletion polymorphism, immediately upstream of IRGM and in perfect linkage disequilibrium (r2 = 1.0) with the most strongly CD-associated SNP, that causes IRGM to segregate in the population with two distinct upstream sequences. The deletion (CD risk) and reference (CD protective) haplotypes of IRGM showed distinct expression patterns. Manipulation of IRGM expression levels modulated cellular autophagy of internalized bacteria, a process implicated in CD. These results suggest that the CD association at IRGM arises from an alteration in IRGM regulation that affects the efficacy of autophagy and identify a common deletion polymorphism as a likely causal variant.
Collapse
Affiliation(s)
- Steven A McCarroll
- Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, 185 Cambridge Street, Boston, Massachusetts 02114, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, Ruan Y, Wei CL, Ng HH, Liu ET. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 2008; 18:1752-62. [PMID: 18682548 DOI: 10.1101/gr.080663.108] [Citation(s) in RCA: 416] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Identification of lineage-specific innovations in genomic control elements is critical for understanding transcriptional regulatory networks and phenotypic heterogeneity. We analyzed, from an evolutionary perspective, the binding regions of seven mammalian transcription factors (ESR1, TP53, MYC, RELA, POU5F1, SOX2, and CTCF) identified on a genome-wide scale by different chromatin immunoprecipitation approaches and found that only a minority of sites appear to be conserved at the sequence level. Instead, we uncovered a pervasive association with genomic repeats by showing that a large fraction of the bona fide binding sites for five of the seven transcription factors (ESR1, TP53, POU5F1, SOX2, and CTCF) are embedded in distinctive families of transposable elements. Using the age of the repeats, we established that these repeat-associated binding sites (RABS) have been associated with significant regulatory expansions throughout the mammalian phylogeny. We validated the functional significance of these RABS by showing that they are over-represented in proximity of regulated genes and that the binding motifs within these repeats have undergone evolutionary selection. Our results demonstrate that transcriptional regulatory networks are highly dynamic in eukaryotic genomes and that transposable elements play an important role in expanding the repertoire of binding sites.
Collapse
Affiliation(s)
- Guillaume Bourque
- Computational and Mathematical Biology, Genome Institute of Singapore, Singapore 138672, Singapore.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, Warburton PE. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol 2008; 3:e137. [PMID: 17630829 PMCID: PMC1914374 DOI: 10.1371/journal.pcbi.0030137] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 05/31/2007] [Indexed: 01/30/2023] Open
Abstract
The constant bombardment of mammalian genomes by transposable elements (TEs) has resulted in TEs comprising at least 45% of the human genome. Because of their great age and abundance, TEs are important in comparative phylogenomics. However, estimates of TE age were previously based on divergence from derived consensus sequences or phylogenetic analysis, which can be unreliable, especially for older more diverged elements. Therefore, a novel genome-wide analysis of TE organization and fragmentation was performed to estimate TE age independently of sequence composition and divergence or the assumption of a constant molecular clock. Analysis of TEs in the human genome revealed ∼600,000 examples where TEs have transposed into and fragmented other TEs, covering >40% of all TEs or ∼542 Mbp of genomic sequence. The relative age of these TEs over evolutionary time is implicit in their organization, because newer TEs have necessarily transposed into older TEs that were already present. A matrix of the number of times that each TE has transposed into every other TE was constructed, and a novel objective function was developed that derived the chronological order and relative ages of human TEs spanning >100 million years. This method has been used to infer the relative ages across all four major TE classes, including the oldest, most diverged elements. Analysis of DNA transposons over the history of the human genome has revealed the early activity of some MER2 transposons, and the relatively recent activity of MER1 transposons during primate lineages. The TEs from six additional mammalian genomes were defragmented and analyzed. Pairwise comparison of the independent chronological orders of TEs in these mammalian genomes revealed species phylogeny, the fact that transposons shared between genomes are older than species-specific transposons, and a subset of TEs that were potentially active during periods of speciation. Transposable elements (TEs) are interspersed repetitive DNA families that are capable of copying themselves from place to place; they have literally infested our genome over evolutionary time, and now comprise as much as 45% of our total DNA. Because of their great age and abundance, TEs are important in evolutionary genomics. However, estimates of their age based on DNA sequence composition have been unreliable, especially for older more diverged elements. Therefore, a novel method to estimate the age of TEs was developed based on the fact that as TEs spread throughout the genome, they inserted into and fragmented older TEs that were already present. Therefore, the age of TEs can be revealed by how often they have been fragmented over evolutionary time. We performed a genome-wide defragmention of TEs, and developed a novel objective function to derive the chronological order of TEs spanning >100 million years. This method has been used to infer the relative ages of TEs from seven sequenced mammalian genomes across all four major TE classes, including the oldest, most diverged elements. This age estimate is independent of TE sequence composition or divergence and does not rely on the assumption of a constant molecular clock. This study provides a novel analysis of the evolutionary history of some of the most abundant and ancient repetitive DNA elements in mammalian genomes, which is important for understanding the dynamic forces that shape our genomes during evolution.
Collapse
Affiliation(s)
- Joti Giordano
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Yongchao Ge
- Department of Neurology, Mount Sinai School of Medicine, New York, New York, United States of America
- Center for Translational Systems Biology, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Yevgeniy Gelfand
- Laboratory for Biocomputing and Informatics, Boston University, Boston, Massachusetts, United States of America
| | - György Abrusán
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Gary Benson
- Departments of Computer Science and Biology, Boston University, Boston, Massachusetts, United States of America
| | - Peter E Warburton
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
47
|
Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA. BMC Genomics 2008; 9:226. [PMID: 18485226 PMCID: PMC2397414 DOI: 10.1186/1471-2164-9-226] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Accepted: 05/17/2008] [Indexed: 12/14/2022] Open
Abstract
Background The majority of human non-protein-coding DNA is made up of repetitive sequences, mainly transposable elements (TEs). It is becoming increasingly apparent that many of these repetitive DNA sequence elements encode gene regulatory functions. This fact has important evolutionary implications, since repetitive DNA is the most dynamic part of the genome. We set out to assess the evolutionary rate and pattern of experimentally characterized human transcription factor binding sites (TFBS) that are derived from repetitive versus non-repetitive DNA to test whether repeat-derived TFBS are in fact rapidly evolving. We also evaluated the position-specific patterns of variation among TFBS to look for signs of functional constraint on TFBS derived from repetitive and non-repetitive DNA. Results We found numerous experimentally characterized TFBS in the human genome, 7–10% of all mapped sites, which are derived from repetitive DNA sequences including simple sequence repeats (SSRs) and TEs. TE-derived TFBS sequences are far less conserved between species than TFBS derived from SSRs and non-repetitive DNA. Despite their rapid evolution, several lines of evidence indicate that TE-derived TFBS are functionally constrained. First of all, ancient TE families, such as MIR and L2, are enriched for TFBS relative to younger families like Alu and L1. Secondly, functionally important positions in TE-derived TFBS, specifically those residues thought to physically interact with their cognate protein binding factors (TF), are more evolutionarily conserved than adjacent TFBS positions. Finally, TE-derived TFBS show position-specific patterns of sequence variation that are highly distinct from random patterns and similar to the variation seen for non-repeat derived sequences of the same TFBS. Conclusion The abundance of experimentally characterized human TFBS that are derived from repetitive DNA speaks to the substantial regulatory effects that this class of sequence has on the human genome. The unique evolutionary properties of repeat-derived TFBS are perhaps even more intriguing. TE-derived TFBS in particular, while clearly functionally constrained, evolve extremely rapidly relative to non-repeat derived sites. Such rapidly evolving TFBS are likely to confer species-specific regulatory phenotypes, i.e. divergent expression patterns, on the human evolutionary lineage. This result has practical implications with respect to the widespread use of evolutionary conservation as a surrogate for functionally relevant non-coding DNA. Most TE-derived TFBS would be missed using the kinds of sequence conservation-based screens, such as phylogenetic footprinting, that are used to help characterize non-coding DNA. Thus, the very TFBS that are most likely to yield human-specific characteristics will be neglected by the comparative genomic techniques that are currently de rigeur for the identification of novel regulatory sites.
Collapse
|
48
|
Abstract
The control and coordination of eukaryotic gene expression rely on transcriptional and post-transcriptional regulatory networks. Although progress has been made in mapping the components and deciphering the function of these networks, the mechanisms by which such intricate circuits originate and evolve remain poorly understood. Here I revisit and expand earlier models and propose that genomic repeats, and in particular transposable elements, have been a rich source of material for the assembly and tinkering of eukaryotic gene regulatory systems.
Collapse
Affiliation(s)
- Cédric Feschotte
- Department of Biology, Life Science Building, BOX 19498, University of Texas, Arlington, Texas 76019, USA.
| |
Collapse
|
49
|
Dingel J, Hanus P, Leonardi N, Hagenauer J, Zech J, Mueller JC. Local conservation scores without a priori assumptions on neutral substitution rates. BMC Bioinformatics 2008; 9:190. [PMID: 18405366 PMCID: PMC2375903 DOI: 10.1186/1471-2105-9-190] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 04/11/2008] [Indexed: 12/05/2022] Open
Abstract
Background Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates. Results We propose a conservation signal that is produced by local Maximum Likelihood estimation of evolutionary parameters using an optimized sliding window and present a Kullback-Leibler projection that allows multiple different estimated parameters to be transformed into a conservation measure. This conservation measure does not rely on assumptions about neutral evolutionary substitution rates and little a priori assumptions on the properties of the conserved regions are imposed. We show the accuracy of our approach (KuLCons) on synthetic data and compare it to the scores generated by state-of-the-art methods (phastCons, GERP, SCONE) in an ENCODE region. We find that KuLCons is most often in agreement with the conservation/constraint signatures detected by GERP and SCONE while qualitatively very different patterns from phastCons are observed. Opposed to standard methods KuLCons can be extended to more complex evolutionary models, e.g. taking insertion and deletion events into account and corresponding results show that scores obtained under this model can diverge significantly from scores using the simpler model. Conclusion Our results suggest that discriminating among the different degrees of conservation is possible without making assumptions about neutral rates. We find, however, that it cannot be expected to discover considerably different constraint regions than GERP and SCONE. Consequently, we conclude that the reported discrepancies between experimentally verified functional and computationally identified constraint elements are likely not to be explained by biased neutral rate estimates.
Collapse
Affiliation(s)
- Janis Dingel
- Institute for Communications Engineering, Technische Universität München, Munich, Germany.
| | | | | | | | | | | |
Collapse
|
50
|
Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008; 9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The comparison of genomic sequences is now a common approach to identifying and characterizing functional regions in vertebrate genomes. However, for theoretical reasons and because of practical issues, the generation of these data sets is non-trivial and can have many pitfalls. We are currently seeing an explosion of comparative sequence data, the benefits and limitations of which need to be disseminated to the scientific community. This Review provides a critical overview of the different types of sequence data that are available for analysis and of contemporary comparative sequence analysis methods, highlighting both their strengths and limitations. Approaches to determining the biological significance of constrained sequence are also explored.
Collapse
|