1
|
Clayton EA, Rishishwar L, Huang TC, Gulati S, Ban D, McDonald JF, Jordan IK. An atlas of transposable element-derived alternative splicing in cancer. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190342. [PMID: 32075558 PMCID: PMC7061986 DOI: 10.1098/rstb.2019.0342] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
Transposable element (TE)-derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially used in normal versus cancer tissues. We analysed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumour tissue pairs across 13 cancer types, resulting in the discovery of 4820 TE-generated alternative splice events distributed among 723 cancer-associated genes. Short interspersed nuclear elements (Alu) and long interspersed nuclear elements (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes, including MYH11, WHSC1 and CANT1, were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
- Evan A. Clayton
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Lavanya Rishishwar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- PanAmerican Bioinformatics Institute, Cali, Colombia
- Applied Bioinformatics Laboratory, Atlanta, GA, USA
| | - Tzu-Chuan Huang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Saurabh Gulati
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Dongjo Ban
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - John F. McDonald
- Integrated Cancer Research Center, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - I. King Jordan
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- PanAmerican Bioinformatics Institute, Cali, Colombia
- Applied Bioinformatics Laboratory, Atlanta, GA, USA
| |
Collapse
|
2
|
Abstract
Transposable elements (TEs) are low-complexity elements (e.g., LINEs, SINEs, SVAs, and HERVs) that make up to two-thirds of the human genome. There is mounting evidence that TEs play an essential role in molecular functions that influence genomic plasticity and gene expression regulation. With the advent of next-generation sequencing approaches, our understanding of the relationship between TEs and psychiatric disorders will greatly improve. In this chapter, the Authors comprehensively summarize the state-of the-art of TE research in animal models and humans supporting a framework in which TEs play a functional role in mechanisms affecting a variety of behaviors, including neurodevelopmental, neuropsychiatric, and neurodegenerative disorders. Finally, the Authors discuss recent therapeutic applications raised from the increasing experimental evidence on TE functional mechanisms.
Collapse
Affiliation(s)
- G Guffanti
- McLean Hospital - Harvard Medical School, Belmont, MA, USA.
| | - A Bartlett
- Department of Psychology, University of Massachusetts, Boston, Boston, MA, USA
| | - P DeCrescenzo
- McLean Hospital - Harvard Medical School, Belmont, MA, USA
| | - F Macciardi
- Department of Psychiatry and Human Behavior, University of California, Irvine, Irvine, CA, USA
| | - R Hunter
- Department of Psychology, University of Massachusetts, Boston, Boston, MA, USA
| |
Collapse
|
3
|
Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017; 6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]
Abstract
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
4
|
Zheng Y, Joyce BT, Liu L, Zhang Z, Kibbe WA, Zhang W, Hou L. Prediction of genome-wide DNA methylation in repetitive elements. Nucleic Acids Res 2017; 45:8697-8711. [PMID: 28911103 PMCID: PMC5587781 DOI: 10.1093/nar/gkx587] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 06/28/2017] [Indexed: 12/16/2022] Open
Abstract
DNA methylation in repetitive elements (RE) suppresses their mobility and maintains genomic stability, and decreases in it are frequently observed in tumor and/or surrogate tissues. Averaging methylation across RE in genome is widely used to quantify global methylation. However, methylation may vary in specific RE and play diverse roles in disease development, thus averaging methylation across RE may lose significant biological information. The ambiguous mapping of short reads by and high cost of current bisulfite sequencing platforms make them impractical for quantifying locus-specific RE methylation. Although microarray-based approaches (particularly Illumina's Infinium methylation arrays) provide cost-effective and robust genome-wide methylation quantification, the number of interrogated CpGs in RE remains limited. We report a random forest-based algorithm (and corresponding R package, REMP) that can accurately predict genome-wide locus-specific RE methylation based on Infinium array profiling data. We validated its prediction performance using alternative sequencing and microarray data. Testing its clinical utility with The Cancer Genome Atlas data demonstrated that our algorithm offers more comprehensively extended locus-specific RE methylation information that can be readily applied to large human studies in a cost-effective manner. Our work has the potential to improve our understanding of the role of global methylation in human diseases, especially cancer.
Collapse
Affiliation(s)
- Yinan Zheng
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.,Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Brian T Joyce
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Lei Liu
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Zhou Zhang
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.,Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Warren A Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD 20850, USA
| | - Wei Zhang
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Lifang Hou
- Center for Population Epigenetics, Robert H. Lurie Comprehensive Cancer Center and Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
5
|
Göke J, Ng HH. CTRL+INSERT: retrotransposons and their contribution to regulation and innovation of the transcriptome. EMBO Rep 2016; 17:1131-44. [PMID: 27402545 DOI: 10.15252/embr.201642743] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 06/20/2016] [Indexed: 12/25/2022] Open
Abstract
The human genome contains millions of fragments from retrotransposons-highly repetitive DNA sequences that were once able to "copy and paste" themselves to other regions in the genome. However, the majority of retrotransposons have lost this capacity through acquisition of mutations or through endogenous silencing mechanisms. Without this imminent threat of transposition, retrotransposons have the potential to act as a major source of genomic innovation. Indeed, large numbers of retrotransposons have been found to be active in specific contexts: as gene regulatory elements and promoters for protein-coding genes or long noncoding RNAs, among others. In this review, we summarise recent findings about retrotransposons, with implications in gene expression regulation, the expansion of gene isoform diversity and the generation of long noncoding RNAs. We highlight key examples that demonstrate their role in cellular identity and their versatility as markers of cell states, and we discuss how their dysregulation may contribute to the formation of and possibly therapeutic response in human cancers.
Collapse
Affiliation(s)
- Jonathan Göke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Huck Hui Ng
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore Department of Biochemistry, National University of Singapore, Singapore Department of Biological Sciences, National University of Singapore, Singapore School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
6
|
Du J, Leung A, Trac C, Lee M, Parks BW, Lusis AJ, Natarajan R, Schones DE. Chromatin variation associated with liver metabolism is mediated by transposable elements. Epigenetics Chromatin 2016; 9:28. [PMID: 27398095 PMCID: PMC4939004 DOI: 10.1186/s13072-016-0078-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/29/2016] [Indexed: 01/23/2023] Open
Abstract
Background Functional regulatory regions in eukaryotic genomes are characterized by the disruption of nucleosomes leading to accessible chromatin. The modulation of chromatin accessibility is one of the key mediators of transcriptional regulation, and variation in chromatin accessibility across individuals has been linked to complex traits and disease susceptibility. While mechanisms responsible for chromatin variation across individuals have been investigated, the overwhelming majority of chromatin variation remains unexplained. Furthermore, the processes through which the variation of chromatin accessibility contributes to phenotypic diversity remain poorly understood. Results We profiled chromatin accessibility in liver from seven strains of mice with phenotypic diversity in response to a high-fat/high-sucrose (HF/HS) diet and identified reproducible chromatin variation across the individuals. We found that sites of variable chromatin accessibility were more likely to coincide with particular classes of transposable elements (TEs) than sites with common chromatin signatures. Evolutionarily younger long interspersed nuclear elements (LINEs) are particularly likely to harbor variable chromatin sites. These younger LINEs are enriched for binding sites of immune-associated transcription factors, whereas older LINEs are enriched for liver-specific transcription factors. Genomic region enrichment analysis indicates that variable chromatin sites at TEs may function to regulate liver metabolic pathways. CRISPR-Cas9 deletion of a number of variable chromatin sites at TEs altered expression of nearby metabolic genes. Finally, we show that polymorphism of TEs and differential DNA methylation at TEs can both influence chromatin variation. Conclusions Our results demonstrate that specific classes of TEs show variable chromatin accessibility across strains of mice that display phenotypic diversity in response to a HF/HS diet. These results indicate that chromatin variation at TEs is an important contributor to phenotypic variation among populations. Electronic supplementary material The online version of this article (doi:10.1186/s13072-016-0078-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Du
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA ; Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA USA
| | - Amy Leung
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA
| | - Candi Trac
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA
| | - Michael Lee
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA ; Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA USA
| | - Brian W Parks
- Department of Nutritional Sciences, University of Wisconsin-Madison, Madison, WI USA
| | - Aldons J Lusis
- Department of Medicine, University of California, Los Angeles, CA USA
| | - Rama Natarajan
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA ; Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA USA
| | - Dustin E Schones
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA USA ; Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA USA
| |
Collapse
|
7
|
Robert C, Kapetanovic R, Beraldi D, Watson M, Archibald AL, Hume DA. Identification and annotation of conserved promoters and macrophage-expressed genes in the pig genome. BMC Genomics 2015; 16:970. [PMID: 26582032 PMCID: PMC4652390 DOI: 10.1186/s12864-015-2111-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 10/19/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The FANTOM5 consortium used Cap Analysis of Gene Expression (CAGE) tag sequencing to produce a comprehensive atlas of promoters and enhancers within the human and mouse genomes. We reasoned that the mapping of these regulatory elements to the pig genome could provide useful annotation and evidence to support assignment of orthology. RESULTS For human transcription start sites (TSS) associated with annotated human-mouse orthologs, 17% mapped to the pig genome but not to the mouse, 10% mapped only to the mouse, and 55% mapped to both pig and mouse. Around 17% did not map to either species. The mapping percentages were lower where there was not clear orthology relationship, but in every case, mapping to pig was greater than to mouse, and the degree of homology was also greater. Combined mapping of mouse and human CAGE-defined promoters identified at least one putative conserved TSS for >16,000 protein-coding genes. About 54% of the predicted locations of regulatory elements in the pig genome were supported by CAGE and/or RNA-Seq analysis from pig macrophages. CONCLUSIONS Comparative mapping of promoters and enhancers from humans and mice can provide useful preliminary annotation of other animal genomes. The data also confirm extensive gain and loss of regulatory elements between species, and the likelihood that pigs provide a better model than mice for human gene regulation and function.
Collapse
Affiliation(s)
- Christelle Robert
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, Edinburgh, UK.
| | - Ronan Kapetanovic
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
| | - Dario Beraldi
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Center, Robinson Way, Cambridge, CB2 0RE, UK.
| | - Mick Watson
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, Edinburgh, UK.
- Edinburgh Genomics, University of Edinburgh, Easter Bush, Edinburgh, EH25 9RG, UK.
| | - Alan L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, Edinburgh, UK.
| | - David A Hume
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, Edinburgh, UK.
| |
Collapse
|
8
|
Abstract
Insulators are regulatory elements that help to organize eukaryotic chromatin via enhancer-blocking and chromatin barrier activity. Although there are several examples of transposable element (TE)-derived insulators, the contribution of TEs to human insulators has not been systematically explored. Mammalian-wide interspersed repeats (MIRs) are a conserved family of TEs that have substantial regulatory capacity and share sequence characteristics with tRNA-related insulators. We sought to evaluate whether MIRs can serve as insulators in the human genome. We applied a bioinformatic screen using genome sequence and functional genomic data from CD4(+) T cells to identify a set of 1,178 predicted MIR insulators genome-wide. These predicted MIR insulators were computationally tested to serve as chromatin barriers and regulators of gene expression in CD4(+) T cells. The activity of predicted MIR insulators was experimentally validated using in vitro and in vivo enhancer-blocking assays. MIR insulators are enriched around genes of the T-cell receptor pathway and reside at T-cell-specific boundaries of repressive and active chromatin. A total of 58% of the MIR insulators predicted here show evidence of T-cell-specific chromatin barrier and gene regulatory activity. MIR insulators appear to be CCCTC-binding factor (CTCF) independent and show a distinct local chromatin environment with marked peaks for RNA Pol III and a number of histone modifications, suggesting that MIR insulators recruit transcriptional complexes and chromatin modifying enzymes in situ to help establish chromatin and regulatory domains in the human genome. The provisioning of insulators by MIRs across the human genome suggests a specific mechanism by which TE sequences can be used to modulate gene regulatory networks.
Collapse
|
9
|
del Rosario RCH, Rayan NA, Prabhakar S. Noncoding origins of anthropoid traits and a new null model of transposon functionalization. Genome Res 2014; 24:1469-84. [PMID: 25043600 PMCID: PMC4158753 DOI: 10.1101/gr.168963.113] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Little is known about novel genetic elements that drove the emergence of anthropoid primates. We exploited the sequencing of the marmoset genome to identify 23,849 anthropoid-specific constrained (ASC) regions and confirmed their robust functional signatures. Of the ASC base pairs, 99.7% were noncoding, suggesting that novel anthropoid functional elements were overwhelmingly cis-regulatory. ASCs were highly enriched in loci associated with fetal brain development, motor coordination, neurotransmission, and vision, thus providing a large set of candidate elements for exploring the molecular basis of hallmark primate traits. We validated ASC192 as a primate-specific enhancer in proliferative zones of the developing brain. Unexpectedly, transposable elements (TEs) contributed to >56% of ASCs, and almost all TE families showed functional potential similar to that of nonrepetitive DNA. Three L1PA repeat-derived ASCs displayed coherent eye-enhancer function, thus demonstrating that the "gene-battery" model of TE functionalization applies to enhancers in vivo. Our study provides fundamental insights into genome evolution and the origins of anthropoid phenotypes and supports an elegantly simple new null model of TE exaptation.
Collapse
Affiliation(s)
- Ricardo C H del Rosario
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Nirmala Arul Rayan
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| | - Shyam Prabhakar
- Computational and Systems Biology, Genome Institute of Singapore, #02-01 Genome, Singapore 138672
| |
Collapse
|
10
|
Spouge JL, Mariño-Ramírez L, Sheetlin SL. Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps. INTERNATIONAL JOURNAL OF BIOINFORMATICS RESEARCH AND APPLICATIONS 2014; 10:384-408. [PMID: 24989859 DOI: 10.1504/ijbra.2014.062991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.
Collapse
Affiliation(s)
- John L Spouge
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| | - Leonardo Mariño-Ramírez
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| | - Sergey L Sheetlin
- Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA
| |
Collapse
|
11
|
Jjingo D, Conley AB, Wang J, Mariño-Ramírez L, Lunyak VV, Jordan IK. Mammalian-wide interspersed repeat (MIR)-derived enhancers and the regulation of human gene expression. Mob DNA 2014; 5:14. [PMID: 25018785 PMCID: PMC4090950 DOI: 10.1186/1759-8753-5-14] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 04/10/2014] [Indexed: 11/26/2022] Open
Abstract
Background Mammalian-wide interspersed repeats (MIRs) are the most ancient family of transposable elements (TEs) in the human genome. The deep conservation of MIRs initially suggested the possibility that they had been exapted to play functional roles for their host genomes. MIRs also happen to be the only TEs whose presence in-and-around human genes is positively correlated to tissue-specific gene expression. Similar associations of enhancer prevalence within genes and tissue-specific expression, along with MIRs’ previous implication as providing regulatory sequences, suggested a possible link between MIRs and enhancers. Results To test the possibility that MIRs contribute functional enhancers to the human genome, we evaluated the relationship between MIRs and human tissue-specific enhancers in terms of genomic location, chromatin environment, regulatory function, and mechanistic attributes. This analysis revealed MIRs to be highly concentrated in enhancers of the K562 and HeLa human cell-types. Significantly more enhancers were found to be linked to MIRs than would be expected by chance, and putative MIR-derived enhancers are characterized by a chromatin environment highly similar to that of canonical enhancers. MIR-derived enhancers show strong associations with gene expression levels, tissue-specific gene expression and tissue-specific cellular functions, including a number of biological processes related to erythropoiesis. MIR-derived enhancers were found to be a rich source of transcription factor binding sites, underscoring one possible mechanistic route for the element sequences co-option as enhancers. There is also tentative evidence to suggest that MIR-enhancer function is related to the transcriptional activity of non-coding RNAs. Conclusions Taken together, these data reveal enhancers to be an important cis-regulatory platform from which MIRs can exercise a regulatory function in the human genome and help to resolve a long-standing conundrum as to the reason for MIRs’ deep evolutionary conservation.
Collapse
Affiliation(s)
- Daudi Jjingo
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Andrew B Conley
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jianrong Wang
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Leonardo Mariño-Ramírez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA ; PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia
| | - Victoria V Lunyak
- PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia ; Buck Institute for Research on Aging, Novato, CA, USA
| | - I King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, GA, USA ; PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia
| |
Collapse
|
12
|
Kumar CS, Qureshi SF, Ali A, Satyanarayana M, Rangaraju A, Venkateshwari A, Nallari P. Hidden magicians of genome evolution. Indian J Med Res 2013; 137:1052-60. [PMID: 23852286 PMCID: PMC3734710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
Abstract
Transposable elements (TEs) represent genome's dynamic component, causing mutations and genetic variations. Transposable elements can invade eukaryotic genomes in a short span; these are silenced by homology-dependent gene silencing and some functional parts of silenced elements are utilized to perform novel cellular functions. However, during the past two decades, major interest has been focused on the positive contribution of these elements in the evolution of genomes. The interaction between mobile DNAs and their host genomes are quite diverse, ranging from modifications of gene structure to alterations in general genome architecture and can be regarded as hidden magicians in shaping evolution of genomes. Some of the prominent examples that impressively demonstrate the beneficial impact of TEs on host biology over evolutionary time include their role in structure and functions of eukaryotic genomes.
Collapse
Affiliation(s)
| | | | - Altaf Ali
- Department of Genetics, Osmania University, Hyderabad, India
| | | | | | - A. Venkateshwari
- Department of Genetics, Institute of Genetics & Hospital for Genetic Diseases, Hyderabad, India
| | - Pratibha Nallari
- Department of Genetics, Osmania University, Hyderabad, India,Reprint requests: Dr Pratibha Nallari, Professor, Department of Genetics, Osmania University, Hyderabad 500 007, India e-mail:
| |
Collapse
|
13
|
Jacques PÉ, Jeyakani J, Bourque G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet 2013; 9:e1003504. [PMID: 23675311 PMCID: PMC3649963 DOI: 10.1371/journal.pgen.1003504] [Citation(s) in RCA: 222] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Accepted: 03/25/2013] [Indexed: 11/18/2022] Open
Abstract
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Collapse
Affiliation(s)
- Pierre-Étienne Jacques
- Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore
- Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Justin Jeyakani
- Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- McGill University and Génome Québec Innovation Center, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
14
|
Huda A, Tyagi E, Mariño-Ramírez L, Bowen NJ, Jjingo D, Jordan IK. Prediction of transposable element derived enhancers using chromatin modification profiles. PLoS One 2011; 6:e27513. [PMID: 22087331 PMCID: PMC3210180 DOI: 10.1371/journal.pone.0027513] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Accepted: 10/18/2011] [Indexed: 11/19/2022] Open
Abstract
Experimentally characterized enhancer regions have previously been shown to display specific patterns of enrichment for several different histone modifications. We modelled these enhancer chromatin profiles in the human genome and used them to guide the search for novel enhancers derived from transposable element (TE) sequences. To do this, a computational approach was taken to analyze the genome-wide histone modification landscape characterized by the ENCODE project in two human hematopoietic cell types, GM12878 and K562. We predicted the locations of 2,107 and 1,448 TE-derived enhancers in the GM12878 and K562 cell lines respectively. A vast majority of these putative enhancers are unique to each cell line; only 3.5% of the TE-derived enhancers are shared between the two. We evaluated the functional effect of TE-derived enhancers by associating them with the cell-type specific expression of nearby genes, and found that the number of TE-derived enhancers is strongly positively correlated with the expression of nearby genes in each cell line. Furthermore, genes that are differentially expressed between the two cell lines also possess a divergent number of TE-derived enhancers in their vicinity. As such, genes that are up-regulated in the GM12878 cell line and down-regulated in K562 have significantly more TE-derived enhancers in their vicinity in the GM12878 cell line and vice versa. These data indicate that human TE-derived sequences are likely to be involved in regulating cell-type specific gene expression on a broad scale and suggest that the enhancer activity of TE-derived sequences is mediated by epigenetic regulatory mechanisms.
Collapse
Affiliation(s)
- Ahsan Huda
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eishita Tyagi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Leonardo Mariño-Ramírez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia
| | - Nathan J. Bowen
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- Ovarian Cancer Institute, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Daudi Jjingo
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - I. King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia
- * E-mail:
| |
Collapse
|
15
|
Jjingo D, Huda A, Gundapuneni M, Mariño-Ramírez L, Jordan IK. Effect of the transposable element environment of human genes on gene length and expression. Genome Biol Evol 2011; 3:259-71. [PMID: 21362639 PMCID: PMC3070429 DOI: 10.1093/gbe/evr015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
Collapse
Affiliation(s)
- Daudi Jjingo
- School of Biology, Georgia Institute of Technology, GA, USA
| | | | | | | | | |
Collapse
|
16
|
Huda A, Bowen NJ, Conley AB, Jordan IK. Epigenetic regulation of transposable element derived human gene promoters. Gene 2011; 475:39-48. [PMID: 21215797 DOI: 10.1016/j.gene.2010.12.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 12/22/2010] [Indexed: 02/08/2023]
Abstract
It was previously thought that epigenetic histone modifications of mammalian transposable elements (TEs) serve primarily to defend the genome against deleterious effects associated with their activity. However, we recently showed that, genome-wide, human TEs can also be epigenetically modified in a manner consistent with their ability to regulate host genes. Here, we explore the ability of TE sequences to epigenetically regulate individual human genes by focusing on the histone modifications of promoter sequences derived from TEs. We found 1520 human genes that initiate transcription from within TE-derived promoter sequences. We evaluated the distributions of eight histone modifications across these TE-promoters, within and between the GM12878 and K562 cell lines, and related their modification status with the cell-type specific expression patterns of the genes that they regulate. TE-derived promoters are significantly enriched for active histone modifications, and depleted for repressive modifications, relative to the genomic background. Active histone modifications of TE-promoters peak at transcription start sites and are positively correlated with increasing expression within cell lines. Furthermore, differential modification of TE-derived promoters between cell lines is significantly correlated with differential gene expression. LTR-retrotransposon derived promoters in particular play a prominent role in mediating cell-type specific gene regulation, and a number of these LTR-promoter genes are implicated in lineage-specific cellular functions. The regulation of human genes mediated by histone modifications targeted to TE-derived promoters is consistent with the ability of TEs to contribute to the epigenomic landscape in a way that provides functional utility to the host genome.
Collapse
Affiliation(s)
- Ahsan Huda
- School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
17
|
Paquet Y, Anderson A. Sequence composition similarities with the 7SL RNA are highly predictive of functional genomic features. Nucleic Acids Res 2010; 38:4907-16. [PMID: 20392819 PMCID: PMC2926601 DOI: 10.1093/nar/gkq234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Transposable elements derived from the 7SL RNA gene, such as Alu elements in primates, have had remarkable success in several mammalian lineages. The results presented here show a broad spectrum of functions for genomic segments that display sequence composition similarities with the 7SL RNA gene. Using thoroughly documented loci, we report that DNaseI-hypersensitive sites can be singled out in large genomic sequences by an assessment of sequence composition similarities with the 7SL RNA gene. We apply a root word frequency approach to illustrate a distinctive relationship between the sequence of the 7SL RNA gene and several classes of functional genomic features that are not presumed to be of transposable origin. Transposable elements that show noticeable similarities with the 7SL sequence include Alu sequences, as expected, but also long terminal repeats and the 5′-untranslated regions of long interspersed repetitive elements. In sequences masked for repeated elements, we find, when using the 7SL RNA gene as query sequence, distinctive similarities with promoters, exons and distal gene regulatory regions. The latter being the most notoriously difficult to detect, this approach may be useful for finding genomic segments that have regulatory functions and that may have escaped detection by existing methods.
Collapse
Affiliation(s)
- Yanick Paquet
- Centre de recherche en cancérologie de l’Université Laval, L’Hôtel-Dieu de Québec, Centre hospitalier universitaire de Québec, Québec G1R 2J6 and Département de biologie, Université Laval, Québec G1K 7P4, Canada
| | - Alan Anderson
- Centre de recherche en cancérologie de l’Université Laval, L’Hôtel-Dieu de Québec, Centre hospitalier universitaire de Québec, Québec G1R 2J6 and Département de biologie, Université Laval, Québec G1K 7P4, Canada
- *To whom correspondence should be addressed. Tel: + 418 691 5281; Fax: +418 691 5439;
| |
Collapse
|
18
|
Herpin A, Braasch I, Kraeussling M, Schmidt C, Thoma EC, Nakamura S, Tanaka M, Schartl M. Transcriptional rewiring of the sex determining dmrt1 gene duplicate by transposable elements. PLoS Genet 2010; 6:e1000844. [PMID: 20169179 PMCID: PMC2820524 DOI: 10.1371/journal.pgen.1000844] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Accepted: 01/12/2010] [Indexed: 02/01/2023] Open
Abstract
Control and coordination of eukaryotic gene expression rely on transcriptional and posttranscriptional regulatory networks. Evolutionary innovations and adaptations often require rapid changes of such networks. It has long been hypothesized that transposable elements (TE) might contribute to the rewiring of regulatory interactions. More recently it emerged that TEs might bring in ready-to-use transcription factor binding sites to create alterations to the promoters by which they were captured. A process where the gene regulatory architecture is of remarkable plasticity is sex determination. While the more downstream components of the sex determination cascades are evolutionary conserved, the master regulators can switch between groups of organisms even on the interspecies level or between populations. In the medaka fish (Oryzias latipes) a duplicated copy of dmrt1, designated dmrt1bY or DMY, on the Y chromosome was shown to be the master regulator of male development, similar to Sry in mammals. We found that the dmrt1bY gene has acquired a new feedback downregulation of its expression. Additionally, the autosomal dmrt1a gene is also able to regulate transcription of its duplicated paralog by binding to a unique target Dmrt1 site nested within the dmrt1bY proximal promoter region. We could trace back this novel regulatory element to a highly conserved sequence within a new type of TE that inserted into the upstream region of dmrt1bY shortly after the duplication event. Our data provide functional evidence for a role of TEs in transcriptional network rewiring for sub- and/or neo-functionalization of duplicated genes. In the particular case of dmrt1bY, this contributed to create new hierarchies of sex-determining genes. Evolutionary innovations and adaptations often require rapid changes in gene regulation. Transposable elements constitute the most dynamic part of eukaryotic genomes. Insertions of transposable elements can influence the expression of surrounding genes by donating new regulatory elements. A longstanding hypothesis postulates that the dispersal of transposable elements may rewire regulatory links between genes, thereby changing regulatory networks and shuffling regulatory cascades. A regulatory hierarchy of remarkable plasticity is the sex determination cascade. In the course of animal evolution, new master regulators frequently replace the sex determination gene on top of the hierarchy. In the medaka fish, a duplicate of the dmrt1 transcription factor gene, dmrt1bY, has become the sex master regulator. Its ancestor dmrt1a, in contrast, has a downstream position in the sex determination cascade. We show that after the duplication of the dmrt1 gene, the new hierarchy has been established by the insertion of a transposable element into the regulatory region of the dmrt1bY gene on the sex chromosome. This transposable element, harboring a Dmrt1 binding site, enables the self- and cross-regulation of dmrt1bY expression by Dmrt1 proteins. Our study therefore provides strong evidence for the important role of transposable elements in the rewiring of gene regulatory networks.
Collapse
Affiliation(s)
- Amaury Herpin
- University of Würzburg, Physiological Chemistry I, Biozentrum, Würzburg, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Identification of transcription factor binding sites derived from transposable element sequences using ChIP-seq. Methods Mol Biol 2010; 674:225-40. [PMID: 20827595 DOI: 10.1007/978-1-60761-854-6_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Transposable elements (TEs) form a substantial fraction of the non-coding DNA of many eukaryotic genomes. There are numerous examples of TEs being exapted for regulatory function by the host, many of which were identified through their high conservation. However, given that TEs are often the youngest part of a genome and typically exhibit a high turnover, conservation-based methods will fail to identify lineage- or species-specific exaptations. ChIP-seq has become a very popular and effective method for identifying in vivo DNA-protein interactions, such as those seen at transcription factor binding sites (TFBS), and has been used to show that there are a large number of TE-derived TFBS. Many of these TE-derived TFBS show poor conservation and would go unnoticed using conservation screens. Here, we describe a simple pipeline method for using data generated through ChIP-seq to identify TE-derived TFBS.
Collapse
|
20
|
Wang J, Bowen NJ, Mariño-Ramírez L, Jordan IK. A c-Myc regulatory subnetwork from human transposable element sequences. MOLECULAR BIOSYSTEMS 2009; 5:1831-9. [PMID: 19763338 DOI: 10.1039/b908494k] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Transposable elements (TEs) can donate regulatory sequences that help to control the expression of human genes. The oncogene c-Myc is a promiscuous transcription factor that is thought to regulate the expression of hundreds of genes. We evaluated the contribution of TEs to the c-Myc regulatory network by searching for c-Myc binding sites derived from TEs and by analyzing the expression and function of target genes with nearby TE-derived c-Myc binding sites. There are thousands of TE sequences in the human genome that are bound by c-Myc. A conservative analysis indicated that 816-4564 of these TEs contain canonical c-Myc binding site motifs. c-Myc binding sites are over-represented among sequences derived from the ancient TE families L2 and MIR, consistent with their preservation by purifying selection. Genes associated with TE-derived c-Myc binding sites are co-expressed with each other and with c-Myc. A number of these putative TE-derived c-Myc target genes are differentially expressed between Burkitt's lymphoma cells versus normal B cells and encode proteins with cancer-related functions. Despite several lines of evidence pointing to their regulation by c-Myc and relevance to cancer, the set of genes identified as TE-derived c-Myc targets does not significantly overlap with two previously characterized c-Myc target gene sets. These data point to a substantial contribution of TEs to the regulation of human genes by c-Myc. Genes that are regulated by TE-derived c-Myc binding sites appear to form a distinct c-Myc regulatory subnetwork.
Collapse
Affiliation(s)
- Jianrong Wang
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
21
|
Ma X, Li-Ling J, Huang Q, Chen X, Hou L, Ma F. Systematic analysis of alternative promoters correlated with alternative splicing in human genes. Genomics 2009; 93:420-5. [PMID: 19442634 DOI: 10.1016/j.ygeno.2009.01.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2008] [Revised: 01/22/2009] [Accepted: 01/28/2009] [Indexed: 11/17/2022]
Abstract
Interactions between various events are essential for complex and delicate transcriptional regulation. To delineate the features and potential roles of alternative promoters (APs) correlated with alternative splicing (AS), we have systematically analyzed 9908 putative alternative promoters (PAPs) from 3797 human genes. Our results showed that approximately 65% of AS events are associated with PAPs. Intriguingly, PAPs per human AS gene only averaged 2.6 for our dataset, which was significantly lower than previously reported. This seems to imply that the human genome contains a small pool of appropriable PAPs for AS genes. Exploration of the characteristics of PAPs such as CpG islands, TATA boxes, GC-content, transcription factor binding sites (TFBSs) and repetitive elements suggested that, respectively, 87% and 90% of PAPs of human AS genes are CpG- and TATA box-poor. The GC-content is significantly higher in the downstream of transcription start sites (TSSs) than upstream (58% vs. 53%), and there is a strong negative correlation between the GC-content and the number of PAPs. These suggested that GC-content around the TSSs plays an important role in the regulation of AS. Moreover, different APs contain distinct densities of repetitive elements and TFBSs, indicating that such sequences have an intrinsic role in the divergent regulation of PAPs and AS. Substantial difference was also found between human AS genes in terms of PAP numbers. A close connection between PAPs and AS may play a critical role in the choice of APs and regulation of AS genes. Furthermore, the distribution of AS genes on different human chromosomes also influences the numbers of PAPs and isoforms of AS genes. Our results may provide important clues for further studies on regulatory network of transcription-related events.
Collapse
Affiliation(s)
- Xiaojuan Ma
- College of Life Science, Liaoning Normal University, Dalian 116029, China
| | | | | | | | | | | |
Collapse
|
22
|
Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA. BMC Genomics 2008; 9:226. [PMID: 18485226 PMCID: PMC2397414 DOI: 10.1186/1471-2164-9-226] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Accepted: 05/17/2008] [Indexed: 12/14/2022] Open
Abstract
Background The majority of human non-protein-coding DNA is made up of repetitive sequences, mainly transposable elements (TEs). It is becoming increasingly apparent that many of these repetitive DNA sequence elements encode gene regulatory functions. This fact has important evolutionary implications, since repetitive DNA is the most dynamic part of the genome. We set out to assess the evolutionary rate and pattern of experimentally characterized human transcription factor binding sites (TFBS) that are derived from repetitive versus non-repetitive DNA to test whether repeat-derived TFBS are in fact rapidly evolving. We also evaluated the position-specific patterns of variation among TFBS to look for signs of functional constraint on TFBS derived from repetitive and non-repetitive DNA. Results We found numerous experimentally characterized TFBS in the human genome, 7–10% of all mapped sites, which are derived from repetitive DNA sequences including simple sequence repeats (SSRs) and TEs. TE-derived TFBS sequences are far less conserved between species than TFBS derived from SSRs and non-repetitive DNA. Despite their rapid evolution, several lines of evidence indicate that TE-derived TFBS are functionally constrained. First of all, ancient TE families, such as MIR and L2, are enriched for TFBS relative to younger families like Alu and L1. Secondly, functionally important positions in TE-derived TFBS, specifically those residues thought to physically interact with their cognate protein binding factors (TF), are more evolutionarily conserved than adjacent TFBS positions. Finally, TE-derived TFBS show position-specific patterns of sequence variation that are highly distinct from random patterns and similar to the variation seen for non-repeat derived sequences of the same TFBS. Conclusion The abundance of experimentally characterized human TFBS that are derived from repetitive DNA speaks to the substantial regulatory effects that this class of sequence has on the human genome. The unique evolutionary properties of repeat-derived TFBS are perhaps even more intriguing. TE-derived TFBS in particular, while clearly functionally constrained, evolve extremely rapidly relative to non-repeat derived sites. Such rapidly evolving TFBS are likely to confer species-specific regulatory phenotypes, i.e. divergent expression patterns, on the human evolutionary lineage. This result has practical implications with respect to the widespread use of evolutionary conservation as a surrogate for functionally relevant non-coding DNA. Most TE-derived TFBS would be missed using the kinds of sequence conservation-based screens, such as phylogenetic footprinting, that are used to help characterize non-coding DNA. Thus, the very TFBS that are most likely to yield human-specific characteristics will be neglected by the comparative genomic techniques that are currently de rigeur for the identification of novel regulatory sites.
Collapse
|
23
|
Abstract
The control and coordination of eukaryotic gene expression rely on transcriptional and post-transcriptional regulatory networks. Although progress has been made in mapping the components and deciphering the function of these networks, the mechanisms by which such intricate circuits originate and evolve remain poorly understood. Here I revisit and expand earlier models and propose that genomic repeats, and in particular transposable elements, have been a rich source of material for the assembly and tinkering of eukaryotic gene regulatory systems.
Collapse
Affiliation(s)
- Cédric Feschotte
- Department of Biology, Life Science Building, BOX 19498, University of Texas, Arlington, Texas 76019, USA.
| |
Collapse
|
24
|
Tharakaraman K, Bodenreider O, Landsman D, Spouge JL, Mariño-Ramírez L. The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site. Nucleic Acids Res 2008; 36:2777-86. [PMID: 18367472 PMCID: PMC2377430 DOI: 10.1093/nar/gkn137] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A number of previous studies have predicted transcription factor binding sites (TFBSs) by exploiting the position of genomic landmarks like the transcriptional start site (TSS). The studies’ methods are generally too computationally intensive for genome-scale investigation, so the full potential of ‘positional regulomics’ to discover TFBSs and determine their function remains unknown. Because databases often annotate the genomic landmarks in DNA sequences, the methodical exploitation of positional regulomics has become increasingly urgent. Accordingly, we examined a set of 7914 human putative promoter regions (PPRs) with a known TSS. Our methods identified 1226 eight-letter DNA words with significant positional preferences with respect to the TSS, of which only 608 of the 1226 words matched known TFBSs. Many groups of genes whose PPRs contained a common word displayed similar expression profiles and related biological functions, however. Most interestingly, our results included 78 words, each of which clustered significantly in two or three different positions relative to the TSS. Often, the gene groups corresponding to different positional clusters of the same word corresponded to diverse functions, e.g. activation or repression in different tissues. Thus, different clusters of the same word likely reflect the phenomenon of ‘positional regulation’, i.e. a word's regulatory function can vary with its position relative to a genomic landmark, a conclusion inaccessible to methods based purely on sequence. Further integrative analysis of words co-occurring in PPRs also yielded 24 different groups of genes, likely identifying cis-regulatory modules de novo. Whereas comparative genomics requires precise sequence alignments, positional regulomics exploits genomic landmarks to provide a ‘poor man's alignment’. By exploiting the phenomenon of positional regulation, it uses position to differentiate the biological functions of subsets of TFBSs sharing a common sequence motif.
Collapse
Affiliation(s)
- Kannan Tharakaraman
- Computational Biology Branch, National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, MSC 6075 Bethesda, MD 20894-6075, USA
| | | | | | | | | |
Collapse
|
25
|
Human cis natural antisense transcripts initiated by transposable elements. Trends Genet 2008; 24:53-6. [DOI: 10.1016/j.tig.2007.11.008] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2007] [Revised: 11/02/2007] [Accepted: 11/21/2007] [Indexed: 11/21/2022]
|
26
|
Piriyapongsa J, Mariño-Ramírez L, Jordan IK. Origin and evolution of human microRNAs from transposable elements. Genetics 2007; 176:1323-37. [PMID: 17435244 PMCID: PMC1894593 DOI: 10.1534/genetics.107.072553] [Citation(s) in RCA: 254] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
We sought to evaluate the extent of the contribution of transposable elements (TEs) to human microRNA (miRNA) genes along with the evolutionary dynamics of TE-derived human miRNAs. We found 55 experimentally characterized human miRNA genes that are derived from TEs, and these TE-derived miRNAs have the potential to regulate thousands of human genes. Sequence comparisons revealed that TE-derived human miRNAs are less conserved, on average, than non-TE-derived miRNAs. However, there are 18 TE-derived miRNAs that are relatively conserved, and 14 of these are related to the ancient L2 and MIR families. Comparison of miRNA vs. mRNA expression patterns for TE-derived miRNAs and their putative target genes showed numerous cases of anti-correlated expression that are consistent with regulation via mRNA degradation. In addition to the known human miRNAs that we show to be derived from TE sequences, we predict an additional 85 novel TE-derived miRNA genes. TE sequences are typically disregarded in genomic surveys for miRNA genes and target sites; this is a mistake. Our results indicate that TEs provide a natural mechanism for the origination miRNAs that can contribute to regulatory divergence between species as well as a rich source for the discovery of as yet unknown miRNA genes.
Collapse
Affiliation(s)
- Jittima Piriyapongsa
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332 and National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894
| | - Leonardo Mariño-Ramírez
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332 and National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894
| | - I. King Jordan
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332 and National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894
- Corresponding author: School of Biology, Georgia Institute of Technology, 310 Ferst Dr., Atlanta, GA 30332-0230. E-mail:
| |
Collapse
|
27
|
Parris GE. Mechanism and history of evolution of symbiotic HIV strains into lethal pandemic strains: the key event may have been a 1927 trial of pamaquine in Leopoldville (Kinshasa), Congo. Med Hypotheses 2007; 69:838-48. [PMID: 17368749 DOI: 10.1016/j.mehy.2007.01.073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2007] [Accepted: 01/24/2007] [Indexed: 02/04/2023]
Abstract
In previous papers, I have rejected both the zoonosis and the serial transfer hypotheses of the origin and evolution of the current lethal pandemic strains of HIV. The hypothesis that fits the critical observations is that all the human and nonhuman primate species in central Africa (an area of hyper-endemic malaria) have shared (through inter-species transfers) a "primate T-cell retrovirus" (PTRV), which has adapted to each host species. This retrovirus is believed to assist primate T-cells attack the liver stage of the malaria infection. Each geographic region has a dominant primate host and a characteristic virus. Starting in 1955 and continuing into the late 1970s, chloroquine was provided by the WHO and used for prophylaxis against malaria. Chloroquine has a number of biochemical activities but two of the most important are blocking transcription of cellular genes and proviruses activated by NF-kappaB and blocking the glycosylation of surface proteins on viruses and cells. Concurrent with the development of resistance of the malaria parasite to chloroquine, HIV strains were quickly selected, which have enhanced transcription rates (by inclusion of multiple kappaB binding sites in their long terminal repeats by recombination) and enhanced infectivity (fusogenicity) (most likely by mutations in multiple viral genes that regulate glycosylation of Env). There also may have been mutations that enhanced activation of NF-kappaB in the host cell. These changes in the retrovirus genome were not manifest in effects of the HIV strains as long as the hosts were under the influence of chloroquine. But, when the virus infects people who are not protected by chloroquine, the virus multiplies more rapidly and is more communicable. Fortunately, most of these strains (i.e., HIV-2 groups, and HIV-1 O and HIV-1 N) self-regulate (i.e., infected cells kill infected cells) well enough that viral loads remain subdued and bystander cells of the immune system are not excessively attrited. In the case of HIV-1 group M, however, there is more going on. Following the work of Korber et al. on the phylogenetics of HIV-1 groups M, I reach the conclusion that the major subgroups giving rise to the worldwide pandemic, were founded in a 1927 clinical trial of pamaquine (plasmoquine) in Leopoldville (Kinshasa). This drug is much more toxic that chloroquine and appears to have strongly selected for resistance to apoptosis in infected cells, which allows these subgroups to attrite bystander cells leading to AIDS.
Collapse
|