Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kamal M, Xie X, Lander ES. A large family of ancient repeat elements in the human genome is under strong selection. Proc Natl Acad Sci U S A 2006;103:2740-5. [PMID: 16477033 PMCID: PMC1413850 DOI: 10.1073/pnas.0511238103] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

For:	Kamal M, Xie X, Lander ES. A large family of ancient repeat elements in the human genome is under strong selection. Proc Natl Acad Sci U S A 2006;103:2740-5. [PMID: 16477033 PMCID: PMC1413850 DOI: 10.1073/pnas.0511238103] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Number

Cited by Other Article(s)

Frith MC. Paleozoic Protein Fossils Illuminate the Evolution of Vertebrate Genomes and Transposable Elements. Mol Biol Evol 2022;39:6555113. [PMID: 35348724 PMCID: PMC9004415 DOI: 10.1093/molbev/msac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Berrio A, Haygood R, Wray GA. Identifying branch-specific positive selection throughout the regulatory genome using an appropriate proxy neutral. BMC Genomics 2020;21:359. [PMID: 32404186 PMCID: PMC7222330 DOI: 10.1186/s12864-020-6752-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 04/21/2020] [Indexed: 01/09/2023] Open

Zeng Y, Cao Y, Halevy RS, Nguyen P, Liu D, Zhang X, Ahituv N, Han JDJ. Characterization of functional transposable element enhancers in acute myeloid leukemia. SCIENCE CHINA-LIFE SCIENCES 2020;63:675-687. [PMID: 32170627 DOI: 10.1007/s11427-019-1574-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 10/24/2019] [Indexed: 12/15/2022]

Affiliation(s)

Yingying Zeng CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Yaqiang Cao CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Rivka Sukenik Halevy Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA.,Sackler School of Medicine, Tel-Aviv University, Tel Aviv, 6997801, Israel
Picard Nguyen Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA
Denghui Liu CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Xiaoli Zhang CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
Nadav Ahituv Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, 94158, USA. .,Institute for Human Genetics, University of California San Francisco, San Francisco, 94143, USA.
Jing-Dong J Han CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China. .,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology, Peking University, Beijing, 100871, China.

Collapse

Goerner-Potvin P, Bourque G. Computational tools to unmask transposable elements. Nat Rev Genet 2018;19:688-704. [DOI: 10.1038/s41576-018-0050-x] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Buckley RM, Kortschak RD, Adelson DL. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse. PLoS Comput Biol 2018;14:e1006091. [PMID: 29677183 PMCID: PMC5931693 DOI: 10.1371/journal.pcbi.1006091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 05/02/2018] [Accepted: 03/15/2018] [Indexed: 12/31/2022] Open

Venuto D, Bourque G. Identifying co-opted transposable elements using comparative epigenomics. Dev Growth Differ 2018;60:53-62. [PMID: 29363107 DOI: 10.1111/dgd.12423] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 12/08/2017] [Indexed: 12/19/2022]

Polychronopoulos D, King JWD, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res 2018;45:12611-12624. [PMID: 29121339 PMCID: PMC5728398 DOI: 10.1093/nar/gkx1074] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022] Open

Harmston N, Ing-Simmons E, Tan G, Perry M, Merkenschlager M, Lenhard B. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat Commun 2017;8:441. [PMID: 28874668 PMCID: PMC5585340 DOI: 10.1038/s41467-017-00524-5] [Citation(s) in RCA: 108] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Accepted: 07/05/2017] [Indexed: 02/08/2023] Open

Rayan NA, Del Rosario RCH, Prabhakar S. Massive contribution of transposable elements to mammalian regulatory sequences. Semin Cell Dev Biol 2016;57:51-56. [PMID: 27174439 DOI: 10.1016/j.semcdb.2016.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 05/06/2016] [Accepted: 05/06/2016] [Indexed: 12/17/2022]

Chandrashekar DS, Dey P, Acharya KK. GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species. PLoS One 2015. [PMID: 26208093 PMCID: PMC4514817 DOI: 10.1371/journal.pone.0133647] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract

Background

Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses.

Result

We developed the ‘Genomic Repeat Element Analyzer for Mammals’ (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species.

Conclusion

GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/.

Collapse

Lynch VJ, Nnamani MC, Kapusta A, Brayer K, Plaza SL, Mazur EC, Emera D, Sheikh SZ, Grützner F, Bauersachs S, Graf A, Young SL, Lieb JD, DeMayo FJ, Feschotte C, Wagner GP. Ancient transposable elements transformed the uterine regulatory landscape and transcriptome during the evolution of mammalian pregnancy. Cell Rep 2015;10:551-61. [PMID: 25640180 PMCID: PMC4447085 DOI: 10.1016/j.celrep.2014.12.052] [Citation(s) in RCA: 181] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 11/14/2014] [Accepted: 12/22/2014] [Indexed: 11/24/2022] Open

Affiliation(s)

Vincent J Lynch Department of Human Genetics, The University of Chicago, 920 East 58(th) Street, CLSC 319C, Chicago, IL 60637, USA.
Mauris C Nnamani Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
Aurélie Kapusta Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
Kathryn Brayer Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
Silvia L Plaza Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
Erik C Mazur Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
Deena Emera Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA
Shehzad Z Sheikh Division of Gastroenterology and Hepatology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Frank Grützner The Robinson Institute, School of Molecular and Biomedical Sciences, University of Adelaide, Adelaide, SA 5005, Australia
Stefan Bauersachs Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, LMU Munich, Feodor Lynen Strasse 25, 81377 Munich, Germany
Alexander Graf Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, LMU Munich, Feodor Lynen Strasse 25, 81377 Munich, Germany
Steven L Young Department of Obstetrics and Gynecology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27705, USA
Jason D Lieb Department of Human Genetics, The University of Chicago, 920 East 58(th) Street, CLSC 319C, Chicago, IL 60637, USA
Francesco J DeMayo Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX 77030, USA
Cédric Feschotte Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
Günter P Wagner Yale Systems Biology Institute and Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06511, USA

Collapse

del Rosario RCH, Rayan NA, Prabhakar S. Noncoding origins of anthropoid traits and a new null model of transposon functionalization. Genome Res 2014;24:1469-84. [PMID: 25043600 PMCID: PMC4158753 DOI: 10.1101/gr.168963.113] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Wilkins AS, Wrangham RW, Fitch WT. The "domestication syndrome" in mammals: a unified explanation based on neural crest cell behavior and genetics. Genetics 2014;197:795-808. [PMID: 25024034 PMCID: PMC4096361 DOI: 10.1534/genetics.114.165423] [Citation(s) in RCA: 344] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A 2014;111:6131-8. [PMID: 24753594 DOI: 10.1073/pnas.1318948111] [Citation(s) in RCA: 454] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

He S, Gu W, Li Y, Zhu H. ANRIL/CDKN2B-AS shows two-stage clade-specific evolution and becomes conserved after transposon insertions in simians. BMC Evol Biol 2013;13:247. [PMID: 24225082 PMCID: PMC3831594 DOI: 10.1186/1471-2148-13-247] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 11/08/2013] [Indexed: 02/07/2023] Open

Harmston N, Baresic A, Lenhard B. The mystery of extreme non-coding conservation. Philos Trans R Soc Lond B Biol Sci 2013;368:20130021. [PMID: 24218634 PMCID: PMC3826495 DOI: 10.1098/rstb.2013.0021] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Wenger AM, Clarke SL, Notwell JH, Chung T, Tuteja G, Guturu H, Schaar BT, Bejerano G. The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS Genet 2013;9:e1003728. [PMID: 24009522 PMCID: PMC3757057 DOI: 10.1371/journal.pgen.1003728] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 07/03/2013] [Indexed: 11/18/2022] Open

Jacques PÉ, Jeyakani J, Bourque G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet 2013;9:e1003504. [PMID: 23675311 PMCID: PMC3649963 DOI: 10.1371/journal.pgen.1003504] [Citation(s) in RCA: 222] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Accepted: 03/25/2013] [Indexed: 11/18/2022] Open

Matvienko M, Kozik A, Froenicke L, Lavelle D, Martineau B, Perroud B, Michelmore R. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS One 2013;8:e55913. [PMID: 23409088 PMCID: PMC3568094 DOI: 10.1371/journal.pone.0055913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 01/04/2013] [Indexed: 12/22/2022] Open

Simcha D, Price ND, Geman D. The limits of de novo DNA motif discovery. PLoS One 2012;7:e47836. [PMID: 23144830 PMCID: PMC3492406 DOI: 10.1371/journal.pone.0047836] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 09/21/2012] [Indexed: 12/02/2022] Open

Abstract

A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.

Collapse

Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates. Biol Direct 2012;7:36. [PMID: 23098210 PMCID: PMC3500645 DOI: 10.1186/1745-6150-7-36] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 10/23/2012] [Indexed: 12/16/2022] Open

Testori A, Caizzi L, Cutrupi S, Friard O, De Bortoli M, Cora' D, Caselle M. The role of Transposable Elements in shaping the combinatorial interaction of Transcription Factors. BMC Genomics 2012;13:400. [PMID: 22897927 PMCID: PMC3478180 DOI: 10.1186/1471-2164-13-400] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 06/28/2012] [Indexed: 12/22/2022] Open

Abstract

Background

In the last few years several studies have shown that Transposable Elements (TEs) in the human genome are significantly associated with Transcription Factor Binding Sites (TFBSs) and that in several cases their expansion within the genome led to a substantial rewiring of the regulatory network. Another important feature of the regulatory network which has been thoroughly studied is the combinatorial organization of transcriptional regulation. In this paper we combine these two observations and suggest that TEs, besides rewiring the network, also played a central role in the evolution of particular patterns of combinatorial gene regulation.

Results

To address this issue we searched for TEs overlapping Estrogen Receptor α (ERα) binding peaks in two publicly available ChIP-seq datasets from the MCF7 cell line corresponding to different modalities of exposure to estrogen. We found a remarkable enrichment of a few specific classes of Transposons. Among these a prominent role was played by MIR (Mammalian Interspersed Repeats) transposons. These TEs underwent a dramatic expansion at the beginning of the mammalian radiation and then stabilized. We conjecture that the special affinity of ERα for the MIR class of TEs could be at the origin of the important role assumed by ERα in Mammalians. We then searched for TFBSs within the TEs overlapping ChIP-seq peaks. We found a strong enrichment of a few precise combinations of TFBS. In several cases the corresponding Transcription Factors (TFs) were known cofactors of ERα, thus supporting the idea of a co-regulatory role of TFBS within the same TE. Moreover, most of these correlations turned out to be strictly associated to specific classes of TEs thus suggesting the presence of a well-defined "transposon code" within the regulatory network.

Conclusions

In this work we tried to shed light into the role of Transposable Elements (TEs) in shaping the regulatory network of higher eukaryotes. To test this idea we focused on a particular transcription factor: the Estrogen Receptor α (ERα) and we found that ERα preferentially targets a well defined set of TEs and that these TEs host combinations of transcriptional regulators involving several of known co-regulators of ERα. Moreover, a significant number of these TEs turned out to be conserved between human and mouse and located in the vicinity (and thus candidate to be regulators) of important estrogen-related genes.

Collapse

Tashiro K, Teissier A, Kobayashi N, Nakanishi A, Sasaki T, Yan K, Tarabykin V, Vigier L, Sumiyama K, Hirakawa M, Nishihara H, Pierani A, Okada N. A mammalian conserved element derived from SINE displays enhancer properties recapitulating Satb2 expression in early-born callosal projection neurons. PLoS One 2011;6:e28497. [PMID: 22174821 PMCID: PMC3234267 DOI: 10.1371/journal.pone.0028497] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 11/09/2011] [Indexed: 02/04/2023] Open

Abstract

Short interspersed repetitive elements (SINEs) are highly repeated sequences that account for a significant proportion of many eukaryotic genomes and are usually considered "junk DNA". However, we previously discovered that many AmnSINE1 loci are evolutionarily conserved across mammalian genomes, suggesting that they may have acquired significant functions involved in controlling mammalian-specific traits. Notably, we identified the AS021 SINE locus, located 390 kbp upstream of Satb2. Using transgenic mice, we showed that this SINE displays specific enhancer activity in the developing cerebral cortex. The transcription factor Satb2 is expressed by cortical neurons extending axons through the corpus callosum and is a determinant of callosal versus subcortical projection. Mouse mutants reveal a crucial function for Sabt2 in corpus callosum formation. In this study, we compared the enhancer activity of the AS021 locus with Satb2 expression during telencephalic development in the mouse. First, we showed that the AS021 enhancer is specifically activated in early-born Satb2(+) neurons. Second, we demonstrated that the activity of the AS021 enhancer recapitulates the expression of Satb2 at later embryonic and postnatal stages in deep-layer but not superficial-layer neurons, suggesting the possibility that the expression of Satb2 in these two subpopulations of cortical neurons is under genetically distinct transcriptional control. Third, we showed that the AS021 enhancer is activated in neurons projecting through the corpus callosum, as described for Satb2(+) neurons. Notably, AS021 drives specific expression in axons crossing through the ventral (TAG1(-)/NPY(+)) portion of the corpus callosum, confirming that it is active in a subpopulation of callosal neurons. These data suggest that exaptation of the AS021 SINE locus might be involved in enhancement of Satb2 expression, leading to the establishment of interhemispheric communication via the corpus callosum, a eutherian-specific brain structure.

Collapse

Franchini LF, López-Leal R, Nasif S, Beati P, Gelman DM, Low MJ, de Souza FJS, Rubinstein M. Convergent evolution of two mammalian neuronal enhancers by sequential exaptation of unrelated retroposons. Proc Natl Acad Sci U S A 2011;108:15270-5. [PMID: 21876128 PMCID: PMC3174587 DOI: 10.1073/pnas.1104997108] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Shin C, Nam JW, Farh KKH, Chiang HR, Shkumatava A, Bartel DP. Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 2010;38:789-802. [PMID: 20620952 DOI: 10.1016/j.molcel.2010.06.005] [Citation(s) in RCA: 450] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 04/27/2010] [Accepted: 06/03/2010] [Indexed: 12/30/2022]

Paquet Y, Anderson A. Sequence composition similarities with the 7SL RNA are highly predictive of functional genomic features. Nucleic Acids Res 2010;38:4907-16. [PMID: 20392819 PMCID: PMC2926601 DOI: 10.1093/nar/gkq234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Jung CH, Makunin IV, Mattick JS. Identification of conserved Drosophila-specific euchromatin-restricted non-coding sequence motifs. Genomics 2010;96:154-66. [PMID: 20595017 DOI: 10.1016/j.ygeno.2010.05.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 05/25/2010] [Accepted: 05/26/2010] [Indexed: 01/19/2023]

Warnefors M, Pereira V, Eyre-Walker A. Transposable Elements: Insertion Pattern and Impact on Gene Expression Evolution in Hominids. Mol Biol Evol 2010;27:1955-62. [DOI: 10.1093/molbev/msq084] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Eory L, Halligan DL, Keightley PD. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol Biol Evol 2010;27:177-92. [PMID: 19759235 DOI: 10.1093/molbev/msp219] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Abstract

Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.

Collapse

Ponicsan SL, Kugel JF, Goodrich JA. Genomic gems: SINE RNAs regulate mRNA production. Curr Opin Genet Dev 2010;20:149-55. [PMID: 20176473 DOI: 10.1016/j.gde.2010.01.004] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 01/15/2010] [Accepted: 01/24/2010] [Indexed: 01/22/2023]

Transposable elements in gene regulation and in the evolution of vertebrate genomes. Curr Opin Genet Dev 2009;19:607-12. [PMID: 19914058 DOI: 10.1016/j.gde.2009.10.013] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Revised: 10/20/2009] [Accepted: 10/26/2009] [Indexed: 01/30/2023]

Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat Rev Genet 2009;10:691-703. [PMID: 19763152 DOI: 10.1038/nrg2640] [Citation(s) in RCA: 1127] [Impact Index Per Article: 75.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

The plasticity of the mammalian transcriptome. Genomics 2009;95:1-6. [PMID: 19716875 DOI: 10.1016/j.ygeno.2009.08.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2009] [Revised: 08/05/2009] [Accepted: 08/22/2009] [Indexed: 11/28/2022]

Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. Identifying novel constrained elements by exploiting biased substitution patterns. ACTA ACUST UNITED AC 2009;25:i54-62. [PMID: 19478016 PMCID: PMC2687944 DOI: 10.1093/bioinformatics/btp190] [Citation(s) in RCA: 248] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Wang J, Bowen NJ, Mariño-Ramírez L, Jordan IK. A c-Myc regulatory subnetwork from human transposable element sequences. MOLECULAR BIOSYSTEMS 2009;5:1831-9. [PMID: 19763338 DOI: 10.1039/b908494k] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Fu W, Ray P, Xing EP. DISCOVER: a feature-based discriminative method for motif search in complex genomes. Bioinformatics 2009;25:i321-9. [PMID: 19478006 PMCID: PMC2687984 DOI: 10.1093/bioinformatics/btp230] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

MOTIVATION

Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate 'grammatical organization' of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites. Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features.

RESULTS

This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate our method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in F1 score.

AVAILABILITY AND IMPLEMENTATION

The code is publicly available at http://www.sailing.cs.cmu.edu/discover.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009;9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open

Abstract

Background

The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species.

Results

We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r²<5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates.

Conclusion

Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.

Collapse

Xu L, Guo L, Shen Z, Loss G, Gish R, Wasilenko S, Mason AL. Duplication of MER115 on chromosome 4 in patients with primary biliary cirrhosis. Liver Int 2009;29:375-83. [PMID: 19018986 DOI: 10.1111/j.1478-3231.2008.01888.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Pereira V, Enard D, Eyre-Walker A. The effect of transposable element insertions on gene expression evolution in rodents. PLoS One 2009;4:e4321. [PMID: 19183808 PMCID: PMC2629548 DOI: 10.1371/journal.pone.0004321] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2008] [Accepted: 11/24/2008] [Indexed: 01/04/2023] Open

Hirakawa M, Nishihara H, Kanehisa M, Okada N. Characterization and evolutionary landscape of AmnSINE1 in Amniota genomes. Gene 2008;441:100-10. [PMID: 19166919 DOI: 10.1016/j.gene.2008.12.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Revised: 11/29/2008] [Accepted: 12/04/2008] [Indexed: 11/18/2022]

Baele G, Van de Peer Y, Vansteelandt S. A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 2008;57:675-92. [PMID: 18853356 DOI: 10.1080/10635150802422324] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Xie HB, Irwin DM, Zhang YP. Evolution of conserved secondary structures and their function in transcriptional regulation networks. BMC Genomics 2008;9:520. [PMID: 18976501 PMCID: PMC2584662 DOI: 10.1186/1471-2164-9-520] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2008] [Accepted: 11/02/2008] [Indexed: 12/12/2022] Open

The opossum genome: insights and opportunities from an alternative mammal. Genome Res 2008;18:1199-215. [PMID: 18676819 DOI: 10.1101/gr.065326.107] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

McCarroll SA, Huett A, Kuballa P, Chilewski SD, Landry A, Goyette P, Zody MC, Hall JL, Brant SR, Cho JH, Duerr RH, Silverberg MS, Taylor KD, Rioux JD, Altshuler D, Daly MJ, Xavier RJ. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat Genet 2008;40:1107-12. [PMID: 19165925 PMCID: PMC2731799 DOI: 10.1038/ng.215] [Citation(s) in RCA: 518] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Bourque G, Leong B, Vega VB, Chen X, Lee YL, Srinivasan KG, Chew JL, Ruan Y, Wei CL, Ng HH, Liu ET. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res 2008;18:1752-62. [PMID: 18682548 DOI: 10.1101/gr.080663.108] [Citation(s) in RCA: 416] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, Warburton PE. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol 2008;3:e137. [PMID: 17630829 PMCID: PMC1914374 DOI: 10.1371/journal.pcbi.0030137] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 05/31/2007] [Indexed: 01/30/2023] Open

Abstract

The constant bombardment of mammalian genomes by transposable elements (TEs) has resulted in TEs comprising at least 45% of the human genome. Because of their great age and abundance, TEs are important in comparative phylogenomics. However, estimates of TE age were previously based on divergence from derived consensus sequences or phylogenetic analysis, which can be unreliable, especially for older more diverged elements. Therefore, a novel genome-wide analysis of TE organization and fragmentation was performed to estimate TE age independently of sequence composition and divergence or the assumption of a constant molecular clock. Analysis of TEs in the human genome revealed ∼600,000 examples where TEs have transposed into and fragmented other TEs, covering >40% of all TEs or ∼542 Mbp of genomic sequence. The relative age of these TEs over evolutionary time is implicit in their organization, because newer TEs have necessarily transposed into older TEs that were already present. A matrix of the number of times that each TE has transposed into every other TE was constructed, and a novel objective function was developed that derived the chronological order and relative ages of human TEs spanning >100 million years. This method has been used to infer the relative ages across all four major TE classes, including the oldest, most diverged elements. Analysis of DNA transposons over the history of the human genome has revealed the early activity of some MER2 transposons, and the relatively recent activity of MER1 transposons during primate lineages. The TEs from six additional mammalian genomes were defragmented and analyzed. Pairwise comparison of the independent chronological orders of TEs in these mammalian genomes revealed species phylogeny, the fact that transposons shared between genomes are older than species-specific transposons, and a subset of TEs that were potentially active during periods of speciation.

Transposable elements (TEs) are interspersed repetitive DNA families that are capable of copying themselves from place to place; they have literally infested our genome over evolutionary time, and now comprise as much as 45% of our total DNA. Because of their great age and abundance, TEs are important in evolutionary genomics. However, estimates of their age based on DNA sequence composition have been unreliable, especially for older more diverged elements. Therefore, a novel method to estimate the age of TEs was developed based on the fact that as TEs spread throughout the genome, they inserted into and fragmented older TEs that were already present. Therefore, the age of TEs can be revealed by how often they have been fragmented over evolutionary time. We performed a genome-wide defragmention of TEs, and developed a novel objective function to derive the chronological order of TEs spanning >100 million years. This method has been used to infer the relative ages of TEs from seven sequenced mammalian genomes across all four major TE classes, including the oldest, most diverged elements. This age estimate is independent of TE sequence composition or divergence and does not rely on the assumption of a constant molecular clock. This study provides a novel analysis of the evolutionary history of some of the most abundant and ancient repetitive DNA elements in mammalian genomes, which is important for understanding the dynamic forces that shape our genomes during evolution.

Collapse

Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA. BMC Genomics 2008;9:226. [PMID: 18485226 PMCID: PMC2397414 DOI: 10.1186/1471-2164-9-226] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2008] [Accepted: 05/17/2008] [Indexed: 12/14/2022] Open

Abstract

Background

The majority of human non-protein-coding DNA is made up of repetitive sequences, mainly transposable elements (TEs). It is becoming increasingly apparent that many of these repetitive DNA sequence elements encode gene regulatory functions. This fact has important evolutionary implications, since repetitive DNA is the most dynamic part of the genome. We set out to assess the evolutionary rate and pattern of experimentally characterized human transcription factor binding sites (TFBS) that are derived from repetitive versus non-repetitive DNA to test whether repeat-derived TFBS are in fact rapidly evolving. We also evaluated the position-specific patterns of variation among TFBS to look for signs of functional constraint on TFBS derived from repetitive and non-repetitive DNA.

Results

We found numerous experimentally characterized TFBS in the human genome, 7–10% of all mapped sites, which are derived from repetitive DNA sequences including simple sequence repeats (SSRs) and TEs. TE-derived TFBS sequences are far less conserved between species than TFBS derived from SSRs and non-repetitive DNA. Despite their rapid evolution, several lines of evidence indicate that TE-derived TFBS are functionally constrained. First of all, ancient TE families, such as MIR and L2, are enriched for TFBS relative to younger families like Alu and L1. Secondly, functionally important positions in TE-derived TFBS, specifically those residues thought to physically interact with their cognate protein binding factors (TF), are more evolutionarily conserved than adjacent TFBS positions. Finally, TE-derived TFBS show position-specific patterns of sequence variation that are highly distinct from random patterns and similar to the variation seen for non-repeat derived sequences of the same TFBS.

Conclusion

The abundance of experimentally characterized human TFBS that are derived from repetitive DNA speaks to the substantial regulatory effects that this class of sequence has on the human genome. The unique evolutionary properties of repeat-derived TFBS are perhaps even more intriguing. TE-derived TFBS in particular, while clearly functionally constrained, evolve extremely rapidly relative to non-repeat derived sites. Such rapidly evolving TFBS are likely to confer species-specific regulatory phenotypes, i.e. divergent expression patterns, on the human evolutionary lineage. This result has practical implications with respect to the widespread use of evolutionary conservation as a surrogate for functionally relevant non-coding DNA. Most TE-derived TFBS would be missed using the kinds of sequence conservation-based screens, such as phylogenetic footprinting, that are used to help characterize non-coding DNA. Thus, the very TFBS that are most likely to yield human-specific characteristics will be neglected by the comparative genomic techniques that are currently de rigeur for the identification of novel regulatory sites.

Collapse

Feschotte C. Transposable elements and the evolution of regulatory networks. Nat Rev Genet 2008;9:397-405. [PMID: 18368054 DOI: 10.1038/nrg2337] [Citation(s) in RCA: 885] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Dingel J, Hanus P, Leonardi N, Hagenauer J, Zech J, Mueller JC. Local conservation scores without a priori assumptions on neutral substitution rates. BMC Bioinformatics 2008;9:190. [PMID: 18405366 PMCID: PMC2375903 DOI: 10.1186/1471-2105-9-190] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 04/11/2008] [Indexed: 12/05/2022] Open

Abstract

Background

Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates.

Results

We propose a conservation signal that is produced by local Maximum Likelihood estimation of evolutionary parameters using an optimized sliding window and present a Kullback-Leibler projection that allows multiple different estimated parameters to be transformed into a conservation measure. This conservation measure does not rely on assumptions about neutral evolutionary substitution rates and little a priori assumptions on the properties of the conserved regions are imposed. We show the accuracy of our approach (KuLCons) on synthetic data and compare it to the scores generated by state-of-the-art methods (phastCons, GERP, SCONE) in an ENCODE region. We find that KuLCons is most often in agreement with the conservation/constraint signatures detected by GERP and SCONE while qualitatively very different patterns from phastCons are observed. Opposed to standard methods KuLCons can be extended to more complex evolutionary models, e.g. taking insertion and deletion events into account and corresponding results show that scores obtained under this model can diverge significantly from scores using the simpler model.

Conclusion

Our results suggest that discriminating among the different degrees of conservation is possible without making assumptions about neutral rates. We find, however, that it cannot be expected to discover considerably different constraint regions than GERP and SCONE. Consequently, we conclude that the reported discrepancies between experimentally verified functional and computationally identified constraint elements are likely not to be explained by biased neutral rate estimates.

Collapse

Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008;9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]