1
|
Gonzalez P, Hauck QC, Baxevanis AD. Conserved Noncoding Elements Evolve Around the Same Genes Throughout Metazoan Evolution. Genome Biol Evol 2024; 16:evae052. [PMID: 38502060 PMCID: PMC10988421 DOI: 10.1093/gbe/evae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 03/07/2024] [Accepted: 03/13/2024] [Indexed: 03/20/2024] Open
Abstract
Conserved noncoding elements (CNEs) are DNA sequences located outside of protein-coding genes that can remain under purifying selection for up to hundreds of millions of years. Studies in vertebrate genomes have revealed that most CNEs carry out regulatory functions. Notably, many of them are enhancers that control the expression of homeodomain transcription factors and other genes that play crucial roles in embryonic development. To further our knowledge of CNEs in other parts of the animal tree, we conducted a large-scale characterization of CNEs in more than 50 genomes from three of the main branches of the metazoan tree: Cnidaria, Mollusca, and Arthropoda. We identified hundreds of thousands of CNEs and reconstructed the temporal dynamics of their appearance in each lineage, as well as determining their spatial distribution across genomes. We show that CNEs evolve repeatedly around the same genes across the Metazoa, including around homeodomain genes and other transcription factors; they also evolve repeatedly around genes involved in neural development. We also show that transposons are a major source of CNEs, confirming previous observations from vertebrates and suggesting that they have played a major role in wiring developmental gene regulatory mechanisms since the dawn of animal evolution.
Collapse
Affiliation(s)
- Paul Gonzalez
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Quinn C Hauck
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andreas D Baxevanis
- Center for Genomics and Data Science Research, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
2
|
Bhutta MS, Awais M, Raouf A, Anjum A, Azam S, Shahid N, Malik K, Shahid AA, Rao AQ. Biosafety and toxicity assessment of transgenic cotton-harboring insecticide and herbicide tolerant genes on albino mice. Toxicol Res (Camb) 2024; 13:tfae043. [PMID: 38525247 PMCID: PMC10960071 DOI: 10.1093/toxres/tfae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 03/06/2024] [Accepted: 03/07/2024] [Indexed: 03/26/2024] Open
Abstract
Introduction Genetic engineering has revolutionized agriculture by transforming biotic and abiotic stress-resistance genes in plants. The biosafety of GM crops is a major concern for consumers and regulatory authorities. Methodology A 14-week biosafety and toxicity analysis of transgenic cotton, containing 5 transgenes ((Cry1Ac, Cry2A, CP4 EPSPS, VIP3Aa, and ASAL)), was conducted on albino mice. Thirty mice were divided into three groups (Conventional, Non-transgenic, without Bt, and transgenic, containing targeted crop) according to the feed given, with 10 mice in each group, with 5 male and 5 female mice in each group. Results During the study, no biologically significant changes were observed in the non-transgenic and transgenic groups compared to the control group in any of the study's parameters i.e. increase in weight of mice, physiological, pathological, and molecular analysis, irrespective of the gender of the mice. However, a statistically significant change was observed in the hematological parameters of the male mice, while no such change was observed in the female study group mice. The expression analysis, however, of the TNF gene increases many folds in the transgenic group as compared to the non-transgenic and conventional groups. Conclusion Overall, no physiological, pathological, or molecular toxicity was observed in the mice fed with transgenic feed. Therefore, it can be speculated that the targeted transgenic crop is biologically safe. However, more study is required to confirm the biosafety of the product on the animal by expression profiling.
Collapse
Affiliation(s)
- Muhammad Saad Bhutta
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Muhammad Awais
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Abdul Raouf
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Aqsa Anjum
- Department of Zoology, Government College Women University, Sialkot, 51310 Punjab, Pakistan
| | - Saira Azam
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Naila Shahid
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Kausar Malik
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Ahmed Ali Shahid
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| | - Abdul Qayyum Rao
- Centre of Excellence in Molecular Biology, University of the Punjab, 87 West Canal Rd, Thokar Niaz Baig Sector 1، Lahore, Punjab 53700 Lahore, Pakistan
| |
Collapse
|
3
|
Zhu X, Ma S, Wong WH. Genetic effects of sequence-conserved enhancer-like elements on human complex traits. Genome Biol 2024; 25:1. [PMID: 38167462 PMCID: PMC10759394 DOI: 10.1186/s13059-023-03142-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits. RESULTS Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes. CONCLUSIONS Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.
Collapse
Affiliation(s)
- Xiang Zhu
- Department of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, 16802, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, 201 Huck Life Sciences Building, University Park, 16802, PA, USA.
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
| | - Shining Ma
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, 94305, CA, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road MC5464, Stanford, 94305, CA, USA.
| |
Collapse
|
4
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
5
|
Bae S, Kim K, Kang K, Kim H, Lee M, Oh B, Kaneko K, Ma S, Choi JH, Kwak H, Lee EY, Park SH, Park-Min KH. RANKL-responsive epigenetic mechanism reprograms macrophages into bone-resorbing osteoclasts. Cell Mol Immunol 2023; 20:94-109. [PMID: 36513810 PMCID: PMC9794822 DOI: 10.1038/s41423-022-00959-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 11/03/2022] [Indexed: 12/15/2022] Open
Abstract
Monocyte/macrophage lineage cells are highly plastic and can differentiate into various cells under different environmental stimuli. Bone-resorbing osteoclasts are derived from the monocyte/macrophage lineage in response to receptor activator of NF-κB ligand (RANKL). However, the epigenetic signature contributing to the fate commitment of monocyte/macrophage lineage differentiation into human osteoclasts is largely unknown. In this study, we identified RANKL-responsive human osteoclast-specific superenhancers (SEs) and SE-associated enhancer RNAs (SE-eRNAs) by integrating data obtained from ChIP-seq, ATAC-seq, nuclear RNA-seq and PRO-seq analyses. RANKL induced the formation of 200 SEs, which are large clusters of enhancers, while suppressing 148 SEs in macrophages. RANKL-responsive SEs were strongly correlated with genes in the osteoclastogenic program and were selectively increased in human osteoclasts but marginally presented in osteoblasts, CD4+ T cells, and CD34+ cells. In addition to the major transcription factors identified in osteoclasts, we found that BATF binding motifs were highly enriched in RANKL-responsive SEs. The depletion of BATF1/3 inhibited RANKL-induced osteoclast differentiation. Furthermore, we found increased chromatin accessibility in SE regions, where RNA polymerase II was significantly recruited to induce the extragenic transcription of SE-eRNAs, in human osteoclasts. Knocking down SE-eRNAs in the vicinity of the NFATc1 gene diminished the expression of NFATc1, a major regulator of osteoclasts, and osteoclast differentiation. Inhibiting BET proteins suppressed the formation of some RANKL-responsive SEs and NFATc1-associated SEs, and the expression of SE-eRNA:NFATc1. Moreover, SE-eRNA:NFATc1 was highly expressed in the synovial macrophages of rheumatoid arthritis patients exhibiting high-osteoclastogenic potential. Our genome-wide analysis revealed RANKL-inducible SEs and SE-eRNAs as osteoclast-specific signatures, which may contribute to the development of osteoclast-specific therapeutic interventions.
Collapse
Affiliation(s)
- Seyeon Bae
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA
- Department of Medicine, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Kibyeong Kim
- Department of Biological Science, Ulsan National Institute of Science & Technology (UNIST), Ulsan, 44919, Republic of Korea
- Department of Life Science, College of Natural Sciences, Research Institute for Natural Sciences, Hanyang University, Seoul, Korea
| | - Keunsoo Kang
- Department of Microbiology, Dankook University, Cheonan, 3116, Republic of Korea
| | - Haemin Kim
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA
- Department of Medicine, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Minjoon Lee
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA
| | - Brian Oh
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA
| | - Kaichi Kaneko
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA
| | - Sungkook Ma
- Department of Biological Science, Ulsan National Institute of Science & Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Jae Hoon Choi
- Department of Life Science, College of Natural Sciences, Research Institute for Natural Sciences, Hanyang University, Seoul, Korea
| | - Hojoong Kwak
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| | - Eun Young Lee
- Division of Rheumatology, Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.
| | - Sung Ho Park
- Department of Biological Science, Ulsan National Institute of Science & Technology (UNIST), Ulsan, 44919, Republic of Korea.
| | - Kyung-Hyun Park-Min
- Arthritis and Tissue Degeneration Program, David Z. Rosensweig Genomics Research Center, Hospital for Special Surgery, New York, NY, 10021, USA.
- Department of Medicine, Weill Cornell Medical College, New York, NY, 10065, USA.
- BCMB Allied Program, Weill Cornell Graduate School of Medical Sciences, New York, NY, 10021, USA.
| |
Collapse
|
6
|
Gasperini M, Tome JM, Shendure J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat Rev Genet 2020; 21:292-310. [PMID: 31988385 PMCID: PMC7845138 DOI: 10.1038/s41576-019-0209-0] [Citation(s) in RCA: 196] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2019] [Indexed: 12/14/2022]
Abstract
The human gene catalogue is essentially complete, but we lack an equivalently vetted inventory of bona fide human enhancers. Hundreds of thousands of candidate enhancers have been nominated via biochemical annotations; however, only a handful of these have been validated and confidently linked to their target genes. Here we review emerging technologies for discovering, characterizing and validating human enhancers at scale. We furthermore propose a new framework for operationally defining enhancers that accommodates the heterogeneous and complementary results that are emerging from reporter assays, biochemical measurements and CRISPR screens.
Collapse
Affiliation(s)
- Molly Gasperini
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacob M Tome
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
7
|
Zemelman BV. Targeting Subsets of Mammalian Neurons. Neurosci Insights 2020; 15:2633105520908537. [PMID: 32783027 PMCID: PMC7384116 DOI: 10.1177/2633105520908537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 01/23/2020] [Indexed: 11/17/2022] Open
Abstract
Functional dissection of mammalian neuronal circuits depends on accurate targeting of constituent cell classes. Transgenic mice offer precise and predictable access to genetically defined cell populations, but there is the pressing need to target neuronal assemblies in species less amenable to genomic manipulations, such as the primate, which is an important animal model for human perception, cognition, and action. We have developed several virus-based methods for accessing all forebrain inhibitory interneurons as well as the major excitatory and inhibitory neuron subclasses. These methods rely on the wealth of emerging single-cell transcriptome data and harness gene expression variations to refine neuron targeting. Our approach enables nuanced functional studies, including in vivo imaging and manipulation, of the diverse cell populations of the mammalian neocortex, and it represents a timely blueprint for transgenics-independent interrogation of functionally significant cell classes.
Collapse
Affiliation(s)
- Boris V Zemelman
- Center for Learning and Memory, Department of Neuroscience, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
8
|
Fuertes MA, Rodrigo JR, Alonso C. Conserved Critical Evolutionary Gene Structures in Orthologs. J Mol Evol 2019; 87:93-105. [PMID: 30815710 DOI: 10.1007/s00239-019-09889-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 02/13/2019] [Indexed: 12/18/2022]
Abstract
Unravelling gene structure requires the identification and understanding of the constraints that are often associated with the evolutionary history and functional domains of genes. We speculated in this manuscript with the possibility of the existence in orthologs of an emergent highly conserved gene structure that might explain their coordinated evolution during speciation events and their parental function. Here, we will address the following issues: (1) is there any conserved hypothetical structure along ortholog gene sequences? (2) If any, are such conserved structures maintained and conserved during speciation events? The data presented show evidences supporting this hypothesis. We have found that, (1) most orthologs studied share highly conserved compositional structures not observed previously. (2) While the percent identity of nucleotide sequences of orthologs correlates with the percent identity of composon sequences, the number of emergent compositional structures conserved during speciation does not correlate with the percent identity. (3) A broad range of species conserves the emergent compositional stretches. We will also discuss the concept of critical gene structure.
Collapse
Affiliation(s)
- Miguel A Fuertes
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | - Carlos Alonso
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain
| |
Collapse
|
9
|
Li L, Barth NKH, Hirth E, Taher L. Pairs of Adjacent Conserved Noncoding Elements Separated by Conserved Genomic Distances Act as Cis-Regulatory Units. Genome Biol Evol 2018; 10:2535-2550. [PMID: 30184074 PMCID: PMC6161761 DOI: 10.1093/gbe/evy196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/01/2018] [Indexed: 01/02/2023] Open
Abstract
Comparative genomic studies have identified thousands of conserved noncoding elements (CNEs) in the mammalian genome, many of which have been reported to exert cis-regulatory activity. We analyzed ∼5,500 pairs of adjacent CNEs in the human genome and found that despite divergence at the nucleotide sequence level, the inter-CNE distances of the pairs are under strong evolutionary constraint, with inter-CNE sequences featuring significantly lower transposon densities than expected. Further, we show that different degrees of conservation of the inter-CNE distance are associated with distinct cis-regulatory functions at the CNEs. Specifically, the CNEs in pairs with conserved and mildly contracted inter-CNE sequences are the most likely to represent active or poised enhancers. In contrast, CNEs in pairs with extremely contracted or expanded inter-CNE sequences are associated with no cis-regulatory activity. Furthermore, we observed that functional CNEs in a pair have very similar epigenetic profiles, hinting at a functional relationship between them. Taken together, our results support the existence of epistatic interactions between adjacent CNEs that are distance-sensitive and disrupted by transposon insertions and deletions, and contribute to our understanding of the selective forces acting on cis-regulatory elements, which are crucial for elucidating the molecular mechanisms underlying adaptive evolution and human genetic diseases.
Collapse
Affiliation(s)
- Lifei Li
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Nicolai K H Barth
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Eva Hirth
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Leila Taher
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
10
|
Sequence and functional characterization of MIRNA164 promoters from Brassica shows copy number dependent regulatory diversification among homeologs. Funct Integr Genomics 2018. [DOI: 10.1007/s10142-018-0598-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
11
|
Polychronopoulos D, King JWD, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic Acids Res 2018; 45:12611-12624. [PMID: 29121339 PMCID: PMC5728398 DOI: 10.1093/nar/gkx1074] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022] Open
Abstract
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks–dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to diseases linked with development, and cancer. The emergence, evolutionary dynamics and functions of CNEs still remain poorly understood, and new approaches are required to enable comprehensive CNE identification and characterization. Here, we review current knowledge and identify challenges that need to be tackled to resolve the impasse in understanding extreme non-coding conservation.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - James W D King
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Alexander J Nash
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Ge Tan
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - Boris Lenhard
- Computational Regulatory Genomics Group, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK.,Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK.,Sars International Centre for Marine Molecular Biology, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| |
Collapse
|
12
|
Lai X, Behera S, Liang Z, Lu Y, Deogun JS, Schnable JC. STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species. MOLECULAR PLANT 2017; 10:990-999. [PMID: 28602693 DOI: 10.1016/j.molp.2017.05.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Revised: 05/24/2017] [Accepted: 05/30/2017] [Indexed: 06/07/2023]
Abstract
One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.
Collapse
Affiliation(s)
- Xianjun Lai
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Sairam Behera
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Zhikai Liang
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Yanli Lu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
| | - James C Schnable
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
| |
Collapse
|
13
|
Yokoyama KD, Zhang Y, Ma J. Tracing the evolution of lineage-specific transcription factor binding sites in a birth-death framework. PLoS Comput Biol 2014; 10:e1003771. [PMID: 25144359 PMCID: PMC4140645 DOI: 10.1371/journal.pcbi.1003771] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/27/2014] [Indexed: 11/24/2022] Open
Abstract
Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species. Recent experimental studies showed that the evolution of transcription factor binding sites (TFBS) is highly dynamic, with sites differing a great deal even between closely related mammalian species. Despite the substantial experimental evidence for rapid divergence of regulatory protein-binding events across species, computational methods designed to analyze regulatory elements evolution have focused primarily on phylogenetic footprinting approaches, in which putative functional regulatory elements are identified according to strong sequence conservation. Cross-species comparisons of non-coding sequences are limited in their ability to fully understand the evolution of regulatory sequences, particularly in cases where the elements are selected for novelty or species-specific. We have developed a novel framework to reconstruct the history of lineage-specific TFBS and showed that large amount of TFBS in human were born after human-mouse divergence. These elements also have distinct biological implications as compared to more ancient ones. This method can help understand the roles of lineage-specific TFBS in shaping gene regulation across different species.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yang Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jian Ma
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
14
|
CluGene: A Bioinformatics Framework for the Identification of Co-Localized, Co-Expressed and Co-Regulated Genes Aimed at the Investigation of Transcriptional Regulatory Networks from High-Throughput Expression Data. PLoS One 2013; 8:e66196. [PMID: 23823315 PMCID: PMC3688840 DOI: 10.1371/journal.pone.0066196] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2012] [Accepted: 05/05/2013] [Indexed: 01/03/2023] Open
Abstract
The full understanding of the mechanisms underlying transcriptional regulatory networks requires unravelling of complex causal relationships. Genome high-throughput technologies produce a huge amount of information pertaining gene expression and regulation; however, the complexity of the available data is often overwhelming and tools are needed to extract and organize the relevant information. This work starts from the assumption that the observation of co-occurrent events (in particular co-localization, co-expression and co-regulation) may provide a powerful starting point to begin unravelling transcriptional regulatory networks. Co-expressed genes often imply shared functional pathways; co-expressed and functionally related genes are often co-localized, too; moreover, co-expressed and co-localized genes are also potential targets for co-regulation; finally, co-regulation seems more frequent for genes mapped to proximal chromosome regions. Despite the recognized importance of analysing co-occurrent events, no bioinformatics solution allowing the simultaneous analysis of co-expression, co-localization and co-regulation is currently available. Our work resulted in developing and valuating CluGene, a software providing tools to analyze multiple types of co-occurrences within a single interactive environment allowing the interactive investigation of combined co-expression, co-localization and co-regulation of genes. The use of CluGene will enhance the power of testing hypothesis and experimental approaches aimed at unravelling transcriptional regulatory networks. The software is freely available at http://bioinfolab.unipg.it/.
Collapse
|
15
|
Simonatto M, Barozzi I, Natoli G. Non-coding transcription at cis-regulatory elements: computational and experimental approaches. Methods 2013; 63:66-75. [PMID: 23542771 DOI: 10.1016/j.ymeth.2013.03.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 03/18/2013] [Accepted: 03/20/2013] [Indexed: 12/17/2022] Open
Abstract
Mammalian genomes are pervasively transcribed, generating mostly RNAs with no coding potential that display different size, structure and interspecies sequence conservation. A prominent contribution to the ncRNA pool comes from the transcription of cis-regulatory elements, namely promoters, enhancers and locus control regions. While this phenomenon has been extensively documented, possible roles of such ncRNAs in gene regulation are still unclear. Addressing this issue will require experimental strategies dealing with the low abundance of enhancer-templated ncRNAs and aimed at specifically dissecting the relative role of transcription per se vs. RNA products. In this review, we first focus on the identification and characterization of cis-regulatory elements, highlighting the differences between emerging classes of ncRNAs associated to specific chromatin signatures. We then discuss current experimental strategies to dissect the function of nc transcription and computational approaches to the analysis and classification of regulatory sequences identified in next-generation sequencing experiments.
Collapse
Affiliation(s)
- Marta Simonatto
- Department of Experimental Oncology, European Institute of Oncology (IEO), Via Adamello 16, 20139 Milan, Italy.
| | | | | |
Collapse
|
16
|
Abstract
Insights into the evolution of hemoglobins and their genes are an abundant source of ideas regarding hemoglobin function and regulation of globin gene expression. This article presents the multiple genes and gene families encoding human globins, summarizes major events in the evolution of the hemoglobin gene clusters, and discusses how these studies provide insights into regulation of globin genes. Although the genes in and around the α-like globin gene complex are relatively stable, the β-like globin gene clusters are more dynamic, showing evidence of transposition to a new locus and frequent lineage-specific expansions and deletions. The cis-regulatory modules controlling levels and timing of gene expression are a mix of conserved and lineage-specific DNA, perhaps reflecting evolutionary constraint on core regulatory functions shared broadly in mammals and adaptive fine-tuning in different orders of mammals.
Collapse
Affiliation(s)
- Ross C Hardison
- Center for Comparative Genomics and Bioinformatics, Huck Institute of Genome Sciences, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
17
|
Abstract
Differential gene expression is the fundamental mechanism underlying animal development and cell differentiation. However, it is a challenge to identify comprehensively and accurately the DNA sequences that are required to regulate gene expression: namely, cis-regulatory modules (CRMs). Three major features, either singly or in combination, are used to predict CRMs: clusters of transcription factor binding site motifs, non-coding DNA that is under evolutionary constraint and biochemical marks associated with CRMs, such as histone modifications and protein occupancy. The validation rates for predictions indicate that identifying diagnostic biochemical marks is the most reliable method, and understanding is enhanced by the analysis of motifs and conservation patterns within those predicted CRMs.
Collapse
|
18
|
The hypersensitive sites of the murine β-globin locus control region act independently to affect nuclear localization and transcriptional elongation. Blood 2012; 119:3820-7. [PMID: 22378846 DOI: 10.1182/blood-2011-09-380485] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The β-globin locus control region (LCR) is necessary for high-level β-globin gene transcription and differentiation-dependent relocation of the β-globin locus from the nuclear periphery to the central nucleoplasm and to foci of hyperphosphorylated Pol II "transcription factories" (TFys). To determine the contribution of individual LCR DNaseI hypersensitive sites (HSs) to transcription and nuclear location, in the present study, we compared β-globin gene activity and location in erythroid cells derived from mice with deletions of individual HSs, deletions of 2 HSs, and deletion of the whole LCR and found all of the HSs had a similar spectrum of activities, albeit to different degrees. Each HS acts as an independent module to activate expression in an additive manner, and this is correlated with relocation away from the nuclear periphery. In contrast, HSs have redundant activities with respect to association with TFys and the probability that an allele is actively transcribed, as measured by primary RNA transcript FISH. The limiting effect on RNA levels occurs after β-globin genes associate with TFys, at which time HSs contribute to the amount of RNA arising from each burst of transcription by stimulating transcriptional elongation.
Collapse
|
19
|
Schanze D, Ekici AB, Pfuhlmann B, Reis A, Stöber G. Evaluation of conserved and ultra-conserved non-genic sequences in chromosome 15q15-linked periodic catatonia. Am J Med Genet B Neuropsychiatr Genet 2012; 159B:77-86. [PMID: 22162401 DOI: 10.1002/ajmg.b.32004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 11/03/2011] [Indexed: 01/14/2023]
Abstract
Conserved and ultra-conserved non-genic sequence elements (CNGs, UCEs) between human and other mammalian genomes seem to constitute a heterogeneous group of functional sequences which likely have important biological function. To determine whether variation in CNGs and UCEs contributes to risk for the schizophrenic subphenotype of periodic catatonia (according to K. Leonhard; OMIM 605419), we evaluated non-coding elements at a critical 7.35 Mb interval on chromosome 15q15 in 8 unrelated cases with periodic catatonia (derived from pedigrees compatible with linkage to chromosome 15q15) and 8 controls, followed by association studies in a cohort of 510 cases and controls. Among 65 CNGs (≥100 bp, 100% identity; human-mouse comparison), 7 CNGs matched criteria for UCE (≥200 bp, 100% identity). A hot spot of 62/65 CNGs (95%) appeared at the MEIS2 locus, which implicates functional importance of associated (ultra-)conserved elements to this early developmental gene, which is present in the human fetal neocortex and associated with metabolic side effects to antipsychotic drugs. Further CNGs were identified at the PLCB2 and DLL4 locus or located intergenic between TYRO3 and MAPKBP1. Automated sequencing revealed genetic variation in 12.3% of CNGs, but frequencies were low (MAF: 0.06-0.4) in cases. Three variants located inside CNGs/UCEs were found in cases only. In a case-control association study we could not confirm a significant association of these three CNG-variants with periodic catatonia. Our results suggest genetic variation in (ultra-)conserved non-genic sequence elements which might alter functional properties. The identified variants are genetically not associated with the phenotype of periodic catatonia.
Collapse
Affiliation(s)
- Denny Schanze
- Institute of Human Genetics, University of Erlangen-Nuremberg, Erlangen, Germany
| | | | | | | | | |
Collapse
|
20
|
Fakiola M, Miller EN, Fadl M, Mohamed HS, Jamieson SE, Francis RW, Cordell HJ, Peacock CS, Raju M, Khalil EA, Elhassan A, Musa AM, Silveira F, Shaw JJ, Sundar S, Jeronimo SMB, Ibrahim ME, Blackwell JM. Genetic and functional evidence implicating DLL1 as the gene that influences susceptibility to visceral leishmaniasis at chromosome 6q27. J Infect Dis 2011; 204:467-77. [PMID: 21742847 DOI: 10.1093/infdis/jir284] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Visceral leishmaniasis (VL) is caused by Leishmania donovani and Leishmania infantum chagasi. Genome-wide linkage studies from Sudan and Brazil identified a putative susceptibility locus on chromosome 6q27. METHODS Twenty-two single-nucleotide polymorphisms (SNPs) at genes PHF10, C6orf70, DLL1, FAM120B, PSMB1, and TBP were genotyped in 193 VL cases from 85 Sudanese families, and 8 SNPs at genes PHF10, C6orf70, DLL1, PSMB1, and TBP were genotyped in 194 VL cases from 80 Brazilian families. Family-based association, haplotype, and linkage disequilibrium analyses were performed. Multispecies comparative sequence analysis was used to identify conserved noncoding sequences carrying putative regulatory elements. Quantitative reverse-transcription polymerase chain reaction measured expression of candidate genes in splenic aspirates from Indian patients with VL compared with that in the control spleen sample. RESULTS Positive associations were observed at PHF10, C6orf70, DLL1, PSMB1, and TBP in Sudan, but only at DLL1 in Brazil (combined P = 3 × 10(-4) at DLL1 across Sudan and Brazil). No functional coding region variants were observed in resequencing of 22 Sudanese VL cases. DLL1 expression was significantly (P = 2 × 10(-7)) reduced (mean fold change, 3.5 [SEM, 0.7]) in splenic aspirates from patients with VL, whereas other 6q27 genes showed higher levels (1.27 × 10(-6) < P < .01) than did the control spleen sample. A cluster of conserved noncoding sequences with putative regulatory variants was identified in the distal promoter of DLL1. CONCLUSIONS DLL1, which encodes Delta-like 1, the ligand for Notch3, is strongly implicated as the chromosome 6q27 VL susceptibility gene.
Collapse
Affiliation(s)
- Michaela Fakiola
- Cambridge Institute for Medical Research and Department of Medicine, University of Cambridge School of Clinical Medicine, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
When needles look like hay: how to find tissue-specific enhancers in model organism genomes. Dev Biol 2010; 350:239-54. [PMID: 21130761 DOI: 10.1016/j.ydbio.2010.11.026] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 11/11/2010] [Accepted: 11/22/2010] [Indexed: 01/22/2023]
Abstract
A major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found. Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project.
Collapse
|
22
|
Ovacik MA, Androulakis IP. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison. Toxicol Appl Pharmacol 2010; 271:363-71. [PMID: 20851138 DOI: 10.1016/j.taap.2010.09.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Revised: 08/24/2010] [Accepted: 09/10/2010] [Indexed: 11/30/2022]
Abstract
Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.
Collapse
Affiliation(s)
- Meric A Ovacik
- Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854, USA
| | | |
Collapse
|
23
|
Tran DA, Wong TC, Schep AN, Drewell RA. Characterization of an Ultra-Conserved Putativecis-Regulatory Module at the Mammalian Telomerase Reverse Transcriptase Gene. DNA Cell Biol 2010; 29:499-508. [DOI: 10.1089/dna.2009.0994] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Diana A. Tran
- Department of Biology, Harvey Mudd College, Claremont, California
| | - Terence C. Wong
- Department of Biology, Harvey Mudd College, Claremont, California
| | - Alicia N. Schep
- Department of Biology, Harvey Mudd College, Claremont, California
| | | |
Collapse
|
24
|
Merhej V, Raoult D. Rickettsial evolution in the light of comparative genomics. Biol Rev Camb Philos Soc 2010; 86:379-405. [DOI: 10.1111/j.1469-185x.2010.00151.x] [Citation(s) in RCA: 183] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Solovieff N, Milton JN, Hartley SW, Sherva R, Sebastiani P, Dworkis DA, Klings ES, Farrer LA, Garrett ME, Ashley-Koch A, Telen MJ, Fucharoen S, Ha SY, Li CK, Chui DHK, Baldwin CT, Steinberg MH. Fetal hemoglobin in sickle cell anemia: genome-wide association studies suggest a regulatory region in the 5' olfactory receptor gene cluster. Blood 2010; 115:1815-22. [PMID: 20018918 PMCID: PMC2832816 DOI: 10.1182/blood-2009-08-239517] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Accepted: 11/18/2009] [Indexed: 11/20/2022] Open
Abstract
In a genome-wide association study of 848 blacks with sickle cell anemia, we identified single nucleotide polymorphisms (SNPs) associated with fetal hemoglobin concentration. The most significant SNPs in a discovery sample were tested in a replication set of 305 blacks with sickle cell anemia and in subjects with hemoglobin E or beta thalassemia trait from Thailand and Hong Kong. A novel region on chromosome 11 containing olfactory receptor genes OR51B5 and OR51B6 was identified by 6 SNPs (lowest P = 4.7E-08) and validated in the replication set. An additional olfactory receptor gene, OR51B2, was identified by a novel SNP set enrichment analysis. Genome-wide association studies also validated a previously identified SNP (rs766432) in BCL11A, a gene known to affect fetal hemoglobin levels (P = 2.6E-21) and in Thailand and Hong Kong subjects. Elements within the olfactory receptor gene cluster might play a regulatory role in gamma-globin gene expression.
Collapse
MESH Headings
- Adolescent
- Adult
- Black or African American/genetics
- Anemia, Sickle Cell/blood
- Anemia, Sickle Cell/genetics
- Carrier Proteins/genetics
- Child
- Child, Preschool
- Chromosomes, Human, Pair 11/genetics
- Chromosomes, Human, X/genetics
- Female
- Fetal Hemoglobin/genetics
- Fetal Hemoglobin/metabolism
- Genome-Wide Association Study
- Hemoglobin E/genetics
- Hong Kong
- Humans
- Male
- Multigene Family
- Nuclear Proteins/genetics
- Polymorphism, Single Nucleotide
- Receptors, Odorant/genetics
- Regulatory Sequences, Nucleic Acid
- Repressor Proteins
- Thailand
- Young Adult
- beta-Thalassemia/genetics
Collapse
Affiliation(s)
- Nadia Solovieff
- Department of Biostatistics, Boston University School of Public Health, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Shen X, Walsh B, Li JJ, Pang HX, Wang WJ, Tao SH. The correlations of the function and positional distribution of the cis-elements CArG around the TSS in the genes of Mus musculus. Genome 2009; 52:217-21. [PMID: 19234549 DOI: 10.1139/g08-117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
While many studies of cis-elements CArG bound by serum response factor (SRF) are in progress, little is known about the positional distribution of the functional CArG elements around the transcription start site (TSS) of genes that they influence. We use a validated CArG data set to calculate the distance distribution of functional CArG elements around the TSS. Distances between adjacent CArGs were also analyzed. We compare these distributions with those derived using a control set of randomly selected CArGs (that were not experimentally validated for function). Our results show that most functional CArG elements (108 of 152, 71%) exist upstream of the annotated TSS, with copy number increasing as one moves closer to the TSS. Moreover, the average number of the CArG elements in the CArG-containing genes is significantly more than that in the control genes. Our study extends earlier bioinformatic analyses of functional CArG elements and provides an application of comparative sequence data to the identification of transcription factor binding sites.
Collapse
Affiliation(s)
- Xia Shen
- Bioinformatics Center, Northwest A&F University, 712100 Yangling, Shaanxi, China
| | | | | | | | | | | |
Collapse
|
27
|
Allele-specific expression and gene methylation in the control of CYP1A2 mRNA level in human livers. THE PHARMACOGENOMICS JOURNAL 2009; 9:208-17. [DOI: 10.1038/tpj.2009.4] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
28
|
Hestand MS, van Galen M, Villerius MP, van Ommen GJB, den Dunnen JT, 't Hoen PAC. CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes. BMC Bioinformatics 2008; 9:495. [PMID: 19036135 PMCID: PMC2613159 DOI: 10.1186/1471-2105-9-495] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 11/26/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. RESULTS We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFAC R database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. CONCLUSION The program CORE_TF is accessible in a user friendly web interface at http://www.LGTC.nl/CORE_TF. It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites.
Collapse
Affiliation(s)
- Matthew S Hestand
- The Center for Human and Clinical Genetics, Leiden University Medical Center, Postzone S4-0P, PO Box 9600, 2300 RC Leiden, The Netherlands.
| | | | | | | | | | | |
Collapse
|
29
|
Genomic promoter analysis predicts functional transcription factor binding. Adv Bioinformatics 2008; 2008:369830. [PMID: 19865592 PMCID: PMC2768302 DOI: 10.1155/2008/369830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Revised: 05/15/2008] [Accepted: 07/17/2008] [Indexed: 02/02/2023] Open
Abstract
Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology.
Results.
We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%.
Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.
Collapse
|
30
|
Abstract
BACKGROUND Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions. METHODS A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions. RESULTS The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity < or = 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/ CONCLUSION The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Purdue University, 150 N, University Street, West Lafayette, IN 47906, USA.
| |
Collapse
|
31
|
Eisermann K, Tandon S, Bazarov A, Brett A, Fraizer G, Piontkivska H. Evolutionary conservation of zinc finger transcription factor binding sites in promoters of genes co-expressed with WT1 in prostate cancer. BMC Genomics 2008; 9:337. [PMID: 18631392 PMCID: PMC2515153 DOI: 10.1186/1471-2164-9-337] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2008] [Accepted: 07/16/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression analyses have led to a better understanding of growth control of prostate cancer cells. We and others have identified the presence of several zinc finger transcription factors in the neoplastic prostate, suggesting a potential role for these genes in the regulation of the prostate cancer transcriptome. One of the transcription factors (TFs) identified in the prostate cancer epithelial cells was the Wilms tumor gene (WT1). To rapidly identify coordinately expressed prostate cancer growth control genes that may be regulated by WT1, we used an in silico approach. RESULTS Evolutionary conserved transcription factor binding sites (TFBS) recognized by WT1, EGR1, SP1, SP2, AP2 and GATA1 were identified in the promoters of 24 differentially expressed prostate cancer genes from eight mammalian species. To test the relationship between sequence conservation and function, chromatin of LNCaP prostate cancer and kidney 293 cells were tested for TF binding using chromatin immunoprecipitation (ChIP). Multiple putative TFBS in gene promoters of placental mammals were found to be shared with those in human gene promoters and some were conserved between genomes that diverged about 170 million years ago (i.e., primates and marsupials), therefore implicating these sites as candidate binding sites. Among those genes coordinately expressed with WT1 was the kallikrein-related peptidase 3 (KLK3) gene commonly known as the prostate specific antigen (PSA) gene. This analysis located several potential WT1 TFBS in the PSA gene promoter and led to the rapid identification of a novel putative binding site confirmed in vivo by ChIP. Conversely for two prostate growth control genes, androgen receptor (AR) and vascular endothelial growth factor (VEGF), known to be transcriptionally regulated by WT1, regulatory sequence conservation was observed and TF binding in vivo was confirmed by ChIP. CONCLUSION Overall, this targeted approach rapidly identified important candidate WT1-binding elements in genes coordinately expressed with WT1 in prostate cancer cells, thus enabling a more focused functional analysis of the most likely target genes in prostate cancer progression. Identifying these genes will help to better understand how gene regulation is altered in these tumor cells.
Collapse
Affiliation(s)
- Kurtis Eisermann
- School of Biomedical Sciences, Kent State University, Kent, Ohio, USA.
| | | | | | | | | | | |
Collapse
|
32
|
Louro R, El-Jundi T, Nakaya HI, Reis EM, Verjovski-Almeida S. Conserved tissue expression signatures of intronic noncoding RNAs transcribed from human and mouse loci. Genomics 2008; 92:18-25. [DOI: 10.1016/j.ygeno.2008.03.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2008] [Revised: 03/25/2008] [Accepted: 03/28/2008] [Indexed: 12/15/2022]
|
33
|
Elgar G, Vavouri T. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet 2008; 24:344-52. [PMID: 18514361 DOI: 10.1016/j.tig.2008.04.005] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2008] [Revised: 04/14/2008] [Accepted: 04/14/2008] [Indexed: 01/25/2023]
|
34
|
Ohtomo T, Miyatake S, Kajiyama Y, Umezu-Goto M, Kobayashi N, Kaminuma O, Mori A. Airway eosinophilic inflammation is attenuated in conserved noncoding sequence-1-deficient mice. Int Arch Allergy Immunol 2008; 146 Suppl 1:2-6. [PMID: 18504398 DOI: 10.1159/000126052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Conserved noncoding sequence-1 (CNS-1) is an important regulatory element for T helper 2 cytokine expression. IL-4, IL-5 and IL-13 expression as well as serum IgE level were attenuated in CNS-1-/- mice. METHOD CNS-1-/- and CNS-1+/+ mice were sensitized with ovalbumin (OVA) followed by antigen challenge. The number of eosinophils and T helper 2 cytokine concentration in the bronchoalveolar lavage fluid, OVA-specific IgE antibody (Ab) in the serum and bronchial responsiveness to methacholine were examined. RESULTS Bronchoalveolar lavage fluid eosinophilia was significantly attenuated in CNS-1-/- mice compared to CNS-1+/+ mice, which were sensitized with OVA/aluminum once. OVA-specific IgE Ab was also attenuated. When mice were sensitized with OVA/aluminum twice, induction of eosinophilia and OVA-specific IgE Ab was not significantly different between CNS-1-/- and CNS-1+/+ mice. CONCLUSION CNS-1 locus regulates eosinophilic inflammation in vivo.
Collapse
Affiliation(s)
- Takayuki Ohtomo
- National Hospital Organization, Sagamihara National Hospital, Clinical Research Center for Allergy and Rheumatology, Sagamihara, Japan
| | | | | | | | | | | | | |
Collapse
|
35
|
Identification of SOX4 target genes using phylogenetic footprinting-based prediction from expression microarrays suggests that overexpression of SOX4 potentiates metastasis in hepatocellular carcinoma. Oncogene 2008; 27:5578-89. [PMID: 18504433 DOI: 10.1038/onc.2008.168] [Citation(s) in RCA: 167] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A comprehensive microarray analysis of hepatocellular carcinoma (HCC) revealed distinct synexpression patterns during intrahepatic metastasis. Recent evidence has demonstrated that synexpression group member genes are likely to be regulated by master control gene(s). Here we investigate the functions and gene regulation of the transcription factor SOX4 in intrahepatic metastatic HCC. SOX4 is important in tumor metastasis as RNAi knockdown reduces tumor cell migration, invasion, in vivo tumorigenesis and metastasis. A multifaceted approach integrating gene profiling, binding site computation and empirical verification by chromatin immunoprecipitation and gene ablation refined the consensus SOX4 binding motif and identified 32 binding loci in 31 genes with high confidence. RNAi knockdown of two SOX4 target genes, neuropilin 1 and semaphorin 3C, drastically reduced cell migration activity in HCC cell lines suggesting that SOX4 exerts some of its action via regulation of these two downstream targets. The discovery of 31 previously unidentified targets expands our knowledge of how SOX4 modulates HCC progression and implies a range of novel SOX4 functions. This integrated approach sets a paradigm whereby a subset of member genes from a synexpression group can be regulated by one master control gene and this is exemplified by SOX4 and advanced HCC.
Collapse
|
36
|
Kim BC, Kim WY, Park D, Chung WH, Shin KS, Bhak J. SNP@Promoter: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions. BMC Bioinformatics 2008; 9 Suppl 1:S2. [PMID: 18315851 PMCID: PMC2259403 DOI: 10.1186/1471-2105-9-s1-s2] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of single nucleotide polymorphism (SNP) is becoming a key research in genomics fields. Many functional analyses of SNPs have been carried out for coding regions and splicing sites that can alter proteins and mRNA splicing. However, SNPs in non-coding regulatory regions can also influence important biological regulation. Presently, there are few databases for SNPs in non-coding regulatory regions. DESCRIPTION We identified 488,452 human SNPs in the putative promoter regions that extended from the +5000 bp to -500 bp region of the transcription start sites. Some SNPs occurring in transcription factor (TF) binding sites were also predicted (47,832 SNP; 9.8%). The result is stored in a database: SNP@promoter. Users can search the SNP@Promoter database using three entries: 1) by SNP identifier (rs number from dbSNP), 2) by gene (gene name, gene symbol, refSeq ID), and 3) by disease term. The SNP@Promoter database provides extensive genetic information and graphical views of queried terms. CONCLUSION We present the SNP@Promoter database. It was created in order to predict functional SNPs in putative promoter regions and predicted transcription factor binding sites. SNP@Promoter will help researchers to identify functional SNPs in non-coding regions.
Collapse
Affiliation(s)
- Byoung-Chul Kim
- Korean BioInformation Center (KOBIC), KRIBB, Daejeon 305-806, Korea.
| | | | | | | | | | | |
Collapse
|
37
|
Hu Y, Papagerakis P, Ye L, Feng JQ, Simmer JP, Hu JCC. Distal cis-regulatory elements are required for tissue-specific expression of enamelin (Enam). Eur J Oral Sci 2008; 116:113-23. [PMID: 18353004 DOI: 10.1111/j.1600-0722.2007.00519.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Enamel formation is orchestrated by the sequential expression of genes encoding enamel matrix proteins; however, the mechanisms sustaining the spatio-temporal order of gene transcription during amelogenesis are poorly understood. The aim of this study was to characterize the cis-regulatory sequences necessary for normal expression of enamelin (Enam). Several enamelin transcription regulatory regions, showing high sequence homology among species, were identified. DNA constructs containing 5.2 or 3.9 kb regions upstream of the enamelin translation initiation site were linked to a LacZ reporter and used to generate transgenic mice. Only the 5.2-Enam-LacZ construct was sufficient to recapitulate the endogenous pattern of enamelin tooth-specific expression. The 3.9-Enam-LacZ transgenic lines showed no expression in dental cells, but ectopic beta-galactosidase activity was detected in osteoblasts. Potential transcription factor-binding sites were identified that may be important in controlling enamelin basal promoter activity and in conferring enamelin tissue-specific expression. Our study provides new insights into regulatory mechanisms governing enamelin expression.
Collapse
Affiliation(s)
- Yuanyuan Hu
- Department of Orthodontics and Pediatric Dentistry, University of Michigan School of Dentistry, Ann Arbor, MI 48108, USA
| | | | | | | | | | | |
Collapse
|
38
|
Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat Genet 2008; 40:158-60. [PMID: 18176564 DOI: 10.1038/ng.2007.55] [Citation(s) in RCA: 251] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2007] [Accepted: 10/16/2007] [Indexed: 01/29/2023]
Abstract
Extended perfect human-rodent sequence identity of at least 200 base pairs (ultraconservation) is potentially indicative of evolutionary or functional uniqueness. We used a transgenic mouse assay to compare the embryonic enhancer activity of 231 noncoding ultraconserved human genome regions with that of 206 extremely conserved regions lacking ultraconservation. Developmental enhancers were equally prevalent in both populations, suggesting instead that ultraconservation identifies a small, functionally indistinct subset of similarly constrained cis-regulatory elements.
Collapse
Affiliation(s)
- Axel Visel
- Genomics Division, MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Woolfe A, Elgar G. Organization of conserved elements near key developmental regulators in vertebrate genomes. ADVANCES IN GENETICS 2008; 61:307-38. [PMID: 18282512 DOI: 10.1016/s0065-2660(07)00012-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sequence conservation has traditionally been used as a means to target functional regions of complex genomes. In addition to its use in identifying coding regions of genes, the recent availability of whole genome data for a number of vertebrates has permitted high-resolution analyses of the noncoding "dark matter" of the genome. This has resulted in the identification of a large number of highly conserved sequence elements that appear to be preserved in all bony vertebrates. Further positional analysis of these conserved noncoding elements (CNEs) in the genome demonstrates that they cluster around genes involved in developmental regulation. This chapter describes the identification and characterization of these elements, with particular reference to their composition and organization.
Collapse
Affiliation(s)
- Adam Woolfe
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
| | | |
Collapse
|
40
|
Fan X, Zhu J, Schadt EE, Liu JS. Statistical power of phylo-HMM for evolutionarily conserved element detection. BMC Bioinformatics 2007; 8:374. [PMID: 17919331 PMCID: PMC2194792 DOI: 10.1186/1471-2105-8-374] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 10/05/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated. RESULTS We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors. CONCLUSION Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.
Collapse
Affiliation(s)
- Xiaodan Fan
- Department of Statistics, Harvard University, Boston, MA, USA.
| | | | | | | |
Collapse
|
41
|
Abstract
The elucidation of a growing number of species' genomes heralds an unprecedented opportunity to ascertain functional attributes of non-coding sequences. In particular, cis regulatory modules (CRMs) controlling gene expression constitute a rich treasure trove of data to be defined and experimentally validated. Such information will provide insight into cell lineage determination and differentiation and the genetic basis of heritable diseases as well as the development of novel tools for restricting the inactivation of genes to specific cell types or conditions. Historically, the study of CRMs and their individual transcription factor binding sites has been limited to proximal regions around gene loci. Two important by-products of the genomics revolution, artificial chromosome vectors and comparative genomics, have fueled efforts to define an increasing number of CRMs acting remotely to control gene expression. Such regulation from a distance has challenged our perspectives of gene expression control and perhaps the very definition of a gene. This review summarizes current approaches to characterize remote control of gene expression in transgenic mice and inherent limitations for accurately interpreting the essential nature of CRM activity.
Collapse
Affiliation(s)
- Xiaochun Long
- Cardiovascular Research Institute, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | | |
Collapse
|
42
|
Kovaleva GY, Bazykin GA, Brudno M, Gelfand MS. Comparative genomics of transcriptional regulation in yeasts and its application to identification of a candidate alpha-isopropylmalate transporter. J Bioinform Comput Biol 2007; 4:981-98. [PMID: 17099937 DOI: 10.1142/s0219720006002284] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 05/17/2006] [Accepted: 06/21/2006] [Indexed: 01/14/2023]
Abstract
Conservation rates in non-protein-coding regions of five yeast genomes of the genus Saccharomyces were analyzed using multiple whole-genome alignments. This analysis confirmed previously shown decrease in conservation rates observed immediately upstream of the translation start point and downstream of the stop-codon. Further, there was a sharp conservation peak in the upstream regions likely related to the core promoter (-35 bp to +35 bp around TSS) and a conservation peak downstream of the stop-codon whose function is not yet clear. Regulation of leucine and methionine biosynthesis controlled by the global regulator Gcn4p and pathway-specific regulators was analyzed in detail. A candidate alpha-isopropylmalate carrier, YOR271cp, was identified based on conservation of Leu3p binding sites, analysis of ChIP-chip data, protein localization and sequence similarity.
Collapse
Affiliation(s)
- Galina Yu Kovaleva
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia.
| | | | | | | |
Collapse
|
43
|
Abstract
With the availability of genomic sequence from numerous vertebrates, a paradigm shift has occurred in the identification of distant-acting gene regulatory elements. In contrast to traditional gene-centric studies in which investigators randomly scanned genomic fragments that flank genes of interest in functional assays, the modern approach begins electronically with publicly available comparative sequence datasets that provide investigators with prioritized lists of putative functional sequences based on their evolutionary conservation. However, although a large number of tools and resources are now available, application of comparative genomic approaches remains far from trivial. In particular, it requires users to dynamically consider the species and methods for comparison depending on the specific biological question under investigation. While there is currently no single general rule to this end, it is clear that when applied appropriately, comparative genomic approaches exponentially increase our power in generating biological hypotheses for subsequent experimental testing. It is anticipated that cardiac-related genes and the identification of their distant-acting transcriptional enhancers are particularly poised to benefit from these modern capabilities.
Collapse
Affiliation(s)
- Axel Visel
- Genomics Division, MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
44
|
Abstract
Computational biology is a rapidly evolving area where methodologies from computer science, mathematics, and statistics are applied to address fundamental problems in biology. The study of gene regulatory information is a central problem in current computational biology. This article reviews recent development of statistical methods related to this field. Starting from microarray gene selection, we examine methods for finding transcription factor binding motifs and cis-regulatory modules in coregulated genes, and methods for utilizing information from cross-species comparisons and ChIP-chip experiments. The ultimate understanding of cis-regulatory logic in mammalian genomes may require the integration of information collected from all these steps.
Collapse
Affiliation(s)
- Hongkai Ji
- Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
45
|
Wang H, Zhang Y, Cheng Y, Zhou Y, King DC, Taylor J, Chiaromonte F, Kasturi J, Petrykowska H, Gibb B, Dorman C, Miller W, Dore LC, Welch J, Weiss MJ, Hardison RC. Experimental validation of predicted mammalian erythroid cis-regulatory modules. Genes Dev 2006; 16:1480-92. [PMID: 17038566 PMCID: PMC1665632 DOI: 10.1101/gr.5353806] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2006] [Accepted: 06/07/2006] [Indexed: 11/25/2022]
Abstract
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
Collapse
Affiliation(s)
- Hao Wang
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Ying Zhang
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Intercollege Graduate Degree Program in Genetics
| | - Yong Cheng
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Yuepin Zhou
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - David C. King
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Intercollege Graduate Degree Program in Integrative Biosciences
| | - James Taylor
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
| | - Francesca Chiaromonte
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Statistics, and
| | - Jyotsna Kasturi
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
| | - Hanna Petrykowska
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Brian Gibb
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Christine Dorman
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Louis C. Dore
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - John Welch
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Mitchell J. Weiss
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Ross C. Hardison
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| |
Collapse
|
46
|
Abstract
We introduce a new system, called shortHMM, for predicting exons, which predicts individual exons using two related genomes. In this system, we build a hidden semi-Markov model to identify exons. In the hidden Markov model, we propose joint probability models of nucleotides in introns, splice sites, 5'UTR, 3'UTR, and intergenic regions by exploiting the homology between related genomes. In order to reduce the false positive rate of the hidden Markov model, we develop a screening process which is able to identify intergenic regions. We then build a classifier by combining the statistics from the hidden Markov model and the screening process. We implement shortHMM on human-mouse sequence alignments. The source codes are available at < www.stat.purdue.edu/ jingwu/hmm >. Compared to TWINSCAN and SLAM, shortHMM is substantially more powerful in identifying AT-rich RefSeq exons (8% more AT-rich RefSeq exons were predicted), as well as slightly more powerful in identifying RefSeq exons (3-10% more RefSeq exons were predicted), at a similar or lower false positive rate, with less computing time and with less memory usage. Last, shortHMM is also capable of finding new potential exons.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Purdue University, West Lafayette, Indiana 47906, USA.
| | | |
Collapse
|
47
|
Abstract
Global gene expression profiling of hepatocellular carcinoma (HCC) is a promising new technology that has already refined the diagnosis and prognostic predictions of HCC patients. This has been accomplished by identifying genes whose expression pattern is associated with clinicopathological features of HCC tumors. Molecular characterization of HCC from gene expression profiling studies will undoubtedly improve the prediction of treatment responses, selection of treatments for specific molecular subtypes of HCC and ultimately the clinical outcome of HCC patients. The research focus is now shifting toward the identification of genetic determinants that are components of the specific regulatory pathways altered in cancers, and that may constitute novel therapeutic targets. Here we review the recent advances in gene expression profiling of HCC and discuss the future strategies for analysing large and complicated data sets from microarray studies and how to integrate these with diverse genomic data.
Collapse
Affiliation(s)
- J-S Lee
- Laboratory of Experimental Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892-4262, USA
| | | |
Collapse
|
48
|
Abstract
The regulation of gene expression plays an important role in complex phenotypes, including disease in humans. For some genes, the genetic mechanisms influencing gene expression are well elucidated; however, it is unclear how applicable these results are to gene expression on a genome-wide level. Studies in model organisms and humans have clearly documented gene expression variation among individuals and shown that a significant proportion of this variation has a genetic basis. Recent studies combine microarray surveys of gene expression for thousands of genes with dense marker maps, and are beginning to identify regions in the human genome that have functional effects on gene expression. This paper reviews recent developments and methodologies in this field, and discusses implications and future directions of this research in the context of understanding the influence of human genomic variation on the regulation of gene expression.
Collapse
Affiliation(s)
- Barbara E Stranger
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Emmanouil T Dermitzakis
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
49
|
Walker SR, Nelson EA, Frank DA. STAT5 represses BCL6 expression by binding to a regulatory region frequently mutated in lymphomas. Oncogene 2006; 26:224-33. [PMID: 16819511 DOI: 10.1038/sj.onc.1209775] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Deregulated expression of BCL6 is a pathogenic event in many lymphomas. BCL6 blocks cellular differentiation by repressing transcription of its target genes, and this may promote tumorigenesis. Conversely, the transcription factor signal transducers and activators of transcription (STAT)5 promotes differentiation in many systems. STAT5 upregulates a number of genes repressed by BCL6, raising the possibility that STAT5 and BCL6 have opposing roles in transcriptional regulation. Therefore, we sought to determine the effects of STAT5 activation on BCL6 expression and function. We found that activation of STAT5 downregulates BCL6 expression in B-lymphoma cells and other hematopoietic cell lines. We identified two potential STAT-binding regions in the first exon and first intron of BCL6 that fell within regions of high inter-species homology, suggesting conservation of regulatory function. STAT5 can bind inducibly and regulate transcription at one of these regions, identifying BCL6 as a STAT5 target gene. Additionally, STAT5-mediated downregulation of BCL6 results in loss of BCL6 repression of its target genes, confirming that STAT5 is a negative regulator of BCL6 function. The STAT5 responsive region of the BCL6 gene is mutated frequently in B-cell lymphomas, suggesting that loss of the repressive effects of STAT5 on BCL6 might contribute to the pathogenesis of these cancers.
Collapse
Affiliation(s)
- S R Walker
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA
| | | | | |
Collapse
|
50
|
Gerlach F, Avivi A, Joel A, Burmester T, Nevo E, Hankeln T. Genomic Organization and Molecular Evolution of the Genes for Neuroglobin and Cytoglobin in the Hypoxiatolerant Israeli Mole Rat, Spalax Carmeli. Isr J Ecol Evol 2006. [DOI: 10.1560/ijee_52_3-4_389] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The genes for the two respiratory proteins neuroglobin (Ngb) and cytoglobin (Cygb) in the subterranean Israeli mole rat Spalax carmeli have been sequenced and compared to other mammals including human. Coding regions of both Spalax genes are highly conserved on the nucleotide and amino acid level. The ratios of non-synonymous to synonymous nucleotide substitutions suggest strong purifying selection acting on Ngb and Cygb in all mammals. Thus, there appears to be no special sequence level adaptation in the two respiratory proteins within the hypoxia-tolerant mole rat. On the genomic level, Spalax Ngb and Cygb gene regions revealed the conserved 4-exon-3-intron structure and conserved CpG-rich islands in the 5' region. The Spalax Cygb gene promoter contains a conserved hypoxia-responsive transcription factor binding site, indicating a possible up-regulation of Cygb under oxygen deprivation. In Cygb intron 1, we observed a stretch of highly conserved putatively non-coding sequence of yet unknown (regulatory?) importance. In the Spalax Ngb gene, we note the presence of candidate hypoxia-responsive elements, which are not conserved in Ngb of hypoxia-sensitive mammals. Both globin gene regions harbor Spalax-specific simple sequence regions, which might be of adaptive value. We conclude that adaptations for hypoxia in mole rats are most likely to be found in regulatory functions rather than in protein structure.
Collapse
Affiliation(s)
- Frank Gerlach
- Institute of Molecular Genetics, Johannes Gutenberg-University Mainz, J.-J. Becherweg 32
- Biocenter Grindel, University of Hamburg, Martin-Luther-King-Platz 3
| | | | - Alma Joel
- Institute of Evolution, University of Haifa
| | | | | | - Thomas Hankeln
- Institute of Molecular Genetics, Johannes Gutenberg-University Mainz,
J.-J. Becherweg 32
| |
Collapse
|