1
|
Yu W, Chakravarthi VP, Borosha S, Dilower I, Lee EB, Ratri A, Starks RR, Fields PE, Wolfe MW, Faruque MO, Tuteja G, Rumi MAK. Transcriptional regulation of Satb1 in mouse trophoblast stem cells. Front Cell Dev Biol 2022; 10:918235. [PMID: 36589740 PMCID: PMC9795202 DOI: 10.3389/fcell.2022.918235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 11/18/2022] [Indexed: 12/15/2022] Open
Abstract
SATB homeobox proteins are important regulators of developmental gene expression. Among the stem cell lineages that emerge during early embryonic development, trophoblast stem (TS) cells exhibit robust SATB expression. Both SATB1 and SATB2 act to maintain the trophoblast stem-state. However, the molecular mechanisms that regulate TS-specific Satb expression are not yet known. We identified Satb1 variant 2 as the predominant transcript in trophoblasts. Histone marks, and RNA polymerase II occupancy in TS cells indicated an active state of the promoter. A novel cis-regulatory region with active histone marks was identified ∼21 kbp upstream of the variant 2 promoter. CRISPR/Cas9 mediated disruption of this sequence decreased Satb1 expression in TS cells and chromosome conformation capture analysis confirmed looping of this distant regulatory region into the proximal promoter. Scanning position weight matrices across the enhancer predicted two ELF5 binding sites in close proximity to SATB1 sites, which were confirmed by chromatin immunoprecipitation. Knockdown of ELF5 downregulated Satb1 expression in TS cells and overexpression of ELF5 increased the enhancer-reporter activity. Interestingly, ELF5 interacts with SATB1 in TS cells, and the enhancer activity was upregulated following SATB overexpression. Our findings indicate that trophoblast-specific Satb1 expression is regulated by long-range chromatin looping of an enhancer that interacts with ELF5 and SATB proteins.
Collapse
Affiliation(s)
- Wei Yu
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - V. Praveen Chakravarthi
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Shaon Borosha
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Iman Dilower
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Eun Bee Lee
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Anamika Ratri
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Rebekah R. Starks
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Patrick E. Fields
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Michael W. Wolfe
- Department of Cell Biology and Physiology, University of Kansas Medical Center, Kansas City, KS, United States
| | - M. Omar Faruque
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Geetu Tuteja
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - M. A. Karim Rumi
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States,*Correspondence: M. A. Karim Rumi,
| |
Collapse
|
2
|
WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 2022; 18:e1010378. [PMID: 36040971 PMCID: PMC9426921 DOI: 10.1371/journal.pcbi.1010378] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 07/11/2022] [Indexed: 11/19/2022] Open
Abstract
We present WhichTF, a computational method to identify functionally important transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF applies an ontology-guided functional approach to compute novel enrichment by integrating accessibility measurements, high-confidence pre-computed conservation-aware TF binding sites, and putative gene-regulatory models. Comparison with prior sheer abundance-based methods reveals the unique ability of WhichTF to identify context-specific TFs with functional relevance, including NF-κB family members in lymphocytes and GATA factors in cardiac cells. To distinguish the transcriptional regulatory landscape in closely related samples, we apply differential analysis and demonstrate its utility in lymphocyte, mesoderm developmental, and disease cells. We find suggestive, under-characterized TFs, such as RUNX3 in mesoderm development and GLI1 in systemic lupus erythematosus. We also find TFs known for stress response, suggesting routine experimental caveats that warrant careful consideration. WhichTF yields biological insight into known and novel molecular mechanisms of TF-mediated transcriptional regulation in diverse contexts, including human and mouse cell types, cell fate trajectories, and disease-associated cells. Transcription factors (TFs), a class of DNA binding proteins, regulate tissue- and cell-type-specific expression of genes. Identifying the critical TFs in a given cellular context leads to investigating molecular regulatory mechanisms in development, differentiation, and disease. Because there are more than 1,500 human TFs, experimental measurements of genome-wide occupancy across all TFs have been challenging. While computational approaches play pivotal roles, most existing methods rely on statistical enrichment, focusing either on sequence motif similarity recognized by TFs or the similarity of the genomic region of interest with the previously characterized TF occupancy profile. Here we propose WhichTF as an alternative, incorporating curated biomedical knowledge from ontology and integrating it with the high-confidence prediction of conserved TF binding sites in user-provided genomic regions of interest. We develop a new WhichTF score to rank TFs and demonstrate its applicability across human and mouse cell types, cellular differentiation trajectories, and disease-associated cells.
Collapse
|
3
|
Heller IS, Guenther CA, Meireles AM, Talbot WS, Kingsley DM. Characterization of mouse Bmp5 regulatory injury element in zebrafish wound models. Bone 2022; 155:116263. [PMID: 34826632 PMCID: PMC9007314 DOI: 10.1016/j.bone.2021.116263] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 11/21/2022]
Abstract
Many key signaling molecules used to build tissues during embryonic development are re-activated at injury sites to stimulate tissue regeneration and repair. Bone morphogenetic proteins provide a classic example, but the mechanisms that lead to reactivation of BMPs following injury are still unknown. Previous studies have mapped a large "injury response element" (IRE) in the mouse Bmp5 gene that drives gene expression following bone fractures and other types of injury. Here we show that the large mouse IRE region is also activated in both zebrafish tail resection and mechanosensory hair cell injury models. Using the ability to test multiple constructs and image temporal and spatial dynamics following injury responses, we have narrowed the original size of the mouse IRE region by over 100 fold and identified a small 142 bp minimal enhancer that is rapidly induced in both mesenchymal and epithelial tissues after injury. These studies identify a small sequence that responds to evolutionarily conserved local signals in wounded tissues and suggest candidate pathways that contribute to BMP reactivation after injury.
Collapse
Affiliation(s)
- Ian S Heller
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - Catherine A Guenther
- Department of Developmental Biology, Stanford University School of Medicine, United States of America; Howard Hughes Medical Institute, Stanford University School of Medicine, United States of America
| | - Ana M Meireles
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - William S Talbot
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - David M Kingsley
- Department of Developmental Biology, Stanford University School of Medicine, United States of America; Howard Hughes Medical Institute, Stanford University School of Medicine, United States of America.
| |
Collapse
|
4
|
Starks RR, Abu Alhasan R, Kaur H, Pennington KA, Schulz LC, Tuteja G. Transcription Factor PLAGL1 Is Associated with Angiogenic Gene Expression in the Placenta. Int J Mol Sci 2020; 21:ijms21218317. [PMID: 33171905 PMCID: PMC7664191 DOI: 10.3390/ijms21218317] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 10/29/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
During pregnancy, the placenta is important for transporting nutrients and waste between the maternal and fetal blood supply, secreting hormones, and serving as a protective barrier. To better understand placental development, we must understand how placental gene expression is regulated. We used RNA-seq data and ChIP-seq data for the enhancer associated mark, H3k27ac, to study gene regulation in the mouse placenta at embryonic day (e) 9.5, when the placenta is developing a complex network of blood vessels. We identified several upregulated transcription factors with enriched binding sites in e9.5-specific enhancers. The most enriched transcription factor, PLAGL1 had a predicted motif in 233 regions that were significantly associated with vasculature development and response to insulin stimulus genes. We then performed several experiments using mouse placenta and a human trophoblast cell line to understand the role of PLAGL1 in placental development. In the mouse placenta, Plagl1 is expressed in endothelial cells of the labyrinth layer and is differentially expressed in placentas from mice with gestational diabetes compared to placentas from control mice in a sex-specific manner. In human trophoblast cells, siRNA knockdown significantly decreased expression of genes associated with placental vasculature development terms. In a tube assay, decreased PLAGL1 expression led to reduced cord formation. These results suggest that Plagl1 regulates overlapping gene networks in placental trophoblast and endothelial cells, and may play a critical role in placental development in normal and complicated pregnancies.
Collapse
Affiliation(s)
- Rebekah R. Starks
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA; (R.R.S.); (R.A.A.); (H.K.)
- Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Rabab Abu Alhasan
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA; (R.R.S.); (R.A.A.); (H.K.)
| | - Haninder Kaur
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA; (R.R.S.); (R.A.A.); (H.K.)
| | | | - Laura C. Schulz
- Obstetrics, Gynecology and Women’s Health, University of Missouri, Columba, MO 65212, USA;
| | - Geetu Tuteja
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA; (R.R.S.); (R.A.A.); (H.K.)
- Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
- Correspondence:
| |
Collapse
|
5
|
Marcovitz A, Turakhia Y, Chen HI, Gloudemans M, Braun BA, Wang H, Bejerano G. A functional enrichment test for molecular convergent evolution finds a clear protein-coding signal in echolocating bats and whales. Proc Natl Acad Sci U S A 2019; 116:21094-21103. [PMID: 31570615 PMCID: PMC6800341 DOI: 10.1073/pnas.1818532116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Distantly related species entering similar biological niches often adapt by evolving similar morphological and physiological characters. How much genomic molecular convergence (particularly of highly constrained coding sequence) contributes to convergent phenotypic evolution, such as echolocation in bats and whales, is a long-standing fundamental question. Like others, we find that convergent amino acid substitutions are not more abundant in echolocating mammals compared to their outgroups. However, we also ask a more informative question about the genomic distribution of convergent substitutions by devising a test to determine which, if any, of more than 4,000 tissue-affecting gene sets is most statistically enriched with convergent substitutions. We find that the gene set most overrepresented (q-value = 2.2e-3) with convergent substitutions in echolocators, affecting 18 genes, regulates development of the cochlear ganglion, a structure with empirically supported relevance to echolocation. Conversely, when comparing to nonecholocating outgroups, no significant gene set enrichment exists. For aquatic and high-altitude mammals, our analysis highlights 15 and 16 genes from the gene sets most affected by molecular convergence which regulate skin and lung physiology, respectively. Importantly, our test requires that the most convergence-enriched set cannot also be enriched for divergent substitutions, such as in the pattern produced by inactivated vision genes in subterranean mammals. Showing a clear role for adaptive protein-coding molecular convergence, we discover nearly 2,600 convergent positions, highlight 77 of them in 3 organs, and provide code to investigate other clades across the tree of life.
Collapse
Affiliation(s)
- Amir Marcovitz
- Department of Developmental Biology, Stanford University, Stanford, CA 94305
| | - Yatish Turakhia
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305
| | - Heidi I Chen
- Department of Developmental Biology, Stanford University, Stanford, CA 94305
| | | | - Benjamin A Braun
- Department of Computer Science, Stanford University, Stanford, CA 94305
| | - Haoqing Wang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305
| | - Gill Bejerano
- Department of Developmental Biology, Stanford University, Stanford, CA 94305;
- Department of Computer Science, Stanford University, Stanford, CA 94305
- Department of Pediatrics, Stanford University, Stanford, CA 94305
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
6
|
Madelaine R, Notwell JH, Skariah G, Halluin C, Chen CC, Bejerano G, Mourrain P. A screen for deeply conserved non-coding GWAS SNPs uncovers a MIR-9-2 functional mutation associated to retinal vasculature defects in human. Nucleic Acids Res 2019. [PMID: 29518216 PMCID: PMC5909433 DOI: 10.1093/nar/gky166] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Thousands of human disease-associated single nucleotide polymorphisms (SNPs) lie in the non-coding genome, but only a handful have been demonstrated to affect gene expression and human biology. We computationally identified risk-associated SNPs in deeply conserved non-exonic elements (CNEs) potentially contributing to 45 human diseases. We further demonstrated that human CNE1/rs17421627 associated with retinal vasculature defects showed transcriptional activity in the zebrafish retina, while introducing the risk-associated allele completely abolished CNE1 enhancer activity. Furthermore, deletion of CNE1 led to retinal vasculature defects and to a specific downregulation of microRNA-9, rather than MEF2C as predicted by the original genome-wide association studies. Consistent with these results, miR-9 depletion affects retinal vasculature formation, demonstrating MIR-9-2 as a critical gene underpinning the associated trait. Importantly, we validated that other CNEs act as transcriptional enhancers that can be disrupted by conserved non-coding SNPs. This study uncovers disease-associated non-coding mutations that are deeply conserved, providing a path for in vivo testing to reveal their cis-regulated genes and biological roles.
Collapse
Affiliation(s)
- Romain Madelaine
- Department of Psychiatry and Behavioral Sciences, Stanford Center for Sleep Sciences and Medicine, Stanford, CA 94305, USA
| | | | - Gemini Skariah
- Department of Psychiatry and Behavioral Sciences, Stanford Center for Sleep Sciences and Medicine, Stanford, CA 94305, USA
| | - Caroline Halluin
- Department of Psychiatry and Behavioral Sciences, Stanford Center for Sleep Sciences and Medicine, Stanford, CA 94305, USA
| | | | - Gill Bejerano
- Department of Computer Science, Stanford, CA 94305, USA.,Department of Developmental Biology, Stanford, CA 94305, USA.,Division of Medical Genetics, Department of Pediatrics, Stanford, CA 94305, USA
| | - Philippe Mourrain
- Department of Psychiatry and Behavioral Sciences, Stanford Center for Sleep Sciences and Medicine, Stanford, CA 94305, USA.,INSERM 1024, Ecole Normale Supérieure Paris, 75005, France
| |
Collapse
|
7
|
Berger MJ, Wenger AM, Guturu H, Bejerano G. Independent erosion of conserved transcription factor binding sites points to shared hindlimb, vision and external testes loss in different mammals. Nucleic Acids Res 2019; 46:9299-9308. [PMID: 30137416 PMCID: PMC6182171 DOI: 10.1093/nar/gky741] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 08/21/2018] [Indexed: 02/05/2023] Open
Abstract
Genetic variation in cis-regulatory elements is thought to be a major driving force in morphological and physiological changes. However, identifying transcription factor binding events that code for complex traits remains a challenge, motivating novel means of detecting putatively important binding events. Using a curated set of 1154 high-quality transcription factor motifs, we demonstrate that independently eroded binding sites are enriched for independently lost traits in three distinct pairs of placental mammals. We show that these independently eroded events pinpoint the loss of hindlimbs in dolphin and manatee, degradation of vision in naked mole-rat and star-nosed mole, and the loss of external testes in white rhinoceros and Weddell seal. We additionally show that our method may also be utilized with more than two species. Our study exhibits a novel methodology to detect cis-regulatory mutations which help explain a portion of the molecular mechanism underlying complex trait formation and loss.
Collapse
Affiliation(s)
- Mark J Berger
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA
| | - Aaron M Wenger
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305-5008, USA
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, CA 94305-5329, USA.,Department of Developmental Biology, Stanford University, Stanford, CA 94305-5329, USA.,Department of Pediatrics, Stanford University, Stanford, CA 94305-5208, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305-5464, USA
| |
Collapse
|
8
|
Starks RR, Biswas A, Jain A, Tuteja G. Combined analysis of dissimilar promoter accessibility and gene expression profiles identifies tissue-specific genes and actively repressed networks. Epigenetics Chromatin 2019; 12:16. [PMID: 30795793 PMCID: PMC6385419 DOI: 10.1186/s13072-019-0260-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/12/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The assay for transposase-accessible chromatin (ATAC-seq) is a powerful method to examine chromatin accessibility. While many studies have reported a positive correlation between gene expression and promoter accessibility, few have investigated the genes that deviate from this trend. In this study, we aimed to understand the relationship between gene expression and promoter accessibility in multiple cell types while also identifying gene regulatory networks in the placenta, an understudied organ that is critical for a successful pregnancy. RESULTS We started by assaying the open chromatin landscape in the mid-gestation placenta, when the fetal vasculature has started developing. After incorporating transcriptomic data generated in the placenta at the same time point, we grouped genes based on their expression levels and ATAC-seq promoter coverage. We found that the genes with the strongest correlation (high expression and high coverage) are likely involved in housekeeping functions, whereas tissue-specific genes were highly expressed and had only medium-low coverage. We also predicted that genes with medium-low expression and high promoter coverage were actively repressed. Within this group, we extracted a protein-protein interaction network enriched for neuronal functions, likely preventing the cells from adopting a neuronal fate. We further confirmed that a repressive histone mark is bound to the promoters of genes in this network. Finally, we ran our pipeline using ATAC-seq and RNA-seq data generated in ten additional cell types. We again found that genes with the strongest correlation are enriched for housekeeping functions and that genes with medium-low promoter coverage and high expression are more likely to be tissue-specific. These results demonstrate that only two data types, both of which require relatively low starting material to generate and are becoming more commonly available, can be integrated to understand multiple aspects of gene regulation. CONCLUSIONS Within the placenta, we identified an active placenta-specific gene network as well as a repressed neuronal network. Beyond the placenta, we demonstrate that ATAC-seq data and RNA-seq data can be integrated to identify tissue-specific genes and actively repressed gene networks in multiple cell types.
Collapse
Affiliation(s)
- Rebekah R. Starks
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
- Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 USA
| | - Anilisa Biswas
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
- Molecular, Cellular, and Developmental Biology, Iowa State University, Ames, IA 50011 USA
| | - Ashish Jain
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
- Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 USA
| | - Geetu Tuteja
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
- Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 USA
- Molecular, Cellular, and Developmental Biology, Iowa State University, Ames, IA 50011 USA
| |
Collapse
|
9
|
Chen HI, Jagadeesh KA, Birgmeier J, Wenger AM, Guturu H, Schelley S, Bernstein JA, Bejerano G. An MTF1 binding site disrupted by a homozygous variant in the promoter of ATP7B likely causes Wilson Disease. Eur J Hum Genet 2018; 26:1810-1818. [PMID: 30087448 PMCID: PMC6244090 DOI: 10.1038/s41431-018-0221-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 05/09/2018] [Accepted: 06/26/2018] [Indexed: 12/16/2022] Open
Abstract
Approximately 2% of the human genome accounts for protein-coding genes, yet most known Mendelian disease-causing variants lie in exons or splice sites. Individuals who symptomatically present with monogenic disorders but do not possess function-altering variants in the protein-coding regions of causative genes may harbor variants in the surrounding gene regulatory domains. We present such a case: a male of Afghani descent was clinically diagnosed with Wilson Disease-a disorder of systemic copper buildup-but was found to have no function-altering coding variants in ATP7B (ENST00000242839.4), the typically causative gene. Our analysis revealed the homozygous variant chr13:g.52,586,149T>C (NC_000013.10, hg19) 676 bp into the ATP7B promoter, which disrupts a metal regulatory transcription factor 1 (MTF1) binding site and diminishes expression of ATP7B in response to copper intake, likely resulting in Wilson Disease. Our approach to identify the causative variant can be generalized to systematically discover function-altering non-coding variants underlying disease and motivates evaluation of gene regulatory variants.
Collapse
Affiliation(s)
- Heidi I Chen
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Karthik A Jagadeesh
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, USA
| | - Johannes Birgmeier
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, USA
| | - Aaron M Wenger
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Harendra Guturu
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Susan Schelley
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Jonathan A Bernstein
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.
| | - Gill Bejerano
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, USA.
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
10
|
Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories. PLoS Comput Biol 2016; 12:e1004711. [PMID: 26845687 PMCID: PMC4742230 DOI: 10.1371/journal.pcbi.1004711] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 12/16/2015] [Indexed: 01/02/2023] Open
Abstract
Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual's variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates "abnormal cardiac output" for a patient with a longstanding family history of heart disease, "decreased circulating sodium level" for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.
Collapse
|
11
|
Chen J, Shishkin AA, Zhu X, Kadri S, Maza I, Guttman M, Hanna JH, Regev A, Garber M. Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol 2016; 17:19. [PMID: 26838501 PMCID: PMC4739325 DOI: 10.1186/s13059-016-0880-9] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/14/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across many species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: identifying truly non-coding genes from de novo reconstructed transcriptomes, and prioritizing the hundreds of resulting putative lncRNAs for downstream experimental interrogation. RESULTS We present slncky, a lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-sequencing data and further uses evolutionary constraint to prioritize lncRNAs that are likely to be functionally important. Our automated filtering pipeline is comparable to manual curation efforts and more sensitive than previously published computational approaches. Furthermore, we developed a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for analyzing sequence and transcript evolution. Our analysis reveals that evolutionary selection acts in several distinct patterns, and uncovers two notable classes of intergenic lncRNAs: one showing strong purifying selection on RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript. CONCLUSION Our results highlight that lncRNAs are not a homogenous class of molecules but rather a mixture of multiple functional classes with distinct biological mechanism and/or roles. Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we make available through the slncky Evolution Browser.
Collapse
Affiliation(s)
- Jenny Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, 02140, USA
| | - Alexander A Shishkin
- Division of Biology and Biological Engineering, California Institute of Technology, Cambridge, MA, 02140, USA
| | - Xiaopeng Zhu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Sabah Kadri
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Itay Maza
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Cambridge, MA, 02140, USA
| | - Jacob H Hanna
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02140, USA
| | - Manuel Garber
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA. .,Program in Molecular Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA.
| |
Collapse
|
12
|
Tuteja G, Chung T, Bejerano G. Changes in the enhancer landscape during early placental development uncover a trophoblast invasion gene-enhancer network. Placenta 2015; 37:45-55. [PMID: 26604129 DOI: 10.1016/j.placenta.2015.11.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 10/21/2015] [Accepted: 11/02/2015] [Indexed: 01/17/2023]
Abstract
INTRODUCTION Trophoblast invasion establishes adequate blood flow between mother and fetus in early placental development. However, little is known about the cis-regulatory mechanisms underlying this important process. We aimed to identify enhancer elements that are active during trophoblast invasion, and build a trophoblast invasion gene-enhancer network. METHODS We carried out ChIP-Seq for an enhancer-associated mark (H3k27Ac) at two time points during early placental development in mouse. One time point when invasion is at its peak (e7.5) and another time point shortly afterwards (e9.5). We use computational analysis to identify putative enhancers, as well as the transcription factor binding sites within them, that are specific to the time point of trophoblast invasion. RESULTS We compared read profiles at e7.5 and e9.5 to identify 1,977 e7.5-specific enhancers. Within a subset of e7.5-specific enhancers, we discovered a cell migration associated regulatory code, consisting of three transcription factor motifs: AP1, Ets, and Tcfap2. To validate differential expression of the transcription factors that bind these motifs, we performed RNA-Seq in the same context. Finally, we integrated these data with publicly available protein-protein interaction data and constructed a trophoblast invasion gene-enhancer network. DISCUSSION The data we generated and analysis we carried out improves our understanding of the regulatory mechanisms of trophoblast invasion, by suggesting a transcriptional code exists in the enhancers of cell migration genes. Furthermore, the network we constructed highlights novel candidate genes that may be critical for trophoblast invasion.
Collapse
Affiliation(s)
- Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Gill Bejerano
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA; Division of Medical Genetics, Department of Pediatrics, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
13
|
Notwell JH, Chung T, Heavner W, Bejerano G. A family of transposable elements co-opted into developmental enhancers in the mouse neocortex. Nat Commun 2015; 6:6644. [PMID: 25806706 PMCID: PMC4438107 DOI: 10.1038/ncomms7644] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 02/13/2015] [Indexed: 12/27/2022] Open
Abstract
The neocortex is a mammalian-specific structure that is responsible for higher functions such as cognition, emotion and perception. To gain insight into its evolution and the gene regulatory codes that pattern it, we studied the overlap of its active developmental enhancers with transposable element (TE) families and compared this overlap to uniformly shuffled enhancers. Here we show a striking enrichment of the MER130 repeat family among active enhancers in the mouse dorsal cerebral wall, which gives rise to the neocortex, at embryonic day 14.5. We show that MER130 instances preserve a common code of transcriptional regulatory logic, function as enhancers and are adjacent to critical neocortical genes. MER130, a nonautonomous interspersed TE, originates in the tetrapod or possibly Sarcopterygii ancestor, which far predates the appearance of the neocortex. Our results show that MER130 elements were recruited, likely through their common regulatory logic, as neocortical enhancers.
Collapse
Affiliation(s)
- James H Notwell
- Department of Computer Science, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA
| | - Whitney Heavner
- 1] Department of Developmental Biology, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA [2] Department of Biology, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA
| | - Gill Bejerano
- 1] Department of Computer Science, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA [2] Department of Developmental Biology, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA [3] Department of Pediatrics, Division of Medical Genetics, Stanford University, 279 Campus Drive West (MC 5329), Beckman Center B-300, Stanford, California 94305-5329, USA
| |
Collapse
|
14
|
Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 2015; 43:3998-4012. [PMID: 25791631 PMCID: PMC4417154 DOI: 10.1093/nar/gkv195] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/24/2015] [Indexed: 11/17/2022] Open
Abstract
Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.
Collapse
Affiliation(s)
- Charles Blatti
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA
| | - Majid Kazemian
- National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Scot Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01655, USA Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Michael Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01655, USA Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA Institute of Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
15
|
Aevermann BD, Pickett BE, Kumar S, Klem EB, Agnihothram S, Askovich PS, Bankhead A, Bolles M, Carter V, Chang J, Clauss TRW, Dash P, Diercks AH, Eisfeld AJ, Ellis A, Fan S, Ferris MT, Gralinski LE, Green RR, Gritsenko MA, Hatta M, Heegel RA, Jacobs JM, Jeng S, Josset L, Kaiser SM, Kelly S, Law GL, Li C, Li J, Long C, Luna ML, Matzke M, McDermott J, Menachery V, Metz TO, Mitchell H, Monroe ME, Navarro G, Neumann G, Podyminogin RL, Purvine SO, Rosenberger CM, Sanders CJ, Schepmoes AA, Shukla AK, Sims A, Sova P, Tam VC, Tchitchek N, Thomas PG, Tilton SC, Totura A, Wang J, Webb-Robertson BJ, Wen J, Weiss JM, Yang F, Yount B, Zhang Q, McWeeney S, Smith RD, Waters KM, Kawaoka Y, Baric R, Aderem A, Katze MG, Scheuermann RH. A comprehensive collection of systems biology data characterizing the host response to viral infection. Sci Data 2014; 1:140033. [PMID: 25977790 PMCID: PMC4410982 DOI: 10.1038/sdata.2014.33] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 08/15/2014] [Indexed: 12/13/2022] Open
Abstract
The Systems Biology for Infectious Diseases Research program was established by
the U.S. National Institute of Allergy and Infectious Diseases to investigate
host-pathogen interactions at a systems level. This program generated 47
transcriptomic and proteomic datasets from 30 studies that investigate
in vivo and in vitro host responses to
viral infections. Human pathogens in the Orthomyxoviridae and
Coronaviridae families, especially pandemic H1N1 and avian
H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus
(SARS-CoV), were investigated. Study validation was demonstrated via
experimental quality control measures and meta-analysis of independent
experiments performed under similar conditions. Primary assay results are
archived at the GEO and PeptideAtlas public repositories, while processed
statistical results together with standardized metadata are publically available
at the Influenza Research Database (www.fludb.org) and the Virus Pathogen
Resource (www.viprbrc.org). By comparing data from mutant versus wild-type
virus and host strains, RNA versus protein differential expression, and
infection with genetically similar strains, these data can be used to further
investigate genetic and physiological determinants of host responses to viral
infection.
Collapse
Affiliation(s)
| | | | - Sanjeev Kumar
- Northrop Grumman Information Systems, Health IT , Rockville, MD 20850, USA
| | - Edward B Klem
- Northrop Grumman Information Systems, Health IT , Rockville, MD 20850, USA
| | - Sudhakar Agnihothram
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA
| | | | - Armand Bankhead
- Oregon Clinical & Translational Research Institute , Portland, Oregon 97239-3098, USA ; Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health Sciences University , Portland, Oregon 97239-3098, USA
| | - Meagen Bolles
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill , Chapel Hill, North Carolina 27599-7290, USA
| | - Victoria Carter
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Jean Chang
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Therese R W Clauss
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Pradyot Dash
- Department of Immunology, St. Jude Children's Research Hospital , Memphis, TN 38105-3678, USA
| | - Alan H Diercks
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Amie J Eisfeld
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | - Amy Ellis
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | - Shufang Fan
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | - Martin T Ferris
- Department of Genetics, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7264, USA
| | - Lisa E Gralinski
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill , Chapel Hill, North Carolina 27599-7290, USA
| | - Richard R Green
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Masato Hatta
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | - Robert A Heegel
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Jon M Jacobs
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Sophia Jeng
- Oregon Clinical & Translational Research Institute , Portland, Oregon 97239-3098, USA
| | - Laurence Josset
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Shari M Kaiser
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Sara Kelly
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - G Lynn Law
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Chengjun Li
- Division of Animal influenza, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences , Harbin, Heilongjiang Province 150001, China
| | - Jiangning Li
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Casey Long
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA
| | - Maria L Luna
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Melissa Matzke
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Jason McDermott
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Vineet Menachery
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Hugh Mitchell
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Garnet Navarro
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Gabriele Neumann
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | | | - Samuel O Purvine
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory , Richland, WA 99354, USA
| | | | - Catherine J Sanders
- Department of Immunology, St. Jude Children's Research Hospital , Memphis, TN 38105-3678, USA
| | - Athena A Schepmoes
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Anil K Shukla
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Amy Sims
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA
| | - Pavel Sova
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Vincent C Tam
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Nicolas Tchitchek
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Paul G Thomas
- Department of Immunology, St. Jude Children's Research Hospital , Memphis, TN 38105-3678, USA
| | - Susan C Tilton
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Allison Totura
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill , Chapel Hill, North Carolina 27599-7290, USA
| | - Jing Wang
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | | | - Ji Wen
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Jeffrey M Weiss
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA
| | - Feng Yang
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Boyd Yount
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA
| | - Qibin Zhang
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Shannon McWeeney
- Oregon Clinical & Translational Research Institute , Portland, Oregon 97239-3098, USA ; Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health Sciences University , Portland, Oregon 97239-3098, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Katrina M Waters
- Biological Sciences Division, Pacific Northwest National Laboratory , Richland, WA 99352, USA
| | - Yoshihiro Kawaoka
- School of Veterinary Medicine, Department of Pathobiological Sciences, Influenza Research Institute, University of Wisconsin-Madison , Madison, WI 53706, USA
| | - Ralph Baric
- Department of Epidemiology, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7400, USA ; Department of Microbiology and Immunology, University of North Carolina at Chapel Hill , Chapel Hill, North Carolina 27599-7290, USA
| | - Alan Aderem
- Seattle Biomedical Research Institute , Seattle, WA 98109, USA
| | - Michael G Katze
- Department of Microbiology, University of Washington , Seattle, WA 98195, USA ; Washington National Primate Research Center, University of Washington , Seattle, WA 98195, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute , La Jolla, CA 92037, USA ; Department of Pathology, University of California , San Diego, CA 92093, USA
| |
Collapse
|
16
|
Abstract
The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. Address: http://veda.cs.uiuc.edu/MET/.
Collapse
Affiliation(s)
- Charles Blatti
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
17
|
Tuteja G, Moreira KB, Chung T, Chen J, Wenger AM, Bejerano G. Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data. PLoS Comput Biol 2014; 10:e1003449. [PMID: 24499934 PMCID: PMC3907286 DOI: 10.1371/journal.pcbi.1003449] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Accepted: 12/09/2013] [Indexed: 12/01/2022] Open
Abstract
Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs) likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86%) tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87%) tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver. Enhancers are distal gene regulatory elements that can activate tissue- and time-point specific gene expression. Identification of active enhancers is challenging, and is the subject of intense investigation. We developed an automated computational framework to predict transcription factors (TFs) and enhancers that target a tissue of interest by combining two growing resources: TF binding motifs and target gene function annotations. We applied our framework to the placenta, and confirmed our enhancer predictions are more active in placental cell types than others. To demonstrate generalizability, we applied our approach to 5 additional tissues. The combination of experimental sampling with computational prediction approaches will aid in the identification of those enhancers that are most likely active in a particular tissue, as well as the characterization of groups of TFs associated with these enhancers.
Collapse
Affiliation(s)
- Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Karen Betancourt Moreira
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Jenny Chen
- Biomedical Informatics Program, Stanford University, Stanford, California, United States of America
| | - Aaron M. Wenger
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
18
|
Guturu H, Doxey AC, Wenger AM, Bejerano G. Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130029. [PMID: 24218641 DOI: 10.1098/rstb.2013.0029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.
Collapse
Affiliation(s)
- Harendra Guturu
- Department of Electrical Engineering, Stanford University, , Stanford, CA 94305, USA
| | | | | | | |
Collapse
|
19
|
Abstract
When the human genome project started, the major challenge was how to sequence a 3 billion letter code in an organized and cost-effective manner. When completed, the project had laid the foundation for a huge variety of biomedical fields through the production of a complete human genome sequence, but also had driven the development of laboratory and analytical methods that could produce large amounts of sequencing data cheaply. These technological developments made possible the sequencing of many more vertebrate genomes, which have been necessary for the interpretation of the human genome. They have also enabled large-scale studies of vertebrate genome evolution, as well as comparative and human medicine. In this review, we give examples of evolutionary analysis using a wide variety of time frames—from the comparison of populations within a species to the comparison of species separated by at least 300 million years. Furthermore, we anticipate discoveries related to evolutionary mechanisms, adaptation, and disease to quickly accelerate in the coming years.
Collapse
Affiliation(s)
- Jessica Alföldi
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | |
Collapse
|
20
|
Wenger AM, Clarke SL, Notwell JH, Chung T, Tuteja G, Guturu H, Schaar BT, Bejerano G. The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS Genet 2013; 9:e1003728. [PMID: 24009522 PMCID: PMC3757057 DOI: 10.1371/journal.pgen.1003728] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 07/03/2013] [Indexed: 11/18/2022] Open
Abstract
Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
Collapse
Affiliation(s)
- Aaron M. Wenger
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Shoa L. Clarke
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - James H. Notwell
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, California, United States of America
| | - Bruce T. Schaar
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G. Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic Acids Res 2013; 41:e151. [PMID: 23814184 PMCID: PMC3753653 DOI: 10.1093/nar/gkt557] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.
Collapse
Affiliation(s)
- Michael Hiller
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | | | | | | | | | | | | |
Collapse
|
22
|
Ishibashi M, Mechaly AS, Becker TS, Rinkwitz S. Using zebrafish transgenesis to test human genomic sequences for specific enhancer activity. Methods 2013; 62:216-25. [PMID: 23542551 DOI: 10.1016/j.ymeth.2013.03.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 03/15/2013] [Accepted: 03/19/2013] [Indexed: 01/09/2023] Open
Abstract
We detail an approach for the identification of human tissue-specific transcriptional enhancers involving three steps: delineation of search space around a locus or target gene, in silico identification and size definition of putative candidate sequences, and testing through several independent genomic insertions in a transgenic zebrafish reporter assay. Candidate sequences are defined through evolutionary conservation, transcription factor binding and chromatin marks (e.g. ENCODE data) and are amplified from genomic DNA, cloned into basal promoter:fluorescent protein reporter vectors based on the Tol2 transposon system and are microinjected into fertilized zebrafish eggs. After raising injected founders to sexual maturity, fluorescent screening identifies positive founder fish whose offspring undergo a detailed expression analysis to determine tissue specificity and reproducibility of specific enhancers.
Collapse
Affiliation(s)
- Minaka Ishibashi
- Brain and Mind Research Institute, Sydney Medical School, University of Sydney, 100 Mallet Street, Camperdown 2050, Australia
| | | | | | | |
Collapse
|