1
|
Benner P, Vingron M. Quantifying the tissue-specific regulatory information within enhancer DNA sequences. NAR Genom Bioinform 2021; 3:lqab095. [PMID: 34729474 PMCID: PMC8557370 DOI: 10.1093/nargab/lqab095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 09/23/2021] [Accepted: 09/28/2021] [Indexed: 12/04/2022] Open
Abstract
Recent efforts to measure epigenetic marks across a wide variety of different cell types and tissues provide insights into the cell type-specific regulatory landscape. We use these data to study whether there exists a correlate of epigenetic signals in the DNA sequence of enhancers and explore with computational methods to what degree such sequence patterns can be used to predict cell type-specific regulatory activity. By constructing classifiers that predict in which tissues enhancers are active, we are able to identify sequence features that might be recognized by the cell in order to regulate gene expression. While classification performances vary greatly between tissues, we show examples where our classifiers correctly predict tissue-specific regulation from sequence alone. We also show that many of the informative patterns indeed harbor transcription factor footprints.
Collapse
Affiliation(s)
- Philipp Benner
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| |
Collapse
|
2
|
NF-Y Subunits Overexpression in HNSCC. Cancers (Basel) 2021; 13:cancers13123019. [PMID: 34208636 PMCID: PMC8234210 DOI: 10.3390/cancers13123019] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/31/2021] [Accepted: 06/06/2021] [Indexed: 12/14/2022] Open
Abstract
Simple Summary Cancer cells have altered gene expression profiles. This is ultimately elicited by altered structure, expression or binding of transcription factors to regulatory regions of genomes. The CCAAT-binding trimer is a pioneer transcription factor involved in the activation of “cancer” genes. We and others have shown that the regulatory NF-YA subunit is overexpressed in epithelial cancers. Here, we examined large datasets of bulk gene expression profiles, as well as single-cell data, in head and neck squamous cell carcinomas by bioinformatic methods. We partitioned tumors according to molecular subtypes, mutations and positivity for HPV. We came to the conclusion that high levels of the histone-like subunits and the “short” NF-YAs isoform are protective in HPV-positive tumors. On the other hand, high levels of the “long” NF-YAl were found in the recently identified aggressive and metastasis-prone cell population undergoing partial epithelial to mesenchymal transition, p-EMT. Abstract NF-Y is the CCAAT-binding trimer formed by the histone fold domain (HFD), NF-YB/NF-YC and NF-YA. The CCAAT box is generally prevalent in promoters of “cancer” genes. We reported the overexpression of NF-YA in BRCA, LUAD and LUSC, and of all subunits in HCC. Altered splicing of NF-YA was found in breast and lung cancer. We analyzed RNA-seq datasets of TCGA and cell lines of head and neck squamous cell carcinomas (HNSCC). We partitioned all TCGA data into four subtypes, deconvoluted single-cell RNA-seq of tumors and derived survival curves. The CCAAT box was enriched in the promoters of overexpressed genes. The “short” NF-YAs was overexpressed in all subtypes and the “long” NF-YAl in Mesenchymal. The HFD subunits are overexpressed, except Basal (NF-YB) and Atypical (NF-YC); NF-YAl is increased in p53 mutated tumors. In HPV-positive tumors, high levels of NF-YAs, p16 and ΔNp63 correlate with better prognosis. Deconvolution of single cell RNA-seq (scRNA-seq) found a correlation of NF-YAl with Cancer Associated Fibroblasts (CAFs) and p-EMT cells, a population endowed with metastatic potential. We conclude that overexpression of HFD subunits and NF-YAs is protective in HPV-positive tumors; expression of NF-YAl is largely confined to mutp53 tumors and malignant p-EMT cells.
Collapse
|
3
|
Bezzecchi E, Ronzio M, Semeghini V, Andrioletti V, Mantovani R, Dolfini D. NF-YA Overexpression in Lung Cancer: LUAD. Genes (Basel) 2020; 11:genes11020198. [PMID: 32075093 PMCID: PMC7074112 DOI: 10.3390/genes11020198] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 02/10/2020] [Indexed: 12/14/2022] Open
Abstract
The trimeric transcription factor (TF) NF-Y regulates the CCAAT box, a DNA element enriched in promoters of genes overexpressed in many types of cancer. The regulatory NF-YA is present in two major isoforms, NF-YAl ("long") and NF-YAs ("short"). There is growing indication that NF-YA levels are increased in tumors. Here, we report interrogation of RNA-Seq TCGA (The Cancer Genome Atlas)-all 576 samples-and GEO (Gene Expression Ominibus) datasets of lung adenocarcinoma (LUAD). NF-YAs is overexpressed in the three subtypes, proliferative, inflammatory, and TRU (terminal respiratory unit). CCAAT is enriched in promoters of tumor differently expressed genes (DEG) and in the proliferative/inflammatory intersection, matching with KEGG (Kyoto Encyclopedia of Genes and Genomes) terms cell-cycle and signaling. Increasing levels of NF-YAs are observed from low to high CpG island methylator phenotypes (CIMP). We identified 166 genes overexpressed in LUAD cell lines with low NF-YAs/NF-YAl ratios: applying this centroid to TCGA samples faithfully predicted tumors' isoform ratio. This signature lacks CCAAT in promoters. Finally, progression-free intervals and hazard ratios concurred with the worst prognosis of patients with either a low or high NF-YAs/NF-YAl ratio. In conclusion, global overexpression of NF-YAs is documented in LUAD and is associated with aggressive tumor behavior; however, a similar prognosis is recorded in tumors with high levels of NF-YAl and overexpressed CCAAT-less genes.
Collapse
Affiliation(s)
- Eugenia Bezzecchi
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Mirko Ronzio
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Valentina Semeghini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Valentina Andrioletti
- Internal Medicine VIII, University Hospital Tübingen. Otfried-Müller-Str. 14, 72076 Tübingen, Germany
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
| | - Diletta Dolfini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italy
- Correspondence: ; Tel.: +39-02-50315005
| |
Collapse
|
4
|
Bezzecchi E, Ronzio M, Dolfini D, Mantovani R. NF-YA Overexpression in Lung Cancer: LUSC. Genes (Basel) 2019; 10:genes10110937. [PMID: 31744190 PMCID: PMC6895822 DOI: 10.3390/genes10110937] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/04/2019] [Accepted: 11/13/2019] [Indexed: 12/12/2022] Open
Abstract
The CCAAT box is recognized by the trimeric transcription factor NF-Y, whose NF-YA subunit is present in two major splicing isoforms, NF-YAl (“long”) and NF-YAs (“short”). Little is known about the expression levels of NF-Y subunits in tumors, and nothing in lung cancer. By interrogating RNA-seq TCGA and GEO datasets, we found that, unlike NF-YB/NF-YC, NF-YAs is overexpressed in lung squamous cell carcinomas (LUSC). The ratio of the two isoforms changes from normal to cancer cells, with NF-YAs becoming predominant in the latter. NF-YA increased expression correlates with common proliferation markers. We partitioned all 501 TCGA LUSC tumors in the four molecular cohorts and verified that NF-YAs is similarly overexpressed. We analyzed global and subtype-specific RNA-seq data and found that CCAAT is the most abundant DNA matrix in promoters of genes overexpressed in all subtypes. Enriched Gene Ontology terms are cell-cycle and signaling. Survival curves indicate a worse clinical outcome for patients with increasing global amounts of NF-YA; same with hazard ratios with very high and, surprisingly, very low NF-YAs/NF-YAl ratios. We then analyzed gene expression in this latter cohort and identified a different, pro-migration signature devoid of CCAAT. We conclude that overexpression of the NF-Y regulatory subunit in LUSC has the scope of increasing CCAAT-dependent, proliferative (NF-YAshigh) or CCAAT-less, pro-migration (NF-YAlhigh) genes. The data further reinstate the importance of analysis of single isoforms of TFs involved in tumor development.
Collapse
|
5
|
Vishnevsky OV, Bocharnikov AV, Kolchanov NA. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets. J Bioinform Comput Biol 2017; 16:1740012. [PMID: 29281953 DOI: 10.1142/s0219720017400121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Collapse
Affiliation(s)
- Oleg V Vishnevsky
- * Institute of Cytology and Genetics SB RAS, Lavrentieva Ave., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova, 10, Novosibirsk 630090, Russia
| | | | - Nikolay A Kolchanov
- * Institute of Cytology and Genetics SB RAS, Lavrentieva Ave., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova, 10, Novosibirsk 630090, Russia
| |
Collapse
|
6
|
Triska M, Ivliev A, Nikolsky Y, Tatarinova TV. Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. Methods Mol Biol 2017; 1613:291-310. [PMID: 28849565 DOI: 10.1007/978-1-4939-7027-8_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.
Collapse
Affiliation(s)
- Martin Triska
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | | | - Yuri Nikolsky
- Prosapia Genetics, Solana Beach, CA, USA.,School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Tatiana V Tatarinova
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA. .,Center for Personalized Medicine, Children's Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA, 90027, USA. .,A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia.
| |
Collapse
|
7
|
Gurtner A, Manni I, Piaggio G. NF-Y in cancer: Impact on cell transformation of a gene essential for proliferation. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:604-616. [PMID: 27939755 DOI: 10.1016/j.bbagrm.2016.12.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 11/30/2016] [Accepted: 12/05/2016] [Indexed: 12/17/2022]
Abstract
NF-Y is a ubiquitous heterotrimeric transcription factor with a binding affinity for the CCAAT consensus motif, one of the most common cis-acting element in the promoter and enhancer regions of eukaryote genes in direct (CCAAT) or reverse (ATTGG) orientation. NF-Y consists of three subunits, NF-YA, the regulatory subunit of the trimer, NF-YB, and NF-YC, all required for CCAAT binding. Growing evidence in cells and animal models support the notion that NF-Y, driving transcription of a plethora of cell cycle regulatory genes, is a key player in the regulation of proliferation. Proper control of cellular growth is critical for cancer prevention and uncontrolled proliferation is a hallmark of cancer cells. Indeed, during cell transformation aberrant molecular pathways disrupt mechanisms controlling proliferation and many growth regulatory genes are altered in tumors. Here, we review bioinformatics, molecular and functional evidence indicating the involvement of the cell cycle regulator NF-Y in cancer-associated pathways. This article is part of a Special Issue entitled: Nuclear Factor Y in Development and Disease, edited by Prof. Roberto Mantovani.
Collapse
Affiliation(s)
- Aymone Gurtner
- Department of Research, Advanced Diagnostics and Technological Innovation, UOSD SAFU, Regina Elena National Cancer Institute, Via Elio Chianesi 53, 00144, Rome, Italy
| | - Isabella Manni
- Department of Research, Advanced Diagnostics and Technological Innovation, UOSD SAFU, Regina Elena National Cancer Institute, Via Elio Chianesi 53, 00144, Rome, Italy
| | - Giulia Piaggio
- Department of Research, Advanced Diagnostics and Technological Innovation, UOSD SAFU, Regina Elena National Cancer Institute, Via Elio Chianesi 53, 00144, Rome, Italy.
| |
Collapse
|
8
|
Zambelli F, Pavesi G. Genome wide features, distribution and correlations of NF-Y binding sites. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:581-589. [PMID: 27769808 DOI: 10.1016/j.bbagrm.2016.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 10/10/2016] [Accepted: 10/17/2016] [Indexed: 12/12/2022]
Abstract
NF-Y is a trimeric transcription factor that binds on DNA the CCAAT-box motif. In this article we reviewed and complemented with additional bioinformatic analysis existing data on genome-wide NF-Y binding characterization in human, reaching the following main conclusions: (1) about half of NF-Y binding sites are located at promoters, about 60-80 base pairs from transcription start sites; NF-Y binding to distal genomic regions takes place at inactive chromatin loci and/or DNA repetitive elements more often than active enhancers; (2) on almost half of its binding sites, regardless of their genomic localization (promoters or distal regions), NF-Y finds on DNA more than one CCAAT-box, and most of those multiple CCAAT binding loci present precise spacing and organization of the elements composing them; (3) there exists a well defined class of transcription factors that show genome-wide co-localization with NF-Y. Some of them lack their canonical binding site in binding regions overlapping with NF-Y, hence hinting at NF-Y mediated recruitment, while others show a precise positioning on DNA of their binding sites with respect to the CCAAT box bound by NF-Y. This article is part of a Special Issue entitled: Nuclear Factor Y in Development and Disease, edited by Prof. Roberto Mantovani.
Collapse
Affiliation(s)
- Federico Zambelli
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy; Istituto di Biomembrane e Bioenergetica, Consiglio Nazionale delle Ricerche, Bari, Via Amendola 165/A, 70126, Italy
| | - Giulio Pavesi
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy.
| |
Collapse
|
9
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
10
|
Reiss DJ, Plaisier CL, Wu WJ, Baliga NS. cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism. Nucleic Acids Res 2015; 43:e87. [PMID: 25873626 PMCID: PMC4513845 DOI: 10.1093/nar/gkv300] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 03/05/2015] [Accepted: 03/26/2015] [Indexed: 12/25/2022] Open
Abstract
The cMonkey integrated biclustering algorithm identifies conditionally co-regulated modules of genes (biclusters). cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation, and optimizes biclusters to be supported simultaneously by one or more of these prior constraints. The algorithm served as the cornerstone for constructing the first global, predictive Environmental Gene Regulatory Influence Network (EGRIN) model for a free-living cell, and has now been applied to many more organisms. However, due to its computational inefficiencies, long run-time and complexity of various input data types, cMonkey was not readily usable by the wider community. To address these primary concerns, we have significantly updated the cMonkey algorithm and refactored its implementation, improving its usability and extendibility. These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation. We show, via three separate analyses of data for E. coli, M. tuberculosis and H. sapiens, that the updated algorithm and inclusion of novel scoring functions for new data types (e.g. ChIP-seq and transcription factor over-expression [TFOE]) improve discovery of biologically informative co-regulated modules. The complete cMonkey2 software package, including source code, is available at https://github.com/baliga-lab/cmonkey2.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | | | - Wei-Ju Wu
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA Department of Microbiology, University of Washington, Seattle, WA 98103, USA
| |
Collapse
|
11
|
Bobbs A, Gellerman K, Hallas WM, Joseph S, Yang C, Kurkewich J, Cowden Dahl KD. ARID3B Directly Regulates Ovarian Cancer Promoting Genes. PLoS One 2015; 10:e0131961. [PMID: 26121572 PMCID: PMC4486168 DOI: 10.1371/journal.pone.0131961] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 06/08/2015] [Indexed: 01/22/2023] Open
Abstract
The DNA-binding protein AT-Rich Interactive Domain 3B (ARID3B) is elevated in ovarian cancer and increases tumor growth in a xenograft model of ovarian cancer. However, relatively little is known about ARID3B's function. In this study we perform the first genome wide screen for ARID3B direct target genes and ARID3B regulated pathways. We identified and confirmed numerous ARID3B target genes by chromatin immunoprecipitation (ChIP) followed by microarray and quantitative RT-PCR. Using motif-finding algorithms, we characterized a binding site for ARID3B, which is similar to the previously known site for the ARID3B paralogue ARID3A. Functionality of this predicted site was demonstrated by ChIP analysis. We next demonstrated that ARID3B induces expression of its targets in ovarian cancer cell lines. We validated that ARID3B binds to an epidermal growth factor receptor (EGFR) enhancer and increases mRNA expression. ARID3B also binds to the promoter of Wnt5A and its receptor FZD5. FZD5 is highly expressed in ovarian cancer cell lines, and is upregulated by exogenous ARID3B. Both ARID3B and FZD5 expression increase adhesion to extracellular matrix (ECM) components including collagen IV, fibronectin and vitronectin. ARID3B-increased adhesion to collagens II and IV require FZD5. This study directly demonstrates that ARID3B binds target genes in a sequence-specific manner, resulting in increased gene expression. Furthermore, our data indicate that ARID3B regulation of direct target genes in the Wnt pathway promotes adhesion of ovarian cancer cells.
Collapse
Affiliation(s)
- Alexander Bobbs
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine-South Bend, South Bend, Indiana, United States of America
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
| | - Katrina Gellerman
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - William Morgan Hallas
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Stancy Joseph
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine-South Bend, South Bend, Indiana, United States of America
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
| | - Chao Yang
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Jeffrey Kurkewich
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
- Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Karen D. Cowden Dahl
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine-South Bend, South Bend, Indiana, United States of America
- Harper Cancer Research Institute, South Bend, Indiana, United States of America
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, United States of America
- Indiana University Melvin and Bren Simon Cancer Center, Indianapolis, Indiana, United States of America
- * E-mail:
| |
Collapse
|
12
|
iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014; 10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 606] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open
Abstract
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
Collapse
|
13
|
Wenger AM, Clarke SL, Notwell JH, Chung T, Tuteja G, Guturu H, Schaar BT, Bejerano G. The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS Genet 2013; 9:e1003728. [PMID: 24009522 PMCID: PMC3757057 DOI: 10.1371/journal.pgen.1003728] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 07/03/2013] [Indexed: 11/18/2022] Open
Abstract
Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
Collapse
Affiliation(s)
- Aaron M. Wenger
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Shoa L. Clarke
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - James H. Notwell
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, California, United States of America
| | - Bruce T. Schaar
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
14
|
Maurya MR, Gupta S, Li X, Fahy E, Dinasarapu AR, Sud M, Brown HA, Glass CK, Murphy RC, Russell DW, Dennis EA, Subramaniam S. Analysis of inflammatory and lipid metabolic networks across RAW264.7 and thioglycolate-elicited macrophages. J Lipid Res 2013; 54:2525-42. [PMID: 23776196 DOI: 10.1194/jlr.m040212] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Studies of macrophage biology have been significantly advanced by the availability of cell lines such as RAW264.7 cells. However, it is unclear how these cell lines differ from primary macrophages such as thioglycolate-elicited peritoneal macrophages (TGEMs). We used the inflammatory stimulus Kdo2-lipid A (KLA) to stimulate RAW264.7 and TGEM cells. Temporal changes of lipid and gene expression levels were concomitantly measured and a systems-level analysis was performed on the fold-change data. Here we present a comprehensive comparison between the two cell types. Upon KLA treatment, both RAW264.7 and TGEM cells show a strong inflammatory response. TGEM (primary) cells show a more rapid and intense inflammatory response relative to RAW264.7 cells. DNA levels (fold-change relative to control) are reduced in RAW264.7 cells, correlating with greater downregulation of cell cycle genes. The transcriptional response suggests that the cholesterol de novo synthesis increases considerably in RAW264.7 cells, but 25-hydroxycholesterol increases considerably in TGEM cells. Overall, while RAW264.7 cells behave similarly to TGEM cells in some ways and can be used as a good model for inflammation- and immune function-related kinetic studies, they behave differently than TGEM cells in other aspects of lipid metabolism and phenotypes used as models for various disorders such as atherosclerosis.
Collapse
Affiliation(s)
- Mano R Maurya
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Dolfini D, Mantovani R. Targeting the Y/CCAAT box in cancer: YB-1 (YBX1) or NF-Y? Cell Death Differ 2013; 20:676-85. [PMID: 23449390 PMCID: PMC3619239 DOI: 10.1038/cdd.2013.13] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Revised: 01/11/2013] [Accepted: 01/18/2013] [Indexed: 01/14/2023] Open
Abstract
The Y box is an important sequence motif found in promoters and enhancers containing a CCAAT box - one of the few elements enriched in promoters of large sets of genes overexpressed in cancer. The search for the transcription factor(s) acting on it led to the biochemical purification of the nuclear factor Y (NF-Y) heterotrimer, and to the cloning - through the screening of expression libraries - of Y box-binding protein 1 (YB-1), an oncogene, overexpressed in aggressive tumors and associated with drug resistance. These two factors have been associated with Y/CCAAT-dependent activation of numerous growth-related genes, notably multidrug resistance protein 1. We review two decades of data indicating that NF-Y ultimately acts on Y/CCAAT in cancer cells, a notion recently confirmed by genome-wide data. Other features of YB-1, such as post-transcriptional control of mRNA biology, render it important in cancer biology.
Collapse
Affiliation(s)
- D Dolfini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, Milan 20133, Italy
| | - R Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, Milan 20133, Italy
| |
Collapse
|
16
|
Abstract
In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed.
Collapse
|
17
|
Amar D, Safer H, Shamir R. Dissection of regulatory networks that are altered in disease via differential co-expression. PLoS Comput Biol 2013; 9:e1002955. [PMID: 23505361 PMCID: PMC3591264 DOI: 10.1371/journal.pcbi.1002955] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Accepted: 01/14/2013] [Indexed: 12/26/2022] Open
Abstract
Comparing the gene-expression profiles of sick and healthy individuals can help in understanding disease. Such differential expression analysis is a well-established way to find gene sets whose expression is altered in the disease. Recent approaches to gene-expression analysis go a step further and seek differential co-expression patterns, wherein the level of co-expression of a set of genes differs markedly between disease and control samples. Such patterns can arise from a disease-related change in the regulatory mechanism governing that set of genes, and pinpoint dysfunctional regulatory networks. Here we present DICER, a new method for detecting differentially co-expressed gene sets using a novel probabilistic score for differential correlation. DICER goes beyond standard differential co-expression and detects pairs of modules showing differential co-expression. The expression profiles of genes within each module of the pair are correlated across all samples. The correlation between the two modules, however, differs markedly between the disease and normal samples. We show that DICER outperforms the state of the art in terms of significance and interpretability of the detected gene sets. Moreover, the gene sets discovered by DICER manifest regulation by disease-specific microRNA families. In a case study on Alzheimer's disease, DICER dissected biological processes and protein complexes into functional subunits that are differentially co-expressed, thereby revealing inner structures in disease regulatory networks. The most fundamental and popular gene-expression experiments measure genome-wide transcription levels in two populations: perturbed and wild type, or cases and controls. The genes that show significantly different expression between the two populations (the differentially expressed genes) are useful for understanding the biology underlying the phenotype difference, and can sometimes also serve as biomarkers for classification. In contrast, genes that have similar expression to each other across all profiles (co-expressed genes) can yield clues about the functional commonality of the two populations. Differential co-expression has recently been proposed as a way to combine the benefits of these two approaches: it seeks gene groups that are co-expressed in one phenotype much more than in the other. Here we develop a new method for detecting differential co-expression and test it on case-control expression profiles of several diseases. Our algorithm improves upon the state of the art in the strength of the detected patterns and in agreement with current biological knowledge. We show that our method can predict gene regulators that are associated with the disease of interest and demonstrate that it can dissect known biological pathways into subcomponents that are not detected using standard analyses.
Collapse
Affiliation(s)
- David Amar
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Hershel Safer
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
18
|
New meta-analysis tools reveal common transcriptional regulatory basis for multiple determinants of behavior. Proc Natl Acad Sci U S A 2012; 109:E1801-10. [PMID: 22691501 DOI: 10.1073/pnas.1205283109] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
A fundamental problem in meta-analysis is how to systematically combine information from multiple statistical tests to rigorously evaluate a single overarching hypothesis. This problem occurs in systems biology when attempting to map genomic attributes to complex phenotypes such as behavior. Behavior and other complex phenotypes are influenced by intrinsic and environmental determinants that act on the transcriptome, but little is known about how these determinants interact at the molecular level. We developed an informatic technique that identifies statistically significant meta-associations between gene expression patterns and transcription factor combinations. Deploying this technique for brain transcriptome profiles from ca. 400 individual bees, we show that diverse determinants of behavior rely on shared combinations of transcription factors. These relationships were revealed only when we considered complex and variable regulatory rules, suggesting that these shared transcription factors are used in distinct ways by different determinants. This regulatory code would have been missed by traditional gene coexpression or cis-regulatory analytic methods. We expect that our meta-analysis tools will be useful for a broad array of problems in systems biology and other fields.
Collapse
|
19
|
The NF-Y/p53 liaison: well beyond repression. Biochim Biophys Acta Rev Cancer 2011; 1825:131-9. [PMID: 22138487 DOI: 10.1016/j.bbcan.2011.11.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/09/2011] [Accepted: 11/12/2011] [Indexed: 12/15/2022]
Abstract
NF-Y is a sequence-specific transcription factor - TF - targeting the common CCAAT promoter element. p53 is a master TF controlling the response to stress signals endangering genome integrity, often mutated in human cancers. The NF-Y/p53 - and p63, p73 - interaction results in transcriptional repression of a subset of genes within the vast NF-Y regulome under DNA-damage conditions. Recent data shows that NF-Y is also involved in pro-apoptotic activities, either directly, by mediating p53 transcriptional activation, or indirectly, by being targeted by a non coding RNA, PANDA. The picture is subverted in cells carrying Gain-of-function mutant p53, through interactions with TopBP1, a protein also involved in DNA repair and replication. In summary, the connection between p53 and NF-Y is crucial in determining cell survival or death.
Collapse
|
20
|
Dolfini D, Gatta R, Mantovani R. NF-Y and the transcriptional activation of CCAAT promoters. Crit Rev Biochem Mol Biol 2011; 47:29-49. [PMID: 22050321 DOI: 10.3109/10409238.2011.628970] [Citation(s) in RCA: 171] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The CCAAT box promoter element and NF-Y, the transcription factor (TF) that binds to it, were among the first cis-elements and trans-acting factors identified; their interplay is required for transcriptional activation of a sizeable number of eukaryotic genes. NF-Y consists of three evolutionarily conserved subunits: a dimer of NF-YB and NF-YC which closely resembles a histone, and the "innovative" NF-YA. In this review, we will provide an update on the functional and biological features that make NF-Y a fundamental link between chromatin and transcription. The last 25 years have witnessed a spectacular increase in our knowledge of how genes are regulated: from the identification of cis-acting sequences in promoters and enhancers, and the biochemical characterization of the corresponding TFs, to the merging of chromatin studies with the investigation of enzymatic machines that regulate epigenetic states. Originally identified and studied in yeast and mammals, NF-Y - also termed CBF and CP1 - is composed of three subunits, NF-YA, NF-YB and NF-YC. The complex recognizes the CCAAT pentanucleotide and specific flanking nucleotides with high specificity (Dorn et al., 1997; Hatamochi et al., 1988; Hooft van Huijsduijnen et al, 1987; Kim & Sheffery, 1990). A compelling set of bioinformatics studies clarified that the NF-Y preferred binding site is one of the most frequent promoter elements (Suzuki et al., 2001, 2004; Elkon et al., 2003; Mariño-Ramírez et al., 2004; FitzGerald et al., 2004; Linhart et al., 2005; Zhu et al., 2005; Lee et al., 2007; Abnizova et al., 2007; Grskovic et al., 2007; Halperin et al., 2009; Häkkinen et al., 2011). The same consensus, as determined by mutagenesis and SELEX studies (Bi et al., 1997), was also retrieved in ChIP-on-chip analysis (Testa et al., 2005; Ceribelli et al., 2006; Ceribelli et al., 2008; Reed et al., 2008). Additional structural features of the CCAAT box - position, orientation, presence of multiple Transcriptional Start Sites - were previously reviewed (Dolfini et al., 2009) and will not be considered in detail here.
Collapse
Affiliation(s)
- Diletta Dolfini
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Milan, Italy
| | | | | |
Collapse
|
21
|
Sivriver J, Habib N, Friedman N. An integrative clustering and modeling algorithm for dynamical gene expression data. Bioinformatics 2011; 27:i392-400. [PMID: 21685097 PMCID: PMC3117368 DOI: 10.1093/bioinformatics/btr250] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The precise dynamics of gene expression is often crucial for proper response to stimuli. Time-course gene-expression profiles can provide insights about the dynamics of many cellular responses, but are often noisy and measured at arbitrary intervals, posing a major analysis challenge. RESULTS We developed an algorithm that interleaves clustering time-course gene-expression data with estimation of dynamic models of their response by biologically meaningful parameters. In combining these two tasks we overcome obstacles posed in each one. Moreover, our approach provides an easy way to compare between responses to different stimuli at the dynamical level. We use our approach to analyze the dynamical transcriptional responses to inflammation and anti-viral stimuli in mice primary dendritic cells, and extract a concise representation of the different dynamical response types. We analyze the similarities and differences between the two stimuli and identify potential regulators of this complex transcriptional response. AVAILABILITY The code to our method is freely available http://www.compbio.cs.huji.ac.il/DynaMiteC. CONTACT nir@cs.huji.ac.il.
Collapse
Affiliation(s)
- Julia Sivriver
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | | | | |
Collapse
|
22
|
Subramaniam S, Fahy E, Gupta S, Sud M, Byrnes RW, Cotter D, Dinasarapu AR, Maurya MR. Bioinformatics and systems biology of the lipidome. Chem Rev 2011; 111:6452-90. [PMID: 21939287 PMCID: PMC3383319 DOI: 10.1021/cr200295k] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Shankar Subramaniam
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
- Departments of Chemistry and Biochemistry, and Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, California 92093, USA
| | - Eoin Fahy
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Shakti Gupta
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Manish Sud
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Robert W. Byrnes
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Dawn Cotter
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Ashok Reddy Dinasarapu
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Mano Ram Maurya
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| |
Collapse
|
23
|
Rational design of therapeutic siRNAs: minimizing off-targeting potential to improve the safety of RNAi therapy for Huntington's disease. Mol Ther 2011; 19:2169-77. [PMID: 21952166 DOI: 10.1038/mt.2011.185] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
RNA interference (RNAi) provides an approach for the treatment of many human diseases. However, the safety of RNAi-based therapies can be hampered by the ability of small inhibitory RNAs (siRNAs) to bind to unintended mRNAs and reduce their expression, an effect known as off-target gene silencing. Off-targeting primarily occurs when the seed region (nucleotides 2-8 of the small RNA) pairs with sequences in 3'-UTRs of unintended mRNAs and directs translational repression and destabilization of those transcripts. To date, most therapeutic RNAi sequences are selected primarily for gene silencing efficacy, and later evaluated for safety. Here, in designing siRNAs to treat Huntington's disease (HD), a dominant neurodegenerative disorder, we prioritized selection of sequences with minimal off-targeting potentials (i.e., those with a scarcity of seed complements within all known human 3'-UTRs). We identified new promising therapeutic candidate sequences which show potent silencing in cell culture and mouse brain. Furthermore, we present microarray data demonstrating that off-targeting is significantly minimized by using siRNAs that contain "safe" seeds, an important strategy to consider during preclinical development of RNAi-based therapeutics.
Collapse
|
24
|
Linhart C, Halperin Y, Darom A, Kidron S, Broday L, Shamir R. A novel candidate cis-regulatory motif pair in the promoters of germline and oogenesis genes in C. elegans. Genome Res 2011; 22:76-83. [PMID: 21930893 DOI: 10.1101/gr.115626.110] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this study we report on a novel pair of cis-regulatory motifs in promoter sequences of the nematode Caenorhabditis elegans. The motif pair exhibits extraordinary genomic traits: The order and the orientation of the two motifs are highly specific, and the distance between them is almost always one of two frequent distances. In contrast, the sequence between the motifs is variable across occurrences. Thus, the motif pair constitutes a nearly combinatorial sequence configuration. We further show that this module is conserved among, and unique to, the entire Caenorhabditis genus. By analyzing several gene expression data sets, our data suggest that this motif pair may function in germline development, oogenesis, and early embryogenesis. Finally, we verify that the motifs are indeed functional cis-regulatory elements using reporter constructs in transgenic C. elegans.
Collapse
Affiliation(s)
- Chaim Linhart
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | | | | | |
Collapse
|
25
|
Gruel J, LeBorgne M, LeMeur N, Théret N. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns. BMC Bioinformatics 2011; 12:365. [PMID: 21910886 PMCID: PMC3215511 DOI: 10.1186/1471-2105-12-365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Accepted: 09/12/2011] [Indexed: 01/07/2023] Open
Abstract
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Collapse
Affiliation(s)
- Jérémy Gruel
- EA 4427 SeRAIC IFR140, Université de Rennes 1, 2 avenue du Pr, Léon Bernard, Rennes 35043, France.
| | | | | | | |
Collapse
|
26
|
Ulitsky I, Laurent LC, Shamir R. Towards computational prediction of microRNA function and activity. Nucleic Acids Res 2010; 38:e160. [PMID: 20576699 PMCID: PMC2926627 DOI: 10.1093/nar/gkq570] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
While it has been established that microRNAs (miRNAs) play key roles throughout development and are dysregulated in many human pathologies, the specific processes and pathways regulated by individual miRNAs are mostly unknown. Here, we use computational target predictions in order to automatically infer the processes affected by human miRNAs. Our approach improves upon standard statistical tools by addressing specific characteristics of miRNA regulation. Our analysis is based on a novel compendium of experimentally verified miRNA-pathway and miRNA-process associations that we constructed, which can be a useful resource by itself. Our method also predicts novel miRNA-regulated pathways, refines the annotation of miRNAs for which only crude functions are known, and assigns differential functions to miRNAs with closely related sequences. Applying our approach to groups of co-expressed genes allows us to identify miRNAs and genomic miRNA clusters with functional importance in specific stages of early human development. A full list of the predicted mRNA functions is available at http://acgt.cs.tau.ac.il/fame/.
Collapse
Affiliation(s)
- Igor Ulitsky
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | |
Collapse
|
27
|
Ulitsky I, Maron-Katz A, Shavit S, Sagir D, Linhart C, Elkon R, Tanay A, Sharan R, Shiloh Y, Shamir R. Expander: from expression microarrays to networks and functions. Nat Protoc 2010; 5:303-22. [PMID: 20134430 DOI: 10.1038/nprot.2009.230] [Citation(s) in RCA: 166] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A major challenge in the analysis of gene expression microarray data is to extract meaningful biological knowledge out of the huge volume of raw data. Expander (EXPression ANalyzer and DisplayER) is an integrated software platform for the analysis of gene expression data, which is freely available for academic use. It is designed to support all the stages of microarray data analysis, from raw data normalization to inference of transcriptional regulatory networks. The microarray analysis described in this protocol starts with importing the data into Expander 5.0 and is followed by normalization and filtering. Then, clustering and network-based analyses are performed. The gene groups identified are tested for enrichment in function (based on Gene Ontology), co-regulation (using transcription factor and microRNA target predictions) or co-location. The results of each analysis step can be visualized in a number of ways. The complete protocol can be executed in approximately 1 h.
Collapse
Affiliation(s)
- Igor Ulitsky
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Rajkovic M, Iwen KAH, Hofmann PJ, Harneit A, Weitzel JM. Functional cooperation between CREM and GCNF directs gene expression in haploid male germ cells. Nucleic Acids Res 2010; 38:2268-78. [PMID: 20071744 PMCID: PMC2853129 DOI: 10.1093/nar/gkp1220] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cellular differentiation and development of germ cells critically depend on a coordinated activation and repression of specific genes. The underlying regulation mechanisms, however, still lack a lot of understanding. Here, we describe that both the testis-specific transcriptional activator CREMτ (cAMP response element modulator tau) and the repressor GCNF (germ cell nuclear factor) have an overlapping binding site which alone is sufficient to direct cell type-specific expression in vivo in a heterologous promoter context. Expression of the transgene driven by the CREM/GCNF site is detectable in spermatids, but not in any somatic tissue or at any other stages during germ cell differentiation. CREMτ acts as an activator of gene transcription whereas GCNF suppresses this activity. Both factors compete for binding to the same DNA response element. Effective binding of CREM and GCNF highly depends on composition and epigenetic modification of the binding site. We also discovered that CREM and GCNF bind to each other via their DNA binding domains, indicating a complex interaction between the two factors. There are several testis-specific target genes that are regulated by CREM and GCNF in a reciprocal manner, showing a similar activation pattern as during spermatogenesis. Our data indicate that a single common binding site for CREM and GCNF is sufficient to specifically direct gene transcription in a tissue-, cell type- and differentiation-specific manner.
Collapse
Affiliation(s)
- Mirjana Rajkovic
- Institut für Immunologie und Transfusionsmedizin, Ernst-Moritz-Arndt Universität, Greifswald, Germany
| | | | | | | | | |
Collapse
|
29
|
Huttenhower C, Mutungu KT, Indik N, Yang W, Schroeder M, Forman JJ, Troyanskaya OG, Coller HA. Detailing regulatory networks through large scale data integration. Bioinformatics 2009; 25:3267-74. [PMID: 19825796 DOI: 10.1093/bioinformatics/btp588] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Much of a cell's regulatory response to changing environments occurs at the transcriptional level. Particularly in higher organisms, transcription factors (TFs), microRNAs and epigenetic modifications can combine to form a complex regulatory network. Part of this system can be modeled as a collection of regulatory modules: co-regulated genes, the conditions under which they are co-regulated and sequence-level regulatory motifs. RESULTS We present the Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) system for regulatory module prediction. The algorithm is efficient enough to discover expression biclusters and putative regulatory motifs in metazoan genomes (>20,000 genes) and very large microarray compendia (>10,000 conditions). Using Bayesian data integration, it can also include diverse supporting data types such as evolutionary conservation or nucleosome placement. We validate its performance using a functional evaluation of co-clustered genes, known yeast and Escherichea coli TF targets, synthetic data and various metazoan data compendia. In all cases, COALESCE performs as well or better than current biclustering and motif prediction tools, with high accuracy in functional and TF/target assignments and zero false positives on synthetic data. COALESCE provides an efficient and flexible platform within which large, diverse data collections can be integrated to predict metazoan regulatory networks. AVAILABILITY Source code (C++) is available at http://function.princeton.edu/sleipnir, and supporting data and a web interface are provided at http://function.princeton.edu/coalesce. CONTACT ogt@cs.princeton.edu; hcoller@princeton.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Roider HG, Lenhard B, Kanhere A, Haas SA, Vingron M. CpG-depleted promoters harbor tissue-specific transcription factor binding signals--implications for motif overrepresentation analyses. Nucleic Acids Res 2009; 37:6305-15. [PMID: 19736212 PMCID: PMC2770660 DOI: 10.1093/nar/gkp682] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Motif overrepresentation analysis of proximal promoters is a common approach to characterize the regulatory properties of co-expressed sets of genes. Here we show that these approaches perform well on mammalian CpG-depleted promoter sets that regulate expression in terminally differentiated tissues such as liver and heart. In contrast, CpG-rich promoters show very little overrepresentation signal, even when associated with genes that display highly constrained spatiotemporal expression. For instance, while ∼50% of heart specific genes possess CpG-rich promoters we find that the frequently observed enrichment of MEF2-binding sites upstream of heart-specific genes is solely due to contributions from CpG-depleted promoters. Similar results are obtained for all sets of tissue-specific genes indicating that CpG-rich and CpG-depleted promoters differ fundamentally in their distribution of regulatory inputs around the transcription start site. In order not to dilute the respective transcription factor binding signals, the two promoter types should thus be treated as separate sets in any motif overrepresentation analysis.
Collapse
Affiliation(s)
- Helge G Roider
- Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin.
| | | | | | | | | |
Collapse
|