1
|
Visualization and assessment of model selection uncertainty. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
2
|
Gui W, Xue L, Yue J, Kuang Z, Jin Y, Niu L. Crystal structure of the complex of DNA with the C-terminal domain of TYE7 from Saccharomyces cerevisiae. Acta Crystallogr F Struct Biol Commun 2021; 77:341-347. [PMID: 34605438 PMCID: PMC8488859 DOI: 10.1107/s2053230x21009250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 09/06/2021] [Indexed: 11/10/2022] Open
Abstract
TYE7, a bHLH (basic helix-loop-helix) transcription factor from Saccharomyces cerevisiae, is involved in the regulation of many genes, including glycolytic genes. Meanwhile, accumulating evidence indicates that TYE7 also functions as a cyclin and is linked to sulfur metabolism. Here, the structure of TYE7 (residues 165-291) complexed with its specific DNA was determined by X-ray crystallography. Structural analysis and comparison revealed that His185 and Glu189 are conserved in base recognition. However, Arg193 is also involved in base recognition in the structures that were compared. In the structure in this study, Arg193 in chain A has two conformations and makes a salt bridge with the phosphate backbone structure. In addition, a series of corresponding electrophoretic mobility shift assays were performed to better understand the DNA-binding mechanism of the bHLH domain of TYE7.
Collapse
Affiliation(s)
- Wei Gui
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| | - Lu Xue
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| | - Jian Yue
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| | - Zhiling Kuang
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| | - Yuping Jin
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| | - Liwen Niu
- Hefei National Laboratory for Physical Sciences at the Microscale, Division of Molecular and Cellular Biophysics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China
| |
Collapse
|
3
|
Green B, Lian H, Yu Y, Zu T. Ultra high-dimensional semiparametric longitudinal data analysis. Biometrics 2020; 77:903-913. [PMID: 32750150 DOI: 10.1111/biom.13348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 06/08/2020] [Accepted: 07/21/2020] [Indexed: 11/30/2022]
Abstract
As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as exp ( n 1 / 2 ) with respect to the number of clusters n. We consider a flexible semiparametric approach, namely, partially linear single-index models, for ultra high-dimensional longitudinal data. Most importantly, we allow not only the partially linear covariates but also the single-index covariates within the unknown flexible function estimated nonparametrically to be ultra high dimensional. Using penalized generalized estimating equations, this approach can capture correlation within subjects, can perform simultaneous variable selection and estimation with a smoothly clipped absolute deviation penalty, and can capture nonlinearity and potentially some interactions among predictors. We establish asymptotic theory for the estimators including the oracle property in ultra high dimension for both the partially linear and nonparametric components, and we present an efficient algorithm to handle the computational challenges. We show the effectiveness of our method and algorithm via a simulation study and a yeast cell cycle gene expression data.
Collapse
Affiliation(s)
- Brittany Green
- Department of Computer Information Systems, University of Louisville, Louisville, Kentucky
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Yan Yu
- Department of Operations, Business Analytics, & Information Systems, University of Cincinnati, Cincinnati, Ohio
| | - Tianhai Zu
- Department of Operations, Business Analytics, & Information Systems, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
4
|
Sikdar S, Datta S. A novel statistical approach for identification of the master regulator transcription factor. BMC Bioinformatics 2017; 18:79. [PMID: 28148240 PMCID: PMC5288875 DOI: 10.1186/s12859-017-1499-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 01/27/2017] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A 'master regulator' transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This 'master regulator' transcription factor is at the top of the hierarchy of the transcriptomic regulation. Therefore, it is important to identify and target the master regulator transcription factor for proper understanding of the associated disease process and identifying the best therapeutic option. METHODS We present a novel two-step computational approach for identification of master regulator transcription factor in a genome. At the first step of our method we test whether there exists any master regulator transcription factor in the system. We evaluate the concordance of two ranked lists of transcription factors using a statistical measure. In case the concordance measure is statistically significant, we conclude that there is a master regulator. At the second step, our method identifies the master regulator transcription factor, if there exists one. RESULTS In the simulation scenario, our method performs reasonably well in validating the existence of a master regulator when the number of subjects in each treatment group is reasonably large. In application to two real datasets, our method ensures the existence of master regulators and identifies biologically meaningful master regulators. An R code for implementing our method in a sample test data can be found in http://www.somnathdatta.org/software . CONCLUSION We have developed a screening method of identifying the 'master regulator' transcription factor just using only the gene expression data. Understanding the regulatory structure and finding the master regulator help narrowing the search space for identifying biomarkers for complex diseases such as cancer. In addition to identifying the master regulator our method provides an overview of the regulatory structure of the transcription factors which control the global gene expression profiles and consequently the cell functioning.
Collapse
Affiliation(s)
- Sinjini Sikdar
- Department of Biostatistics, University of Florida, Gainesville, FL, 32611, USA
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, 32611, USA.
| |
Collapse
|
5
|
Wu WS, Hsieh YC, Lai FJ. YCRD: Yeast Combinatorial Regulation Database. PLoS One 2016; 11:e0159213. [PMID: 27392072 PMCID: PMC4938206 DOI: 10.1371/journal.pone.0159213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/28/2016] [Indexed: 12/21/2022] Open
Abstract
In eukaryotes, the precise transcriptional control of gene expression is typically achieved through combinatorial regulation using cooperative transcription factors (TFs). Therefore, a database which provides regulatory associations between cooperative TFs and their target genes is helpful for biologists to study the molecular mechanisms of transcriptional regulation of gene expression. Because there is no such kind of databases in the public domain, this prompts us to construct a database, called Yeast Combinatorial Regulation Database (YCRD), which deposits 434,197 regulatory associations between 2535 cooperative TF pairs and 6243 genes. The comprehensive collection of more than 2500 cooperative TF pairs was retrieved from 17 existing algorithms in the literature. The target genes of a cooperative TF pair (e.g. TF1-TF2) are defined as the common target genes of TF1 and TF2, where a TF’s experimentally validated target genes were downloaded from YEASTRACT database. In YCRD, users can (i) search the target genes of a cooperative TF pair of interest, (ii) search the cooperative TF pairs which regulate a gene of interest and (iii) identify important cooperative TF pairs which regulate a given set of genes. We believe that YCRD will be a valuable resource for yeast biologists to study combinatorial regulation of gene expression. YCRD is available at http://cosbi.ee.ncku.edu.tw/YCRD/ or http://cosbi2.ee.ncku.edu.tw/YCRD/.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| | - Yen-Chen Hsieh
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Fu-Jou Lai
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
6
|
Khaleel SS, Andrews EH, Ung M, DiRenzo J, Cheng C. E2F4 regulatory program predicts patient survival prognosis in breast cancer. Breast Cancer Res 2014; 16:486. [PMID: 25440089 PMCID: PMC4303196 DOI: 10.1186/s13058-014-0486-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 11/18/2014] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION Genetic and molecular signatures have been incorporated into cancer prognosis prediction and treatment decisions with good success over the past decade. Clinically, these signatures are usually used in early-stage cancers to evaluate whether they require adjuvant therapy following surgical resection. A molecular signature that is prognostic across more clinical contexts would be a useful addition to current signatures. METHODS We defined a signature for the ubiquitous tissue factor, E2F4, based on its shared target genes in multiple tissues. These target genes were identified by chromatin immunoprecipitation sequencing (ChIP-seq) experiments using a probabilistic method. We then computationally calculated the regulatory activity score (RAS) of E2F4 in cancer tissues, and examined how E2F4 RAS correlates with patient survival. RESULTS Genes in our E2F4 signature were 21-fold more likely to be correlated with breast cancer patient survival time compared to randomly selected genes. Using eight independent breast cancer datasets containing over 1,900 unique samples, we stratified patients into low and high E2F4 RAS groups. E2F4 activity stratification was highly predictive of patient outcome, and our results remained robust even when controlling for many factors including patient age, tumor size, grade, estrogen receptor (ER) status, lymph node (LN) status, whether the patient received adjuvant therapy, and the patient's other prognostic indices such as Adjuvant! and the Nottingham Prognostic Index scores. Furthermore, the fractions of samples with positive E2F4 RAS vary in different intrinsic breast cancer subtypes, consistent with the different survival profiles of these subtypes. CONCLUSIONS We defined a prognostic signature, the E2F4 regulatory activity score, and showed it to be significantly predictive of patient outcome in breast cancer regardless of treatment status and the states of many other clinicopathological variables. It can be used in conjunction with other breast cancer classification methods such as Oncotype DX to improve clinical outcome prediction.
Collapse
Affiliation(s)
- Sari S Khaleel
- Department of Genetics, Geisel School of Medicine at Dartmouth, 1 Rope Ferry Road, Hanover, NH, 03755, USA.
| | - Erik H Andrews
- Department of Genetics, Geisel School of Medicine at Dartmouth, 1 Rope Ferry Road, Hanover, NH, 03755, USA.
| | - Matthew Ung
- Department of Genetics, Geisel School of Medicine at Dartmouth, 1 Rope Ferry Road, Hanover, NH, 03755, USA.
| | - James DiRenzo
- Department of Pharmacology & Toxicology, Geisel School of Medicine at Dartmouth, 1 Rope Ferry Road, Hanover, NH, 03755, USA.
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, 1 Rope Ferry Road, Hanover, NH, 03755, USA.
- Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, One Medical Center Drive, Lebanon, NH, 03766, USA.
- Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, One Medical Center Drive, Lebanon, NH, 03766, USA.
| |
Collapse
|
7
|
Eser P, Demel C, Maier KC, Schwalb B, Pirkl N, Martin DE, Cramer P, Tresch A. Periodic mRNA synthesis and degradation co-operate during cell cycle gene expression. Mol Syst Biol 2014; 10:717. [PMID: 24489117 PMCID: PMC4023403 DOI: 10.1002/msb.134886] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
During the cell cycle, the levels of hundreds of mRNAs change in a periodic manner, but how this is achieved by alterations in the rates of mRNA synthesis and degradation has not been studied systematically. Here, we used metabolic RNA labeling and comparative dynamic transcriptome analysis (cDTA) to derive mRNA synthesis and degradation rates every 5 min during three cell cycle periods of the yeast Saccharomyces cerevisiae. A novel statistical model identified 479 genes that show periodic changes in mRNA synthesis and generally also periodic changes in their mRNA degradation rates. Peaks of mRNA degradation generally follow peaks of mRNA synthesis, resulting in sharp and high peaks of mRNA levels at defined times during the cell cycle. Whereas the timing of mRNA synthesis is set by upstream DNA motifs and their associated transcription factors (TFs), the synthesis rate of a periodically expressed gene is apparently set by its core promoter.
Collapse
Affiliation(s)
- Philipp Eser
- Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM Ludwig-Maximilians-Universität München, Munich, Germany
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Dehghan Nayeri F. Identification of transcription factors linked to cell cycle regulation in Arabidopsis. PLANT SIGNALING & BEHAVIOR 2014; 9:e972864. [PMID: 25482767 PMCID: PMC4622563 DOI: 10.4161/15592316.2014.972864] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 07/24/2014] [Accepted: 07/25/2014] [Indexed: 06/04/2023]
Abstract
Cell cycle is an essential process in growth and development of living organisms consists of the replication and mitotic phases separated by 2 gap phases; G1 and G2. It is tightly controlled at the molecular level and especially at the level of transcription. Precise regulation of the cell cycle is of central significance for plant growth and development and transcription factors are global regulators of gene expression playing essential roles in cell cycle regulation. This study has uncovered TFs that are involved in the control of cell cycle progression. With the aid of multi-parallel quantitative RT-PCR, the expression changes of 1880 TFs represented in the Arabidopsis TF platform was monitored in Arabidopsis synchronous MM2d cells during a 19 h period representing different time points corresponding to the 4 cell cycle phases after treatment of MM2d cells with Aphidicolin. Comparative TF expression analyses performed on synchronous cells resulted in the identification of 239 TFs differentially expressed during the cell cycle, while about one third of TFs were constitutively expressed through all time points. Phase-specific TFs were also identified.
Collapse
Affiliation(s)
- Fatemeh Dehghan Nayeri
- Max-Planck Institute of Molecular Plant Physiology; Am Mühlenberg 1; Potsdam-Golm, Germany
- Department of Agricultural Biotechnology; Faculty of Engineering and Technology; Imam Khomeini International University; Qazvin, Iran
| |
Collapse
|
9
|
Cheng C, Ung M, Grant GD, Whitfield ML. Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding RNAs. PLoS Comput Biol 2013; 9:e1003132. [PMID: 23874175 PMCID: PMC3708869 DOI: 10.1371/journal.pcbi.1003132] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 05/24/2013] [Indexed: 12/02/2022] Open
Abstract
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Microarray time course experiments have been successfully used to identify cell cycle regulated genes but with several limitations, e.g. less effective in identifying genes with low expression. We propose a computational approach to predict cell cycle genes based on TF binding data and motif information in their promoters. Specifically, we take advantage of ChIP-seq TF binding data generated by the ENCODE project and the TF binding motif information available from public databases. These data were processed and utilized as predictor for predicting cell cycle genes using the Random Forest method. Our results show that both the trans- TF features and the cis- motif features are predictive to cell cycle genes, and a combination of the two types features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.
| | | | | | | |
Collapse
|
10
|
State of the art in silico tools for the study of signaling pathways in cancer. Int J Mol Sci 2012; 13:6561-6581. [PMID: 22837650 PMCID: PMC3397482 DOI: 10.3390/ijms13066561] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 05/03/2012] [Accepted: 05/10/2012] [Indexed: 12/18/2022] Open
Abstract
In the last several years, researchers have exhibited an intense interest in the evolutionarily conserved signaling pathways that have crucial roles during embryonic development. Interestingly, the malfunctioning of these signaling pathways leads to several human diseases, including cancer. The chemical and biophysical events that occur during cellular signaling, as well as the number of interactions within a signaling pathway, make these systems complex to study. In silico resources are tools used to aid the understanding of cellular signaling pathways. Systems approaches have provided a deeper knowledge of diverse biochemical processes, including individual metabolic pathways, signaling networks and genome-scale metabolic networks. In the future, these tools will be enormously valuable, if they continue to be developed in parallel with growing biological knowledge. In this study, an overview of the bioinformatics resources that are currently available for the analysis of biological networks is provided.
Collapse
|
11
|
Wang H, Wang YH, Wu WS. Yeast cell cycle transcription factors identification by variable selection criteria. Gene 2011; 485:172-6. [DOI: 10.1016/j.gene.2011.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Revised: 05/12/2011] [Accepted: 06/03/2011] [Indexed: 01/12/2023]
|
12
|
Weiss MS, Peñalver Bernabé B, Bellis AD, Broadbelt LJ, Jeruss JS, Shea LD. Dynamic, large-scale profiling of transcription factor activity from live cells in 3D culture. PLoS One 2010; 5:e14026. [PMID: 21103341 PMCID: PMC2984444 DOI: 10.1371/journal.pone.0014026] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 10/21/2010] [Indexed: 12/03/2022] Open
Abstract
Background Extracellular activation of signal transduction pathways and their downstream target transcription factors (TFs) are critical regulators of cellular processes and tissue development. The intracellular signaling network is complex, and techniques that quantify the activities of numerous pathways and connect their activities to the resulting phenotype would identify the signals and mechanisms regulating tissue development. The ability to investigate tissue development should capture the dynamic pathway activity and requires an environment that supports cellular organization into structures that mimic in vivo phenotypes. Taken together, our objective was to develop cellular arrays for dynamic, large-scale quantification of TF activity as cells organized into spherical structures within 3D culture. Methodology/Principal Findings TF-specific and normalization reporter constructs were delivered in parallel to a cellular array containing a well-established breast cancer cell line cultured in Matrigel. Bioluminescence imaging provided a rapid, non-invasive, and sensitive method to quantify luciferase levels, and was applied repeatedly on each sample to monitor dynamic activity. Arrays measuring 28 TFs identified up to 19 active, with 13 factors changing significantly over time. Stimulation of cells with β-estradiol or activin A resulted in differential TF activity profiles evolving from initial stimulation of the ligand. Many TFs changed as expected based on previous reports, yet arrays were able to replicate these results in a single experiment. Additionally, arrays identified TFs that had not previously been linked with activin A. Conclusions/Significance This system provides a method for large-scale, non-invasive, and dynamic quantification of signaling pathway activity as cells organize into structures. The arrays may find utility for investigating mechanisms regulating normal and abnormal tissue growth, biomaterial design, or as a platform for screening therapeutics.
Collapse
Affiliation(s)
- Michael S. Weiss
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Beatriz Peñalver Bernabé
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Abigail D. Bellis
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Linda J. Broadbelt
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
| | - Jacqueline S. Jeruss
- Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, Illinois, United States of America
- * E-mail: (LDS); (JSJ)
| | - Lonnie D. Shea
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, Illinois, United States of America
- Institute for Bionanotechnology in Medicine (IBNAM), Northwestern University, Chicago, Illinois, United States of America
- * E-mail: (LDS); (JSJ)
| |
Collapse
|
13
|
Emmert-Streib F, Dehmer M. Predicting cell cycle regulated genes by causal interactions. PLoS One 2009; 4:e6633. [PMID: 19688096 PMCID: PMC2723924 DOI: 10.1371/journal.pone.0006633] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 04/23/2009] [Indexed: 11/28/2022] Open
Abstract
The fundamental difference between classic and modern biology is that technological innovations allow to generate high-throughput data to get insights into molecular interactions on a genomic scale. These high-throughput data can be used to infer gene networks, e.g., the transcriptional regulatory or signaling network, representing a blue print of the current dynamical state of the cellular system. However, gene networks do not provide direct answers to biological questions, instead, they need to be analyzed to reveal functional information of molecular working mechanisms. In this paper we propose a new approach to analyze the transcriptional regulatory network of yeast to predict cell cycle regulated genes. The novelty of our approach is that, in contrast to all other approaches aiming to predict cell cycle regulated genes, we do not use time series data but base our analysis on the prior information of causal interactions among genes. The major purpose of the present paper is to predict cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions between genes, and a list of known periodic genes. No further data are used. Our approach utilizes the causal membership of genes and the hierarchical organization of the transcriptional regulatory network leading to two groups of periodic genes with a well defined direction of information flow. We predict genes as periodic if they appear on unique shortest paths connecting two periodic genes from different hierarchy levels. Our results demonstrate that a classical problem as the prediction of cell cycle regulated genes can be seen in a new light if the concept of a causal membership of a gene is applied consequently. This also shows that there is a wealth of information buried in the transcriptional regulatory network whose unraveling may require more elaborate concepts than it might seem at first.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning, Center for Cancer Research and Cell Biology, School of Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom.
| | | |
Collapse
|
14
|
Marco A, Konikoff C, Karr TL, Kumar S. Relationship between gene co-expression and sharing of transcription factor binding sites in Drosophila melanogaster. Bioinformatics 2009; 25:2473-7. [PMID: 19633094 DOI: 10.1093/bioinformatics/btp462] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION In functional genomics, it is frequently useful to correlate expression levels of genes to identify transcription factor binding sites (TFBS) via the presence of common sequence motifs. The underlying assumption is that co-expressed genes are more likely to contain shared TFBS and, thus, TFBS can be identified computationally. Indeed, gene pairs with a very high expression correlation show a significant excess of shared binding sites in yeast. We have tested this assumption in a more complex organism, Drosophila melanogaster, by using experimentally determined TFBS and microarray expression data. We have also examined the reverse relationship between the expression correlation and the extent of TFBS sharing. RESULTS Pairs of genes with shared TFBS show, on average, a higher degree of co-expression than those with no common TFBS in Drosophila. However, the reverse does not hold true: gene pairs with high expression correlations do not share significantly larger numbers of TFBS. Exception to this observation exists when comparing expression of genes from the earliest stages of embryonic development. Interestingly, semantic similarity between gene annotations (Biological Process) is much better associated with TFBS sharing, as compared to the expression correlation. We discuss these results in light of reverse engineering approaches to computationally predict regulatory sequences by using comparative genomics.
Collapse
Affiliation(s)
- Antonio Marco
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
| | | | | | | |
Collapse
|
15
|
Emmert-Streib F, Dehmer M. Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2009; 3:76. [PMID: 19619302 PMCID: PMC2721836 DOI: 10.1186/1752-0509-3-76] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 07/20/2009] [Indexed: 11/25/2022]
Abstract
Background Gene networks are a representation of molecular interactions among genes or products thereof and, hence, are forming causal networks. Despite intense studies during the last years most investigations focus so far on inferential methods to reconstruct gene networks from experimental data or on their structural properties, e.g., degree distributions. Their structural analysis to gain functional insights into organizational principles of, e.g., pathways remains so far under appreciated. Results In the present paper we analyze cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions and not just associations or correlations between genes, and a list of known periodic genes. No further data are used. Partitioning the transcriptional regulatory network according to a graph theoretical property leads to a hierarchy in the network and, hence, in the information flow allowing to identify two groups of periodic genes. This reveals a novel conceptual interpretation of the working mechanism of the cell cycle and the genes regulated by this pathway. Conclusion Aside from the obtained results for the cell cycle of yeast our approach could be exemplary for the analysis of general pathways by exploiting the rich causal structure of inferred and/or curated gene networks including protein or signaling networks.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Center for Cancer Research and Cell Biology, Queen's University Belfast, UK.
| | | |
Collapse
|
16
|
Cheng C, Li LM, Alves P, Gerstein M. Systematic identification of transcription factors associated with patient survival in cancers. BMC Genomics 2009; 10:225. [PMID: 19442316 PMCID: PMC2686740 DOI: 10.1186/1471-2164-10-225] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 05/15/2009] [Indexed: 12/23/2022] Open
Abstract
Background Aberrant activation or expression of transcription factors has been implicated in the tumorigenesis of various types of cancer. In spite of the prevalent application of microarray experiments for profiling gene expression in cancer samples, they provide limited information regarding the activities of transcription factors. However, the association between transcription factors and cancers is largely dependent on the transcription regulatory activities rather than mRNA expression levels. Results In this paper, we propose a computational approach that integrates microarray expression data with the transcription factor binding site information to systematically identify transcription factors associated with patient survival given a specific cancer type. This approach was applied to two gene expression data sets for breast cancer and acute myeloid leukemia. We found that two transcription factor families, the steroid nuclear receptor family and the ATF/CREB family, are significantly correlated with the survival of patients with breast cancer; and that a transcription factor named T-cell acute lymphocytic leukemia 1 is significantly correlated with acute myeloid leukemia patient survival. Conclusion Our analysis identifies transcription factors associating with patient survival and provides insight into the regulatory mechanism underlying the breast cancer and leukemia. The transcription factors identified by our method are biologically meaningful and consistent with prior knowledge. As an insightful tool, this approach can also be applied to other microarray cancer data sets to help researchers better understand the intricate relationship between transcription factors and diseases.
Collapse
Affiliation(s)
- Chao Cheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| | | | | | | |
Collapse
|
17
|
Abstract
One of the early success stories of computational systems biology was the work done on cell-cycle regulation. The earliest mathematical descriptions of cell-cycle control evolved into very complex, detailed computational models that describe the regulation of cell division in many different cell types. On the way these models predicted several dynamical properties and unknown components of the system that were later experimentally verified/identified. Still, research on this field is far from over. We need to understand how the core cell-cycle machinery is controlled by internal and external signals, also in yeast cells and in the more complex regulatory networks of higher eukaryotes. Furthermore, there are many computational challenges what we face as new types of data appear thanks to continuing advances in experimental techniques. We have to deal with cell-to-cell variations, revealed by single cell measurements, as well as the tremendous amount of data flowing from high throughput machines. We need new computational concepts and tools to handle these data and develop more detailed, more precise models of cell-cycle regulation in various organisms. Here we review past and present of computational modeling of cell-cycle regulation, and discuss possible future directions of the field.
Collapse
Affiliation(s)
- Attila Csikász-Nagy
- The Microsoft Research - University of Trento Centre for Computational and Systems Biology, Piazza Manci 17, Povo-Trento I-38100, Italy.
| |
Collapse
|
18
|
Wu WS, Li WH. Systematic identification of yeast cell cycle transcription factors using multiple data sources. BMC Bioinformatics 2008; 9:522. [PMID: 19061501 PMCID: PMC2613934 DOI: 10.1186/1471-2105-9-522] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Accepted: 12/05/2008] [Indexed: 12/16/2022] Open
Abstract
Background Eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Results We developed a method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor binding site (TFBS), and cell cycle gene expression data. We identified 17 cell cycle TFs, 12 of which are known cell cycle TFs, while the remaining five (Ash1, Rlm1, Ste12, Stp1, Tec1) are putative novel cell cycle TFs. For each cell cycle TF, we assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. We also identified 178 novel cell cycle-regulated genes, among which 59 have unknown functions, but they may now be annotated as cell cycle-regulated genes. Most of our predictions are supported by previous experimental or computational studies. Furthermore, a high confidence TF-gene regulatory matrix is derived as a byproduct of our method. Each TF-gene regulatory relationship in this matrix is supported by at least three data sources: gene expression, TFBS, and ChIP-chip or/and mutant data. We show that our method performs better than four existing methods for identifying yeast cell cycle TFs. Finally, an application of our method to different cell cycle gene expression datasets suggests that our method is robust. Conclusion Our method is effective for identifying yeast cell cycle TFs and cell cycle-regulated genes. Many of our predictions are validated by the literature. Our study shows that integrating multiple data sources is a powerful approach to studying complex biological systems.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Evolution and Ecology, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
19
|
Li H, Zhan M. Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data. ACTA ACUST UNITED AC 2008; 24:1874-80. [PMID: 18586698 PMCID: PMC2519161 DOI: 10.1093/bioinformatics/btn332] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Motivation: Unraveling the transcriptional regulatory program mediated by transcription factors (TFs) is a fundamental objective of computational biology, yet still remains a challenge. Method: Here, we present a new methodology that integrates microarray and TF binding data for unraveling transcriptional regulatory networks. The algorithm is based on a two-stage constrained matrix decomposition model. The model takes into account the non-linear structure in gene expression data, particularly in the TF-target gene interactions and the combinatorial nature of gene regulation by TFs. The gene expression profile is modeled as a linear weighted combination of the activity profiles of a set of TFs. The TF activity profiles are deduced from the expression levels of TF target genes, instead directly from TFs themselves. The TF-target gene relationships are derived from ChIP-chip and other TF binding data. The proposed algorithm can not only identify transcriptional modules, but also reveal regulatory programs of which TFs control which target genes in which specific ways (either activating or inhibiting). Results: In comparison with other methods, our algorithm identifies biologically more meaningful transcriptional modules relating to specific TFs. We applied the new algorithm on yeast cell cycle and stress response data. While known transcriptional regulations were confirmed, novel TF-gene interactions were predicted and provide new insights into the regulatory mechanisms of the cell. Contact:zhanmi@mail.nih.gov Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huai Li
- Bioinformatics Unit, Branch of Research Resources, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | | |
Collapse
|
20
|
Cheng C, Li LM. Inferring microRNA activities by combining gene expression with microRNA target prediction. PLoS One 2008; 3:e1989. [PMID: 18431476 PMCID: PMC2291556 DOI: 10.1371/journal.pone.0001989] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2007] [Accepted: 03/01/2008] [Indexed: 11/30/2022] Open
Abstract
Background MicroRNAs (miRNAs) play crucial roles in a variety of biological processes via regulating expression of their target genes at the mRNA level. A number of computational approaches regarding miRNAs have been proposed, but most of them focus on miRNA gene finding or target predictions. Little computational work has been done to investigate the effective regulation of miRNAs. Methodology/Principal Findings We propose a method to infer the effective regulatory activities of miRNAs by integrating microarray expression data with miRNA target predictions. The method is based on the idea that regulatory activity changes of miRNAs could be reflected by the expression changes of their target transcripts measured by microarray. To validate this method, we apply it to the microarray data sets that measure gene expression changes in cell lines after transfection or inhibition of several specific miRNAs. The results indicate that our method can detect activity enhancement of the transfected miRNAs as well as activity reduction of the inhibited miRNAs with high sensitivity and specificity. Furthermore, we show that our inference is robust with respect to false positives of target prediction. Conclusions/Significance A huge amount of gene expression data sets are available in the literature, but miRNA regulation underlying these data sets is largely unknown. The method is easy to be implemented and can be used to investigate the miRNA effective regulation underlying the expression change profiles obtained from microarray experiments.
Collapse
Affiliation(s)
- Chao Cheng
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Lei M. Li
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
- Department of Mathematics, University of Southern California, Los Angeles, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|