1
|
Chen Y, Mao R, Xu J, Huang Y, Xu J, Cui S, Zhu Z, Ji X, Huang S, Huang Y, Huang HY, Yen SC, Lin YCD, Huang HD. A Causal Regulation Modeling Algorithm for Temporal Events with Application to Escherichia coli's Aerobic to Anaerobic Transition. Int J Mol Sci 2024; 25:5654. [PMID: 38891842 PMCID: PMC11171773 DOI: 10.3390/ijms25115654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 05/10/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024] Open
Abstract
Time-series experiments are crucial for understanding the transient and dynamic nature of biological phenomena. These experiments, leveraging advanced classification and clustering algorithms, allow for a deep dive into the cellular processes. However, while these approaches effectively identify patterns and trends within data, they often need to improve in elucidating the causal mechanisms behind these changes. Building on this foundation, our study introduces a novel algorithm for temporal causal signaling modeling, integrating established knowledge networks with sequential gene expression data to elucidate signal transduction pathways over time. Focusing on Escherichia coli's (E. coli) aerobic to anaerobic transition (AAT), this research marks a significant leap in understanding the organism's metabolic shifts. By applying our algorithm to a comprehensive E. coli regulatory network and a time-series microarray dataset, we constructed the cross-time point core signaling and regulatory processes of E. coli's AAT. Through gene expression analysis, we validated the primary regulatory interactions governing this process. We identified a novel regulatory scheme wherein environmentally responsive genes, soxR and oxyR, activate fur, modulating the nitrogen metabolism regulators fnr and nac. This regulatory cascade controls the stress regulators ompR and lrhA, ultimately affecting the cell motility gene flhD, unveiling a novel regulatory axis that elucidates the complex regulatory dynamics during the AAT process. Our approach, merging empirical data with prior knowledge, represents a significant advance in modeling cellular signaling processes, offering a deeper understanding of microbial physiology and its applications in biotechnology.
Collapse
Affiliation(s)
- Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Runbo Mao
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Jiatong Xu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Jingyi Xu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Shidong Cui
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Xiang Ji
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Shenghan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yanzhe Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Shih-Chung Yen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yang-Chi-Duang Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| |
Collapse
|
2
|
Ai D, Chen L, Xie J, Cheng L, Zhang F, Luan Y, Li Y, Hou S, Sun F, Xia LC. Identifying local associations in biological time series: algorithms, statistical significance, and applications. Brief Bioinform 2023; 24:bbad390. [PMID: 37930023 DOI: 10.1093/bib/bbad390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 11/07/2023] Open
Abstract
Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Lulu Chen
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Jiemin Xie
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Longwei Cheng
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Fang Zhang
- Shenwan Hongyuan Securities Co. Ltd., Shanghai 200031, China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Yang Li
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, California, 90007, USA
| | - Li Charlie Xia
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
3
|
Reagor CC, Velez-Angel N, Hudspeth AJ. Depicting pseudotime-lagged causality across single-cell trajectories for accurate gene-regulatory inference. PNAS NEXUS 2023; 2:pgad113. [PMID: 37113980 PMCID: PMC10129065 DOI: 10.1093/pnasnexus/pgad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/21/2023] [Accepted: 03/23/2023] [Indexed: 04/29/2023]
Abstract
Identifying the causal interactions in gene-regulatory networks requires an accurate understanding of the time-lagged relationships between transcription factors and their target genes. Here we describe DELAY (short for Depicting Lagged Causality), a convolutional neural network for the inference of gene-regulatory relationships across pseudotime-ordered single-cell trajectories. We show that combining supervised deep learning with joint probability matrices of pseudotime-lagged trajectories allows the network to overcome important limitations of ordinary Granger causality-based methods, for example, the inability to infer cyclic relationships such as feedback loops. Our network outperforms several common methods for inferring gene regulation and, when given partial ground-truth labels, predicts novel regulatory networks from single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) data sets. To validate this approach, we used DELAY to identify important genes and modules in the regulatory network of auditory hair cells, as well as likely DNA-binding partners for two hair cell cofactors (Hist1h1c and Ccnd1) and a novel binding sequence for the hair cell-specific transcription factor Fiz1. We provide an easy-to-use implementation of DELAY under an open-source license at https://github.com/calebclayreagor/DELAY.
Collapse
Affiliation(s)
| | - Nicolas Velez-Angel
- Howard Hughes Medical Institute and Laboratory of Sensory Neuroscience, The Rockefeller University, New York, NY 10065, USA
| | | |
Collapse
|
4
|
Lin Z, Ou-Yang L. Inferring gene regulatory networks from single-cell gene expression data via deep multi-view contrastive learning. Brief Bioinform 2023; 24:6965907. [PMID: 36585783 DOI: 10.1093/bib/bbac586] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 01/01/2023] Open
Abstract
The inference of gene regulatory networks (GRNs) is of great importance for understanding the complex regulatory mechanisms within cells. The emergence of single-cell RNA-sequencing (scRNA-seq) technologies enables the measure of gene expression levels for individual cells, which promotes the reconstruction of GRNs at single-cell resolution. However, existing network inference methods are mainly designed for data collected from a single data source, which ignores the information provided by multiple related data sources. In this paper, we propose a multi-view contrastive learning (DeepMCL) model to infer GRNs from scRNA-seq data collected from multiple data sources or time points. We first represent each gene pair as a set of histogram images, and then introduce a deep Siamese convolutional neural network with contrastive loss to learn the low-dimensional embedding for each gene pair. Moreover, an attention mechanism is introduced to integrate the embeddings extracted from different data sources and different neighbor gene pairs. Experimental results on synthetic and real-world datasets validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.
Collapse
Affiliation(s)
- Zerun Lin
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| |
Collapse
|
5
|
Shan A, Zhang F, Luan Y. Efficient Approximation of Statistical Significance in Local Trend Analysis of Dependent Time Series. Front Genet 2022; 13:729011. [PMID: 35559007 PMCID: PMC9086404 DOI: 10.3389/fgene.2022.729011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 03/01/2022] [Indexed: 11/13/2022] Open
Abstract
Biological time series data plays an important role in exploring the dynamic changes of biological systems, while the determinate patterns of association between various biological factors can further deepen the understanding of biological system functions and the interactions between them. At present, local trend analysis (LTA) has been commonly conducted in many biological fields, where the biological time series data can be the sequence at either the level of gene expression or OTU abundance, etc., A local trend score can be obtained by taking the similarity degree of the upward, constant or downward trend of time series data as an indicator of the correlation between different biological factors. However, a major limitation facing local trend analysis is that the permutation test conducted to calculate its statistical significance requires a time-consuming process. Therefore, the problem attracting much attention from bioinformatics scientists is to develop a method of evaluating the statistical significance of local trend scores quickly and effectively. In this paper, a new approach is proposed to evaluate the efficient approximation of statistical significance in the local trend analysis of dependent time series, and the effectiveness of the new method is demonstrated through simulation and real data set analysis.
Collapse
Affiliation(s)
- Ang Shan
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
- Postdoctoral Programme of Zhongtai Securities Co. Ltd, Jinan, China
| | - Fang Zhang
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Yihui Luan
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| |
Collapse
|
6
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
7
|
Yuan Y, Bar-Joseph Z. Deep learning of gene relationships from single cell time-course expression data. Brief Bioinform 2021; 22:6238595. [PMID: 33876191 DOI: 10.1093/bib/bbab142] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/22/2021] [Accepted: 03/25/2021] [Indexed: 11/12/2022] Open
Abstract
Time-course gene-expression data have been widely used to infer regulatory and signaling relationships between genes. Most of the widely used methods for such analysis were developed for bulk expression data. Single cell RNA-Seq (scRNA-Seq) data offer several advantages including the large number of expression profiles available and the ability to focus on individual cells rather than averages. However, the data also raise new computational challenges. Using a novel encoding for scRNA-Seq expression data, we develop deep learning methods for interaction prediction from time-course data. Our methods use a supervised framework which represents the data as 3D tensor and train convolutional and recurrent neural networks for predicting interactions. We tested our time-course deep learning (TDL) models on five different time-series scRNA-Seq datasets. As we show, TDL can accurately identify causal and regulatory gene-gene interactions and can also be used to assign new function to genes. TDL improves on prior methods for the above tasks and can be generally applied to new time-series scRNA-Seq data.
Collapse
Affiliation(s)
- Ye Yuan
- Department of Automation, Shanghai Jiao Tong University, USA
| | - Ziv Bar-Joseph
- FORE Systems Professor of Computational Biology and Machine Learning at CMU, USA
| |
Collapse
|
8
|
|
9
|
Zhang F, Sun F, Luan Y. Statistical significance approximation for local similarity analysis of dependent time series data. BMC Bioinformatics 2019; 20:53. [PMID: 30691412 PMCID: PMC6348690 DOI: 10.1186/s12859-019-2595-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 01/03/2019] [Indexed: 11/29/2022] Open
Abstract
Background Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. Results In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled. Conclusions Our methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors. Electronic supplementary material The online version of this article (10.1186/s12859-019-2595-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fang Zhang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Fengzhu Sun
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, 90089, CA, USA.,Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| |
Collapse
|
10
|
Wang YXR, Liu K, Theusch E, Rotter JI, Medina MW, Waterman MS, Huang H, Stegle O. Generalized correlation measure using count statistics for gene expression data with ordered samples. Bioinformatics 2018; 34:617-624. [PMID: 29040382 DOI: 10.1093/bioinformatics/btx641] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 10/11/2017] [Indexed: 12/22/2022] Open
Abstract
Motivation Capturing association patterns in gene expression levels under different conditions or time points is important for inferring gene regulatory interactions. In practice, temporal changes in gene expression may result in complex association patterns that require more sophisticated detection methods than simple correlation measures. For instance, the effect of regulation may lead to time-lagged associations and interactions local to a subset of samples. Furthermore, expression profiles of interest may not be aligned or directly comparable (e.g. gene expression profiles from two species). Results We propose a count statistic for measuring association between pairs of gene expression profiles consisting of ordered samples (e.g. time-course), where correlation may only exist locally in subsequences separated by a position shift. The statistic is simple and fast to compute, and we illustrate its use in two applications. In a cross-species comparison of developmental gene expression levels, we show our method not only measures association of gene expressions between the two species, but also provides alignment between different developmental stages. In the second application, we applied our statistic to expression profiles from two distinct phenotypic conditions, where the samples in each profile are ordered by the associated phenotypic values. The detected associations can be useful in building correspondence between gene association networks under different phenotypes. On the theoretical side, we provide asymptotic distributions of the statistic for different regions of the parameter space and test its power on simulated data. Availability and implementation The code used to perform the analysis is available as part of the Supplementary Material. Contact msw@usc.edu or hhuang@stat.berkeley.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia
| | - Ke Liu
- Department of Statistics, University of California, Berkeley, CA 94720, USA
| | - Elizabeth Theusch
- Children's Hospital Oakland Research Institute, Oakland, CA 94609, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Departments of Pediatrics and Medicine, LABioMed at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Marisa W Medina
- Children's Hospital Oakland Research Institute, Oakland, CA 94609, USA
| | - Michael S Waterman
- Molecular and Computational Biology, University of Southern California, CA 90089, USA
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
11
|
Zhang F, Shan A, Luan Y. A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series. Stat Appl Genet Mol Biol 2018; 17:/j/sagmb.ahead-of-print/sagmb-2018-0019/sagmb-2018-0019.xml. [PMID: 30447151 DOI: 10.1515/sagmb-2018-0019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.
Collapse
Affiliation(s)
- Fang Zhang
- School of Mathematics, Shandong University, Jinan, 250100, P.R. China
| | - Ang Shan
- School of Mathematics, Shandong University, Jinan, 250100, P.R. China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan, 250100, P.R. China
| |
Collapse
|
12
|
Morimoto S, Yahara K. Identification of stress responsive genes by studying specific relationships between mRNA and protein abundance. Heliyon 2018; 4:e00558. [PMID: 29560469 PMCID: PMC5857721 DOI: 10.1016/j.heliyon.2018.e00558] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 02/03/2018] [Accepted: 02/23/2018] [Indexed: 11/26/2022] Open
Abstract
Protein expression is regulated by the production and degradation of mRNAs and proteins but the specifics of their relationship are controversial. Although technological advances have enabled genome-wide and time-series surveys of mRNA and protein abundance, recent studies have shown paradoxical results, with most statistical analyses being limited to linear correlation, or analysis of variance applied separately to mRNA and protein datasets. Here, using recently analyzed genome-wide time-series data, we have developed a statistical analysis framework for identifying which types of genes or biological gene groups have significant correlation between mRNA and protein abundance after accounting for potential time delays. Our framework stratifies all genes in terms of the extent of time delay, conducts gene clustering in each stratum, and performs a non-parametric statistical test of the correlation between mRNA and protein abundance in a gene cluster. Consequently, we revealed stronger correlations than previously reported between mRNA and protein abundance in two metabolic pathways. Moreover, we identified a pair of stress responsive genes (ADC17 and KIN1) that showed a highly similar time series of mRNA and protein abundance. Furthermore, we confirmed robustness of the analysis framework by applying it to another genome-wide time-series data and identifying a cytoskeleton-related gene cluster (keratin 18, keratin 17, and mitotic spindle positioning) that shows similar correlation. The significant correlation and highly similar changes of mRNA and protein abundance suggests a concerted role of these genes in cellular stress response, which we consider provides an answer to the question of the specific relationships between mRNA and protein in a cell. In addition, our framework for studying the relationship between mRNAs and proteins in a cell will provide a basis for studying specific relationships between mRNA and protein abundance after accounting for potential time delays.
Collapse
Affiliation(s)
- Shimpei Morimoto
- Division of Biostatistics, Kurume University School of Medicine, Fukuoka, Japan
| | - Koji Yahara
- Antimicrobial Resistance Research Center, National Institute of Infectious Diseases, Tokyo, Japan
| |
Collapse
|
13
|
Kamisoglu K, Acevedo A, Almon RR, Coyle S, Corbett S, Dubois DC, Nguyen TT, Jusko WJ, Androulakis IP. Understanding Physiology in the Continuum: Integration of Information from Multiple - Omics Levels. Front Pharmacol 2017; 8:91. [PMID: 28289389 PMCID: PMC5327699 DOI: 10.3389/fphar.2017.00091] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 02/13/2017] [Indexed: 01/18/2023] Open
Abstract
In this paper, we discuss approaches for integrating biological information reflecting diverse physiologic levels. In particular, we explore statistical and model-based methods for integrating transcriptomic, proteomic and metabolomics data. Our case studies reflect responses to a systemic inflammatory stimulus and in response to an anti-inflammatory treatment. Our paper serves partly as a review of existing methods and partly as a means to demonstrate, using case studies related to human endotoxemia and response to methylprednisolone (MPL) treatment, how specific questions may require specific methods, thus emphasizing the non-uniqueness of the approaches. Finally, we explore novel ways for integrating -omics information with PKPD models, toward the development of more integrated pharmacology models.
Collapse
Affiliation(s)
- Kubra Kamisoglu
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, Buffalo NY, USA
| | - Alison Acevedo
- Department of Biomedical Engineering, Rutgers University, Piscataway NJ, USA
| | - Richard R Almon
- Department of Biological Sciences, University at Buffalo, Buffalo NY, USA
| | - Susette Coyle
- Department of Surgery, Rutgers Robert Wood Johnson Medical School, New Brunswick NJ, USA
| | - Siobhan Corbett
- Department of Surgery, Rutgers Robert Wood Johnson Medical School, New Brunswick NJ, USA
| | - Debra C Dubois
- Department of Biological Sciences, University at Buffalo, Buffalo NY, USA
| | - Tung T Nguyen
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway NJ, USA
| | - William J Jusko
- Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, Buffalo NY, USA
| | - Ioannis P Androulakis
- Department of Biomedical Engineering, Rutgers University, PiscatawayNJ, USA; Department of Chemical Engineering, Rutgers University, PiscatawayNJ, USA
| |
Collapse
|
14
|
Straube J, Huang BE, Cao KAL. DynOmics to identify delays and co-expression patterns across time course experiments. Sci Rep 2017; 7:40131. [PMID: 28065937 PMCID: PMC5220332 DOI: 10.1038/srep40131] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 12/02/2016] [Indexed: 12/16/2022] Open
Abstract
Dynamic changes in biological systems can be captured by measuring molecular expression from different levels (e.g., genes and proteins) across time. Integration of such data aims to identify molecules that show similar expression changes over time; such molecules may be co-regulated and thus involved in similar biological processes. Combining data sources presents a systematic approach to study molecular behaviour. It can compensate for missing data in one source, and can reduce false positives when multiple sources highlight the same pathways. However, integrative approaches must accommodate the challenges inherent in ‘omics’ data, including high-dimensionality, noise, and timing differences in expression. As current methods for identification of co-expression cannot cope with this level of complexity, we developed a novel algorithm called DynOmics. DynOmics is based on the fast Fourier transform, from which the difference in expression initiation between trajectories can be estimated. This delay can then be used to realign the trajectories and identify those which show a high degree of correlation. Through extensive simulations, we demonstrate that DynOmics is efficient and accurate compared to existing approaches. We consider two case studies highlighting its application, identifying regulatory relationships across ‘omics’ data within an organism and for comparative gene expression analysis across organisms.
Collapse
Affiliation(s)
- Jasmin Straube
- QFAB@QCIF Bioinformatics, Institute for Molecular Biosciences, The University of Queensland, Queensland Bioscience Precinct, St Lucia, QLD, Australia.,The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Bevan Emma Huang
- Janssen Research &Development, LLC, Discovery Sciences, Menlo Park, USA
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| |
Collapse
|
15
|
Ray SS, Misra S. A supervised weighted similarity measure for gene expressions using biological knowledge. Gene 2016; 595:150-160. [PMID: 27688070 DOI: 10.1016/j.gene.2016.09.033] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 08/18/2016] [Accepted: 09/22/2016] [Indexed: 11/17/2022]
Abstract
A supervised similarity measure for Saccharomyces cerevisiae gene expressions is developed which can capture the gene similarity when multiple types of experimental conditions like cell cycle, heat shock are available for all the genes. The measure is called Weighted Pearson correlation (WPC), where the weights are systematically determined for each type of experiment by maximizing the positive predictive value for gene pairs having Pearson correlation greater than 0.80. The positive predictive value is computed by using the annotation information available from yeast GO-Slim process annotations in Saccharomyces Genome Database (SGD). Genes are then clustered by k-medoid algorithm using the newly computed WPC, and functions of 135 unclassified genes are predicted with a p-value cutoff 10-5 using Munich Information for Protein Sequences (MIPS) annotations. Out of these genes, functional categories of 55 gene are predicted with p-value cutoff greater than 10-10 and reported in this investigation. The superiority of WPC as compared to some existing similarity measures like Pearson correlation and Euclidean distance is demonstrated using positive predictive (PPV) values of gene pairs for different Saccharomyces cerevisiae data sets. The related code is available at http://www.sampa.droppages.com/WPC.html.
Collapse
Affiliation(s)
- Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| | - Sampa Misra
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| |
Collapse
|
16
|
He FQ, Ollert M. Network-Guided Key Gene Discovery for a Given Cellular Process. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016. [PMID: 27783134 DOI: 10.1007/10_2016_39] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data and the following-up network analysis, opens up new avenues to predict key genes driving a given biological process or cellular function. Here we review and compare the current approaches in predicting key genes, which have no chances to stand out by classic differential expression analysis, from gene-regulatory, protein-protein interaction, or gene expression correlation networks. We elaborate these network-based approaches mainly in the context of immunology and infection, and urge more usage of correlation network-based predictions. Such network-based key gene discovery approaches driven by information-enriched 'omics' data should be very useful for systematic key gene discoveries for any given biochemical process or cellular function, and also valuable for novel drug target discovery and novel diagnostic, prognostic and therapeutic-efficiency marker prediction for a specific disease or disorder.
Collapse
Affiliation(s)
- Feng Q He
- Department of Infection and Immunity, Group of Immune Systems Biology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg.
| | - Markus Ollert
- Department of Infection and Immunity, Group of Allergy and Clinical Immunology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, 5000, Odense C, Denmark
| |
Collapse
|
17
|
Post-transcriptional and translational regulation modulates gene co-expression behavior in more synchronized pace to carry out molecular function in the cell. Gene 2016; 587:163-8. [PMID: 27150569 DOI: 10.1016/j.gene.2016.04.055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Revised: 04/16/2016] [Accepted: 04/28/2016] [Indexed: 11/23/2022]
Abstract
MOTIVATION Biological processes involve much complex interplay between cellular molecules at different molecular levels, and this interplay may exhibit various co-expression patterns explicitly representing the cellular inner regulation mechanism. Whereas, coexpression patterns cannot be necessarily conserved across the different molecular levels for complex regulation processes involved even after transcripts being produced. Investigation of co-expression propagation from transcript level to protein level will reflect inner regulation effects in function states of cells. RESULTS In this study, we perform a comparative analysis of gene coexpression patterns in Plasmodium falciparum. We investigate coexpression patterns propagation from transcript level to protein level to reveal the underlying biological meaning of post-transcriptional and translational mechanism. Our systems-level approach shows after posttranscriptional and translational regulation gene co-expression pace at protein level is mechanistically adjusted to higher synchronicity. Moreover, co-expression patterns at protein level are more linked to function categories, such as co-expression at the same time point is more related with binding categories, and co-expression delayed by several time points is more related with activity categories. Therefore, posttranscriptional and translational regulation modulates co-expression relationships between molecules for meeting the function demands.
Collapse
|
18
|
|
19
|
Xia LC, Ai D, Cram JA, Liang X, Fuhrman JA, Sun F. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains. BMC Bioinformatics 2015; 16:301. [PMID: 26390921 PMCID: PMC4578688 DOI: 10.1186/s12859-015-0732-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 09/05/2015] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. RESULTS By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. AVAILABILITY The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Collapse
Affiliation(s)
- Li C Xia
- Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, 94305-5151, CA, USA.,Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, 19104, PA, USA
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China
| | - Jacob A Cram
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, 90089-0371, CA, USA
| | - Xiaoyi Liang
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, 90089-0371, CA, USA
| | - Fengzhu Sun
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, 90089-2910, CA, USA. .,Centre for Computational Systems Biology, Fudan University, Shanghai, 200433, China.
| |
Collapse
|
20
|
Gao Q, Ostendorf E, Cruz JA, Jin R, Kramer DM, Chen J. Inter-functional analysis of high-throughput phenotype data by non-parametric clustering and its application to photosynthesis. Bioinformatics 2015; 32:67-76. [PMID: 26342101 DOI: 10.1093/bioinformatics/btv515] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 08/25/2015] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION Phenomics is the study of the properties and behaviors of organisms (i.e. their phenotypes) on a high-throughput scale. New computational tools are needed to analyze complex phenomics data, which consists of multiple traits/behaviors that interact with each other and are dependent on external factors, such as genotype and environmental conditions, in a way that has not been well studied. RESULTS We deployed an efficient framework for partitioning complex and high dimensional phenotype data into distinct functional groups. To achieve this, we represented measured phenotype data from each genotype as a cloud-of-points, and developed a novel non-parametric clustering algorithm to cluster all the genotypes. When compared with conventional clustering approaches, the new method is advantageous in that it makes no assumption about the parametric form of the underlying data distribution and is thus particularly suitable for phenotype data analysis. We demonstrated the utility of the new clustering technique by distinguishing novel phenotypic patterns in both synthetic data and a high-throughput plant photosynthetic phenotype dataset. We biologically verified the clustering results using four Arabidopsis chloroplast mutant lines. AVAILABILITY AND IMPLEMENTATION Software is available at www.msu.edu/~jinchen/NPM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT jinchen@msu.edu, kramerd8@cns.msu.edu or rongjin@cse.msu.edu.
Collapse
Affiliation(s)
- Qiaozi Gao
- Department of Computer Science and Engineering
| | | | | | - Rong Jin
- Department of Computer Science and Engineering
| | - David M Kramer
- Department of Energy Plant Research Laboratory and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Jin Chen
- Department of Computer Science and Engineering, Department of Energy Plant Research Laboratory and
| |
Collapse
|
21
|
Wu WS. A Computational Method for Identifying Yeast Cell Cycle Transcription Factors. Methods Mol Biol 2015; 1342:209-19. [PMID: 26254926 DOI: 10.1007/978-1-4939-2957-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
The eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Here, we describe a computational method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor-binding site (TFBS), and cell cycle gene expression data. For each identified cell cycle TF, our method also assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. Moreover, our method can identify novel cell cycle-regulated genes as a by-product.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, No. 1 Daxue Road, East District, Tainan City, 701, Taiwan,
| |
Collapse
|
22
|
Yu X, Gao H, Zheng X, Li C, Wang J. A computational method of predicting regulatory interactions in Arabidopsis based on gene expression data and sequence information. Comput Biol Chem 2014; 51:36-41. [DOI: 10.1016/j.compbiolchem.2014.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Revised: 04/16/2014] [Accepted: 04/27/2014] [Indexed: 10/25/2022]
|
23
|
Bean C, Verma NK, Yamamoto DL, Chemello F, Cenni V, Filomena MC, Chen J, Bang ML, Lanfranchi G. Ankrd2 is a modulator of NF-κB-mediated inflammatory responses during muscle differentiation. Cell Death Dis 2014; 5:e1002. [PMID: 24434510 PMCID: PMC4040671 DOI: 10.1038/cddis.2013.525] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Revised: 11/23/2013] [Accepted: 11/25/2013] [Indexed: 12/29/2022]
Abstract
Adaptive responses of skeletal muscle regulate the nuclear shuttling of the sarcomeric protein Ankrd2 that can transduce different stimuli into specific adaptations by interacting with both structural and regulatory proteins. In a genome-wide expression study on Ankrd2-knockout or -overexpressing primary proliferating or differentiating myoblasts, we found an inverse correlation between Ankrd2 levels and the expression of proinflammatory genes and identified Ankrd2 as a potent repressor of inflammatory responses through direct interaction with the NF-κB repressor subunit p50. In particular, we identified Gsk3β as a novel direct target of the p50/Ankrd2 repressosome dimer and found that the recruitment of p50 by Ankrd2 is dependent on Akt2-mediated phosphorylation of Ankrd2 upon oxidative stress during myogenic differentiation. Surprisingly, the absence of Ankrd2 in slow muscle negatively affected the expression of cytokines and key calcineurin-dependent genes associated with the slow-twitch muscle program. Thus, our findings support a model in which alterations in Ankrd2 protein and phosphorylation levels modulate the balance between physiological and pathological inflammatory responses in muscle.
Collapse
Affiliation(s)
- C Bean
- Department of Biology, Innovative Biotechnologies Interdepartmental Research Center, University of Padova, Padova, Italy
| | - N K Verma
- Department of Biology, Innovative Biotechnologies Interdepartmental Research Center, University of Padova, Padova, Italy
| | - D L Yamamoto
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - F Chemello
- Department of Biology, Innovative Biotechnologies Interdepartmental Research Center, University of Padova, Padova, Italy
| | - V Cenni
- Institute of Molecular Genetics, National Research Council, Bologna Unit/IOR, Bologna, Italy
| | - M C Filomena
- 1] Department of Translational Medicine, University of Milan, Milan, Italy [2] Humanitas Clinical and Research Center, Rozzano, Milan, Italy
| | - J Chen
- University of California, San Diego School of Medicine, La Jolla, CA, USA
| | - M L Bang
- 1] Humanitas Clinical and Research Center, Rozzano, Milan, Italy [2] Milan Unit, Institute of Genetic and Biomedical Research, National Research Council, Milan, Italy
| | - G Lanfranchi
- Department of Biology, Innovative Biotechnologies Interdepartmental Research Center, University of Padova, Padova, Italy
| |
Collapse
|
24
|
Abstract
As more and more systems biology approaches are used to investigate the different types of biological macromolecules, increasing numbers of whole genomic studies are now available for a large array of organisms. Whether it is genomics, transcriptomics, proteomics, interactomics or metabolomics, the full complement of genomic information on all different levels can be juxtaposed between different organisms to reveal similarities or differences, and even to provide consensus models. At the intersection of comparative genomics and systems biology lies great possibility for discovery, analysis and prediction. This paper explores this nexus and the relationship from four general levels: DNA, RNA, protein and extragenomic. For each level, we provide an overview of the methods, discuss the potential challenges and survey the current research. Finally, we suggest some organizing principles and make proposals for new areas that will be important for future research.
Collapse
Affiliation(s)
- Jimmy Lin
- Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | | |
Collapse
|
25
|
Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA. Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief Bioinform 2013; 15:195-211. [PMID: 23698722 PMCID: PMC3956069 DOI: 10.1093/bib/bbt034] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated and experimental expression data. The results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.
Collapse
Affiliation(s)
- Stefan R Maetschke
- Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, Brisbane, QLD 4072, Australia, Tel.: 61 7 3346 2616; Fax: 61 7 3346 2101;
| | | | | | | |
Collapse
|
26
|
Das J, Vo TV, Wei X, Mellor JC, Tong V, Degatano AG, Wang X, Wang L, Cordero NA, Kruer-Zerhusen N, Matsuyama A, Pleiss JA, Lipkin SM, Yoshida M, Roth FP, Yu H. Cross-species protein interactome mapping reveals species-specific wiring of stress response pathways. Sci Signal 2013; 6:ra38. [PMID: 23695164 DOI: 10.1126/scisignal.2003350] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The fission yeast Schizosaccharomyces pombe has more metazoan-like features than the budding yeast Saccharomyces cerevisiae, yet it has similarly facile genetics. We present a large-scale verified binary protein-protein interactome network, "StressNet," based on high-throughput yeast two-hybrid screens of interacting proteins classified as part of stress response and signal transduction pathways in S. pombe. We performed systematic, cross-species interactome mapping using StressNet and a protein interactome network of orthologous proteins in S. cerevisiae. With cross-species comparative network studies, we detected a previously unidentified component (Snr1) of the S. pombe mitogen-activated protein kinase Sty1 pathway. Coimmunoprecipitation experiments showed that Snr1 interacted with Sty1 and that deletion of snr1 increased the sensitivity of S. pombe cells to stress. Comparison of StressNet with the interactome network of orthologous proteins in S. cerevisiae showed that most of the interactions among these stress response and signaling proteins are not conserved between species but are "rewired"; orthologous proteins have different binding partners in both species. In particular, transient interactions connecting proteins in different functional modules were more likely to be rewired than conserved. By directly testing interactions between proteins in one yeast species and their corresponding binding partners in the other yeast species with yeast two-hybrid assays, we found that about half of the interactions that are traditionally considered "conserved" form modified interaction interfaces that may potentially accommodate novel functions.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Tommy V Vo
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Medicine, Weill Cornell College of Medicine, New York, NY 10021, USA
| | - Joseph C Mellor
- Donnelly Centre, University of Toronto, Toronto, ON M5S-3E1, Canada
| | - Virginia Tong
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Andrew G Degatano
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Xiujuan Wang
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Lihua Wang
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Nicolas A Cordero
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| | - Nathan Kruer-Zerhusen
- Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Akihisa Matsuyama
- Chemical Genetics Laboratory, RIKEN Advanced Science Institute, Wako, Saitama 351-0198, Japan.,CREST Research Project, JST, Kawaguchi, Saitama 332-0012, Japan
| | - Jeffrey A Pleiss
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Steven M Lipkin
- Department of Medicine, Weill Cornell College of Medicine, New York, NY 10021, USA
| | - Minoru Yoshida
- Chemical Genetics Laboratory, RIKEN Advanced Science Institute, Wako, Saitama 351-0198, Japan.,CREST Research Project, JST, Kawaguchi, Saitama 332-0012, Japan.,Department of Biotechnology, Graduate School of Agriculture and Life Sciences, University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON M5S-3E1, Canada.,Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON M5S-3E1, Canada.,Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02115.,Harvard Medical School, Boston, MA 02115.,Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto, ON M5G-1X5, Canada.,Genetic Networks Program, Canadian Institute for Advanced Research, Toronto, ON M5G-1Z8, Canada
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
27
|
Lehmann R, Machné R, Georg J, Benary M, Axmann I, Steuer R. How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis. BMC Bioinformatics 2013; 14:133. [PMID: 23601192 PMCID: PMC3679775 DOI: 10.1186/1471-2105-14-133] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 03/18/2013] [Indexed: 11/24/2022] Open
Abstract
Background The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns, reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focussed on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases. Results A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DTF transformation are favorable for the clustering of time series in contrast to the 12 m transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT transformation to derive the diurnal biological program of Synechocystis sp.. Conclusion Application of quantile normalization, median polishing, and also cyclic LOESS normalization of the presented cyanobacterial dataset lead to increased numbers of oscillating genes and the systematic shift of the expression phase. The LOS normalization minimizes the observed detrimental effects. As previous analyses employed a variety of different normalization methods, a direct comparison of results must be treated with caution.
Collapse
Affiliation(s)
- Robert Lehmann
- Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstraße 43, D-10115 Berlin, Germany.
| | | | | | | | | | | |
Collapse
|
28
|
He F, Chen H, Probst-Kepper M, Geffers R, Eifes S, Del Sol A, Schughart K, Zeng AP, Balling R. PLAU inferred from a correlation network is critical for suppressor function of regulatory T cells. Mol Syst Biol 2013; 8:624. [PMID: 23169000 PMCID: PMC3531908 DOI: 10.1038/msb.2012.56] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Accepted: 10/05/2012] [Indexed: 02/07/2023] Open
Abstract
Human FOXP3(+)CD25(+)CD4(+) regulatory T cells (Tregs) are essential to the maintenance of immune homeostasis. Several genes are known to be important for murine Tregs, but for human Tregs the genes and underlying molecular networks controlling the suppressor function still largely remain unclear. Here, we describe a strategy to identify the key genes directly from an undirected correlation network which we reconstruct from a very high time-resolution (HTR) transcriptome during the activation of human Tregs/CD4(+) T-effector cells. We show that a predicted top-ranked new key gene PLAU (the plasminogen activator urokinase) is important for the suppressor function of both human and murine Tregs. Further analysis unveils that PLAU is particularly important for memory Tregs and that PLAU mediates Treg suppressor function via STAT5 and ERK signaling pathways. Our study demonstrates the potential for identifying novel key genes for complex dynamic biological processes using a network strategy based on HTR data, and reveals a critical role for PLAU in Treg suppressor function.
Collapse
Affiliation(s)
- Feng He
- Department of Infection Genetics, Helmholtz Centre for Infection Research (HZI), University of Veterinary Medicine Hannover, Braunschweig, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Xia LC, Ai D, Cram J, Fuhrman JA, Sun F. Efficient statistical significance approximation for local similarity analysis of high-throughput time series data. ACTA ACUST UNITED AC 2012. [PMID: 23178636 DOI: 10.1093/bioinformatics/bts668] [Citation(s) in RCA: 88] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation. RESULTS We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples. AVAILABILITY The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: http://meta.usc.edu/softs/lsa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT fsun@usc.edu.
Collapse
Affiliation(s)
- Li C Xia
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA
| | | | | | | | | |
Collapse
|
30
|
An L, Doerge RW. Dynamic clustering of gene expression. ISRN BIOINFORMATICS 2012; 2012:537217. [PMID: 25969750 PMCID: PMC4393063 DOI: 10.5402/2012/537217] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 08/05/2012] [Indexed: 11/23/2022]
Abstract
It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest.
Collapse
Affiliation(s)
- Lingling An
- Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
| | - R. W. Doerge
- Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
31
|
Diversity in genetic in vivo methods for protein-protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system. Microbiol Mol Biol Rev 2012; 76:331-82. [PMID: 22688816 DOI: 10.1128/mmbr.05021-11] [Citation(s) in RCA: 135] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The yeast two-hybrid system pioneered the field of in vivo protein-protein interaction methods and undisputedly gave rise to a palette of ingenious techniques that are constantly pushing further the limits of the original method. Sensitivity and selectivity have improved because of various technical tricks and experimental designs. Here we present an exhaustive overview of the genetic approaches available to study in vivo binary protein interactions, based on two-hybrid and protein fragment complementation assays. These methods have been engineered and employed successfully in microorganisms such as Saccharomyces cerevisiae and Escherichia coli, but also in higher eukaryotes. From single binary pairwise interactions to whole-genome interactome mapping, the self-reassembly concept has been employed widely. Innovative studies report the use of proteins such as ubiquitin, dihydrofolate reductase, and adenylate cyclase as reconstituted reporters. Protein fragment complementation assays have extended the possibilities in protein-protein interaction studies, with technologies that enable spatial and temporal analyses of protein complexes. In addition, one-hybrid and three-hybrid systems have broadened the types of interactions that can be studied and the findings that can be obtained. Applications of these technologies are discussed, together with the advantages and limitations of the available assays.
Collapse
|
32
|
Ma C, Wang X. Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis. PLANT PHYSIOLOGY 2012; 160:192-203. [PMID: 22797655 PMCID: PMC3440197 DOI: 10.1104/pp.112.201962] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 07/09/2012] [Indexed: 05/18/2023]
Abstract
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.
Collapse
|
33
|
Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 2012; 13:552-64. [PMID: 22805708 DOI: 10.1038/nrg3244] [Citation(s) in RCA: 291] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Biological processes are often dynamic, thus researchers must monitor their activity at multiple time points. The most abundant source of information regarding such dynamic activity is time-series gene expression data. These data are used to identify the complete set of activated genes in a biological process, to infer their rates of change, their order and their causal effects and to model dynamic systems in the cell. In this Review we discuss the basic patterns that have been observed in time-series experiments, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.
Collapse
Affiliation(s)
- Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | | | |
Collapse
|
34
|
Das J, Mohammed J, Yu H. Genome-scale analysis of interaction dynamics reveals organization of biological networks. ACTA ACUST UNITED AC 2012; 28:1873-8. [PMID: 22576179 DOI: 10.1093/bioinformatics/bts283] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Analyzing large-scale interaction networks has generated numerous insights in systems biology. However, such studies have primarily been focused on highly co-expressed, stable interactions. Most transient interactions that carry out equally important functions, especially in signal transduction pathways, are yet to be elucidated and are often wrongly discarded as false positives. Here, we revisit a previously described Smith-Waterman-like dynamic programming algorithm and use it to distinguish stable and transient interactions on a genomic scale in human and yeast. We find that in biological networks, transient interactions are key links topologically connecting tightly regulated functional modules formed by stable interactions and are essential to maintaining the integrity of cellular networks. We also perform a systematic analysis of interaction dynamics across different technologies and find that high-throughput yeast two-hybrid is the only available technology for detecting transient interactions on a large scale.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
35
|
Freeman K, Staehle MM, Gümüş ZH, Vadigepalli R, Gonye GE, Nichols CN, Ogunnaike BA, Hoek JB, Schwaber JS. Rapid temporal changes in the expression of a set of neuromodulatory genes during alcohol withdrawal in the dorsal vagal complex: molecular evidence of homeostatic disturbance. Alcohol Clin Exp Res 2012; 36:1688-700. [PMID: 22486438 DOI: 10.1111/j.1530-0277.2012.01791.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 01/31/2012] [Indexed: 11/28/2022]
Abstract
BACKGROUND Chronic alcohol exposure produces neuroadaptation, which increases the risk of cellular excitotoxicity and autonomic dysfunction during withdrawal. The temporal progression and regulation of the gene expression that contributes to this physiologic and behavioral phenotype is poorly understood early in the withdrawal period. Further, it is unexplored in the dorsal vagal complex (DVC), a brainstem autonomic regulatory structure. METHODS We use a quantitative polymerase chain reaction platform to precisely and simultaneously measure the expression of 145 neuromodulatory genes in more than 100 rat DVC samples from control, chronically alcohol-exposed, and withdrawn rats. To gain insight into the dynamic progression and regulation of withdrawal, we focus on the expression of a subset of functionally relevant genes during the first 48 hours, when behavioral symptoms are most severe. RESULTS In the DVC, expression of this gene subset is essentially normal in chronically alcohol-exposed rats. However, withdrawal results in rapid, large-magnitude expression changes in this group. We observed differential regulation in 86 of the 145 genes measured (59%), some as early as 4 hours into withdrawal. Time series measurements (4, 8, 18, 32, and 48 hours after alcohol removal) revealed dynamic expression responses in immediate early genes, γ-aminobutyric acid type A, ionotropic glutamate, and G-protein coupled receptors and the Ras/Raf signaling pathway. Together, these changes elucidate a complex, temporally coordinated response that involves correlated expression of many functionally related groups. In particular, the expression patterns of Gabra1, Grin2a, Grin3a, and Grik3 were tightly correlated. These receptor subunits share overrepresented transcription factor binding sites for Pax-8 and other transcription factors, suggesting a common regulatory mechanism and a role for these transcription factors in the regulation of neurotransmission within the first 48 hours of alcohol withdrawal. CONCLUSIONS Expression in this gene set is essentially normal in the alcohol-adapted DVC, but withdrawal results in immediate, large-magnitude, and dynamic changes. These data support both increased research focus on the biological ramifications of alcohol withdrawal and enable novel insights into the dynamic withdrawal expression response in this understudied homeostatic control center.
Collapse
Affiliation(s)
- Kate Freeman
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics and Computational Biology, Thomas Jefferson University Philadelphia, Philadelphia, PA 19107, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Yang Q, Mattick JSA, Orman MA, Nguyen TT, Ierapetritou MG, Berthiaume F, Androulakis IP. Dynamics of hepatic gene expression profile in a rat cecal ligation and puncture model. J Surg Res 2011; 176:583-600. [PMID: 22381171 DOI: 10.1016/j.jss.2011.11.1031] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Revised: 10/10/2011] [Accepted: 11/23/2011] [Indexed: 12/24/2022]
Abstract
BACKGROUND Sepsis remains a major clinical challenge in intensive care units. The difficulty in developing new and more effective treatments for sepsis exemplifies our incomplete understanding of the underlying pathophysiology of it. One of the more widely used rodent models for studying polymicrobial sepsis is cecal ligation and puncture (CLP). While a number of CLP studies investigated the ensuing systemic inflammatory response, they usually focus on a single time point post-CLP and therefore fail to describe the dynamics of the response. Furthermore, previous studies mostly use surgery without infection (herein referred to as sham CLP, SCLP) as a control for the CLP model, however, SCLP represents an aseptic injurious event that also stimulates a systemic inflammatory response. Thus, there is a need to better understand the dynamics and expression patterns of both injury- and sepsis-induced gene expression alterations to identify potential regulatory targets. In this direction, we characterized the response of the liver within the first 24 h in a rat model of SCLP and CLP using a time series of microarray gene expression data. METHODS Rats were randomly divided into three groups: sham, SCLP, and CLP. Rats in SCLP group are subjected to laparotomy, cecal ligation, and puncture while those in CLP group are subjected to the similar procedures without cecal ligation and puncture. Animals were saline resuscitated and sacrificed at defined time points (0, 2, 4, 8, 16, and 24 h). Liver tissues were explanted and analyzed for their gene expression profiles using microarray technology. Unoperated animals (Sham) serve as negative controls. After identifying differentially expressed probesets between sham and SCLP or CLP conditions over time, the concatenated data sets corresponding to these differentially expressed probesets in sham and SCLP or CLP groups were combined and analyzed using a "consensus clustering" approach. Promoters of genes that share common characteristics were extracted and compared with gene batteries comprised of co-expressed genes to identify putatative transcription factors, which could be responsible for the co-regulation of those genes. RESULTS The SCLP/CLP genes whose expression patterns significantly changed compared with sham over time were identified, clustered, and finally analyzed for pathway enrichment. Our results indicate that both CLP and SCLP triggered the activation of a proinflammatory response, enhanced synthesis of acute-phase proteins, increased metabolism, and tissue damage markers. Genes triggered by CLP, which can be directly linked to bacteria removal functions, were absent in SCLP injury. In addition, genes relevant to oxidative stress induced damage were unique to CLP injury, which may be due to the increased severity of CLP injury versus SCLP injury. Pathway enrichment identified pathways with similar functionality but different dynamics in the two injury models, indicating that the functions controlled by those pathways are under the influence of different transcription factors and regulatory mechanisms. Putatively identified transcription factors, notably including cAMP response element-binding (CREB), nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB), and signal transducer and activator of transcription (STAT), were obtained through analysis of the promoter regions in the SCLP/CLP genes. Our results show that while transcription factors such as NF-κB, homeodomain transcription factor (HOMF), and GATA transcription factor (GATA) were common in both injuries for the IL-6 signaling pathway, there were many other transcription factors associated with that pathway which were unique to CLP, including forkhead (FKHD), hairy/enhancer of split family (HESF), and interferon regulatory factor family (IRFF). There were 17 transcription factors that were identified as important in at least two pathways in the CLP injury, but only seven transcription factors with that property in the SCLP injury. This also supports the hypothesis of unique regulatory modules that govern the pathways present in both the CLP and SCLP response. CONCLUSIONS By using microarrays to assess multiple genes in a high throughput manner, we demonstrate that an inflammatory response involving different dynamics and different genes is triggered by SCLP and CLP. From our analysis of the CLP data, the key characteristics of sepsis are a proinflammatory response, which drives hypermetabolism, immune cell activation, and damage from oxidative stress. This contrasts with SCLP, which triggers a modified inflammatory response leading to no immune cell activation, decreased detoxification potential, and hyper metabolism. Many of the identified transcription factors that drive the CLP-induced response are not found in the SCLP group, suggesting that SCLP and CLP induce different types of inflammatory responses via different regulatory pathways.
Collapse
Affiliation(s)
- Qian Yang
- Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, New Jersey 08854, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
Hallinan JS, James K, Wipat A. Network approaches to the functional analysis of microbial proteins. Adv Microb Physiol 2011; 59:101-33. [PMID: 22114841 DOI: 10.1016/b978-0-12-387661-4.00005-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Large amounts of detailed biological data have been generated over the past few decades. Much of these data is freely available in over 1000 online databases; an enticing, but frustrating resource for microbiologists interested in a systems-level view of the structure and function of microbial cells. The frustration engendered by the need to trawl manually through hundreds of databases in order to accumulate information about a gene, protein, pathway, or organism of interest can be alleviated by the use of computational data integration to generated network views of the system of interest. Biological networks can be constructed from a single type of data, such as protein-protein binding information, or from data generated by multiple experimental approaches. In an integrated network, nodes usually represent genes or gene products, while edges represent some form of interaction between the nodes. Edges between nodes may be weighted to represent the probability that the edge exists in vivo. Networks may also be enriched with ontological annotations, facilitating both visual browsing and computational analysis via web service interfaces. In this review, we describe the construction, analysis of both single-data source and integrated networks, and their application to the inference of protein function in microbes.
Collapse
Affiliation(s)
- J S Hallinan
- School of Computing Science, Newcastle University, Newcastle, UK
| | | | | |
Collapse
|
38
|
Yang Q, Orman MA, Berthiaume F, Ierapetritou MG, Androulakis IP. Dynamics of short-term gene expression profiling in liver following thermal injury. J Surg Res 2011; 176:549-58. [PMID: 22099593 DOI: 10.1016/j.jss.2011.09.052] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 09/23/2011] [Accepted: 09/27/2011] [Indexed: 02/01/2023]
Abstract
BACKGROUND Severe trauma, including burns, triggers a systemic response that significantly impacts on the liver, which plays a key role in the metabolic and immune responses aimed at restoring homeostasis. While many of these changes are likely regulated at the gene expression level, there is a need to better understand the dynamics and expression patterns of burn injury-induced genes in order to identify potential regulatory targets in the liver. Herein we characterized the response within the first 24 h in a standard animal model of burn injury using a time series of microarray gene expression data. METHODS Rats were subjected to a full thickness dorsal scald burn injury covering 20% of their total body surface area while under general anesthesia. Animals were saline resuscitated and sacrificed at defined time points (0, 2, 4, 8, 16, and 24 h). Liver tissues were explanted and analyzed for their gene expression profiles using microarray technology. Sham controls consisted of animals handled similarly but not burned. After identifying differentially expressed probe sets between sham and burn conditions over time, the concatenated data sets corresponding to these differentially expressed probe sets in burn and sham groups were combined and analyzed using a "consensus clustering" approach. RESULTS The clustering method of expression data identified 621 burn-responsive probe sets in four different co-expressed clusters. Functional characterization revealed that these four clusters are mainly associated with pro-inflammatory response, anti-inflammatory response, lipid biosynthesis, and insulin-regulated metabolism. Cluster 1 pro-inflammatory response is rapidly up-regulated (within the first 2 h) following burn injury, while Cluster 2 anti-inflammatory response is activated later on (around 8 h post-burn). Cluster 3 lipid biosynthesis is down-regulated rapidly following burn, possibly indicating a shift in the utilization of energy sources to produce acute phase proteins, which serve the anti-inflammatory response. Cluster 4 insulin-regulated metabolism was down-regulated late in the observation window (around 16 h post-burn), which suggests a potential mechanism to explain the onset of hypermetabolism, a delayed but well-known response that is characteristic of severe burns and trauma with potential adverse outcome. CONCLUSIONS Simultaneous analysis and comparison of gene expression profiles for both burn and sham control groups provided a more accurate estimation of the activation time, expression patterns, and characteristics of a certain burn-induced response based on which the cause-effect relationships among responses were revealed.
Collapse
Affiliation(s)
- Qian Yang
- Chemical and Biochemical Engineering Department, Rutgers, the State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | | | | | | | |
Collapse
|
39
|
Minas C, Waddell SJ, Montana G. Distance-based differential analysis of gene curves. Bioinformatics 2011; 27:3135-41. [PMID: 21984759 DOI: 10.1093/bioinformatics/btr528] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Time course gene expression experiments are performed to study time-varying changes in mRNA levels of thousands of genes. Statistical methods from functional data analysis (FDA) have recently gained popularity for modelling and exploring such time courses. Each temporal profile is treated as the realization of a smooth function of time, or curve, and the inferred curve becomes the basic unit of statistical analysis. The task of identifying genes with differential temporal profiles then consists of detecting statistically significant differences between curves, where such differences are commonly quantified by computing the area between the curves or the l₂ distance. RESULTS We propose a general test statistic for detecting differences between gene curves, which only depends on a suitably chosen distance measure between them. The test makes use of a distance-based variance decomposition and generalizes traditional MANOVA tests commonly used for vectorial observations. We also introduce the visual l₂ distance, which is shown to capture shape-related differences in gene curves and is robust against time shifts, which would otherwise inflate the traditional l₂ distance. Other shape-related distances, such as the curvature, may carry biological significance. We have assessed the comparative performance of the test on realistically simulated datasets and applied it to human immune cell responses to bacterial infection over time.
Collapse
Affiliation(s)
- Christopher Minas
- Statistics Section, Department of Mathematics, Imperial College London, London, UK
| | | | | |
Collapse
|
40
|
Sequence‐Based Fungal Identification and Classification. Mol Microbiol 2011. [DOI: 10.1128/9781555816834.ch43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
41
|
Berg A, Li N, Tong C, Wang Z, Berceli SA, Wu R. Functional mapping of expression quantitative trait loci that regulate oscillatory gene expression. Methods Mol Biol 2011; 734:241-255. [PMID: 21468993 DOI: 10.1007/978-1-61779-086-7_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Genetic networks underlying many biological processes, such as vertebrate somitogenesis, cell cycle, hormonal signaling, and circadian rhythms, are characterized by oscillations in gene expression. It has been recognized that the frequency and amplitude of gene expression oscillations vary among individuals and can be controlled by specific expression quantitative trait loci (eQTLs). In this chapter, we develop a dynamic model for mapping and identifying such eQTLs by integrating mathematical aspects of oscillatory dynamics into the functional mapping framework. The model can determine whether and how eQTLs regulate individual genes' activation kinetics and expression dynamics by estimating and testing Fourier series parameters for different eQTL genotypes. We incorporate a general autoregressive moving-average process of order (r,s), the so-called ARMA(r,s), to model the covariance structure for gene expression profiles measured in time course, broadening the applicability of the new dynamic model to mapping eQTLs in practice. The expectation-maximization algorithm (EM algorithm) was derived to estimate all parameters modeling the mean-covariance structures within a mixture model setting. Simulation studies were performed to investigate the statistical behavior of the model. The model will provide a powerful statistical tool for mapping eQTLs and their epistatic interactions that regulate oscillations in gene expression, helping to construct a regulatory genetic network for those periodic biological phenomena.
Collapse
Affiliation(s)
- Arthur Berg
- Center for Statistical Genetics, Pennsylvania State University, Hershey, PA, USA
| | | | | | | | | | | |
Collapse
|
42
|
Ye C, Liu Y, Zhang X. Observations on shifted cumulative regulation. Genome Biol 2011; 12:404. [PMID: 21554749 PMCID: PMC3218858 DOI: 10.1186/gb-2011-12-4-404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
43
|
Gao S, Hartman JL, Carter JL, Hessner MJ, Wang X. Global analysis of phase locking in gene expression during cell cycle: the potential in network modeling. BMC SYSTEMS BIOLOGY 2010; 4:167. [PMID: 21129191 PMCID: PMC3017040 DOI: 10.1186/1752-0509-4-167] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2010] [Accepted: 12/03/2010] [Indexed: 11/10/2022]
Abstract
Background In nonlinear dynamic systems, synchrony through oscillation and frequency modulation is a general control strategy to coordinate multiple modules in response to external signals. Conversely, the synchrony information can be utilized to infer interaction. Increasing evidence suggests that frequency modulation is also common in transcription regulation. Results In this study, we investigate the potential of phase locking analysis, a technique to study the synchrony patterns, in the transcription network modeling of time course gene expression data. Using the yeast cell cycle data, we show that significant phase locking exists between transcription factors and their targets, between gene pairs with prior evidence of physical or genetic interactions, and among cell cycle genes. When compared with simple correlation we found that the phase locking metric can identify gene pairs that interact with each other more efficiently. In addition, it can automatically address issues of arbitrary time lags or different dynamic time scales in different genes, without the need for alignment. Interestingly, many of the phase locked gene pairs exhibit higher order than 1:1 locking, and significant phase lags with respect to each other. Based on these findings we propose a new phase locking metric for network reconstruction using time course gene expression data. We show that it is efficient at identifying network modules of focused biological themes that are important to cell cycle regulation. Conclusions Our result demonstrates the potential of phase locking analysis in transcription network modeling. It also suggests the importance of understanding the dynamics underlying the gene expression patterns.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | | | |
Collapse
|
44
|
Song JZ, Duan KM, Ware T, Surette M. The wavelet-based cluster analysis for temporal gene expression data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:39382. [PMID: 17713589 PMCID: PMC3171337 DOI: 10.1155/2007/39382] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2005] [Revised: 10/01/2006] [Accepted: 03/04/2007] [Indexed: 11/17/2022]
Abstract
A variety of high-throughput methods have made it possible to generate detailed temporal expression data for a single gene or large numbers of genes. Common methods for analysis of these large data sets can be problematic. One challenge is the comparison of temporal expression data obtained from different growth conditions where the patterns of expression may be shifted in time. We propose the use of wavelet analysis to transform the data obtained under different growth conditions to permit comparison of expression patterns from experiments that have time shifts or delays. We demonstrate this approach using detailed temporal data for a single bacterial gene obtained under 72 different growth conditions. This general strategy can be applied in the analysis of data sets of thousands of genes under different conditions.
Collapse
Affiliation(s)
- JZ Song
- Department of Animal and Avian Science, 2413 Animal Science Center, University of Maryland, College Park, MD 20742, USA
| | - KM Duan
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - T Ware
- Department of Mathematics, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - M Surette
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
45
|
Orlando DA, Brady SM, Fink TMA, Benfey PN, Ahnert SE. Detecting separate time scales in genetic expression data. BMC Genomics 2010; 11:381. [PMID: 20565716 PMCID: PMC3017766 DOI: 10.1186/1471-2164-11-381] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Accepted: 06/16/2010] [Indexed: 01/11/2023] Open
Abstract
Background Biological processes occur on a vast range of time scales, and many of them occur concurrently. As a result, system-wide measurements of gene expression have the potential to capture many of these processes simultaneously. The challenge however, is to separate these processes and time scales in the data. In many cases the number of processes and their time scales is unknown. This issue is particularly relevant to developmental biologists, who are interested in processes such as growth, segmentation and differentiation, which can all take place simultaneously, but on different time scales. Results We introduce a flexible and statistically rigorous method for detecting different time scales in time-series gene expression data, by identifying expression patterns that are temporally shifted between replicate datasets. We apply our approach to a Saccharomyces cerevisiae cell-cycle dataset and an Arabidopsis thaliana root developmental dataset. In both datasets our method successfully detects processes operating on several different time scales. Furthermore we show that many of these time scales can be associated with particular biological functions. Conclusions The spatiotemporal modules identified by our method suggest the presence of multiple biological processes, acting at distinct time scales in both the Arabidopsis root and yeast. Using similar large-scale expression datasets, the identification of biological processes acting at multiple time scales in many organisms is now possible.
Collapse
Affiliation(s)
- David A Orlando
- Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, NC, USA
| | | | | | | | | |
Collapse
|
46
|
Global screening of potential Candida albicans biofilm-related transcription factors via network comparison. BMC Bioinformatics 2010; 11:53. [PMID: 20102611 PMCID: PMC2842261 DOI: 10.1186/1471-2105-11-53] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Accepted: 01/26/2010] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Candida albicans is a commonly encountered fungal pathogen in humans. The formation of biofilm is a major virulence factor in C. albicans pathogenesis and is related to antidrug resistance of this organism. Although many factors affecting biofilm have been analyzed, molecular mechanisms that regulate biofilm formation still await to be elucidated. RESULTS In this study, from the gene regulatory network perspective, we developed an efficient computational framework, which integrates different kinds of data from genome-scale analysis, for global screening of potential transcription factors (TFs) controlling C. albicans biofilm formation. S. cerevisiae information and ortholog data were used to infer the possible TF-gene regulatory associations in C. albicans. Based on TF-gene regulatory associations and gene expression profiles, a stochastic dynamic model was employed to reconstruct the gene regulatory networks of C. albicans biofilm and planktonic cells. The two networks were then compared and a score of relevance value (RV) was proposed to determine and assign the quantity of correlation of each potential TF with biofilm formation. A total of twenty-three TFs are identified to be related to the biofilm formation; ten of them are previously reported by literature evidences. CONCLUSIONS The results indicate that the proposed screening method can successfully identify most known biofilm-related TFs and also identify many others that have not been previously reported. Together, this method can be employed as a pre-experiment screening approach that reveals new target genes for further characterization to understand the regulatory mechanisms in biofilm formation, which can serve as the starting point for therapeutic intervention of C. albicans infections.
Collapse
|
47
|
Unraveling complex temporal associations in cellular systems across multiple time-series microarray datasets. J Biomed Inform 2010; 43:550-9. [PMID: 20083231 DOI: 10.1016/j.jbi.2009.12.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2009] [Revised: 11/26/2009] [Accepted: 12/30/2009] [Indexed: 10/20/2022]
Abstract
Unraveling the temporal complexity of cellular systems is a challenging task, as the subtle coordination of molecular activities cannot be adequately captured by simple mathematical concepts such as correlation. This paper addresses the challenge with a data-mining approach. We introduce the novel concept of a "frequent temporal association pattern" (FTAP): a set of genes simultaneously exhibit complex temporal expression patterns recurrently across multiple microarray datasets. Such temporal signals are hard to identify in individual microarray datasets, but become significant by their frequent occurrences across multiple datasets. We designed an efficient two-stage algorithm to identify FTAPs. First, for each gene we identify expression trends that occur frequently across multiple datasets. Second, we look for a set of genes that simultaneously exhibit their respective trends recurrently in multiple datasets. We applied this algorithm to 18 yeast time-series microarray datasets. The majority of FTAPs identified by the algorithm are associated with specific biological functions. Moreover, a significant number of patterns include genes that are functionally related but do not exhibit co-expression; such gene groups cannot be captured by clustering algorithms. Our approach offers advantages: (1) it can identify complex associations of temporal trends in gene expression, an important step towards understanding the complex mechanisms governing cellular systems; (2) it is capable of integrating time-series data with different time scales and intervals; and (3) it yields results that are robust against outliers.
Collapse
|
48
|
Ladunga I. An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 2010; 674:1-22. [PMID: 20827582 DOI: 10.1007/978-1-60761-854-6_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Here we provide a pragmatic, high-level overview of the computational approaches and tools for the discovery of transcription factor binding sites. Unraveling transcription regulatory networks and their malfunctions such as cancer became feasible due to recent stellar progress in experimental techniques and computational analyses. While predictions of isolated sites still pose notorious challenges, cis-regulatory modules (clusters) of binding sites can now be identified with high accuracy. Further support comes from conserved DNA segments, co-regulation, transposable elements, nucleosomes, and three-dimensional chromosomal structures. We introduce computational tools for the analysis and interpretation of chromatin immunoprecipitation, next-generation sequencing, SELEX, and protein-binding microarray results. Because immunoprecipitation produces overly large DNA segments and well over half of the sequencing reads from constitute background noise, methods are presented for background correction, sequence read mapping, peak calling, false discovery rate estimation, and co-localization analyses. To discover short binding site motifs from extensive immunoprecipitation segments, we recommend algorithms and software based on expectation maximization and Gibbs sampling. Data integration using several databases further improves performance. Binding sites can be visualized in genomic and chromatin context using genome browsers. Binding site information, integrated with co-expression in large compendia of gene expression experiments, allows us to reveal complex transcriptional regulatory networks.
Collapse
Affiliation(s)
- Istvan Ladunga
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
49
|
Abstract
Regulatory and other networks in the cell change in a highly dynamic way over time and in response to internal and external stimuli. While several different types of high-throughput experimental procedures are available to study systems in the cell, most only measure static properties of such networks. Information derived from sequence data is inherently static, and most interaction data sets are measured in a static way as well. In this chapter we discuss one of the few abundant sources for temporal information, time series expression data. We provide an overview of the methods suggested for clustering this type of data to identify functionally related genes. We also discuss methods for inferring causality and interactions using lagged correlations and regression analysis. Finally, we present methods for combining time series expression data with static data to reconstruct dynamic regulatory networks. We point to software tools implementing the methods discussed in this chapter. As more temporal measurements become available, the importance of analyzing such data and of combining it with other types of data will greatly increase.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
50
|
Veiga DFT, Dutta B, Balázsi G. Network inference and network response identification: moving genome-scale data to the next level of biological discovery. MOLECULAR BIOSYSTEMS 2009; 6:469-80. [PMID: 20174676 DOI: 10.1039/b916989j] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery.
Collapse
Affiliation(s)
- Diogo F T Veiga
- Department of Systems Biology-Unit 950, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.
| | | | | |
Collapse
|