1
|
Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems. Adv Bioinformatics 2017; 2017:4827171. [PMID: 28250767 PMCID: PMC5303608 DOI: 10.1155/2017/4827171] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 10/10/2016] [Accepted: 10/19/2016] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
Collapse
|
2
|
Ogundijo OE, Elmas A, Wang X. Reverse engineering gene regulatory networks from measurement with missing values. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2017; 2017:2. [PMID: 28127303 PMCID: PMC5225239 DOI: 10.1186/s13637-016-0055-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/15/2016] [Indexed: 12/31/2022]
Abstract
Background Gene expression time series data are usually in the form of high-dimensional
arrays. Unfortunately, the data may sometimes contain missing values: for either
the expression values of some genes at some time points or the entire expression
values of a single time point or some sets of consecutive time points. This
significantly affects the performance of many algorithms for gene expression
analysis that take as an input, the complete matrix of gene expression
measurement. For instance, previous works have shown that gene regulatory
interactions can be estimated from the complete matrix of gene expression
measurement. Yet, till date, few algorithms have been proposed for the inference
of gene regulatory network from gene expression data with missing values. Results We describe a nonlinear dynamic stochastic model for the evolution of gene
expression. The model captures the structural, dynamical, and the nonlinear
natures of the underlying biomolecular systems. We present point-based Gaussian
approximation (PBGA) filters for joint state and parameter estimation of the
system with one-step or two-step missing measurements. The PBGA filters use Gaussian
approximation and various quadrature rules, such as the unscented transform (UT),
the third-degree cubature rule and the central difference rule for computing the
related posteriors. The proposed algorithm is evaluated with satisfying results
for synthetic networks, in silico networks released as a part of the DREAM
project, and the real biological network, the in vivo reverse engineering and
modeling assessment (IRMA) network of yeast Saccharomyces
cerevisiae. Conclusion PBGA filters are proposed to elucidate the underlying gene regulatory network
(GRN) from time series gene expression data that contain missing values. In our
state-space model, we proposed a measurement model that incorporates the effect of
the missing data points into the sequential algorithm. This approach produces a
better inference of the model parameters and hence, more accurate prediction of
the underlying GRN compared to when using the conventional Gaussian approximation
(GA) filters ignoring the missing data points. Electronic supplementary material The online version of this article (doi:10.1186/s13637-016-0055-8) contains supplementary material, which is available to authorized
users.
Collapse
Affiliation(s)
- Oyetunji E Ogundijo
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| | - Abdulkadir Elmas
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| |
Collapse
|
3
|
CLIP-GENE: a web service of the condition specific context-laid integrative analysis for gene prioritization in mouse TF knockout experiments. Biol Direct 2016; 11:57. [PMID: 27776539 PMCID: PMC5078909 DOI: 10.1186/s13062-016-0158-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/10/2016] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis. However, existing methods require cutoff values to select candidate genes, which can produce either too many false positives or false negatives. This hurdle can be addressed either by improving the accuracy of gene selection or by providing a method to rank candidate genes effectively, or both. Prioritization of candidate genes should consider the goals or context of the knockout experiment. As of now, there are no tools designed for both selecting and prioritizing genes from the mouse knockout data. Hence, the necessity of a new tool arises. RESULTS In this study, we present CLIP-GENE, a web service that selects gene markers by utilizing differentially expressed genes, mouse transcription factor (TF) network, and single nucleotide variant information. Then, protein-protein interaction network and literature information are utilized to find genes that are relevant to the phenotypic differences. One of the novel features is to allow researchers to specify their contexts or hypotheses in a set of keywords to rank genes according to the contexts that the user specify. We believe that CLIP-GENE will be useful in characterizing functions of TFs in mouse experiments. AVAILABILITY http://epigenomics.snu.ac.kr/CLIP-GENE REVIEWERS: This article was reviewed by Dr. Lee and Dr. Pongor.
Collapse
|
4
|
Quivey RG, Grayhack EJ, Faustoferri RC, Hubbard CJ, Baldeck JD, Wolf AS, MacGilvray ME, Rosalen PL, Scott-Anne K, Santiago B, Gopal S, Payne J, Marquis RE. Functional profiling in Streptococcus mutans: construction and examination of a genomic collection of gene deletion mutants. Mol Oral Microbiol 2015; 30:474-95. [PMID: 25973955 DOI: 10.1111/omi.12107] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2015] [Indexed: 12/17/2022]
Abstract
A collection of tagged deletion mutant strains was created in Streptococcus mutans UA159 to facilitate investigation of the aciduric capability of this oral pathogen. Gene-specific barcoded deletions were attempted in 1432 open reading frames (representing 73% of the genome), and resulted in the isolation of 1112 strains (56% coverage) carrying deletions in distinct non-essential genes. As S. mutans virulence is predicated upon the ability of the organism to survive an acidic pH environment, form biofilms on tooth surfaces, and out-compete other oral microflora, we assayed individual mutant strains for the relative fitness of the deletion strain, compared with the parent strain, under acidic and oxidative stress conditions, as well as for their ability to form biofilms in glucose- or sucrose-containing medium. Our studies revealed a total of 51 deletion strains with defects in both aciduricity and biofilm formation. We have also identified 49 strains whose gene deletion confers sensitivity to oxidative damage and deficiencies in biofilm formation. We demonstrate the ability to examine competitive fitness of mutant organisms using the barcode tags incorporated into each deletion strain to examine the representation of a particular strain in a population. Co-cultures of deletion strains were grown either in vitro in a chemostat to steady-state values of pH 7 and pH 5 or in vivo in an animal model for oral infection. Taken together, these data represent a mechanism for assessing the virulence capacity of this pathogenic microorganism and a resource for identifying future targets for drug intervention to promote healthy oral microflora.
Collapse
Affiliation(s)
- R G Quivey
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA.,Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - E J Grayhack
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, NY, USA
| | - R C Faustoferri
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - C J Hubbard
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - J D Baldeck
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA
| | - A S Wolf
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, NY, USA
| | - M E MacGilvray
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - P L Rosalen
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - K Scott-Anne
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - B Santiago
- Center for Oral Biology, University of Rochester, Rochester, NY, USA
| | - S Gopal
- Department of Biological Sciences, Rochester Institute of Technology, Rochester, NY, USA
| | - J Payne
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, NY, USA
| | - R E Marquis
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY, USA
| |
Collapse
|
5
|
Abstract
Large-scale genetic perturbation screens are a classical approach in biology and have been crucial for many discoveries. New technologies can now provide unbiased quantification of multiple molecular and phenotypic changes across tens of thousands of individual cells from large numbers of perturbed cell populations simultaneously. In this Review, we describe how these developments have enabled the discovery of new principles of intracellular and intercellular organization, novel interpretations of genetic perturbation effects and the inference of novel functional genetic interactions. These advances now allow more accurate and comprehensive analyses of gene function in cells using genetic perturbation screens.
Collapse
|
6
|
Genome-scale metabolic network validation of Shewanella oneidensis using transposon insertion frequency analysis. PLoS Comput Biol 2014; 10:e1003848. [PMID: 25233219 PMCID: PMC4168976 DOI: 10.1371/journal.pcbi.1003848] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 08/07/2014] [Indexed: 01/08/2023] Open
Abstract
Transposon mutagenesis, in combination with parallel sequencing, is becoming a powerful tool for en-masse mutant analysis. A probability generating function was used to explain observed miniHimar transposon insertion patterns, and gene essentiality calls were made by transposon insertion frequency analysis (TIFA). TIFA incorporated the observed genome and sequence motif bias of the miniHimar transposon. The gene essentiality calls were compared to: 1) previous genome-wide direct gene-essentiality assignments; and, 2) flux balance analysis (FBA) predictions from an existing genome-scale metabolic model of Shewanella oneidensis MR-1. A three-way comparison between FBA, TIFA, and the direct essentiality calls was made to validate the TIFA approach. The refinement in the interpretation of observed transposon insertions demonstrated that genes without insertions are not necessarily essential, and that genes that contain insertions are not always nonessential. The TIFA calls were in reasonable agreement with direct essentiality calls for S. oneidensis, but agreed more closely with E. coli essentiality calls for orthologs. The TIFA gene essentiality calls were in good agreement with the MR-1 FBA essentiality predictions, and the agreement between TIFA and FBA predictions was substantially better than between the FBA and the direct gene essentiality predictions.
Collapse
|
7
|
Wei W, Ye YN, Luo S, Deng YY, Lin D, Guo FB. IFIM: a database of integrated fitness information for microbial genes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau052. [PMID: 24923821 PMCID: PMC4207227 DOI: 10.1093/database/bau052] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Knowledge of an organism’s fitness for survival is important for a complete understanding of microbial genetics and effective drug design. Current essential gene databases provide only binary essentiality data from genome-wide experiments. We therefore developed a new database that Integrates quantitative Fitness Information for Microbial genes (IFIM). The IFIM database currently contains data from 16 experiments and 2186 theoretical predictions. The highly significant correlation between the experiment-derived fitness data and our computational simulations demonstrated that the computer-generated predictions were often as reliable as the experimental data. The data in IFIM can be accessed easily, and the interface allows users to browse through the gene fitness information that it contains. IFIM is the first resource that allows easy access to fitness data of microbial genes. We believe this database will contribute to a better understanding of microbial genetics and will be useful in designing drugs to resist microbial pathogens, especially when experimental data are unavailable. Database URL:http://cefg.uestc.edu.cn/ifim/ or http://cefg.cn/ifim/
Collapse
Affiliation(s)
- Wen Wei
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yuan-Nong Ye
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Sen Luo
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yan-Yan Deng
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Lin
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Feng-Biao Guo
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
8
|
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One 2014; 9:e96732. [PMID: 24806471 PMCID: PMC4013075 DOI: 10.1371/journal.pone.0096732] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/09/2014] [Indexed: 01/14/2023] Open
Abstract
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, California, United States of America
| | - Federico Morán
- Department of Biochemistry and Molecular Biology, Complutense University, Madrid, Spain
| | | |
Collapse
|
9
|
Jia B, Wang X. Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:5. [PMID: 24708632 PMCID: PMC3998071 DOI: 10.1186/1687-4153-2014-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Accepted: 02/26/2014] [Indexed: 11/10/2022]
Abstract
Parameter estimation in dynamic systems finds applications in various disciplines, including system biology. The well-known expectation-maximization (EM) algorithm is a popular method and has been widely used to solve system identification and parameter estimation problems. However, the conventional EM algorithm cannot exploit the sparsity. On the other hand, in gene regulatory network inference problems, the parameters to be estimated often exhibit sparse structure. In this paper, a regularized expectation-maximization (rEM) algorithm for sparse parameter estimation in nonlinear dynamic systems is proposed that is based on the maximum a posteriori (MAP) estimation and can incorporate the sparse prior. The expectation step involves the forward Gaussian approximation filtering and the backward Gaussian approximation smoothing. The maximization step employs a re-weighted iterative thresholding method. The proposed algorithm is then applied to gene regulatory network inference. Results based on both synthetic and real data show the effectiveness of the proposed algorithm.
Collapse
Affiliation(s)
- Bin Jia
- Intelligent Fusion Technology, Germantown, Inc., MD 20876, USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| |
Collapse
|