1
|
Lu J, Dumitrascu B, McDowell IC, Jo B, Barrera A, Hong LK, Leichter SM, Reddy TE, Engelhardt BE. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput Biol 2021; 17:e1008223. [PMID: 33513136 PMCID: PMC7875426 DOI: 10.1371/journal.pcbi.1008223] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 02/10/2021] [Accepted: 08/07/2020] [Indexed: 11/19/2022] Open
Abstract
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.
Collapse
Affiliation(s)
- Jonathan Lu
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Ian C. McDowell
- Element Genomics, A UCB Company, Durham, North Carolina, United States of America
| | - Brian Jo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Linda K. Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Sarah M. Leichter
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Timothy E. Reddy
- Department of Genome Sciences, Duke University, Durham, North Carolina, United States of America
| | - Barbara E. Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
2
|
Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control. Commun Integr Biol 2020; 13:108-118. [PMID: 33014263 PMCID: PMC7518458 DOI: 10.1080/19420889.2020.1802914] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 07/22/2020] [Accepted: 07/26/2020] [Indexed: 02/07/2023] Open
Abstract
The biological sciences span many spatial and temporal scales in attempts to understand the function and evolution of complex systems-level processes, such as embryogenesis. It is generally assumed that the most effective description of these processes is in terms of molecular interactions. However, recent developments in information theory and causal analysis now allow for the quantitative resolution of this question. In some cases, macro-scale models can minimize noise and increase the amount of information an experimenter or modeler has about "what does what." This result has numerous implications for evolution, pattern regulation, and biomedical strategies. Here, we provide an introduction to these quantitative techniques, and use them to show how informative macro-scales are common across biology. Our goal is to give biologists the tools to identify the maximally-informative scale at which to model, experiment on, predict, control, and understand complex biological systems.
Collapse
Affiliation(s)
- Erik Hoel
- Allen Discovery Center, Tufts University, Medford, MA, USA
| | - Michael Levin
- Allen Discovery Center, Tufts University, Medford, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| |
Collapse
|
3
|
Ness RO, Sachs K, Mallick P, Vitek O. A Bayesian Active Learning Experimental Design for Inferring Signaling Networks. J Comput Biol 2018; 25:709-725. [PMID: 29927613 DOI: 10.1089/cmb.2017.0247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Machine learning methods for learning network structure are applied to quantitative proteomics experiments and reverse-engineer intracellular signal transduction networks. They provide insight into the rewiring of signaling within the context of a disease or a phenotype. To learn the causal patterns of influence between proteins in the network, the methods require experiments that include targeted interventions that fix the activity of specific proteins. However, the interventions are costly and add experimental complexity. We describe an active learning strategy for selecting optimal interventions. Our approach takes as inputs pathway databases and historic data sets, expresses them in form of prior probability distributions on network structures, and selects interventions that maximize their expected contribution to structure learning. Evaluations on simulated and real data show that the strategy reduces the detection error of validated edges as compared with an unguided choice of interventions and avoids redundant interventions, thereby increasing the effectiveness of the experiment.
Collapse
Affiliation(s)
- Robert O Ness
- 1 Department of Statistics, Purdue University , West Lafayette, Indiana
| | - Karen Sachs
- 2 Department of Immunology, School of Medicine, Stanford University , Palo Alto, California
| | - Parag Mallick
- 3 Canary Center for Cancer Early Detection, School of Medicine, Stanford University , Palo Alto, California
| | - Olga Vitek
- 4 College of Science, College of Computer and Information Science, Northeastern University , Boston, Massachusetts
| |
Collapse
|
4
|
Bertrán MA, Martínez NL, Wang Y, Dunson D, Sapiro G, Ringach D. Active learning of cortical connectivity from two-photon imaging data. PLoS One 2018; 13:e0196527. [PMID: 29718955 PMCID: PMC5931643 DOI: 10.1371/journal.pone.0196527] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
Understanding how groups of neurons interact within a network is a fundamental question in system neuroscience. Instead of passively observing the ongoing activity of a network, we can typically perturb its activity, either by external sensory stimulation or directly via techniques such as two-photon optogenetics. A natural question is how to use such perturbations to identify the connectivity of the network efficiently. Here we introduce a method to infer sparse connectivity graphs from in-vivo, two-photon imaging of population activity in response to external stimuli. A novel aspect of the work is the introduction of a recommended distribution, incrementally learned from the data, to optimally refine the inferred network. Unlike existing system identification techniques, this “active learning” method automatically focuses its attention on key undiscovered areas of the network, instead of targeting global uncertainty indicators like parameter variance. We show how active learning leads to faster inference while, at the same time, provides confidence intervals for the network parameters. We present simulations on artificial small-world networks to validate the methods and apply the method to real data. Analysis of frequency of motifs recovered show that cortical networks are consistent with a small-world topology model.
Collapse
Affiliation(s)
- Martín A. Bertrán
- Electrical and Computer Engineering, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| | - Natalia L. Martínez
- Electrical and Computer Engineering, Duke University, Durham, North Carolina, United States of America
| | - Ye Wang
- Statistical Science Program, Duke University, Durham, North Carolina, United States of America
| | - David Dunson
- Statistical Science Program, Duke University, Durham, North Carolina, United States of America
| | - Guillermo Sapiro
- Electrical and Computer Engineering, Duke University, Durham, North Carolina, United States of America
- BME, CS, and Math, Duke University, Durham, North Carolina, United States of America
| | - Dario Ringach
- Neurobiology and Psychology, Jules Stein Eye Institute, Biomedical Engineering Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
5
|
Agrahari R, Foroushani A, Docking TR, Chang L, Duns G, Hudoba M, Karsan A, Zare H. Applications of Bayesian network models in predicting types of hematological malignancies. Sci Rep 2018; 8:6951. [PMID: 29725024 PMCID: PMC5934387 DOI: 10.1038/s41598-018-24758-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 04/05/2018] [Indexed: 12/17/2022] Open
Abstract
Network analysis is the preferred approach for the detection of subtle but coordinated changes in expression of an interacting and related set of genes. We introduce a novel method based on the analyses of coexpression networks and Bayesian networks, and we use this new method to classify two types of hematological malignancies; namely, acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Our classifier has an accuracy of 93%, a precision of 98%, and a recall of 90% on the training dataset (n = 366); which outperforms the results reported by other scholars on the same dataset. Although our training dataset consists of microarray data, our model has a remarkable performance on the RNA-Seq test dataset (n = 74, accuracy = 89%, precision = 88%, recall = 98%), which confirms that eigengenes are robust with respect to expression profiling technology. These signatures are useful in classification and correctly predicting the diagnosis. They might also provide valuable information about the underlying biology of diseases. Our network analysis approach is generalizable and can be useful for classifying other diseases based on gene expression profiles. Our previously published Pigengene package is publicly available through Bioconductor, which can be used to conveniently fit a Bayesian network to gene expression data.
Collapse
Affiliation(s)
- Rupesh Agrahari
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA
| | - Amir Foroushani
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA
| | - T Roderick Docking
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Linda Chang
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Gerben Duns
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Monika Hudoba
- Department of Pathology and Laboratory Medicine, Vancouver General Hospital, Vancouver, British Columbia, V5Z 1M9, Canada
| | - Aly Karsan
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Habil Zare
- Department of Computer Science, Texas State University, San Marcos, Texas, 78666, USA. .,Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, Texas, 78229, USA.
| |
Collapse
|
6
|
Abstract
Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.
Collapse
Affiliation(s)
- Lawrence B Holder
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA
| | - M Muksitul Haque
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA.,b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| | - Michael K Skinner
- b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| |
Collapse
|
7
|
Nalluri JJ, Barh D, Azevedo V, Ghosh P. miRsig: a consensus-based network inference methodology to identify pan-cancer miRNA-miRNA interaction signatures. Sci Rep 2017; 7:39684. [PMID: 28045122 PMCID: PMC5206712 DOI: 10.1038/srep39684] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 11/25/2016] [Indexed: 01/17/2023] Open
Abstract
Decoding the patterns of miRNA regulation in diseases are important to properly realize its potential in diagnostic, prog- nostic, and therapeutic applications. Only a handful of studies computationally predict possible miRNA-miRNA interactions; hence, such interactions require a thorough investigation to understand their role in disease progression. In this paper, we design a novel computational pipeline to predict the common signature/core sets of miRNA-miRNA interactions for different diseases using network inference algorithms on the miRNA-disease expression profiles; the individual predictions of these algorithms were then merged using a consensus-based approach to predict miRNA-miRNA associations. We next selected the miRNA-miRNA associations across particular diseases to generate the corresponding disease-specific miRNA-interaction networks. Next, graph intersection analysis was performed on these networks for multiple diseases to identify the common signature/core sets of miRNA interactions. We applied this pipeline to identify the common signature of miRNA-miRNA inter- actions for cancers. The identified signatures when validated using a manual literature search from PubMed Central and the PhenomiR database, show strong relevance with the respective cancers, providing an indirect proof of the high accuracy of our methodology. We developed miRsig, an online tool for analysis and visualization of the disease-specific signature/core miRNA-miRNA interactions, available at: http://bnet.egr.vcu.edu/miRsig.
Collapse
Affiliation(s)
- Joseph J Nalluri
- Department of Computer Science, School of Engineering, Virginia Commonwealth University, Richmond, Virginia,USA
| | - Debmalya Barh
- Center for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology, Purba Medinipur, West Bengal, India.,Laboratório de Genética Celular e Molecular, Departamento de Biologia Geral, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte, Minas Gerais, Brazil.,Xcode Life Sciences, 3D Eldorado, 112 Nungambakkam High Road, Nungambakkam, Chennai, Tamil Nadu-600034, India
| | - Vasco Azevedo
- Laboratório de Genética Celular e Molecular, Departamento de Biologia Geral, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte, Minas Gerais, Brazil
| | - Preetam Ghosh
- Department of Computer Science, School of Engineering, Virginia Commonwealth University, Richmond, Virginia,USA
| |
Collapse
|
8
|
Gogoshin G, Boerwinkle E, Rodin AS. New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data. J Comput Biol 2016; 24:340-356. [PMID: 27681505 PMCID: PMC5372779 DOI: 10.1089/cmb.2016.0100] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- 1 Diabetes and Metabolism Research Institute , City of Hope, Duarte, California
| | - Eric Boerwinkle
- 2 Human Genetics Center, School of Public Health, University of Texas Health Science Center , Houston, Texas.,3 Institute of Molecular Medicine, University of Texas Health Science Center , Houston, Texas
| | - Andrei S Rodin
- 1 Diabetes and Metabolism Research Institute , City of Hope, Duarte, California
| |
Collapse
|