1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Bouvier M, Zreika S, Vallin E, Fourneaux C, Gonin-Giraud S, Bonnaffoux A, Gandrillon O. TopoDoE: a design of experiment strategy for selection and refinement in ensembles of executable gene regulatory networks. BMC Bioinformatics 2024; 25:245. [PMID: 39030497 PMCID: PMC11264509 DOI: 10.1186/s12859-024-05855-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 07/03/2024] [Indexed: 07/21/2024] Open
Abstract
BACKGROUND Inference of Gene Regulatory Networks (GRNs) is a difficult and long-standing question in Systems Biology. Numerous approaches have been proposed with the latest methods exploring the richness of single-cell data. One of the current difficulties lies in the fact that many methods of GRN inference do not result in one proposed GRN but in a collection of plausible networks that need to be further refined. In this work, we present a Design of Experiment strategy to use as a second stage after the inference process. It is specifically fitted for identifying the next most informative experiment to perform for deciding between multiple network topologies, in the case where proposed GRNs are executable models. This strategy first performs a topological analysis to reduce the number of perturbations that need to be tested, then predicts the outcome of the retained perturbations by simulation of the GRNs and finally compares predictions with novel experimental data. RESULTS We apply this method to the results of our divide-and-conquer algorithm called WASABI, adapt its gene expression model to produce perturbations and compare our predictions with experimental results. We show that our networks were able to produce in silico predictions on the outcome of a gene knock-out, which were qualitatively validated for 48 out of 49 genes. Finally, we eliminate as many as two thirds of the candidate networks for which we could identify an incorrect topology, thus greatly improving the accuracy of our predictions. CONCLUSION These results both confirm the inference accuracy of WASABI and show how executable gene expression models can be leveraged to further refine the topology of inferred GRNs. We hope this strategy will help systems biologists further explore their data and encourage the development of more executable GRN models.
Collapse
Affiliation(s)
- Matteo Bouvier
- Laboratoire de Biologie Moléculaire de la Cellule, Lyon, France.
- Vidium Solutions, Lyon, France.
- Inria Grenoble, Rhône-Alpes Research Center, Lyon, France.
| | - Souad Zreika
- Laboratoire de Biologie Moléculaire de la Cellule, Lyon, France
| | - Elodie Vallin
- Laboratoire de Biologie Moléculaire de la Cellule, Lyon, France
| | | | | | | | - Olivier Gandrillon
- Laboratoire de Biologie Moléculaire de la Cellule, Lyon, France
- Inria Grenoble, Rhône-Alpes Research Center, Lyon, France
| |
Collapse
|
3
|
Isenberg NM, Mertins SD, Yoon BJ, Reyes KG, Urban NM. Identifying Bayesian optimal experiments for uncertain biochemical pathway models. Sci Rep 2024; 14:15237. [PMID: 38956095 PMCID: PMC11219779 DOI: 10.1038/s41598-024-65196-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 06/18/2024] [Indexed: 07/04/2024] Open
Abstract
Pharmacodynamic (PD) models are mathematical models of cellular reaction networks that include drug mechanisms of action. These models are useful for studying predictive therapeutic outcomes of novel drug therapies in silico. However, PD models are known to possess significant uncertainty with respect to constituent parameter data, leading to uncertainty in the model predictions. Furthermore, experimental data to calibrate these models is often limited or unavailable for novel pathways. In this study, we present a Bayesian optimal experimental design approach for improving PD model prediction accuracy. We then apply our method using simulated experimental data to account for uncertainty in hypothetical laboratory measurements. This leads to a probabilistic prediction of drug performance and a quantitative measure of which prospective laboratory experiment will optimally reduce prediction uncertainty in the PD model. The methods proposed here provide a way forward for uncertainty quantification and guided experimental design for models of novel biological pathways.
Collapse
Affiliation(s)
| | - Susan D Mertins
- Fredrick National Laboratory for Cancer Research, Fredrick, MD, 21702, USA
| | - Byung-Jun Yoon
- Texas A &M University, College Station, TX, 77843, USA
- Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Kristofer G Reyes
- University at Buffalo, Buffalo, NY, 14260, USA
- Brookhaven National Laboratory, Upton, NY, 11973, USA
| | | |
Collapse
|
4
|
Qian X, Yoon BJ, Arróyave R, Qian X, Dougherty ER. Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100863. [PMID: 38035192 PMCID: PMC10682757 DOI: 10.1016/j.patter.2023.100863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Significant acceleration of the future discovery of novel functional materials requires a fundamental shift from the current materials discovery practice, which is heavily dependent on trial-and-error campaigns and high-throughput screening, to one that builds on knowledge-driven advanced informatics techniques enabled by the latest advances in signal processing and machine learning. In this review, we discuss the major research issues that need to be addressed to expedite this transformation along with the salient challenges involved. We especially focus on Bayesian signal processing and machine learning schemes that are uncertainty aware and physics informed for knowledge-driven learning, robust optimization, and efficient objective-driven experimental design.
Collapse
Affiliation(s)
- Xiaoning Qian
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Byung-Jun Yoon
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Raymundo Arróyave
- Department of Materials Science & Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Xiaofeng Qian
- Department of Materials Science & Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Edward R. Dougherty
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
5
|
Imani M, Ghoreishi SF. Graph-Based Bayesian Optimization for Large-Scale Objective-Based Experimental Design. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5913-5925. [PMID: 33877989 DOI: 10.1109/tnnls.2021.3071958] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Design is an inseparable part of most scientific and engineering tasks, including real and simulation-based experimental design processes and parameter/hyperparameter tuning/optimization. Several model-based experimental design techniques have been developed for design in domains with partial available knowledge about the underlying process. This article focuses on a powerful class of model-based experimental design called the mean objective cost of uncertainty (MOCU). The MOCU-based techniques are objective-based, meaning that they take the main objective of the process into account during the experimental design process. However, the lack of scalability of MOCU-based techniques prevents their application to most practical problems, including large discrete or combinatorial spaces. To achieve a scalable objective-based experimental design, this article proposes a graph-based MOCU-based Bayesian optimization framework. The correlations among samples in the large design space are accounted for using a graph-based Gaussian process, and an efficient closed-form sequential selection is achieved through the well-known expected improvement policy. The proposed framework's performance is assessed through the structural intervention in gene regulatory networks, aiming to make the network away from the states associated with cancer.
Collapse
|
6
|
Maddouri O, Qian X, Alexander FJ, Dougherty ER, Yoon BJ. Robust importance sampling for error estimation in the context of optimal Bayesian transfer learning. PATTERNS (NEW YORK, N.Y.) 2022; 3:100428. [PMID: 35510184 PMCID: PMC9058919 DOI: 10.1016/j.patter.2021.100428] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/13/2021] [Accepted: 12/16/2021] [Indexed: 11/21/2022]
Abstract
Classification has been a major task for building intelligent systems because it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions-either explicitly or implicitly. In many scientific or clinical settings, training data are typically limited, which impedes the design and evaluation of accurate classifiers. Atlhough transfer learning can improve the learning in target domains by incorporating data from relevant source domains, it has received little attention for performance assessment, notably in error estimation. Here, we investigate knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a class of Bayesian minimum mean-square error estimators for optimal Bayesian transfer learning, which enables rigorous evaluation of classification error under uncertainty in small-sample settings. Using Monte Carlo importance sampling, we illustrate the outstanding performance of the proposed estimator for a broad family of classifiers that span diverse learning capabilities.
Collapse
Affiliation(s)
- Omar Maddouri
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Francis J. Alexander
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Edward R. Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
7
|
Coveney PV, Highfield RR. From digital hype to analogue reality: Universal simulation beyond the quantum and exascale eras. JOURNAL OF COMPUTATIONAL SCIENCE 2020; 46:101093. [PMID: 33312270 PMCID: PMC7709487 DOI: 10.1016/j.jocs.2020.101093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 03/03/2020] [Indexed: 05/23/2023]
Abstract
Many believe that the future of innovation lies in simulation. However, as computers are becoming ever more powerful, so does the hyperbole used to discuss their potential in modelling across a vast range of domains, from subatomic physics to chemistry, climate science, epidemiology, economics and cosmology. As we are about to enter the era of quantum and exascale computing, machine learning and artificial intelligence have entered the field in a significant way. In this article we give a brief history of simulation, discuss how machine learning can be more powerful if underpinned by deeper mechanistic understanding, outline the potential of exascale and quantum computing, highlight the limits of digital computing - classical and quantum - and distinguish rhetoric from reality in assessing the future of modelling and simulation, when we believe analogue computing will play an increasingly important role.
Collapse
Affiliation(s)
- Peter V. Coveney
- Centre for Computational Science, University College London, Gordon Street, London, WC1H 0AJ, UK
- Institute for Informatics, Science Park 904, University of Amsterdam, 1098 XH, Amsterdam, Netherlands
| | | |
Collapse
|
8
|
Elreedy D, F. Atiya A, I. Shaheen S. A Novel Active Learning Regression Framework for Balancing the Exploration-Exploitation Trade-Off. ENTROPY 2019; 21:e21070651. [PMID: 33267365 PMCID: PMC7515147 DOI: 10.3390/e21070651] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 06/28/2019] [Accepted: 06/28/2019] [Indexed: 11/16/2022]
Abstract
Recently, active learning is considered a promising approach for data acquisition due to the significant cost of the data labeling process in many real world applications, such as natural language processing and image processing. Most active learning methods are merely designed to enhance the learning model accuracy. However, the model accuracy may not be the primary goal and there could be other domain-specific objectives to be optimized. In this work, we develop a novel active learning framework that aims to solve a general class of optimization problems. The proposed framework mainly targets the optimization problems exposed to the exploration-exploitation trade-off. The active learning framework is comprehensive, it includes exploration-based, exploitation-based and balancing strategies that seek to achieve the balance between exploration and exploitation. The paper mainly considers regression tasks, as they are under-researched in the active learning field compared to classification tasks. Furthermore, in this work, we investigate the different active querying approaches—pool-based and the query synthesis—and compare them. We apply the proposed framework to the problem of learning the price-demand function, an application that is important in optimal product pricing and dynamic (or time-varying) pricing. In our experiments, we provide a comparative study including the proposed framework strategies and some other baselines. The accomplished results demonstrate a significant performance for the proposed methods.
Collapse
|
9
|
Dougherty ER. A Nonmathematical Review of Optimal Operator and Experimental Design for Uncertain Scientific Models with Application to Genomics. Curr Genomics 2019; 20:16-23. [PMID: 31015788 PMCID: PMC6446484 DOI: 10.2174/1389202919666181213095743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 12/05/2018] [Accepted: 12/10/2018] [Indexed: 11/22/2022] Open
Abstract
Introduction: The most basic aspect of modern engineering is the design of operators to act on physical systems in an optimal manner relative to a desired objective – for instance, designing a con-trol policy to autonomously direct a system or designing a classifier to make decisions regarding the sys-tem. These kinds of problems appear in biomedical science, where physical models are created with the intention of using them to design tools for diagnosis, prognosis, and therapy. Methods: In the classical paradigm, our knowledge regarding the model is certain; however, in practice, especially with complex systems, our knowledge is uncertain and operators must be designed while tak-ing this uncertainty into account. The related concepts of intrinsically Bayesian robust operators and op-timal Bayesian operators treat operator design under uncertainty. An objective-based experimental de-sign procedure is naturally related to operator design: We would like to perform an experiment that max-imally reduces our uncertainty as it pertains to our objective. Results & Discussion: This paper provides a nonmathematical review of optimal Bayesian operators directed at biomedical scientists. It considers two applications important to genomics, structural interven-tion in gene regulatory networks and classification. Conclusion: The salient point regarding intrinsically Bayesian operators is that uncertainty is quantified relative to the scientific model, and the prior distribution is on the parameters of this model. Optimization has direct physical (biological) meaning. This is opposed to the common method of placing prior distri-butions on the parameters of the operator, in which case there is a scientific gap between operator design and the phenomena.
Collapse
Affiliation(s)
- Edward R Dougherty
- Department of Electrical and Computer Engineering, College Station, Texas A&M University - TX, USA
| |
Collapse
|
10
|
Zhang W, Li W, Zhang J, Wang N. Optimal parameter identification of synthetic gene networks using harmony search algorithm. PLoS One 2019; 14:e0213977. [PMID: 30925150 PMCID: PMC6440652 DOI: 10.1371/journal.pone.0213977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 02/09/2019] [Indexed: 12/03/2022] Open
Abstract
Computational modeling of engineered gene circuits is an important while challenged task in systems biology. In order to describe and predict the response behaviors of genetic circuits using reliable model parameters, this paper applies an optimal experimental design(OED) method to obtain input signals. In order to obtain informative observations, this study focuses on maximizing Fisher information matrix(FIM)-based optimal criteria and to provide optimal inputs. Furthermore, this paper designs a two-stage optimization with the modified E-optimal criteria and applies harmony search(HS)-based OED algorithm to minimize estimation errors. The proposed optimal identification methodology involves estimation errors and the sample size to pursue a trade-off between estimation accuracy and measurement cost in modeling gene networks. The designed cost function takes two major factors into account, in which experimental costs are proportional to the number of time points. Experiments select two types of synthetic genetic networks to validate the effectiveness of the proposed HS-OED approach. Identification outcomes and analysis indicate the proposed HS-OED method outperforms two candidate OED approaches, with reduced computational effort.
Collapse
Affiliation(s)
- Wei Zhang
- Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China
| | - Wenchao Li
- Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China
| | - Jianming Zhang
- Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China
| | - Ning Wang
- Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China
| |
Collapse
|
11
|
Trinh HC, Kwon YK. RMut: R package for a Boolean sensitivity analysis against various types of mutations. PLoS One 2019; 14:e0213736. [PMID: 30889216 PMCID: PMC6424452 DOI: 10.1371/journal.pone.0213736] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 02/27/2019] [Indexed: 12/13/2022] Open
Abstract
There have been many in silico studies based on a Boolean network model to investigate network sensitivity against gene or interaction mutations. However, there are no proper tools to examine the network sensitivity against many different types of mutations, including user-defined ones. To address this issue, we developed RMut, which is an R package to analyze the Boolean network-based sensitivity by efficiently employing not only many well-known node-based and edgetic mutations but also novel user-defined mutations. In addition, RMut can specify the mutation area and the duration time for more precise analysis. RMut can be used to analyze large-scale networks because it is implemented in a parallel algorithm using the OpenCL library. In the first case study, we observed that the real biological networks were most sensitive to overexpression/state-flip and edge-addition/-reverse mutations among node-based and edgetic mutations, respectively. In the second case study, we showed that edgetic mutations can predict drug-targets better than node-based mutations. Finally, we examined the network sensitivity to double edge-removal mutations and found an interesting synergistic effect. Taken together, these findings indicate that RMut is a flexible R package to efficiently analyze network sensitivity to various types of mutations. RMut is available at https://github.com/csclab/RMut.
Collapse
Affiliation(s)
- Hung-Cuong Trinh
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Yung-Keun Kwon
- Department of Electrical/Electronic and Computer Engineering, University of Ulsan, Nam-gu, Ulsan, Korea
| |
Collapse
|
12
|
Haque S, Ahmad JS, Clark NM, Williams CM, Sozzani R. Computational prediction of gene regulatory networks in plant growth and development. CURRENT OPINION IN PLANT BIOLOGY 2019; 47:96-105. [PMID: 30445315 DOI: 10.1016/j.pbi.2018.10.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/05/2018] [Accepted: 10/18/2018] [Indexed: 05/22/2023]
Abstract
Plants integrate a wide range of cellular, developmental, and environmental signals to regulate complex patterns of gene expression. Recent advances in genomic technologies enable differential gene expression analysis at a systems level, allowing for improved inference of the network of regulatory interactions between genes. These gene regulatory networks, or GRNs, are used to visualize the causal regulatory relationships between regulators and their downstream target genes. Accordingly, these GRNs can represent spatial, temporal, and/or environmental regulations and can identify functional genes. This review summarizes recent computational approaches applied to different types of gene expression data to infer GRNs in the context of plant growth and development. Three stages of GRN inference are described: first, data collection and analysis based on the dataset type; second, network inference application based on data availability and proposed hypotheses; and third, validation based on in silico, in vivo, and in planta methods. In addition, this review relates data collection strategies to biological questions, organizes inference algorithms based on statistical methods and data types, discusses experimental design considerations, and provides guidelines for GRN inference with an emphasis on the benefits of integrative approaches, especially when a priori information is limited. Finally, this review concludes that computational frameworks integrating large-scale heterogeneous datasets are needed for a more accurate (e.g. fewer false interactions), detailed (e.g. discrimination between direct versus indirect interactions), and comprehensive (e.g. genetic regulation under various conditions and spatial locations) inference of GRNs.
Collapse
Affiliation(s)
- Samiul Haque
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
| | - Jabeen S Ahmad
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Natalie M Clark
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Cranos M Williams
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA.
| | - Rosangela Sozzani
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|
13
|
Dehghannasiri R, Shahrokh Esfahani M, Dougherty ER. An experimental design framework for Markovian gene regulatory networks under stationary control policy. BMC SYSTEMS BIOLOGY 2018; 12:137. [PMID: 30577732 PMCID: PMC6302376 DOI: 10.1186/s12918-018-0649-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
BACKGROUND A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty. RESULTS In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy. CONCLUSIONS Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
Collapse
Affiliation(s)
| | | | - Edward R. Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843 TX USA
- Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, 77845 TX USA
| |
Collapse
|
14
|
Imani M, Dehghannasiri R, Braga-Neto UM, Dougherty ER. Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty. Cancer Inform 2018; 17:1176935118790247. [PMID: 30093796 PMCID: PMC6080085 DOI: 10.1177/1176935118790247] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 06/25/2018] [Indexed: 11/16/2022] Open
Abstract
Scientists are attempting to use models of ever-increasing complexity, especially in medicine, where gene-based diseases such as cancer require better modeling of cell regulation. Complex models suffer from uncertainty and experiments are needed to reduce this uncertainty. Because experiments can be costly and time-consuming, it is desirable to determine experiments providing the most useful information. If a sequence of experiments is to be performed, experimental design is needed to determine the order. A classical approach is to maximally reduce the overall uncertainty in the model, meaning maximal entropy reduction. A recently proposed method takes into account both model uncertainty and the translational objective, for instance, optimal structural intervention in gene regulatory networks, where the aim is to alter the regulatory logic to maximally reduce the long-run likelihood of being in a cancerous state. The mean objective cost of uncertainty (MOCU) quantifies uncertainty based on the degree to which model uncertainty affects the objective. Experimental design involves choosing the experiment that yields the greatest reduction in MOCU. This article introduces finite-horizon dynamic programming for MOCU-based sequential experimental design and compares it with the greedy approach, which selects one experiment at a time without consideration of the full horizon of experiments. A salient aspect of the article is that it demonstrates the advantage of MOCU-based design over the widely used entropy-based design for both greedy and dynamic programming strategies and investigates the effect of model conditions on the comparative performances.
Collapse
Affiliation(s)
- Mahdi Imani
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | | | - Ulisses M Braga-Neto
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA.,Center for Bioinformatics and Genomic Systems Engineering, Texas A&M Engineering Experiment Station (TEES), College Station, TX, USA
| | - Edward R Dougherty
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA.,Center for Bioinformatics and Genomic Systems Engineering, Texas A&M Engineering Experiment Station (TEES), College Station, TX, USA
| |
Collapse
|
15
|
Mohsenizadeh DN, Dehghannasiri R, Dougherty ER. Optimal Objective-Based Experimental Design for Uncertain Dynamical Gene Networks with Experimental Error. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:218-230. [PMID: 27576263 PMCID: PMC5845823 DOI: 10.1109/tcbb.2016.2602873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
In systems biology, network models are often used to study interactions among cellular components, a salient aim being to develop drugs and therapeutic mechanisms to change the dynamical behavior of the network to avoid undesirable phenotypes. Owing to limited knowledge, model uncertainty is commonplace and network dynamics can be updated in different ways, thereby giving multiple dynamic trajectories, that is, dynamics uncertainty. In this manuscript, we propose an experimental design method that can effectively reduce the dynamics uncertainty and improve performance in an interaction-based network. Both dynamics uncertainty and experimental error are quantified with respect to the modeling objective, herein, therapeutic intervention. The aim of experimental design is to select among a set of candidate experiments the experiment whose outcome, when applied to the network model, maximally reduces the dynamics uncertainty pertinent to the intervention objective.
Collapse
|
16
|
Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput Biol 2017; 13:e1005466. [PMID: 28570593 PMCID: PMC5453429 DOI: 10.1371/journal.pcbi.1005466] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.
Collapse
Affiliation(s)
- Yuriy Sverchkov
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
17
|
Coveney PV, Dougherty ER, Highfield RR. Big data need big theory too. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:20160153. [PMID: 27698035 PMCID: PMC5052735 DOI: 10.1098/rsta.2016.0153] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/17/2016] [Indexed: 05/07/2023]
Abstract
The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'.
Collapse
Affiliation(s)
- Peter V Coveney
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Edward R Dougherty
- Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843-31283, USA
| | | |
Collapse
|
18
|
Dalton LA, Yousefi MR. Data Requirements for Model-Based Cancer Prognosis Prediction. Cancer Inform 2016; 14:123-38. [PMID: 27127404 PMCID: PMC4844301 DOI: 10.4137/cin.s30801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 02/02/2016] [Accepted: 02/07/2016] [Indexed: 11/20/2022] Open
Abstract
Cancer prognosis prediction is typically carried out without integrating scientific knowledge available on genomic pathways, the effect of drugs on cell dynamics, or modeling mutations in the population. Recent work addresses some of these problems by formulating an uncertainty class of Boolean regulatory models for abnormal gene regulation, assigning prognosis scores to each network based on intervention outcomes, and partitioning networks in the uncertainty class into prognosis classes based on these scores. For a new patient, the probability distribution of the prognosis class was evaluated using optimal Bayesian classification, given patient data. It was assumed that (1) disease is the result of several mutations of a known healthy network and that these mutations and their probability distribution in the population are known and (2) only a single snapshot of the patient's gene activity profile is observed. It was shown that, even in ideal settings where cancer in the population and the effect of a drug are fully modeled, a single static measurement is typically not sufficient. Here, we study what measurements are sufficient to predict prognosis. In particular, we relax assumption (1) by addressing how population data may be used to estimate network probabilities, and extend assumption (2) to include static and time-series measurements of both population and patient data. Furthermore, we extend the prediction of prognosis classes to optimal Bayesian regression of prognosis metrics. Even when time-series data is preferable to infer a stochastic dynamical network, we show that static data can be superior for prognosis prediction when constrained to small samples. Furthermore, although population data is helpful, performance is not sensitive to inaccuracies in the estimated network probabilities.
Collapse
Affiliation(s)
- Lori A. Dalton
- Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
19
|
Mohsenizadeh DN, Hua J, Bittner M, Dougherty ER. Dynamical modeling of uncertain interaction-based genomic networks. BMC Bioinformatics 2015; 16 Suppl 13:S3. [PMID: 26423606 PMCID: PMC4596957 DOI: 10.1186/1471-2105-16-s13-s3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Most dynamical models for genomic networks are built upon two current methodologies, one process-based and the other based on Boolean-type networks. Both are problematic when it comes to experimental design purposes in the laboratory. The first approach requires a comprehensive knowledge of the parameters involved in all biological processes a priori, whereas the results from the second method may not have a biological correspondence and thus cannot be tested in the laboratory. Moreover, the current methods cannot readily utilize existing curated knowledge databases and do not consider uncertainty in the knowledge. Therefore, a new methodology is needed that can generate a dynamical model based on available biological data, assuming uncertainty, while the results from experimental design can be examined in the laboratory. RESULTS We propose a new methodology for dynamical modeling of genomic networks that can utilize the interaction knowledge provided in public databases. The model assigns discrete states for physical entities, sets priorities among interactions based on information provided in the database, and updates each interaction based on associated node states. Whenever uncertainty in dynamics arises, it explores all possible outcomes. By using the proposed model, biologists can study regulation networks that are too complex for manual analysis. CONCLUSIONS The proposed approach can be effectively used for constructing dynamical models of interaction-based genomic networks without requiring a complete knowledge of all parameters affecting the network dynamics, and thus based on a small set of available data.
Collapse
|
20
|
Dehghannasiri R, Yoon BJ, Dougherty ER. Efficient experimental design for uncertainty reduction in gene regulatory networks. BMC Bioinformatics 2015; 16 Suppl 13:S2. [PMID: 26423515 PMCID: PMC4597030 DOI: 10.1186/1471-2105-16-s13-s2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. Results The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Conclusions Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.
Collapse
|