1
|
Wu H, Shi W, Wang MD. Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning. BMC Med Inform Decis Mak 2024; 24:137. [PMID: 38802809 PMCID: PMC11129385 DOI: 10.1186/s12911-024-02510-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND Modeling causality through graphs, referred to as causal graph learning, offers an appropriate description of the dynamics of causality. The majority of current machine learning models in clinical decision support systems only predict associations between variables, whereas causal graph learning models causality dynamics through graphs. However, building personalized causal graphs for each individual is challenging due to the limited amount of data available for each patient. METHOD In this study, we present a new algorithmic framework using meta-learning for learning personalized causal graphs in biomedicine. Our framework extracts common patterns from multiple patient graphs and applies this information to develop individualized graphs. In multi-task causal graph learning, the proposed optimized initial guess of shared commonality enables the rapid adoption of knowledge to new tasks for efficient causal graph learning. RESULTS Experiments on one real-world biomedical causal graph learning benchmark data and four synthetic benchmarks show that our algorithm outperformed the baseline methods. Our algorithm can better understand the underlying patterns in the data, leading to more accurate predictions of the causal graph. Specifically, we reduce the structural hamming distance by 50-75%, indicating an improvement in graph prediction accuracy. Additionally, the false discovery rate is decreased by 20-30%, demonstrating that our algorithm made fewer incorrect predictions compared to the baseline algorithms. CONCLUSION To the best of our knowledge, this is the first study to demonstrate the effectiveness of meta-learning in personalized causal graph learning and cause inference modeling for biomedicine. In addition, the proposed algorithm can also be generalized to transnational research areas where integrated analysis is necessary for various distributions of datasets, including different clinical institutions.
Collapse
Affiliation(s)
- Hang Wu
- Coulter Department of Biomedical Engineering, Georgia Insitute of Technology, Atlanta, USA
| | - Wenqi Shi
- Department of Electrical and Computer Engineering, Georgia Insitute of Technology, Atlanta, USA
| | - May D Wang
- Coulter Department of Biomedical Engineering, Georgia Insitute of Technology, Atlanta, USA.
| |
Collapse
|
2
|
Yang L, Lin W, Leng S. Conditional cross-map-based technique: From pairwise dynamical causality to causal network reconstruction. CHAOS (WOODBURY, N.Y.) 2023; 33:2894465. [PMID: 37276551 DOI: 10.1063/5.0144310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 05/08/2023] [Indexed: 06/07/2023]
Abstract
Causality detection methods based on mutual cross mapping have been fruitfully developed and applied to data originating from nonlinear dynamical systems, where the causes and effects are non-separable. However, these pairwise methods still have shortcomings in discriminating typical network structures, including common drivers, indirect dependencies, and facing the curse of dimensionality, when they are stepping to causal network reconstruction. A few endeavors have been devoted to conquer these shortcomings. Here, we propose a novel method that could be regarded as one of these endeavors. Our method, named conditional cross-map-based technique, can eliminate third-party information and successfully detect direct dynamical causality, where the detection results can exactly be categorized into four standard normal forms by the designed criterion. To demonstrate the practical usefulness of our model-free, data-driven method, data generated from different representative models covering all kinds of network motifs and measured from real-world systems are investigated. Because correct identification of the direct causal links is essential to successful modeling, predicting, and controlling the underlying complex systems, our method does shed light on uncovering the inner working mechanisms of real-world systems only using the data experimentally obtained in a variety of disciplines.
Collapse
Affiliation(s)
- Liufei Yang
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Wei Lin
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- School of Mathematical Sciences and Shanghai Centre for Mathematical Sciences, Fudan University, Shanghai 200433, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Siyang Leng
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of AI and Robotics, Academy for Engineering and Technology, Fudan University, Shanghai 200433, China
| |
Collapse
|
3
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
4
|
Lu J, Dumitrascu B, McDowell IC, Jo B, Barrera A, Hong LK, Leichter SM, Reddy TE, Engelhardt BE. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput Biol 2021; 17:e1008223. [PMID: 33513136 PMCID: PMC7875426 DOI: 10.1371/journal.pcbi.1008223] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 02/10/2021] [Accepted: 08/07/2020] [Indexed: 11/19/2022] Open
Abstract
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.
Collapse
Affiliation(s)
- Jonathan Lu
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Ian C. McDowell
- Element Genomics, A UCB Company, Durham, North Carolina, United States of America
| | - Brian Jo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Linda K. Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Sarah M. Leichter
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Timothy E. Reddy
- Department of Genome Sciences, Duke University, Durham, North Carolina, United States of America
| | - Barbara E. Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
5
|
Djordjilović V, Chiogna M, Romualdi C. Simulating gene silencing through intervention analysis. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Almeida RJ, Adriaans G, Shapovalova Y. Graphical Causal Models and Imputing Missing Data: A Preliminary Study. INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS 2020. [PMCID: PMC7274349 DOI: 10.1007/978-3-030-50146-4_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Real-world datasets often contain many missing values due to several reasons. This is usually an issue since many learning algorithms require complete datasets. In certain cases, there are constraints in the real world problem that create difficulties in continuously observing all data. In this paper, we investigate if graphical causal models can be used to impute missing values and derive additional information on the uncertainty of the imputed values. Our goal is to use the information from a complete dataset in the form of graphical causal models to impute missing values in an incomplete dataset. This assumes that the datasets have the same data generating process. Furthermore, we calculate the probability of each missing data value belonging to a specified percentile. We present a preliminary study on the proposed method using synthetic data, where we can control the causal relations and missing values.
Collapse
|
7
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
8
|
Sethi T, Maheshwari S, Nagori A, Lodha R. Stewarding antibiotic stewardship in intensive care units with Bayesian artificial intelligence. Wellcome Open Res 2018. [DOI: 10.12688/wellcomeopenres.14629.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Emerging antimicrobial resistance (AMR) is a global threat to life. Injudicious use of antibiotics is the biggest driver of resistance evolution, creating selection pressures on micro-organisms. Intensive care units (ICUs) are the strongest contributors to this pressure, owing to high infection and antibiotic usage rates. Antimicrobial stewardship programs aim to control antibiotic use; however, these are mostly limited to descriptive statistics. Genomic analyses lie at the other extreme of the value-spectrum, and together these factors predispose to siloing of knowledge arising from AMR stewardship. In this study, we bridged the value-gap at a Pediatric ICU by creating Bayesian network (BN) artificial intelligence models with potential impacts on antibiotic stewardship. Methods, actionable insights and an interactive dashboard for BN analysis upon data observed over 3 years at the PICU are described. BNs have several desirable properties for reasoning from data, including interpretability, expert knowledge injection and quantitative inference. Our pipeline leverages best practices of enforcing statistical rigor through bootstrapping, ensemble averaging and Monte Carlo simulations. Competing, shared and independent drug resistances were discovered through the presence of network motifs in BNs. Inferences guided by these visual models are also discussed, such as increasing the sensitivity testing for chloramphenicol as a potential mechanism of avoiding ertapenem overuse in the PICU. Organism, tissue and temporal influences on drug co-resistances are also discussed. While the model represents inferences that are tailored to the site, BNs are excellent tools for building upon pre-learnt structures, hence the model and inferences were wrapped into an interactive dashboard not only deployed at the site, but also made openly available to the community via GitHub. Shared repositories of such models could be a viable alternative to raw-data sharing and could promote partnering, learning across sites and charting a joint course for antimicrobial stewardship programs in the race against AMR.
Collapse
|
9
|
Djordjilović V, Chiogna M, Vomlel J. An empirical comparison of popular structure learning algorithms with a view to gene network inference. Int J Approx Reason 2017. [DOI: 10.1016/j.ijar.2016.12.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput Biol 2017; 13:e1005466. [PMID: 28570593 PMCID: PMC5453429 DOI: 10.1371/journal.pcbi.1005466] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.
Collapse
Affiliation(s)
- Yuriy Sverchkov
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
11
|
Monneret G, Jaffrézic F, Rau A, Zerjal T, Nuel G. Identification of marginal causal relationships in gene networks from observational and interventional expression data. PLoS One 2017; 12:e0171142. [PMID: 28301504 PMCID: PMC5354375 DOI: 10.1371/journal.pone.0171142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 01/01/2017] [Indexed: 11/29/2022] Open
Abstract
Causal network inference is an important methodological challenge in biology as well as other areas of application. Although several causal network inference methods have been proposed in recent years, they are typically applicable for only a small number of genes, due to the large number of parameters to be estimated and the limited number of biological replicates available. In this work, we consider the specific case of transcriptomic studies made up of both observational and interventional data in which a single gene of biological interest is knocked out. We focus on a marginal causal estimation approach, based on the framework of Gaussian directed acyclic graphs, to infer causal relationships between the knocked-out gene and a large set of other genes. In a simulation study, we found that our proposed method accurately differentiates between downstream causal relationships and those that are upstream or simply associative. It also enables an estimation of the total causal effects between the gene of interest and the remaining genes. Our method performed very similarly to a classical differential analysis for experiments with a relatively large number of biological replicates, but has the advantage of providing a formal causal interpretation. Our proposed marginal causal approach is computationally efficient and may be applied to several thousands of genes simultaneously. In addition, it may help highlight subsets of genes of interest for a more thorough subsequent causal network inference. The method is implemented in an R package called MarginalCausality (available on GitHub).
Collapse
Affiliation(s)
- Gilles Monneret
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
- * E-mail:
| | - Florence Jaffrézic
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Andrea Rau
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Tatiana Zerjal
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Grégory Nuel
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
12
|
Hartmann AK, Nuel G. Using Triplet Ordering Preferences for Estimating Causal Effects in the Analysis of Gene Expression Data. PLoS One 2017; 12:e0170514. [PMID: 28141825 PMCID: PMC5283676 DOI: 10.1371/journal.pone.0170514] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 01/05/2017] [Indexed: 12/04/2022] Open
Abstract
Triplet ordering preferences are used to perform Monte Carlo sampling of the posterior causal orderings originating from the analysis of gene-expression experiments involving observation as well as, usually few, interventions, like knock-outs. The performance of this sampling approach is compared to a previously used sampling via pairwise ordering preference as well as to the sampling of the full posterior distribution. For a fair comparison, the latter approach is restricted to twice the numerical effort of the triplet-based approach. This is done for artificially generated causal, i.e., directed acyclic graphs (DAGs) and for actual experimental data taken from the ROSETTA challenge. The sampling using the triplets ordering turns out to be superior to both other approaches.
Collapse
Affiliation(s)
| | - Grégory Nuel
- LPMA, CNRS 7599, Université Pierre et Marie Curie, Paris, France
| |
Collapse
|
13
|
Cho H, Berger B, Peng J. Reconstructing Causal Biological Networks through Active Learning. PLoS One 2016; 11:e0150611. [PMID: 26930205 PMCID: PMC4773135 DOI: 10.1371/journal.pone.0150611] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 02/16/2016] [Indexed: 11/28/2022] Open
Abstract
Reverse-engineering of biological networks is a central problem in systems biology. The use of intervention data, such as gene knockouts or knockdowns, is typically used for teasing apart causal relationships among genes. Under time or resource constraints, one needs to carefully choose which intervention experiments to carry out. Previous approaches for selecting most informative interventions have largely been focused on discrete Bayesian networks. However, continuous Bayesian networks are of great practical interest, especially in the study of complex biological systems and their quantitative properties. In this work, we present an efficient, information-theoretic active learning algorithm for Gaussian Bayesian networks (GBNs), which serve as important models for gene regulatory networks. In addition to providing linear-algebraic insights unique to GBNs, leading to significant runtime improvements, we demonstrate the effectiveness of our method on data simulated with GBNs and the DREAM4 network inference challenge data sets. Our method generally leads to faster recovery of underlying network structure and faster convergence to final distribution of confidence scores over candidate graph structures using the full data, in comparison to random selection of intervention experiments.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
- Department of Mathematics, MIT, Cambridge, MA, United States of America
- * E-mail: (BB); (JP)
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, United States of America
- Department of Mathematics, MIT, Cambridge, MA, United States of America
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- * E-mail: (BB); (JP)
| |
Collapse
|