1
|
Zhang M, Zhang X, Dai M, Wu L, Liu K, Wang H, Chen W, Liu M, Hu Y. Development and validation of a Multi-Causal investigation and discovery framework for knowledge harmonization (MINDMerge): A case study with acute kidney injury risk factor discovery using electronic medical records. Int J Med Inform 2024; 191:105588. [PMID: 39128399 DOI: 10.1016/j.ijmedinf.2024.105588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 07/28/2024] [Accepted: 08/04/2024] [Indexed: 08/13/2024]
Abstract
OBJECTIVE Accurate diagnoses and personalized treatments in medicine rely on identifying causality. However, existing causal discovery algorithms often yield inconsistent results due to distinct learning mechanisms. To address this challenge, we introduce MINDMerge, a multi-causal investigation and discovery framework designed to synthesize causal graphs from various algorithms. METHODS MINDMerge integrates five causal models to reconcile inconsistencies arising from different algorithms. Employing credibility weighting and a novel cycle-breaking mechanism in causal networks, we initially developed and tested MINDMerge using three synthetic networks. Subsequently, we validated its effectiveness in discovering risk factors and predicting acute kidney injury (AKI) using two electronic medical records (EMR) datasets, eICU Collaborative Research Database and MIMIC-III Database. Causal reasoning was employed to analyze the relationships between risk factors and AKI. The identified causal risk factors of AKI were used in building a prediction model, and the prediction model was evaluated using the area under the receiver operating characteristics curve (AUC) and recall. RESULTS Synthetic data experiments demonstrated that our model outperformed significantly in capturing ground-truth network structure compared to other causal models. Application of MINDMerge on real-world data revealed direct connections of pulmonary disease, hypertension, diabetes, x-ray assessment, and BUN with AKI. With the identified variables, AKI risk can be inferred at the individual level based on established BNs and prior information. Compared against existing benchmark models, MINDMerge maintained a higher AUC for AKI prediction in both internal (AUC: 0.832) and external network validations (AUC: 0.861). CONCLUSION MINDMerge can identify causal risk factors of AKI, serving as a valuable diagnostic tool for clinical decision-making and facilitating effective intervention.
Collapse
Affiliation(s)
- Mingyang Zhang
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; School of Management, Jinan University, Guangzhou 510632, PR China
| | - Xiangzhou Zhang
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; School of Medicine, Jinan University, Guangzhou 510632, PR China
| | - Mingyang Dai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; College of Information Science and Technology, Jinan University, Guangzhou 510632, PR China
| | - Lijuan Wu
- nstitute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou 519041, PR China; Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou 519041, PR China
| | - Kang Liu
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; School of Management, Jinan University, Guangzhou 510632, PR China
| | - Hongnian Wang
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; School of Management, Jinan University, Guangzhou 510632, PR China
| | - Weiqi Chen
- School of Computer Science, Guangdong Polytechnic Normal University, 510632, PR China.
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, 32610, USA.
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou 510632, PR China; School of Medicine, Jinan University, Guangzhou 510632, PR China.
| |
Collapse
|
2
|
Ji R, Geng Y, Quan X. Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Sci Rep 2024; 14:21342. [PMID: 39266676 PMCID: PMC11393083 DOI: 10.1038/s41598-024-71864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 09/02/2024] [Indexed: 09/14/2024] Open
Abstract
Inferring gene regulatory networks through deep learning and causal inference methods is a crucial task in the field of computational biology and bioinformatics. This study presents a novel approach that uses a Graph Convolutional Network (GCN) guided by causal information to infer Gene Regulatory Networks (GRN). The transfer entropy and reconstruction layer are utilized to achieve causal feature reconstruction, mitigating the information loss problem caused by multiple rounds of neighbor aggregation in GCN, resulting in a causal and integrated representation of node features. Separable features are extracted from gene expression data by the Gaussian-kernel Autoencoder to improve computational efficiency. Experimental results on the DREAM5 and the mDC dataset demonstrate that our method exhibits superior performance compared to existing algorithms, as indicated by the higher values of the AUPRC metrics. Furthermore, the incorporation of causal feature reconstruction enhances the inferred GRN, rendering them more reasonable, accurate, and reliable.
Collapse
Affiliation(s)
- Ruirui Ji
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China.
- Key Laboratory of Shaanxi Province for Complex System Control and Intelligent Information Processing, Xi'an, 710048, Shaanxi, China.
| | - Yi Geng
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| | - Xin Quan
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
3
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
4
|
Gao Z, Tang J, Xia J, Zheng CH, Wei PJ. CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2853-2861. [PMID: 37267145 DOI: 10.1109/tcbb.2023.3282212] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Collapse
|
5
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:bbac219. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
6
|
Selected Artificial Intelligence Methods in the Risk Analysis of Damage to Masonry Buildings Subject to Long-Term Underground Mining Exploitation. MINERALS 2021. [DOI: 10.3390/min11090958] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
This paper presents an advanced computational approach to assess the risk of damage to masonry buildings subjected to negative kinematic impacts of underground mining exploitation. The research goals were achieved using selected tools from the area of artificial intelligence (AI) methods. Ultimately, two models of damage risk assessment were built using the Naive Bayes classifier (NBC) and Bayesian Networks (BN). The first model was used to compare results obtained using the more computationally advanced Bayesian network methodology. In the case of the Bayesian network, the unknown Directed Acyclic Graph (DAG) structure was extracted using Chow-Liu’s Tree Augmented Naive Bayes (TAN-CL) algorithm. Thus, one of the methods involving Bayesian Network Structure Learning from data (BNSL) was implemented. The application of this approach represents a novel scientific contribution in the interdisciplinary field of mining and civil engineering. The models created were verified with respect to quality of fit to observed data and generalization properties. The connections in the Bayesian network structure obtained were also verified with respect to the observed relations occurring in engineering practice concerning the assessment of the damage intensity to masonry buildings in mining areas. This allowed evaluation of the model and justified the utility of the conducted research in the field of protection of mining areas. The possibility of universal application of the Bayesian network, both in the case of damage prediction and diagnosis of its potential causes, was also pointed out.
Collapse
|