1
|
Xi X, Li J, Jia J, Meng Q, Li C, Wang X, Wei L, Zhang X. A mechanism-informed deep neural network enables prioritization of regulators that drive cell state transitions. Nat Commun 2025; 16:1284. [PMID: 39900922 PMCID: PMC11790924 DOI: 10.1038/s41467-025-56475-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open
Abstract
Cells are regulated at multiple levels, from regulations of individual genes to interactions across multiple genes. Some recent neural network models can connect molecular changes to cellular phenotypes, but their design lacks modeling of regulatory mechanisms, limiting the decoding of regulations behind key cellular events, such as cell state transitions. Here, we present regX, a deep neural network incorporating both gene-level regulation and gene-gene interaction mechanisms, which enables prioritizing potential driver regulators of cell state transitions and providing mechanistic interpretations. Applied to single-cell multi-omics data on type 2 diabetes and hair follicle development, regX reliably prioritizes key transcription factors and candidate cis-regulatory elements that drive cell state transitions. Some regulators reveal potential new therapeutic targets, drug repurposing possibilities, and putative causal single nucleotide polymorphisms. This method to analyze single-cell multi-omics data demonstrates how the interpretable design of neural networks can better decode biological systems.
Collapse
Affiliation(s)
- Xi Xi
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Jiaqi Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Jinmeng Jia
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Qiuchen Meng
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Chen Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Xiaowo Wang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Lei Wei
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST / Department of Automation, Tsinghua University, Beijing, China.
- School of Life Sciences, Tsinghua University, Beijing, China.
| |
Collapse
|
2
|
Qi H, Zhao H, Li E, Lu X, Yu N, Liu J, Han J. DeepQA: A Unified Transcriptome-Based Aging Clock Using Deep Neural Networks. Aging Cell 2025:e14471. [PMID: 39757434 DOI: 10.1111/acel.14471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 11/21/2024] [Accepted: 12/17/2024] [Indexed: 01/07/2025] Open
Abstract
Understanding the complex biological process of aging is of great value, especially as it can help develop therapeutics to prolong healthy life. Predicting biological age from gene expression data has shown to be an effective means to quantify aging of a subject, and to identify molecular and cellular biomarkers of aging. A typical approach for estimating biological age, adopted by almost all existing aging clocks, is to train machine learning models only on healthy subjects, but to infer on both healthy and unhealthy subjects. However, the inherent bias in this approach results in inaccurate biological age as shown in this study. Moreover, almost all existing transcriptome-based aging clocks were built around an inefficient procedure of gene selection followed by conventional machine learning models such as elastic nets, linear discriminant analysis etc. To address these limitations, we proposed DeepQA, a unified aging clock based on mixture of experts. Unlike existing methods, DeepQA is equipped with a specially designed Hinge-Mean-Absolute-Error (Hinge-MAE) loss so that it can train on both healthy and unhealthy subjects of multiple cohorts to reduce the bias of inferring biological age of unhealthy subjects. Our experiments showed that DeepQA significantly outperformed existing methods for biological age estimation on both healthy and unhealthy subjects. In addition, our method avoids the inefficient exhaustive search of genes, and provides a novel means to identify genes activated in aging prediction, alternative to such as differential gene expression analysis.
Collapse
Affiliation(s)
- Hongqian Qi
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China
- College of Pharmacy, Nankai University, Tianjin, China
| | - Hongchen Zhao
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Enyi Li
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Xinyi Lu
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, China
| | - Ningbo Yu
- College of Artificial Intelligence, Nankai University, Tianjin, China
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, China
| | - Jinchao Liu
- College of Artificial Intelligence, Nankai University, Tianjin, China
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, China
| | - Jianda Han
- College of Artificial Intelligence, Nankai University, Tianjin, China
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, Nankai University, China
| |
Collapse
|
3
|
Zeng R, Li Z, Li J, Zhang Q. DNA promoter task-oriented dictionary mining and prediction model based on natural language technology. Sci Rep 2025; 15:153. [PMID: 39747934 PMCID: PMC11697570 DOI: 10.1038/s41598-024-84105-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open
Abstract
Promoters are essential DNA sequences that initiate transcription and regulate gene expression. Precisely identifying promoter sites is crucial for deciphering gene expression patterns and the roles of gene regulatory networks. Recent advancements in bioinformatics have leveraged deep learning and natural language processing (NLP) to enhance promoter prediction accuracy. Techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and BERT models have been particularly impactful. However, current approaches often rely on arbitrary DNA sequence segmentation during BERT pre-training, which may not yield optimal results. To overcome this limitation, this article introduces a novel DNA sequence segmentation method. This approach develops a more refined dictionary for DNA sequences, utilizes it for BERT pre-training, and employs an Inception neural network as the foundational model. This BERT-Inception architecture captures information across multiple granularities. Experimental results show that the model improves the performance of several downstream tasks and introduces deep learning interpretability, providing new perspectives for interpreting and understanding DNA sequence information. The detailed source code is available at https://github.com/katouMegumiH/Promoter_BERT .
Collapse
Affiliation(s)
- Ruolei Zeng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Zihan Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| | - Jialu Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China
| | - Qingchuan Zhang
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| |
Collapse
|
4
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
5
|
Crawford J, Chikina M, Greene CS. Best holdout assessment is sufficient for cancer transcriptomic model selection. PATTERNS (NEW YORK, N.Y.) 2024; 5:101115. [PMID: 39776849 PMCID: PMC11701843 DOI: 10.1016/j.patter.2024.101115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 08/01/2024] [Accepted: 11/13/2024] [Indexed: 01/11/2025]
Abstract
Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones. Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts. We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice versa) and biological contexts (holding out entire cancer types from pan-cancer data). We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength. We did not observe that more regularized signatures generalized better. This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks). When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.
Collapse
Affiliation(s)
- Jake Crawford
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
6
|
Wu Y, Xu P, Wang L, Liu S, Hou Y, Lu H, Hu P, Li X, Yu X. scGO: interpretable deep neural network for cell status annotation and disease diagnosis. Brief Bioinform 2024; 26:bbaf018. [PMID: 39820437 PMCID: PMC11737892 DOI: 10.1093/bib/bbaf018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/16/2024] [Accepted: 01/10/2025] [Indexed: 01/19/2025] Open
Abstract
Machine learning has emerged as a transformative tool for elucidating cellular heterogeneity in single-cell RNA sequencing. However, a significant challenge lies in the "black box" nature of deep learning models, which obscures the decision-making process and limits interpretability in cell status annotation. In this study, we introduced scGO, a Gene Ontology (GO)-inspired deep learning framework designed to provide interpretable cell status annotation for scRNA-seq data. scGO employs sparse neural networks to leverage the intrinsic biological relationships among genes, transcription factors, and GO terms, significantly augmenting interpretability and reducing computational cost. scGO outperforms state-of-the-art methods in the precise characterization of cell subtypes across diverse datasets. Our extensive experimentation across a spectrum of scRNA-seq datasets underscored the remarkable efficacy of scGO in disease diagnosis, prediction of developmental stages, and evaluation of disease severity and cellular senescence status. Furthermore, we incorporated in silico individual gene manipulations into the scGO model, introducing an additional layer for discovering therapeutic targets. Our results provide an interpretable model for accurately annotating cell status, capturing latent biological knowledge, and informing clinical practice.
Collapse
Affiliation(s)
- You Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Pengfei Xu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Liyuan Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Shuai Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Yingnan Hou
- School of Agriculture and Biology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Hui Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| | - Peng Hu
- Ministry of Education, Shanghai Ocean University, No. 999, Huchenghuan Road, Shanghai 201306, China
| | - Xiaofei Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
- Shanghai Pudong New Area People’s Hospital, No. 490, Chuanhuan South Road, Shanghai 201299, China
| | - Xiang Yu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, No. 800 Dong Chuan Road, Shanghai 200240, China
| |
Collapse
|
7
|
Wang J, Wen Y, Zhang Y, Wang Z, Jiang Y, Dai C, Wu L, Leng D, He S, Bo X. An interpretable artificial intelligence framework for designing synthetic lethality-based anti-cancer combination therapies. J Adv Res 2024; 65:329-343. [PMID: 38043609 PMCID: PMC11519055 DOI: 10.1016/j.jare.2023.11.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/27/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023] Open
Abstract
INTRODUCTION Synthetic lethality (SL) provides an opportunity to leverage different genetic interactions when designing synergistic combination therapies. To further explore SL-based combination therapies for cancer treatment, it is important to identify and mechanistically characterize more SL interactions. Artificial intelligence (AI) methods have recently been proposed for SL prediction, but the results of these models are often not interpretable such that deriving the underlying mechanism can be challenging. OBJECTIVES This study aims to develop an interpretable AI framework for SL prediction and subsequently utilize it to design SL-based synergistic combination therapies. METHODS We propose a knowledge and data dual-driven AI framework for SL prediction (KDDSL). Specifically, we use gene knowledge related to the SL mechanism to guide the construction of the model and develop a method to identify the most relevant gene knowledge for the predicted results. RESULTS Experimental and literature-based validation confirmed a good balance between predictive and interpretable ability when using KDDSL. Moreover, we demonstrated that KDDSL could help to discover promising drug combinations and clarify associated biological processes, such as the combination of MDM2 and CDK9 inhibitors, which exhibited significant anti-cancer effects in vitro and in vivo. CONCLUSION These data underscore the potential of KDDSL to guide SL-based combination therapy design. There is a need for biomedicine-focused AI strategies to combine rational biological knowledge with developed models.
Collapse
Affiliation(s)
- Jing Wang
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Zhongming Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Yuyang Jiang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| |
Collapse
|
8
|
Cao P, Dun Y, Xiang X, Wang D, Cheng W, Yan L, Li H. Machine learning-based individualized survival prediction model for prognosis in osteosarcoma: Data from the SEER database. Medicine (Baltimore) 2024; 103:e39582. [PMID: 39331900 PMCID: PMC11441932 DOI: 10.1097/md.0000000000039582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 08/15/2024] [Indexed: 09/29/2024] Open
Abstract
Patient outcomes of osteosarcoma vary because of tumor heterogeneity and treatment strategies. This study aimed to compare the performance of multiple machine learning (ML) models with the traditional Cox proportional hazards (CoxPH) model in predicting prognosis and explored the potential of ML models in clinical decision-making. From 2000 to 2018, 1243 patients with osteosarcoma were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Three ML methods were chosen for model development (DeepSurv, neural multi-task logistic regression [NMTLR]) and random survival forest [RSF]) and compared them with the traditional CoxPH model and TNM staging systems. 871 samples were used for model training, and the rest were used for model validation. The models' overall performance and predictive accuracy for 3- and 5-year survival were assessed by several metrics, including the concordance index (C-index), the Integrated Brier Score (IBS), receiver operating characteristic curves (ROC), area under the ROC curves (AUC), calibration curves, and decision curve analysis. The efficacy of personalized recommendations by ML models was evaluated by the survival curves. The performance was highest in the DeepSurv model (C-index, 0.77; IBS, 0.14; 3-year AUC, 0.80; 5-year AUC, 0.78) compared with other methods (C-index, 0.73-0.74; IBS, 0.16-0.17; 3-year AUC, 0.73-0.78; 5-year AUC, 0.72-0.78). There are also significant differences in survival outcomes between patients who align with the treatment option recommended by the DeepSurv model and those who do not (hazard ratio, 1.88; P < .05). The DeepSurv model is available in an approachable web app format at https://survivalofosteosarcoma.streamlit.app/. We developed ML models capable of accurately predicting the survival of osteosarcoma, which can provide useful information for decision-making regarding the appropriate treatment.
Collapse
Affiliation(s)
- Ping Cao
- Department of Orthopedic, The Frist Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Yixin Dun
- Department of Orthopedic, Tianyou Hospital, Wuhan University of Science and Technology, Wuhan, China
| | - Xi Xiang
- Department of Orthopedic, The Frist Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Daqing Wang
- Department of Orthopedic, The Frist Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Weiyi Cheng
- Department of Emergency General Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lizhao Yan
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Hongjing Li
- Department of Orthopedic, The Frist Affiliated Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
9
|
Venafra V, Sacco F, Perfetto L. SignalingProfiler 2.0 a network-based approach to bridge multi-omics data to phenotypic hallmarks. NPJ Syst Biol Appl 2024; 10:95. [PMID: 39179556 PMCID: PMC11343843 DOI: 10.1038/s41540-024-00417-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 07/31/2024] [Indexed: 08/26/2024] Open
Abstract
Unraveling how cellular signaling is remodeled upon perturbation is crucial for understanding disease mechanisms and identifying potential drug targets. In this pursuit, computational tools generating mechanistic hypotheses from multi-omics data have invaluable potential. Here, we present a newly implemented version (2.0) of SignalingProfiler, a multi-step pipeline to draw mechanistic hypotheses on the signaling events impacting cellular phenotypes. SignalingProfiler 2.0 derives context-specific signaling networks by integrating proteogenomic data with the prior knowledge-causal network. This is a freely accessible and flexible tool that incorporates statistical, footprint-based, and graph algorithms to accelerate the integration and interpretation of multi-omics data. Through a benchmarking process on three proof-of-concept studies, we demonstrate the tool's ability to generate hierarchical mechanistic networks recapitulating novel and known perturbed signaling and phenotypic outcomes, in both human and mice contexts. In summary, SignalingProfiler 2.0 addresses the emergent need to derive biologically relevant information from complex multi-omics data by extracting interpretable networks.
Collapse
Affiliation(s)
- Veronica Venafra
- Ph.D. Program in Cellular and Molecular Biology, Department of Biology, University of Rome 'Tor Vergata', Rome, Italy
- Department of Biology, University of Rome 'Tor Vergata', Rome, Italy
| | - Francesca Sacco
- Department of Biology, University of Rome 'Tor Vergata', Rome, Italy.
| | - Livia Perfetto
- Department of Biology and Biotechnologies 'C.Darwin', University of Rome 'La Sapienza', Rome, Italy.
| |
Collapse
|
10
|
van Hilten A, van Rooij J, Ikram MA, Niessen WJ, van Meurs JBJ, Roshchupkin GV. Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data. NPJ Syst Biol Appl 2024; 10:81. [PMID: 39095438 PMCID: PMC11297229 DOI: 10.1038/s41540-024-00405-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands.
| | - Jeroen van Rooij
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - M Arfan Ikram
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Wiro J Niessen
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Imaging Physics, Delft University of Technology, Delft, The Netherlands
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
- Department of Orthopaedics and Sports Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
11
|
Lobentanzer S, Rodriguez-Mier P, Bauer S, Saez-Rodriguez J. Molecular causality in the advent of foundation models. Mol Syst Biol 2024; 20:848-858. [PMID: 38890548 PMCID: PMC11297329 DOI: 10.1038/s44320-024-00041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 06/20/2024] Open
Abstract
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Collapse
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| | - Pablo Rodriguez-Mier
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| |
Collapse
|
12
|
Chen V, Yang M, Cui W, Kim JS, Talwalkar A, Ma J. Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments. Nat Methods 2024; 21:1454-1461. [PMID: 39122941 PMCID: PMC11348280 DOI: 10.1038/s41592-024-02359-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 06/24/2024] [Indexed: 08/12/2024]
Abstract
Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.
Collapse
Affiliation(s)
- Valerie Chen
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenbo Cui
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Joon Sik Kim
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ameet Talwalkar
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
13
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
14
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
15
|
Meimetis N, Lauffenburger DA, Nilsson A. Inference of drug off-target effects on cellular signaling using interactome-based deep learning. iScience 2024; 27:109509. [PMID: 38591003 PMCID: PMC11000001 DOI: 10.1016/j.isci.2024.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/04/2024] [Accepted: 03/13/2024] [Indexed: 04/10/2024] Open
Abstract
Many diseases emerge from dysregulated cellular signaling, and drugs are often designed to target specific signaling proteins. Off-target effects are, however, common and may ultimately result in failed clinical trials. Here we develop a computer model of the cell's transcriptional response to drugs for improved understanding of their mechanisms of action. The model is based on ensembles of artificial neural networks and simultaneously infers drug-target interactions and their downstream effects on intracellular signaling. With this, it predicts transcription factors' activities, while recovering known drug-target interactions and inferring many new ones, which we validate with an independent dataset. As a case study, we analyze the effects of the drug Lestaurtinib on downstream signaling. Alongside its intended target, FLT3, the model predicts an inhibition of CDK2 that enhances the downregulation of the cell cycle-critical transcription factor FOXM1. Our approach can therefore enhance our understanding of drug signaling for therapeutic design.
Collapse
Affiliation(s)
- Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Douglas A. Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Cell and Molecular Biology, SciLifeLab, Karolinska Institutet, Stockholm, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE 41296, Sweden
| |
Collapse
|
16
|
Girard C. The tri-flow adaptiveness of codes in major evolutionary transitions. Biosystems 2024; 237:105133. [PMID: 38336225 DOI: 10.1016/j.biosystems.2024.105133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/26/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024]
Abstract
Life codes increase in both number and variety with biological complexity. Although our knowledge of codes is constantly expanding, the evolutionary progression of organic, neural, and cultural codes in response to selection pressure remains poorly understood. Greater clarification of the selective mechanisms is achieved by investigating how major evolutionary transitions reduce spatiotemporal and energetic constraints on transmitting heritable code to offspring. Evolution toward less constrained flows is integral to enduring flow architecture everywhere, in both engineered and natural flow systems. Beginning approximately 4 billion years ago, the most basic level for transmitting genetic material to offspring was initiated by protocell division. Evidence from ribosomes suggests that protocells transmitted comma-free or circular codes, preceding the evolution of standard genetic code. This rudimentary information flow within protocells is likely to have first emerged within the geo-energetic and geospatial constraints of hydrothermal vents. A broad-gauged hypothesis is that major evolutionary transitions overcame such constraints with tri-flow adaptations. The interconnected triple flows incorporated energy-converting, spatiotemporal, and code-based informational dynamics. Such tri-flow adaptations stacked sequence splicing code on top of protein-DNA recognition code in eukaryotes, prefiguring the transition to sexual reproduction. Sex overcame the spatiotemporal-energetic constraints of binary fission with further code stacking. Examples are tubulin code and transcription initiation code in vertebrates. In a later evolutionary transition, language reduced metabolic-spatiotemporal constraints on inheritance by stacking phonetic, phonological, and orthographic codes. In organisms that reproduce sexually, each major evolutionary transition is shown to be a tri-flow adaptation that adds new levels of code-based informational exchange. Evolving biological complexity is also shown to increase the nongenetic transmissibility of code.
Collapse
Affiliation(s)
- Chris Girard
- Department of Global and Sociocultural Studies, Florida International University, Miami, FL 33199, United States.
| |
Collapse
|
17
|
Xiong G, Xie N, Nie M, Ling R, Yun B, Xie J, Ren L, Huang Y, Wang W, Yi C, Zhang M, Xu X, Zhang C, Zou B, Zhang L, Liu X, Huang H, Chen D, Cao W, Wang C. Single-cell transcriptomics reveals cell atlas and identifies cycling tumor cells responsible for recurrence in ameloblastoma. Int J Oral Sci 2024; 16:21. [PMID: 38424060 PMCID: PMC10904398 DOI: 10.1038/s41368-024-00281-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 03/02/2024] Open
Abstract
Ameloblastoma is a benign tumor characterized by locally invasive phenotypes, leading to facial bone destruction and a high recurrence rate. However, the mechanisms governing tumor initiation and recurrence are poorly understood. Here, we uncovered cellular landscapes and mechanisms that underlie tumor recurrence in ameloblastoma at single-cell resolution. Our results revealed that ameloblastoma exhibits five tumor subpopulations varying with respect to immune response (IR), bone remodeling (BR), tooth development (TD), epithelial development (ED), and cell cycle (CC) signatures. Of note, we found that CC ameloblastoma cells were endowed with stemness and contributed to tumor recurrence, which was dominated by the EZH2-mediated program. Targeting EZH2 effectively eliminated CC ameloblastoma cells and inhibited tumor growth in ameloblastoma patient-derived organoids. These data described the tumor subpopulation and clarified the identity, function, and regulatory mechanism of CC ameloblastoma cells, providing a potential therapeutic target for ameloblastoma.
Collapse
Affiliation(s)
- Gan Xiong
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Nan Xie
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Min Nie
- Department of Periodontics, Affiliated Stomatology Hospital of Guangzhou Medical University, Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, China
| | - Rongsong Ling
- Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Bokai Yun
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Jiaxiang Xie
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Linlin Ren
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Yaqi Huang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Wenjin Wang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Chen Yi
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Ming Zhang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiuyun Xu
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Caihua Zhang
- Center for Translational Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Bin Zou
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Leitao Zhang
- Department of Oral and Maxillofacial Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Xiqiang Liu
- Department of Oral and Maxillofacial Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hongzhang Huang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Demeng Chen
- Center for Translational Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Wei Cao
- Department of Oral and Maxillofacial & Head and Neck Oncology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- National Center for Stomatology, National Clinical Research Center for Oral diseases, Shanghai Key Laboratory of Stomatology, Shanghai, China.
| | - Cheng Wang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China.
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China.
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
18
|
Holton E, Muskovic W, Powell JE. Deciphering cancer cell state plasticity with single-cell genomics and artificial intelligence. Genome Med 2024; 16:36. [PMID: 38409176 PMCID: PMC10897991 DOI: 10.1186/s13073-024-01309-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 02/21/2024] [Indexed: 02/28/2024] Open
Abstract
Cancer stem cell plasticity refers to the ability of tumour cells to dynamically switch between states-for example, from cancer stem cells to non-cancer stem cell states. Governed by regulatory processes, cells transition through a continuum, with this transition space often referred to as a cell state landscape. Plasticity in cancer cell states leads to divergent biological behaviours, with certain cell states, or state transitions, responsible for tumour progression and therapeutic response. The advent of single-cell assays means these features can now be measured for individual cancer cells and at scale. However, the high dimensionality of this data, complex relationships between genomic features, and a lack of precise knowledge of the genomic profiles defining cancer cell states have opened the door for artificial intelligence methods for depicting cancer cell state landscapes. The contribution of cell state plasticity to cancer phenotypes such as treatment resistance, metastasis, and dormancy has been masked by analysis of 'bulk' genomic data-constituted of the average signal from millions of cells. Single-cell technologies solve this problem by producing a high-dimensional cellular landscape of the tumour ecosystem, quantifying the genomic profiles of individual cells, and creating a more detailed model to investigate cancer plasticity (Genome Res 31:1719, 2021; Semin Cancer Biol 53: 48-58, 2018; Signal Transduct Target Ther 5:1-36, 2020). In conjunction, rapid development in artificial intelligence methods has led to numerous tools that can be employed to study cancer cell plasticity.
Collapse
Affiliation(s)
- Emily Holton
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
- School of Biomedical Science, Faculty of Medicine UNSW Sydney, Kensington, NSW, 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Walter Muskovic
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Joseph E Powell
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia.
- School of Biomedical Science, Faculty of Medicine UNSW Sydney, Kensington, NSW, 2010, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
19
|
Brunnsåker D, Kronström F, Tiukova IA, King RD. Interpreting protein abundance in Saccharomyces cerevisiae through relational learning. Bioinformatics 2024; 40:btae050. [PMID: 38273672 PMCID: PMC10868306 DOI: 10.1093/bioinformatics/btae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 01/27/2024] Open
Abstract
MOTIVATION Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies. RESULTS By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as α-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at the following Github repository: https://github.com/DanielBrunnsaker/ProtPredict.
Collapse
Affiliation(s)
- Daniel Brunnsåker
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Filip Kronström
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Ievgeniia A Tiukova
- Department of Life Sciences, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Industrial Biotechnology, KTH Royal Institute of Technology, Stockholm 106 91, Sweden
| | - Ross D King
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
20
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
21
|
Xie L, Raj Y, Varathan P, He B, Yu M, Nho K, Salama P, Saykin AJ, Yan J. Deep Trans-Omic Network Fusion for Molecular Mechanism of Alzheimer's Disease. J Alzheimers Dis 2024; 99:715-727. [PMID: 38728189 DOI: 10.3233/jad-240098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Background There are various molecular hypotheses regarding Alzheimer's disease (AD) like amyloid deposition, tau propagation, neuroinflammation, and synaptic dysfunction. However, detailed molecular mechanism underlying AD remains elusive. In addition, genetic contribution of these molecular hypothesis is not yet established despite the high heritability of AD. Objective The study aims to enable the discovery of functionally connected multi-omic features through novel integration of multi-omic data and prior functional interactions. Methods We propose a new deep learning model MoFNet with improved interpretability to investigate the AD molecular mechanism and its upstream genetic contributors. MoFNet integrates multi-omic data with prior functional interactions between SNPs, genes, and proteins, and for the first time models the dynamic information flow from DNA to RNA and proteins. Results When evaluated using the ROS/MAP cohort, MoFNet outperformed other competing methods in prediction performance. It identified SNPs, genes, and proteins with significantly more prior functional interactions, resulting in three multi-omic subnetworks. SNP-gene pairs identified by MoFNet were mostly eQTLs specific to frontal cortex tissue where gene/protein data was collected. These molecular subnetworks are enriched in innate immune system, clearance of misfolded proteins, and neurotransmitter release respectively. We validated most findings in an independent dataset. One multi-omic subnetwork consists exclusively of core members of SNARE complex, a key mediator of synaptic vesicle fusion and neurotransmitter transportation. Conclusions Our results suggest that MoFNet is effective in improving classification accuracy and in identifying multi-omic markers for AD with improved interpretability. Multi-omic subnetworks identified by MoFNet provided insights of AD molecular mechanism with improved details.
Collapse
Affiliation(s)
- Linhui Xie
- Department of Electrical and Computer Engineering, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Yash Raj
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Pradeep Varathan
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Bing He
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Meichen Yu
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Paul Salama
- Department of Electrical and Computer Engineering, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Jingwen Yan
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| |
Collapse
|
22
|
Monshizadeh M, Ye Y. Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction. Gut Microbes 2024; 16:2302076. [PMID: 38214657 PMCID: PMC10793686 DOI: 10.1080/19490976.2024.2302076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
We developed MicroKPNN, a prior-knowledge guided interpretable neural network for microbiome-based human host phenotype prediction. The prior knowledge used in MicroKPNN includes the metabolic activities of different bacterial species, phylogenetic relationships, and bacterial community structure, all in a shallow neural network. Application of MicroKPNN to seven gut microbiome datasets (involving five different human diseases including inflammatory bowel disease, type 2 diabetes, liver cirrhosis, colorectal cancer, and obesity) shows that incorporation of the prior knowledge helped improve the microbiome-based host phenotype prediction. MicroKPNN outperformed fully connected neural network-based approaches in all seven cases, with the most improvement of accuracy in the prediction of type 2 diabetes. MicroKPNN outperformed a recently developed deep-learning based approach DeepMicro, which selects the best combination of autoencoder and machine learning approach to make predictions, in all of the seven cases. Importantly, we showed that MicroKPNN provides a way for interpretation of the predictive models. Using importance scores estimated for the hidden nodes, MicroKPNN could provide explanations for prior research findings by highlighting the roles of specific microbiome components in phenotype predictions. In addition, it may suggest potential future research directions for studying the impacts of microbiome on host health and diseases. MicroKPNN is publicly available at https://github.com/mgtools/MicroKPNN.
Collapse
Affiliation(s)
- Mahsa Monshizadeh
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
23
|
YOUSEF M, ALLMER J. Deep learning in bioinformatics. Turk J Biol 2023; 47:366-382. [PMID: 38681776 PMCID: PMC11045206 DOI: 10.55730/1300-0152.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.
Collapse
Affiliation(s)
- Malik YOUSEF
- Department of Information Systems, Zefat Academic College, Zefat,
Israel
| | - Jens ALLMER
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr,
Germany
| |
Collapse
|
24
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
25
|
Tarquino J, Arabyarmohammadi S, Tejada RE, Madabhushi A, Romero E. Intra-nucleus mosaic pattern (InMop) and whole-cell Haralick combined-descriptor for identifying and characterizing acute leukemia blasts on single cell peripheral blood images. Cytometry A 2023; 103:857-867. [PMID: 37565838 PMCID: PMC10841385 DOI: 10.1002/cyto.a.24785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 07/14/2023] [Accepted: 08/08/2023] [Indexed: 08/12/2023]
Abstract
Acute leukemia is usually diagnosed when a test of peripheral blood shows at least 20% of abnormal immature cells (blasts), a figure even lower in case of recurrent cytogenetic abnormalities. Blast identification is crucial for white blood cell (WBC) counting, which depends on both identifying the cell type and characterizing the cellular morphology, processes susceptible of inter- and intraobserver variability. The present work introduces an image combined-descriptor to detect blasts and determine their probable lineage. This strategy uses an intra-nucleus mosaic pattern (InMop) descriptor that captures subtle nuclei differences within WBCs, and Haralick's statistics which quantify the local structure of both nucleus and cytoplasm. The InMop captures WBC inner-nucleus structure by applying a multiscale Shearlet decomposition over a repetitive pattern (mosaic) of automatically-segmented nuclei. As a complement, Haralick's statistics characterize the local structure of the whole cell from an intensity co-occurrence matrix representation. Both InMoP and Haralick-based descriptors are calculated using the b-channel from Lab color-space. The combined-descriptor is assessed by differentiating blasts from nonleukemic cells with support vector machine (SVM) classifiers and different transformation kernels, in two public and independent databases. The first database-D1 (n = 260) is composed of healthy and acute lymphoid leukemia (ALL) single cell images, and second database-D2 contains acute myeloid leukemia (AML) blasts (n = 3294) and nonblast (n = 15,071) cell images. In a first experiment, blasts versus nonblast differentiation is performed by training with a subset of D2 (n = 6588) and testing in D1 (n = 260), obtaining a training AUC of 0.991 ± 0.002 and AUC = 0.782 for the independent validation. A second experiment automatically differentiates AML blasts (260 images from D2) from ALL blasts (260 images from D1), with an AUC of 0.93. In a third experiment, state-of-the-art strategies, VGG16 and RESNEXT convolutional neural networks (CNN), separate blast from nonblast cells in both databases. The VGG16 showed an AUC of 0.673 and the RESNEXT of 0.75. Reported metrics for all the experiments are area under the ROC curve (AUC), accuracy and F1-score.
Collapse
Affiliation(s)
- Jonathan Tarquino
- Computer Imaging and Medical Application Laboratory, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Sara Arabyarmohammadi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA
| | - Rafael Enrique Tejada
- Department of internal medicine, Hemato-oncology unit, Medicine Faculty, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Anant Madabhushi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA
- Atlanta Veterans Medical Center, Atlanta, GA, USA
| | - Eduardo Romero
- Computer Imaging and Medical Application Laboratory, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
26
|
Wang K, Theeke LA, Liao C, Wang N, Lu Y, Xiao D, Xu C. Deep learning analysis of UPLC-MS/MS-based metabolomics data to predict Alzheimer's disease. J Neurol Sci 2023; 453:120812. [PMID: 37776718 DOI: 10.1016/j.jns.2023.120812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/22/2023] [Accepted: 09/14/2023] [Indexed: 10/02/2023]
Abstract
OBJECTIVE Metabolic biomarkers can potentially inform disease progression in Alzheimer's disease (AD). The purpose of this study is to identify and describe a new set of diagnostic biomarkers for developing deep learning (DL) tools to predict AD using Ultra Performance Liquid Chromatography Mass Spectrometry (UPLC-MS/MS)-based metabolomics data. METHODS A total of 177 individuals, including 78 with AD and 99 with cognitive normal (CN), were selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort along with 150 metabolomic biomarkers. We performed feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO). The H2O DL function was used to build multilayer feedforward neural networks to predict AD. RESULTS The LASSO selected 21 metabolic biomarkers. To develop DL models, the 21 biomarkers identified by LASSO were imported into the H2O package. The data was split into 70% for training and 30% for validation. The best DL model with two layers and 18 neurons achieved an accuracy of 0.881, F1-score of 0.892, and AUC of 0.873. Several metabolomic biomarkers involved in glucose and lipid metabolism, in particular bile acid metabolites, were associated with APOE-ε4 allele and clinical biomarkers (Aβ42, tTau, pTau), cognitive assessments [the Alzheimer's Disease Assessment Scale-cognitive subscale 13 (ADAS13), the Mini-Mental State Examination (MMSE)], and hippocampus volume. CONCLUSIONS This study identified a new set of diagnostic metabolomic biomarkers for developing DL tools to predict AD. These biomarkers may help with early diagnosis, prognostic risk stratification, and/or early treatment interventions for patients at risk for AD.
Collapse
Affiliation(s)
- Kesheng Wang
- School of Nursing, Health Sciences Center, West Virginia University, Morgantown, WV 26506, USA.
| | - Laurie A Theeke
- School of Nursing, The George Washington University, Ashburn, VA 20147, USA
| | - Christopher Liao
- Department of Electrical and Computer Engineering, Boston University, MA 02215, USA
| | - Nianyang Wang
- Department of Health Policy and Management, School of Public Health, University of Maryland, College Park, MD 20742, USA
| | - Yongke Lu
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Danqing Xiao
- Department of STEM, School of Arts and Sciences, Regis College, Weston, MA 02493, USA
| | - Chun Xu
- Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA.
| |
Collapse
|
27
|
Esser-Skala W, Fortelny N. Reliable interpretability of biology-inspired deep neural networks. NPJ Syst Biol Appl 2023; 9:50. [PMID: 37816807 PMCID: PMC10564878 DOI: 10.1038/s41540-023-00310-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 09/15/2023] [Indexed: 10/12/2023] Open
Abstract
Deep neural networks display impressive performance but suffer from limited interpretability. Biology-inspired deep learning, where the architecture of the computational graph is based on biological knowledge, enables unique interpretability where real-world concepts are encoded in hidden nodes, which can be ranked by importance and thereby interpreted. In such models trained on single-cell transcriptomes, we previously demonstrated that node-level interpretations lack robustness upon repeated training and are influenced by biases in biological knowledge. Similar studies are missing for related models. Here, we test and extend our methodology for reliable interpretability in P-NET, a biology-inspired model trained on patient mutation data. We observe variability of interpretations and susceptibility to knowledge biases, and identify the network properties that drive interpretation biases. We further present an approach to control the robustness and biases of interpretations, which leads to more specific interpretations. In summary, our study reveals the broad importance of methods to ensure robust and bias-aware interpretability in biology-inspired deep learning.
Collapse
Affiliation(s)
- Wolfgang Esser-Skala
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Nikolaus Fortelny
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
| |
Collapse
|
28
|
Halawani R, Buchert M, Chen YPP. Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity. Comput Biol Med 2023; 164:107274. [PMID: 37506451 DOI: 10.1016/j.compbiomed.2023.107274] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 07/03/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
Tumour heterogeneity is one of the critical confounding aspects in decoding tumour growth. Malignant cells display variations in their gene transcription profiles and mutation spectra even when originating from a single progenitor cell. Single-cell and spatial transcriptomics sequencing have recently emerged as key technologies for unravelling tumour heterogeneity. Single-cell sequencing promotes individual cell-type identification through transcriptome-wide gene expression measurements of each cell. Spatial transcriptomics facilitates identification of cell-cell interactions and the structural organization of heterogeneous cells within a tumour tissue through associating spatial RNA abundance of cells at distinct spots in the tissue section. However, extracting features and analyzing single-cell and spatial transcriptomics data poses challenges. Single-cell transcriptome data is extremely noisy and its sparse nature and dropouts can lead to misinterpretation of gene expression and the misclassification of cell types. Deep learning predictive power can overcome data challenges, provide high-resolution analysis and enhance precision oncology applications that involve early cancer prognosis, diagnosis, patient survival estimation and anti-cancer therapy planning. In this paper, we provide a background to and review of the recent progress of deep learning frameworks to investigate tumour heterogeneity using both single-cell and spatial transcriptomics data types.
Collapse
Affiliation(s)
- Raid Halawani
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Michael Buchert
- School of Cancer Medicine, La Trobe University, Melbourne, Victoria, Australia; Olivia Newton-John Cancer Research Institute, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| |
Collapse
|
29
|
Faure L, Mollet B, Liebermeister W, Faulon JL. A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models. Nat Commun 2023; 14:4669. [PMID: 37537192 PMCID: PMC10400647 DOI: 10.1038/s41467-023-40380-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 07/19/2023] [Indexed: 08/05/2023] Open
Abstract
Constraint-based metabolic models have been used for decades to predict the phenotype of microorganisms in different environments. However, quantitative predictions are limited unless labor-intensive measurements of media uptake fluxes are performed. We show how hybrid neural-mechanistic models can serve as an architecture for machine learning providing a way to improve phenotype predictions. We illustrate our hybrid models with growth rate predictions of Escherichia coli and Pseudomonas putida grown in different media and with phenotype predictions of gene knocked-out Escherichia coli mutants. Our neural-mechanistic models systematically outperform constraint-based models and require training set sizes orders of magnitude smaller than classical machine learning methods. Our hybrid approach opens a doorway to enhancing constraint-based modeling: instead of constraining mechanistic models with additional experimental measurements, our hybrid models grasp the power of machine learning while fulfilling mechanistic constrains, thus saving time and resources in typical systems biology or biological engineering projects.
Collapse
Affiliation(s)
- Léon Faure
- MICALIS Institute, INRAE, AgroParisTech, University of Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Bastien Mollet
- Ecole Normale Supérieure of Lyon, 69342, Lyon, France
- UMR MIA, INRAE, AgroParisTech, University of Paris-Saclay, 91120, Palaiseau, France
| | | | - Jean-Loup Faulon
- MICALIS Institute, INRAE, AgroParisTech, University of Paris-Saclay, 78350, Jouy-en-Josas, France.
- Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN, UK.
| |
Collapse
|
30
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
31
|
Hosseini-Gerami L, Higgins IA, Collier DA, Laing E, Evans D, Broughton H, Bender A. Benchmarking causal reasoning algorithms for gene expression-based compound mechanism of action analysis. BMC Bioinformatics 2023; 24:154. [PMID: 37072707 PMCID: PMC10111792 DOI: 10.1186/s12859-023-05277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/06/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Elucidating compound mechanism of action (MoA) is beneficial to drug discovery, but in practice often represents a significant challenge. Causal Reasoning approaches aim to address this situation by inferring dysregulated signalling proteins using transcriptomics data and biological networks; however, a comprehensive benchmarking of such approaches has not yet been reported. Here we benchmarked four causal reasoning algorithms (SigNet, CausalR, CausalR ScanR and CARNIVAL) with four networks (the smaller Omnipath network vs. 3 larger MetaBase™ networks), using LINCS L1000 and CMap microarray data, and assessed to what extent each factor dictated the successful recovery of direct targets and compound-associated signalling pathways in a benchmark dataset comprising 269 compounds. We additionally examined impact on performance in terms of the functions and roles of protein targets and their connectivity bias in the prior knowledge networks. RESULTS According to statistical analysis (negative binomial model), the combination of algorithm and network most significantly dictated the performance of causal reasoning algorithms, with the SigNet recovering the greatest number of direct targets. With respect to the recovery of signalling pathways, CARNIVAL with the Omnipath network was able to recover the most informative pathways containing compound targets, based on the Reactome pathway hierarchy. Additionally, CARNIVAL, SigNet and CausalR ScanR all outperformed baseline gene expression pathway enrichment results. We found no significant difference in performance between L1000 data or microarray data, even when limited to just 978 'landmark' genes. Notably, all causal reasoning algorithms also outperformed pathway recovery based on input DEGs, despite these often being used for pathway enrichment. Causal reasoning methods performance was somewhat correlated with connectivity and biological role of the targets. CONCLUSIONS Overall, we conclude that causal reasoning performs well at recovering signalling proteins related to compound MoA upstream from gene expression changes by leveraging prior knowledge networks, and that the choice of network and algorithm has a profound impact on the performance of causal reasoning algorithms. Based on the analyses presented here this is true for both microarray-based gene expression data as well as those based on the L1000 platform.
Collapse
Affiliation(s)
- Layla Hosseini-Gerami
- Department of Chemistry, Centre for Molecular Informatics, Cambridge, UK
- Ignota Labs, London, UK
| | | | - David A Collier
- Eli Lilly and Company, Bracknell, UK
- Social, Genetic and Developmental Psychiatry Centre, IoPPN, Kings's College London, London, UK
- Genetic and Genomic Consulting Ltd, Farnham, UK
| | - Emma Laing
- Eli Lilly and Company, Bracknell, UK
- GSK, Stevenage, UK
| | - David Evans
- Eli Lilly and Company, Bracknell, UK
- DeepMind, London, UK
| | - Howard Broughton
- Centre de Investigación, Eli Lilly and Company, Alcobendas, Spain
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, Cambridge, UK.
| |
Collapse
|
32
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
33
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
34
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
35
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
36
|
Ghosh Roy G, Geard N, Verspoor K, He S. MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk. Bioinformatics 2022; 38:5026-5032. [PMID: 36124954 DOI: 10.1093/bioinformatics/btac636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 08/04/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with a lack of interpretability. More interpretable visible neural network architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. RESULTS We propose a novel Mutated Pathway Visible Neural Network (MPVNN) architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar-sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that is important in risk prediction for particular cancer types, is reliable. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/gourabghoshroy/MPVNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gourab Ghosh Roy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia.,School of Computing Technologies, RMIT University, Melbourne 3000, Australia
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
37
|
Kolmar L, Autour A, Ma X, Vergier B, Eduati F, Merten CA. Technological and computational advances driving high-throughput oncology. Trends Cell Biol 2022; 32:947-961. [PMID: 35577671 DOI: 10.1016/j.tcb.2022.04.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/11/2022] [Accepted: 04/20/2022] [Indexed: 01/21/2023]
Abstract
Engineering and computational advances have opened many new avenues in cancer research, particularly when being exploited in interdisciplinary approaches. For example, the combination of microfluidics, novel sequencing technologies, and computational analyses has been crucial to enable single-cell assays, giving a detailed picture of tumor heterogeneity for the very first time. In a similar way, these 'tech' disciplines have been elementary for generating large data sets in multidimensional cancer 'omics' approaches, cell-cell interaction screens, 3D tumor models, and tissue level analyses. In this review we summarize the most important technology and computational developments that have been or will be instrumental for transitioning classical cancer research to a large data-driven, high-throughput, high-content discipline across all biological scales.
Collapse
Affiliation(s)
- Leonie Kolmar
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alexis Autour
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland; European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Xiaoli Ma
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Blandine Vergier
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Federica Eduati
- Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands; Institute for Complex Molecular Systems, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands.
| | - Christoph A Merten
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| |
Collapse
|
38
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
39
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
40
|
Zhao X, Lan Y, Chen D. Exploring long non-coding RNA networks from single cell omics data. Comput Struct Biotechnol J 2022; 20:4381-4389. [PMID: 36051880 PMCID: PMC9403499 DOI: 10.1016/j.csbj.2022.08.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
|
41
|
Garrido‐Rodriguez M, Zirngibl K, Ivanova O, Lobentanzer S, Saez‐Rodriguez J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol Syst Biol 2022; 18:e11036. [PMID: 35880747 PMCID: PMC9316933 DOI: 10.15252/msb.202211036] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/12/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
Signal transduction governs cellular behavior, and its dysregulation often leads to human disease. To understand this process, we can use network models based on prior knowledge, where nodes represent biomolecules, usually proteins, and edges indicate interactions between them. Several computational methods combine untargeted omics data with prior knowledge to estimate the state of signaling networks in specific biological scenarios. Here, we review, compare, and classify recent network approaches according to their characteristics in terms of input omics data, prior knowledge and underlying methodologies. We highlight existing challenges in the field, such as the general lack of ground truth and the limitations of prior knowledge. We also point out new omics developments that may have a profound impact, such as single-cell proteomics or large-scale profiling of protein conformational changes. We provide both an introduction for interested users seeking strategies to study cell signaling on a large scale and an update for seasoned modelers.
Collapse
Affiliation(s)
- Martin Garrido‐Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Katharina Zirngibl
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Olga Ivanova
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| | - Julio Saez‐Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University HospitalInstitute for Computational Biomedicine, BioquantHeidelbergGermany
| |
Collapse
|
42
|
Wei Z, Han D, Zhang C, Wang S, Liu J, Chao F, Song Z, Chen G. Deep Learning-Based Multi-Omics Integration Robustly Predicts Relapse in Prostate Cancer. Front Oncol 2022; 12:893424. [PMID: 35814412 PMCID: PMC9259796 DOI: 10.3389/fonc.2022.893424] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/13/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivePost-operative biochemical relapse (BCR) continues to occur in a significant percentage of patients with localized prostate cancer (PCa). Current stratification methods are not adequate to identify high-risk patients. The present study exploits the ability of deep learning (DL) algorithms using the H2O package to combine multi-omics data to resolve this problem.MethodsFive-omics data from 417 PCa patients from The Cancer Genome Atlas (TCGA) were used to construct the DL-based, relapse-sensitive model. Among them, 265 (63.5%) individuals experienced BCR. Five additional independent validation sets were applied to assess its predictive robustness. Bioinformatics analyses of two relapse-associated subgroups were then performed for identification of differentially expressed genes (DEGs), enriched pathway analysis, copy number analysis and immune cell infiltration analysis.ResultsThe DL-based model, with a significant difference (P = 6e-9) between two subgroups and good concordance index (C-index = 0.767), were proven to be robust by external validation. 1530 DEGs including 678 up- and 852 down-regulated genes were identified in the high-risk subgroup S2 compared with the low-risk subgroup S1. Enrichment analyses found five hallmark gene sets were up-regulated while 13 were down-regulated. Then, we found that DNA damage repair pathways were significantly enriched in the S2 subgroup. CNV analysis showed that 30.18% of genes were significantly up-regulated and gene amplification on chromosomes 7 and 8 was significantly elevated in the S2 subgroup. Moreover, enrichment analysis revealed that some DEGs and pathways were associated with immunity. Three tumor-infiltrating immune cell (TIIC) groups with a higher proportion in the S2 subgroup (p = 1e-05, p = 8.7e-06, p = 0.00014) and one TIIC group with a higher proportion in the S1 subgroup (P = 1.3e-06) were identified.ConclusionWe developed a novel, robust classification for understanding PCa relapse. This study validated the effectiveness of deep learning technique in prognosis prediction, and the method may benefit patients and prevent relapse by improving early detection and advancing early intervention.
Collapse
Affiliation(s)
- Ziwei Wei
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Dunsheng Han
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Cong Zhang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Shiyu Wang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Jinke Liu
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Fan Chao
- Department of Urology, Zhongshan Hospital, Fudan University (Xiamen Branch), Xiamen, China
| | - Zhenyu Song
- Ovarian Cancer Program, Department of Gynecologic Oncology, Zhongshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| | - Gang Chen
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| |
Collapse
|
43
|
Nilsson A, Peters JM, Meimetis N, Bryson B, Lauffenburger DA. Artificial neural networks enable genome-scale simulations of intracellular signaling. Nat Commun 2022; 13:3069. [PMID: 35654811 PMCID: PMC9163072 DOI: 10.1038/s41467-022-30684-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 05/11/2022] [Indexed: 12/14/2022] Open
Abstract
Mammalian cells adapt their functional state in response to external signals in form of ligands that bind receptors on the cell-surface. Mechanistically, this involves signal-processing through a complex network of molecular interactions that govern transcription factor activity patterns. Computer simulations of the information flow through this network could help predict cellular responses in health and disease. Here we develop a recurrent neural network framework constrained by prior knowledge of the signaling network with ligand-concentrations as input and transcription factor-activity as output. Applied to synthetic data, it predicts unseen test-data (Pearson correlation r = 0.98) and the effects of gene knockouts (r = 0.8). We stimulate macrophages with 59 different ligands, with and without the addition of lipopolysaccharide, and collect transcriptomics data. The framework predicts this data under cross-validation (r = 0.8) and knockout simulations suggest a role for RIPK1 in modulating the lipopolysaccharide response. This work demonstrates the feasibility of genome-scale simulations of intracellular signaling.
Collapse
Affiliation(s)
- Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE 41296, Sweden
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Joshua M Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bryan Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA.
| |
Collapse
|
44
|
Lee D, Kim S. Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells. Clin Exp Pediatr 2022; 65:239-249. [PMID: 34844399 PMCID: PMC9082244 DOI: 10.3345/cep.2021.01438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 11/27/2022] Open
Abstract
Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.
Collapse
Affiliation(s)
- Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
- Institute of Engineering Research, Seoul National University, Seoul, Korea
- AIGENDRUG Co., Ltd., Seoul, Korea
| |
Collapse
|
45
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
46
|
Gundogdu P, Loucera C, Alamo-Alvarez I, Dopazo J, Nepomuceno I. Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Min 2022; 15:1. [PMID: 34980200 PMCID: PMC8722116 DOI: 10.1186/s13040-021-00285-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00285-4.
Collapse
Affiliation(s)
- Pelin Gundogdu
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Inmaculada Alamo-Alvarez
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Joaquin Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain. .,FPS/ELIXIR-es, Hospital Virgen del Rocío, 42013, Sevilla, Spain.
| | - Isabel Nepomuceno
- Department of Computer Languages and Systems, Universidad de Sevilla, Sevilla, Spain.
| |
Collapse
|
47
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
48
|
Shao D, Dai Y, Li N, Cao X, Zhao W, Cheng L, Rong Z, Huang L, Wang Y, Zhao J. Artificial intelligence in clinical research of cancers. Brief Bioinform 2021; 23:6470966. [PMID: 34929741 PMCID: PMC8769909 DOI: 10.1093/bib/bbab523] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/06/2021] [Accepted: 11/13/2021] [Indexed: 12/16/2022] Open
Abstract
Several factors, including advances in computational algorithms, the availability of high-performance computing hardware, and the assembly of large community-based databases, have led to the extensive application of Artificial Intelligence (AI) in the biomedical domain for nearly 20 years. AI algorithms have attained expert-level performance in cancer research. However, only a few AI-based applications have been approved for use in the real world. Whether AI will eventually be capable of replacing medical experts has been a hot topic. In this article, we first summarize the cancer research status using AI in the past two decades, including the consensus on the procedure of AI based on an ideal paradigm and current efforts of the expertise and domain knowledge. Next, the available data of AI process in the biomedical domain are surveyed. Then, we review the methods and applications of AI in cancer clinical research categorized by the data types including radiographic imaging, cancer genome, medical records, drug information and biomedical literatures. At last, we discuss challenges in moving AI from theoretical research to real-world cancer research applications and the perspectives toward the future realization of AI participating cancer treatment.
Collapse
Affiliation(s)
- Dan Shao
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Yinfei Dai
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Nianfeng Li
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Xuqing Cao
- Department of Neurology, People's Hospital of Ningxia Hui Autonomous Region (The Affiliated people's Hospital of Ningxia Medical University and The First Affiliated Hospital of Northwest Minzu University), Yinchuan 750002, China
| | - Wei Zhao
- Department of Biochemistry and Molecular Biology, Ningxia Medical University, Yinchuan 750002, China
| | - Li Cheng
- Department of Electrical Diagnosis, Affiliated Hospital of Changchun University of Traditional Chinese Medicine, Changchun, 130021, China
| | - Zhuqing Rong
- School of Science, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, 43210, USA
| |
Collapse
|
49
|
Scherer P, Trębacz M, Simidjievski N, Viñas R, Shams Z, Terre HA, Jamnik M, Liò P. Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases. Bioinformatics 2021; 38:1320-1327. [PMID: 34888618 PMCID: PMC8826027 DOI: 10.1093/bioinformatics/btab830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction (PPI) networks to guide the construction of predictive models. RESULTS We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Scherer
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK,To whom correspondence should be addressed.
| | - Maja Trębacz
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Nikola Simidjievski
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Ramon Viñas
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Zohreh Shams
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Helena Andres Terre
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mateja Jamnik
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| |
Collapse
|
50
|
Bao S, Li K, Yan C, Zhang Z, Qu J, Zhou M. Deep learning-based advances and applications for single-cell RNA-sequencing data analysis. Brief Bioinform 2021; 23:6444320. [PMID: 34849562 DOI: 10.1093/bib/bbab473] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 11/14/2022] Open
Abstract
The rapid development of single-cell RNA-sequencing (scRNA-seq) technology has raised significant computational and analytical challenges. The application of deep learning to scRNA-seq data analysis is rapidly evolving and can overcome the unique challenges in upstream (quality control and normalization) and downstream (cell-, gene- and pathway-level) analysis of scRNA-seq data. In the present study, recent advances and applications of deep learning-based methods, together with specific tools for scRNA-seq data analysis, were summarized. Moreover, the future perspectives and challenges of deep-learning techniques regarding the appropriate analysis and interpretation of scRNA-seq data were investigated. The present study aimed to provide evidence supporting the biomedical application of deep learning-based tools and may aid biologists and bioinformaticians in navigating this exciting and fast-moving area.
Collapse
Affiliation(s)
- Siqi Bao
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jia Qu
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| |
Collapse
|