1
|
Lobentanzer S, Rodriguez-Mier P, Bauer S, Saez-Rodriguez J. Molecular causality in the advent of foundation models. Mol Syst Biol 2024:10.1038/s44320-024-00041-w. [PMID: 38890548 DOI: 10.1038/s44320-024-00041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 06/20/2024] Open
Abstract
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Collapse
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| | - Pablo Rodriguez-Mier
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| |
Collapse
|
2
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
3
|
Meimetis N, Lauffenburger DA, Nilsson A. Inference of drug off-target effects on cellular signaling using interactome-based deep learning. iScience 2024; 27:109509. [PMID: 38591003 PMCID: PMC11000001 DOI: 10.1016/j.isci.2024.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/04/2024] [Accepted: 03/13/2024] [Indexed: 04/10/2024] Open
Abstract
Many diseases emerge from dysregulated cellular signaling, and drugs are often designed to target specific signaling proteins. Off-target effects are, however, common and may ultimately result in failed clinical trials. Here we develop a computer model of the cell's transcriptional response to drugs for improved understanding of their mechanisms of action. The model is based on ensembles of artificial neural networks and simultaneously infers drug-target interactions and their downstream effects on intracellular signaling. With this, it predicts transcription factors' activities, while recovering known drug-target interactions and inferring many new ones, which we validate with an independent dataset. As a case study, we analyze the effects of the drug Lestaurtinib on downstream signaling. Alongside its intended target, FLT3, the model predicts an inhibition of CDK2 that enhances the downregulation of the cell cycle-critical transcription factor FOXM1. Our approach can therefore enhance our understanding of drug signaling for therapeutic design.
Collapse
Affiliation(s)
- Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Douglas A. Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Cell and Molecular Biology, SciLifeLab, Karolinska Institutet, Stockholm, Sweden
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE 41296, Sweden
| |
Collapse
|
4
|
Girard C. The tri-flow adaptiveness of codes in major evolutionary transitions. Biosystems 2024; 237:105133. [PMID: 38336225 DOI: 10.1016/j.biosystems.2024.105133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/26/2024] [Accepted: 01/27/2024] [Indexed: 02/12/2024]
Abstract
Life codes increase in both number and variety with biological complexity. Although our knowledge of codes is constantly expanding, the evolutionary progression of organic, neural, and cultural codes in response to selection pressure remains poorly understood. Greater clarification of the selective mechanisms is achieved by investigating how major evolutionary transitions reduce spatiotemporal and energetic constraints on transmitting heritable code to offspring. Evolution toward less constrained flows is integral to enduring flow architecture everywhere, in both engineered and natural flow systems. Beginning approximately 4 billion years ago, the most basic level for transmitting genetic material to offspring was initiated by protocell division. Evidence from ribosomes suggests that protocells transmitted comma-free or circular codes, preceding the evolution of standard genetic code. This rudimentary information flow within protocells is likely to have first emerged within the geo-energetic and geospatial constraints of hydrothermal vents. A broad-gauged hypothesis is that major evolutionary transitions overcame such constraints with tri-flow adaptations. The interconnected triple flows incorporated energy-converting, spatiotemporal, and code-based informational dynamics. Such tri-flow adaptations stacked sequence splicing code on top of protein-DNA recognition code in eukaryotes, prefiguring the transition to sexual reproduction. Sex overcame the spatiotemporal-energetic constraints of binary fission with further code stacking. Examples are tubulin code and transcription initiation code in vertebrates. In a later evolutionary transition, language reduced metabolic-spatiotemporal constraints on inheritance by stacking phonetic, phonological, and orthographic codes. In organisms that reproduce sexually, each major evolutionary transition is shown to be a tri-flow adaptation that adds new levels of code-based informational exchange. Evolving biological complexity is also shown to increase the nongenetic transmissibility of code.
Collapse
Affiliation(s)
- Chris Girard
- Department of Global and Sociocultural Studies, Florida International University, Miami, FL 33199, United States.
| |
Collapse
|
5
|
Xiong G, Xie N, Nie M, Ling R, Yun B, Xie J, Ren L, Huang Y, Wang W, Yi C, Zhang M, Xu X, Zhang C, Zou B, Zhang L, Liu X, Huang H, Chen D, Cao W, Wang C. Single-cell transcriptomics reveals cell atlas and identifies cycling tumor cells responsible for recurrence in ameloblastoma. Int J Oral Sci 2024; 16:21. [PMID: 38424060 PMCID: PMC10904398 DOI: 10.1038/s41368-024-00281-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 03/02/2024] Open
Abstract
Ameloblastoma is a benign tumor characterized by locally invasive phenotypes, leading to facial bone destruction and a high recurrence rate. However, the mechanisms governing tumor initiation and recurrence are poorly understood. Here, we uncovered cellular landscapes and mechanisms that underlie tumor recurrence in ameloblastoma at single-cell resolution. Our results revealed that ameloblastoma exhibits five tumor subpopulations varying with respect to immune response (IR), bone remodeling (BR), tooth development (TD), epithelial development (ED), and cell cycle (CC) signatures. Of note, we found that CC ameloblastoma cells were endowed with stemness and contributed to tumor recurrence, which was dominated by the EZH2-mediated program. Targeting EZH2 effectively eliminated CC ameloblastoma cells and inhibited tumor growth in ameloblastoma patient-derived organoids. These data described the tumor subpopulation and clarified the identity, function, and regulatory mechanism of CC ameloblastoma cells, providing a potential therapeutic target for ameloblastoma.
Collapse
Affiliation(s)
- Gan Xiong
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Nan Xie
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Min Nie
- Department of Periodontics, Affiliated Stomatology Hospital of Guangzhou Medical University, Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, China
| | - Rongsong Ling
- Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Bokai Yun
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Jiaxiang Xie
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Linlin Ren
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Yaqi Huang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Wenjin Wang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Chen Yi
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Ming Zhang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Xiuyun Xu
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Caihua Zhang
- Center for Translational Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Bin Zou
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Leitao Zhang
- Department of Oral and Maxillofacial Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Xiqiang Liu
- Department of Oral and Maxillofacial Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hongzhang Huang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Demeng Chen
- Center for Translational Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Wei Cao
- Department of Oral and Maxillofacial & Head and Neck Oncology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- National Center for Stomatology, National Clinical Research Center for Oral diseases, Shanghai Key Laboratory of Stomatology, Shanghai, China.
| | - Cheng Wang
- Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China.
- Guangdong Provincial Key Laboratory of Stomatology, Sun Yat-sen University, Guangzhou, China.
- Guanghua School of Stomatology, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
6
|
Holton E, Muskovic W, Powell JE. Deciphering cancer cell state plasticity with single-cell genomics and artificial intelligence. Genome Med 2024; 16:36. [PMID: 38409176 PMCID: PMC10897991 DOI: 10.1186/s13073-024-01309-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 02/21/2024] [Indexed: 02/28/2024] Open
Abstract
Cancer stem cell plasticity refers to the ability of tumour cells to dynamically switch between states-for example, from cancer stem cells to non-cancer stem cell states. Governed by regulatory processes, cells transition through a continuum, with this transition space often referred to as a cell state landscape. Plasticity in cancer cell states leads to divergent biological behaviours, with certain cell states, or state transitions, responsible for tumour progression and therapeutic response. The advent of single-cell assays means these features can now be measured for individual cancer cells and at scale. However, the high dimensionality of this data, complex relationships between genomic features, and a lack of precise knowledge of the genomic profiles defining cancer cell states have opened the door for artificial intelligence methods for depicting cancer cell state landscapes. The contribution of cell state plasticity to cancer phenotypes such as treatment resistance, metastasis, and dormancy has been masked by analysis of 'bulk' genomic data-constituted of the average signal from millions of cells. Single-cell technologies solve this problem by producing a high-dimensional cellular landscape of the tumour ecosystem, quantifying the genomic profiles of individual cells, and creating a more detailed model to investigate cancer plasticity (Genome Res 31:1719, 2021; Semin Cancer Biol 53: 48-58, 2018; Signal Transduct Target Ther 5:1-36, 2020). In conjunction, rapid development in artificial intelligence methods has led to numerous tools that can be employed to study cancer cell plasticity.
Collapse
Affiliation(s)
- Emily Holton
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
- School of Biomedical Science, Faculty of Medicine UNSW Sydney, Kensington, NSW, 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Walter Muskovic
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Joseph E Powell
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW, 2010, Australia.
- School of Biomedical Science, Faculty of Medicine UNSW Sydney, Kensington, NSW, 2010, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
7
|
Brunnsåker D, Kronström F, Tiukova IA, King RD. Interpreting protein abundance in Saccharomyces cerevisiae through relational learning. Bioinformatics 2024; 40:btae050. [PMID: 38273672 PMCID: PMC10868306 DOI: 10.1093/bioinformatics/btae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 01/27/2024] Open
Abstract
MOTIVATION Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies. RESULTS By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as α-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances. AVAILABILITY AND IMPLEMENTATION All data and processing scripts are available at the following Github repository: https://github.com/DanielBrunnsaker/ProtPredict.
Collapse
Affiliation(s)
- Daniel Brunnsåker
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Filip Kronström
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| | - Ievgeniia A Tiukova
- Department of Life Sciences, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Industrial Biotechnology, KTH Royal Institute of Technology, Stockholm 106 91, Sweden
| | - Ross D King
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| |
Collapse
|
8
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
9
|
Xie L, Raj Y, Varathan P, He B, Yu M, Nho K, Salama P, Saykin AJ, Yan J. Deep Trans-Omic Network Fusion for Molecular Mechanism of Alzheimer's Disease. J Alzheimers Dis 2024; 99:715-727. [PMID: 38728189 DOI: 10.3233/jad-240098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Background There are various molecular hypotheses regarding Alzheimer's disease (AD) like amyloid deposition, tau propagation, neuroinflammation, and synaptic dysfunction. However, detailed molecular mechanism underlying AD remains elusive. In addition, genetic contribution of these molecular hypothesis is not yet established despite the high heritability of AD. Objective The study aims to enable the discovery of functionally connected multi-omic features through novel integration of multi-omic data and prior functional interactions. Methods We propose a new deep learning model MoFNet with improved interpretability to investigate the AD molecular mechanism and its upstream genetic contributors. MoFNet integrates multi-omic data with prior functional interactions between SNPs, genes, and proteins, and for the first time models the dynamic information flow from DNA to RNA and proteins. Results When evaluated using the ROS/MAP cohort, MoFNet outperformed other competing methods in prediction performance. It identified SNPs, genes, and proteins with significantly more prior functional interactions, resulting in three multi-omic subnetworks. SNP-gene pairs identified by MoFNet were mostly eQTLs specific to frontal cortex tissue where gene/protein data was collected. These molecular subnetworks are enriched in innate immune system, clearance of misfolded proteins, and neurotransmitter release respectively. We validated most findings in an independent dataset. One multi-omic subnetwork consists exclusively of core members of SNARE complex, a key mediator of synaptic vesicle fusion and neurotransmitter transportation. Conclusions Our results suggest that MoFNet is effective in improving classification accuracy and in identifying multi-omic markers for AD with improved interpretability. Multi-omic subnetworks identified by MoFNet provided insights of AD molecular mechanism with improved details.
Collapse
Affiliation(s)
- Linhui Xie
- Department of Electrical and Computer Engineering, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Yash Raj
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Pradeep Varathan
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Bing He
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Meichen Yu
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Paul Salama
- Department of Electrical and Computer Engineering, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| | - Jingwen Yan
- Department of BioHealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, USA
- Indiana Alzheimer's Disease Research Center, Indianapolis, IN, USA
| |
Collapse
|
10
|
Monshizadeh M, Ye Y. Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction. Gut Microbes 2024; 16:2302076. [PMID: 38214657 PMCID: PMC10793686 DOI: 10.1080/19490976.2024.2302076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
We developed MicroKPNN, a prior-knowledge guided interpretable neural network for microbiome-based human host phenotype prediction. The prior knowledge used in MicroKPNN includes the metabolic activities of different bacterial species, phylogenetic relationships, and bacterial community structure, all in a shallow neural network. Application of MicroKPNN to seven gut microbiome datasets (involving five different human diseases including inflammatory bowel disease, type 2 diabetes, liver cirrhosis, colorectal cancer, and obesity) shows that incorporation of the prior knowledge helped improve the microbiome-based host phenotype prediction. MicroKPNN outperformed fully connected neural network-based approaches in all seven cases, with the most improvement of accuracy in the prediction of type 2 diabetes. MicroKPNN outperformed a recently developed deep-learning based approach DeepMicro, which selects the best combination of autoencoder and machine learning approach to make predictions, in all of the seven cases. Importantly, we showed that MicroKPNN provides a way for interpretation of the predictive models. Using importance scores estimated for the hidden nodes, MicroKPNN could provide explanations for prior research findings by highlighting the roles of specific microbiome components in phenotype predictions. In addition, it may suggest potential future research directions for studying the impacts of microbiome on host health and diseases. MicroKPNN is publicly available at https://github.com/mgtools/MicroKPNN.
Collapse
Affiliation(s)
- Mahsa Monshizadeh
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
11
|
YOUSEF M, ALLMER J. Deep learning in bioinformatics. Turk J Biol 2023; 47:366-382. [PMID: 38681776 PMCID: PMC11045206 DOI: 10.55730/1300-0152.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.
Collapse
Affiliation(s)
- Malik YOUSEF
- Department of Information Systems, Zefat Academic College, Zefat,
Israel
| | - Jens ALLMER
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr,
Germany
| |
Collapse
|
12
|
Wang J, Wen Y, Zhang Y, Wang Z, Jiang Y, Dai C, Wu L, Leng D, He S, Bo X. An interpretable artificial intelligence framework for designing synthetic lethality-based anti-cancer combination therapies. J Adv Res 2023:S2090-1232(23)00374-0. [PMID: 38043609 DOI: 10.1016/j.jare.2023.11.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/27/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023] Open
Abstract
INTRODUCTION Synthetic lethality (SL) provides an opportunity to leverage different genetic interactions when designing synergistic combination therapies. To further explore SL-based combination therapies for cancer treatment, it is important to identify and mechanistically characterize more SL interactions. Artificial intelligence (AI) methods have recently been proposed for SL prediction, but the results of these models are often not interpretable such that deriving the underlying mechanism can be challenging. OBJECTIVES This study aims to develop an interpretable AI framework for SL prediction and subsequently utilize it to design SL-based synergistic combination therapies. METHODS We propose a knowledge and data dual-driven AI framework for SL prediction (KDDSL). Specifically, we use gene knowledge related to the SL mechanism to guide the construction of the model and develop a method to identify the most relevant gene knowledge for the predicted results. RESULTS Experimental and literature-based validation confirmed a good balance between predictive and interpretable ability when using KDDSL. Moreover, we demonstrated that KDDSL could help to discover promising drug combinations and clarify associated biological processes, such as the combination of MDM2 and CDK9 inhibitors, which exhibited significant anti-cancer effects in vitro and in vivo. CONCLUSION These data underscore the potential of KDDSL to guide SL-based combination therapy design. There is a need for biomedicine-focused AI strategies to combine rational biological knowledge with developed models.
Collapse
Affiliation(s)
- Jing Wang
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Zhongming Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Yuyang Jiang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| |
Collapse
|
13
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
14
|
Tarquino J, Arabyarmohammadi S, Tejada RE, Madabhushi A, Romero E. Intra-nucleus mosaic pattern (InMop) and whole-cell Haralick combined-descriptor for identifying and characterizing acute leukemia blasts on single cell peripheral blood images. Cytometry A 2023; 103:857-867. [PMID: 37565838 PMCID: PMC10841385 DOI: 10.1002/cyto.a.24785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 07/14/2023] [Accepted: 08/08/2023] [Indexed: 08/12/2023]
Abstract
Acute leukemia is usually diagnosed when a test of peripheral blood shows at least 20% of abnormal immature cells (blasts), a figure even lower in case of recurrent cytogenetic abnormalities. Blast identification is crucial for white blood cell (WBC) counting, which depends on both identifying the cell type and characterizing the cellular morphology, processes susceptible of inter- and intraobserver variability. The present work introduces an image combined-descriptor to detect blasts and determine their probable lineage. This strategy uses an intra-nucleus mosaic pattern (InMop) descriptor that captures subtle nuclei differences within WBCs, and Haralick's statistics which quantify the local structure of both nucleus and cytoplasm. The InMop captures WBC inner-nucleus structure by applying a multiscale Shearlet decomposition over a repetitive pattern (mosaic) of automatically-segmented nuclei. As a complement, Haralick's statistics characterize the local structure of the whole cell from an intensity co-occurrence matrix representation. Both InMoP and Haralick-based descriptors are calculated using the b-channel from Lab color-space. The combined-descriptor is assessed by differentiating blasts from nonleukemic cells with support vector machine (SVM) classifiers and different transformation kernels, in two public and independent databases. The first database-D1 (n = 260) is composed of healthy and acute lymphoid leukemia (ALL) single cell images, and second database-D2 contains acute myeloid leukemia (AML) blasts (n = 3294) and nonblast (n = 15,071) cell images. In a first experiment, blasts versus nonblast differentiation is performed by training with a subset of D2 (n = 6588) and testing in D1 (n = 260), obtaining a training AUC of 0.991 ± 0.002 and AUC = 0.782 for the independent validation. A second experiment automatically differentiates AML blasts (260 images from D2) from ALL blasts (260 images from D1), with an AUC of 0.93. In a third experiment, state-of-the-art strategies, VGG16 and RESNEXT convolutional neural networks (CNN), separate blast from nonblast cells in both databases. The VGG16 showed an AUC of 0.673 and the RESNEXT of 0.75. Reported metrics for all the experiments are area under the ROC curve (AUC), accuracy and F1-score.
Collapse
Affiliation(s)
- Jonathan Tarquino
- Computer Imaging and Medical Application Laboratory, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Sara Arabyarmohammadi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA
| | - Rafael Enrique Tejada
- Department of internal medicine, Hemato-oncology unit, Medicine Faculty, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Anant Madabhushi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA
- Atlanta Veterans Medical Center, Atlanta, GA, USA
| | - Eduardo Romero
- Computer Imaging and Medical Application Laboratory, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
15
|
Wang K, Theeke LA, Liao C, Wang N, Lu Y, Xiao D, Xu C. Deep learning analysis of UPLC-MS/MS-based metabolomics data to predict Alzheimer's disease. J Neurol Sci 2023; 453:120812. [PMID: 37776718 DOI: 10.1016/j.jns.2023.120812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/22/2023] [Accepted: 09/14/2023] [Indexed: 10/02/2023]
Abstract
OBJECTIVE Metabolic biomarkers can potentially inform disease progression in Alzheimer's disease (AD). The purpose of this study is to identify and describe a new set of diagnostic biomarkers for developing deep learning (DL) tools to predict AD using Ultra Performance Liquid Chromatography Mass Spectrometry (UPLC-MS/MS)-based metabolomics data. METHODS A total of 177 individuals, including 78 with AD and 99 with cognitive normal (CN), were selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort along with 150 metabolomic biomarkers. We performed feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO). The H2O DL function was used to build multilayer feedforward neural networks to predict AD. RESULTS The LASSO selected 21 metabolic biomarkers. To develop DL models, the 21 biomarkers identified by LASSO were imported into the H2O package. The data was split into 70% for training and 30% for validation. The best DL model with two layers and 18 neurons achieved an accuracy of 0.881, F1-score of 0.892, and AUC of 0.873. Several metabolomic biomarkers involved in glucose and lipid metabolism, in particular bile acid metabolites, were associated with APOE-ε4 allele and clinical biomarkers (Aβ42, tTau, pTau), cognitive assessments [the Alzheimer's Disease Assessment Scale-cognitive subscale 13 (ADAS13), the Mini-Mental State Examination (MMSE)], and hippocampus volume. CONCLUSIONS This study identified a new set of diagnostic metabolomic biomarkers for developing DL tools to predict AD. These biomarkers may help with early diagnosis, prognostic risk stratification, and/or early treatment interventions for patients at risk for AD.
Collapse
Affiliation(s)
- Kesheng Wang
- School of Nursing, Health Sciences Center, West Virginia University, Morgantown, WV 26506, USA.
| | - Laurie A Theeke
- School of Nursing, The George Washington University, Ashburn, VA 20147, USA
| | - Christopher Liao
- Department of Electrical and Computer Engineering, Boston University, MA 02215, USA
| | - Nianyang Wang
- Department of Health Policy and Management, School of Public Health, University of Maryland, College Park, MD 20742, USA
| | - Yongke Lu
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Danqing Xiao
- Department of STEM, School of Arts and Sciences, Regis College, Weston, MA 02493, USA
| | - Chun Xu
- Department of Health and Biomedical Sciences, College of Health Professions, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA.
| |
Collapse
|
16
|
Esser-Skala W, Fortelny N. Reliable interpretability of biology-inspired deep neural networks. NPJ Syst Biol Appl 2023; 9:50. [PMID: 37816807 PMCID: PMC10564878 DOI: 10.1038/s41540-023-00310-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 09/15/2023] [Indexed: 10/12/2023] Open
Abstract
Deep neural networks display impressive performance but suffer from limited interpretability. Biology-inspired deep learning, where the architecture of the computational graph is based on biological knowledge, enables unique interpretability where real-world concepts are encoded in hidden nodes, which can be ranked by importance and thereby interpreted. In such models trained on single-cell transcriptomes, we previously demonstrated that node-level interpretations lack robustness upon repeated training and are influenced by biases in biological knowledge. Similar studies are missing for related models. Here, we test and extend our methodology for reliable interpretability in P-NET, a biology-inspired model trained on patient mutation data. We observe variability of interpretations and susceptibility to knowledge biases, and identify the network properties that drive interpretation biases. We further present an approach to control the robustness and biases of interpretations, which leads to more specific interpretations. In summary, our study reveals the broad importance of methods to ensure robust and bias-aware interpretability in biology-inspired deep learning.
Collapse
Affiliation(s)
- Wolfgang Esser-Skala
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria
| | - Nikolaus Fortelny
- Computational Systems Biology Group, Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Straße 34, 5020, Salzburg, Austria.
| |
Collapse
|
17
|
Halawani R, Buchert M, Chen YPP. Deep learning exploration of single-cell and spatially resolved cancer transcriptomics to unravel tumour heterogeneity. Comput Biol Med 2023; 164:107274. [PMID: 37506451 DOI: 10.1016/j.compbiomed.2023.107274] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 07/03/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023]
Abstract
Tumour heterogeneity is one of the critical confounding aspects in decoding tumour growth. Malignant cells display variations in their gene transcription profiles and mutation spectra even when originating from a single progenitor cell. Single-cell and spatial transcriptomics sequencing have recently emerged as key technologies for unravelling tumour heterogeneity. Single-cell sequencing promotes individual cell-type identification through transcriptome-wide gene expression measurements of each cell. Spatial transcriptomics facilitates identification of cell-cell interactions and the structural organization of heterogeneous cells within a tumour tissue through associating spatial RNA abundance of cells at distinct spots in the tissue section. However, extracting features and analyzing single-cell and spatial transcriptomics data poses challenges. Single-cell transcriptome data is extremely noisy and its sparse nature and dropouts can lead to misinterpretation of gene expression and the misclassification of cell types. Deep learning predictive power can overcome data challenges, provide high-resolution analysis and enhance precision oncology applications that involve early cancer prognosis, diagnosis, patient survival estimation and anti-cancer therapy planning. In this paper, we provide a background to and review of the recent progress of deep learning frameworks to investigate tumour heterogeneity using both single-cell and spatial transcriptomics data types.
Collapse
Affiliation(s)
- Raid Halawani
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Michael Buchert
- School of Cancer Medicine, La Trobe University, Melbourne, Victoria, Australia; Olivia Newton-John Cancer Research Institute, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| |
Collapse
|
18
|
Faure L, Mollet B, Liebermeister W, Faulon JL. A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models. Nat Commun 2023; 14:4669. [PMID: 37537192 PMCID: PMC10400647 DOI: 10.1038/s41467-023-40380-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 07/19/2023] [Indexed: 08/05/2023] Open
Abstract
Constraint-based metabolic models have been used for decades to predict the phenotype of microorganisms in different environments. However, quantitative predictions are limited unless labor-intensive measurements of media uptake fluxes are performed. We show how hybrid neural-mechanistic models can serve as an architecture for machine learning providing a way to improve phenotype predictions. We illustrate our hybrid models with growth rate predictions of Escherichia coli and Pseudomonas putida grown in different media and with phenotype predictions of gene knocked-out Escherichia coli mutants. Our neural-mechanistic models systematically outperform constraint-based models and require training set sizes orders of magnitude smaller than classical machine learning methods. Our hybrid approach opens a doorway to enhancing constraint-based modeling: instead of constraining mechanistic models with additional experimental measurements, our hybrid models grasp the power of machine learning while fulfilling mechanistic constrains, thus saving time and resources in typical systems biology or biological engineering projects.
Collapse
Affiliation(s)
- Léon Faure
- MICALIS Institute, INRAE, AgroParisTech, University of Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Bastien Mollet
- Ecole Normale Supérieure of Lyon, 69342, Lyon, France
- UMR MIA, INRAE, AgroParisTech, University of Paris-Saclay, 91120, Palaiseau, France
| | | | - Jean-Loup Faulon
- MICALIS Institute, INRAE, AgroParisTech, University of Paris-Saclay, 78350, Jouy-en-Josas, France.
- Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN, UK.
| |
Collapse
|
19
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland.
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP, UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| |
Collapse
|
20
|
Hosseini-Gerami L, Higgins IA, Collier DA, Laing E, Evans D, Broughton H, Bender A. Benchmarking causal reasoning algorithms for gene expression-based compound mechanism of action analysis. BMC Bioinformatics 2023; 24:154. [PMID: 37072707 PMCID: PMC10111792 DOI: 10.1186/s12859-023-05277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/06/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Elucidating compound mechanism of action (MoA) is beneficial to drug discovery, but in practice often represents a significant challenge. Causal Reasoning approaches aim to address this situation by inferring dysregulated signalling proteins using transcriptomics data and biological networks; however, a comprehensive benchmarking of such approaches has not yet been reported. Here we benchmarked four causal reasoning algorithms (SigNet, CausalR, CausalR ScanR and CARNIVAL) with four networks (the smaller Omnipath network vs. 3 larger MetaBase™ networks), using LINCS L1000 and CMap microarray data, and assessed to what extent each factor dictated the successful recovery of direct targets and compound-associated signalling pathways in a benchmark dataset comprising 269 compounds. We additionally examined impact on performance in terms of the functions and roles of protein targets and their connectivity bias in the prior knowledge networks. RESULTS According to statistical analysis (negative binomial model), the combination of algorithm and network most significantly dictated the performance of causal reasoning algorithms, with the SigNet recovering the greatest number of direct targets. With respect to the recovery of signalling pathways, CARNIVAL with the Omnipath network was able to recover the most informative pathways containing compound targets, based on the Reactome pathway hierarchy. Additionally, CARNIVAL, SigNet and CausalR ScanR all outperformed baseline gene expression pathway enrichment results. We found no significant difference in performance between L1000 data or microarray data, even when limited to just 978 'landmark' genes. Notably, all causal reasoning algorithms also outperformed pathway recovery based on input DEGs, despite these often being used for pathway enrichment. Causal reasoning methods performance was somewhat correlated with connectivity and biological role of the targets. CONCLUSIONS Overall, we conclude that causal reasoning performs well at recovering signalling proteins related to compound MoA upstream from gene expression changes by leveraging prior knowledge networks, and that the choice of network and algorithm has a profound impact on the performance of causal reasoning algorithms. Based on the analyses presented here this is true for both microarray-based gene expression data as well as those based on the L1000 platform.
Collapse
Affiliation(s)
- Layla Hosseini-Gerami
- Department of Chemistry, Centre for Molecular Informatics, Cambridge, UK
- Ignota Labs, London, UK
| | | | - David A Collier
- Eli Lilly and Company, Bracknell, UK
- Social, Genetic and Developmental Psychiatry Centre, IoPPN, Kings's College London, London, UK
- Genetic and Genomic Consulting Ltd, Farnham, UK
| | - Emma Laing
- Eli Lilly and Company, Bracknell, UK
- GSK, Stevenage, UK
| | - David Evans
- Eli Lilly and Company, Bracknell, UK
- DeepMind, London, UK
| | - Howard Broughton
- Centre de Investigación, Eli Lilly and Company, Alcobendas, Spain
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, Cambridge, UK.
| |
Collapse
|
21
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
22
|
Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023; 24:125-137. [PMID: 36192604 DOI: 10.1038/s41576-022-00532-2] [Citation(s) in RCA: 63] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2022] [Indexed: 01/24/2023]
Abstract
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.
Collapse
Affiliation(s)
- Gherman Novakovsky
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Nick Dexter
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.,School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. .,Canadian Institute for Advanced Research, Toronto, Ontario, Canada.
| |
Collapse
|
23
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
24
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
25
|
Ghosh Roy G, Geard N, Verspoor K, He S. MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk. Bioinformatics 2022; 38:5026-5032. [PMID: 36124954 DOI: 10.1093/bioinformatics/btac636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 08/04/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with a lack of interpretability. More interpretable visible neural network architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. RESULTS We propose a novel Mutated Pathway Visible Neural Network (MPVNN) architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar-sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that is important in risk prediction for particular cancer types, is reliable. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/gourabghoshroy/MPVNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gourab Ghosh Roy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia.,School of Computing Technologies, RMIT University, Melbourne 3000, Australia
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
26
|
Kolmar L, Autour A, Ma X, Vergier B, Eduati F, Merten CA. Technological and computational advances driving high-throughput oncology. Trends Cell Biol 2022; 32:947-961. [PMID: 35577671 DOI: 10.1016/j.tcb.2022.04.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/11/2022] [Accepted: 04/20/2022] [Indexed: 01/21/2023]
Abstract
Engineering and computational advances have opened many new avenues in cancer research, particularly when being exploited in interdisciplinary approaches. For example, the combination of microfluidics, novel sequencing technologies, and computational analyses has been crucial to enable single-cell assays, giving a detailed picture of tumor heterogeneity for the very first time. In a similar way, these 'tech' disciplines have been elementary for generating large data sets in multidimensional cancer 'omics' approaches, cell-cell interaction screens, 3D tumor models, and tissue level analyses. In this review we summarize the most important technology and computational developments that have been or will be instrumental for transitioning classical cancer research to a large data-driven, high-throughput, high-content discipline across all biological scales.
Collapse
Affiliation(s)
- Leonie Kolmar
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alexis Autour
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland; European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Xiaoli Ma
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Blandine Vergier
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Federica Eduati
- Department of Biomedical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands; Institute for Complex Molecular Systems, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands.
| | - Christoph A Merten
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| |
Collapse
|
27
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
28
|
Abstract
![]()
AlphaFold has burst into our lives. A powerful algorithm
that underscores
the strength of biological sequence data and artificial intelligence
(AI). AlphaFold has appended projects and research directions. The
database it has been creating promises an untold number of applications
with vast potential impacts that are still difficult to surmise. AI
approaches can revolutionize personalized treatments and usher in
better-informed clinical trials. They promise to make giant leaps
toward reshaping and revamping drug discovery strategies, selecting
and prioritizing combinations of drug targets. Here, we briefly overview
AI in structural biology, including in molecular dynamics simulations
and prediction of microbiota–human protein–protein interactions.
We highlight the advancements accomplished by the deep-learning-powered
AlphaFold in protein structure prediction and their powerful impact
on the life sciences. At the same time, AlphaFold does not resolve
the decades-long protein folding challenge, nor does it identify the
folding pathways. The models that AlphaFold provides do not capture
conformational mechanisms like frustration and allostery, which are
rooted in ensembles, and controlled by their dynamic distributions.
Allostery and signaling are properties of populations. AlphaFold also
does not generate ensembles of intrinsically disordered proteins and
regions, instead describing them by their low structural probabilities.
Since AlphaFold generates single ranked structures, rather than conformational
ensembles, it cannot elucidate the mechanisms of allosteric activating
driver hotspot mutations nor of allosteric drug resistance. However,
by capturing key features, deep learning techniques can use the single
predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
29
|
Zhao X, Lan Y, Chen D. Exploring long non-coding RNA networks from single cell omics data. Comput Struct Biotechnol J 2022; 20:4381-4389. [PMID: 36051880 PMCID: PMC9403499 DOI: 10.1016/j.csbj.2022.08.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/01/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
|
30
|
Garrido-Rodriguez M, Zirngibl K, Ivanova O, Lobentanzer S, Saez-Rodriguez J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol Syst Biol 2022; 18:e11036. [PMID: 35880747 PMCID: PMC9316933 DOI: 10.15252/msb.202211036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/12/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
Signal transduction governs cellular behavior, and its dysregulation often leads to human disease. To understand this process, we can use network models based on prior knowledge, where nodes represent biomolecules, usually proteins, and edges indicate interactions between them. Several computational methods combine untargeted omics data with prior knowledge to estimate the state of signaling networks in specific biological scenarios. Here, we review, compare, and classify recent network approaches according to their characteristics in terms of input omics data, prior knowledge and underlying methodologies. We highlight existing challenges in the field, such as the general lack of ground truth and the limitations of prior knowledge. We also point out new omics developments that may have a profound impact, such as single‐cell proteomics or large‐scale profiling of protein conformational changes. We provide both an introduction for interested users seeking strategies to study cell signaling on a large scale and an update for seasoned modelers.
Collapse
Affiliation(s)
- Martin Garrido-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Katharina Zirngibl
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Olga Ivanova
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| |
Collapse
|
31
|
Wei Z, Han D, Zhang C, Wang S, Liu J, Chao F, Song Z, Chen G. Deep Learning-Based Multi-Omics Integration Robustly Predicts Relapse in Prostate Cancer. Front Oncol 2022; 12:893424. [PMID: 35814412 PMCID: PMC9259796 DOI: 10.3389/fonc.2022.893424] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/13/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivePost-operative biochemical relapse (BCR) continues to occur in a significant percentage of patients with localized prostate cancer (PCa). Current stratification methods are not adequate to identify high-risk patients. The present study exploits the ability of deep learning (DL) algorithms using the H2O package to combine multi-omics data to resolve this problem.MethodsFive-omics data from 417 PCa patients from The Cancer Genome Atlas (TCGA) were used to construct the DL-based, relapse-sensitive model. Among them, 265 (63.5%) individuals experienced BCR. Five additional independent validation sets were applied to assess its predictive robustness. Bioinformatics analyses of two relapse-associated subgroups were then performed for identification of differentially expressed genes (DEGs), enriched pathway analysis, copy number analysis and immune cell infiltration analysis.ResultsThe DL-based model, with a significant difference (P = 6e-9) between two subgroups and good concordance index (C-index = 0.767), were proven to be robust by external validation. 1530 DEGs including 678 up- and 852 down-regulated genes were identified in the high-risk subgroup S2 compared with the low-risk subgroup S1. Enrichment analyses found five hallmark gene sets were up-regulated while 13 were down-regulated. Then, we found that DNA damage repair pathways were significantly enriched in the S2 subgroup. CNV analysis showed that 30.18% of genes were significantly up-regulated and gene amplification on chromosomes 7 and 8 was significantly elevated in the S2 subgroup. Moreover, enrichment analysis revealed that some DEGs and pathways were associated with immunity. Three tumor-infiltrating immune cell (TIIC) groups with a higher proportion in the S2 subgroup (p = 1e-05, p = 8.7e-06, p = 0.00014) and one TIIC group with a higher proportion in the S1 subgroup (P = 1.3e-06) were identified.ConclusionWe developed a novel, robust classification for understanding PCa relapse. This study validated the effectiveness of deep learning technique in prognosis prediction, and the method may benefit patients and prevent relapse by improving early detection and advancing early intervention.
Collapse
Affiliation(s)
- Ziwei Wei
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Dunsheng Han
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Cong Zhang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Shiyu Wang
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Jinke Liu
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
| | - Fan Chao
- Department of Urology, Zhongshan Hospital, Fudan University (Xiamen Branch), Xiamen, China
| | - Zhenyu Song
- Ovarian Cancer Program, Department of Gynecologic Oncology, Zhongshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| | - Gang Chen
- Department of Urology, Jinshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Gang Chen, ; Zhenyu Song,
| |
Collapse
|
32
|
Nilsson A, Peters JM, Meimetis N, Bryson B, Lauffenburger DA. Artificial neural networks enable genome-scale simulations of intracellular signaling. Nat Commun 2022; 13:3069. [PMID: 35654811 PMCID: PMC9163072 DOI: 10.1038/s41467-022-30684-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 05/11/2022] [Indexed: 12/14/2022] Open
Abstract
Mammalian cells adapt their functional state in response to external signals in form of ligands that bind receptors on the cell-surface. Mechanistically, this involves signal-processing through a complex network of molecular interactions that govern transcription factor activity patterns. Computer simulations of the information flow through this network could help predict cellular responses in health and disease. Here we develop a recurrent neural network framework constrained by prior knowledge of the signaling network with ligand-concentrations as input and transcription factor-activity as output. Applied to synthetic data, it predicts unseen test-data (Pearson correlation r = 0.98) and the effects of gene knockouts (r = 0.8). We stimulate macrophages with 59 different ligands, with and without the addition of lipopolysaccharide, and collect transcriptomics data. The framework predicts this data under cross-validation (r = 0.8) and knockout simulations suggest a role for RIPK1 in modulating the lipopolysaccharide response. This work demonstrates the feasibility of genome-scale simulations of intracellular signaling.
Collapse
Affiliation(s)
- Avlant Nilsson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE 41296, Sweden
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Joshua M Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Nikolaos Meimetis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bryan Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA.
| |
Collapse
|
33
|
Lee D, Kim S. Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells. Clin Exp Pediatr 2022; 65:239-249. [PMID: 34844399 PMCID: PMC9082244 DOI: 10.3345/cep.2021.01438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 11/27/2022] Open
Abstract
Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.
Collapse
Affiliation(s)
- Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
- Institute of Engineering Research, Seoul National University, Seoul, Korea
- AIGENDRUG Co., Ltd., Seoul, Korea
| |
Collapse
|
34
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
35
|
Gundogdu P, Loucera C, Alamo-Alvarez I, Dopazo J, Nepomuceno I. Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Min 2022; 15:1. [PMID: 34980200 PMCID: PMC8722116 DOI: 10.1186/s13040-021-00285-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00285-4.
Collapse
Affiliation(s)
- Pelin Gundogdu
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Inmaculada Alamo-Alvarez
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Joaquin Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain. .,FPS/ELIXIR-es, Hospital Virgen del Rocío, 42013, Sevilla, Spain.
| | - Isabel Nepomuceno
- Department of Computer Languages and Systems, Universidad de Sevilla, Sevilla, Spain.
| |
Collapse
|
36
|
Huminiecki Ł. Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science. ENTROPY (BASEL, SWITZERLAND) 2021; 24:17. [PMID: 35052043 PMCID: PMC8774939 DOI: 10.3390/e24010017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/02/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
Mendel proposed an experimentally verifiable paradigm of particle-based heredity that has been influential for over 150 years. The historical arguments have been reflected in the near past as Mendel's concept has been diversified by new types of omics data. As an effect of the accumulation of omics data, a virtual gene concept forms, giving rise to genetical data science. The concept integrates genetical, functional, and molecular features of the Mendelian paradigm. I argue that the virtual gene concept should be deployed pragmatically. Indeed, the concept has already inspired a practical research program related to systems genetics. The program includes questions about functionality of structural and categorical gene variants, about regulation of gene expression, and about roles of epigenetic modifications. The methodology of the program includes bioinformatics, machine learning, and deep learning. Education, funding, careers, standards, benchmarks, and tools to monitor research progress should be provided to support the research program.
Collapse
Affiliation(s)
- Łukasz Huminiecki
- Evolutionary, Computational, and Statistical Genetics, Department of Molecula Biology, Institute of Genetics and Animal Biotechnology, Polish Academy of Sciences, Postępu 36A, Jastrzębiec, 05-552 Warsaw, Poland
| |
Collapse
|
37
|
Shao D, Dai Y, Li N, Cao X, Zhao W, Cheng L, Rong Z, Huang L, Wang Y, Zhao J. Artificial intelligence in clinical research of cancers. Brief Bioinform 2021; 23:6470966. [PMID: 34929741 PMCID: PMC8769909 DOI: 10.1093/bib/bbab523] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/06/2021] [Accepted: 11/13/2021] [Indexed: 12/16/2022] Open
Abstract
Several factors, including advances in computational algorithms, the availability of high-performance computing hardware, and the assembly of large community-based databases, have led to the extensive application of Artificial Intelligence (AI) in the biomedical domain for nearly 20 years. AI algorithms have attained expert-level performance in cancer research. However, only a few AI-based applications have been approved for use in the real world. Whether AI will eventually be capable of replacing medical experts has been a hot topic. In this article, we first summarize the cancer research status using AI in the past two decades, including the consensus on the procedure of AI based on an ideal paradigm and current efforts of the expertise and domain knowledge. Next, the available data of AI process in the biomedical domain are surveyed. Then, we review the methods and applications of AI in cancer clinical research categorized by the data types including radiographic imaging, cancer genome, medical records, drug information and biomedical literatures. At last, we discuss challenges in moving AI from theoretical research to real-world cancer research applications and the perspectives toward the future realization of AI participating cancer treatment.
Collapse
Affiliation(s)
- Dan Shao
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Yinfei Dai
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Nianfeng Li
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Xuqing Cao
- Department of Neurology, People's Hospital of Ningxia Hui Autonomous Region (The Affiliated people's Hospital of Ningxia Medical University and The First Affiliated Hospital of Northwest Minzu University), Yinchuan 750002, China
| | - Wei Zhao
- Department of Biochemistry and Molecular Biology, Ningxia Medical University, Yinchuan 750002, China
| | - Li Cheng
- Department of Electrical Diagnosis, Affiliated Hospital of Changchun University of Traditional Chinese Medicine, Changchun, 130021, China
| | - Zhuqing Rong
- School of Science, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, 43210, USA
| |
Collapse
|
38
|
Scherer P, Trębacz M, Simidjievski N, Viñas R, Shams Z, Terre HA, Jamnik M, Liò P. Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases. Bioinformatics 2021; 38:1320-1327. [PMID: 34888618 PMCID: PMC8826027 DOI: 10.1093/bioinformatics/btab830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction (PPI) networks to guide the construction of predictive models. RESULTS We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Scherer
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK,To whom correspondence should be addressed.
| | - Maja Trębacz
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Nikola Simidjievski
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Ramon Viñas
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Zohreh Shams
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Helena Andres Terre
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mateja Jamnik
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| |
Collapse
|
39
|
Bao S, Li K, Yan C, Zhang Z, Qu J, Zhou M. Deep learning-based advances and applications for single-cell RNA-sequencing data analysis. Brief Bioinform 2021; 23:6444320. [PMID: 34849562 DOI: 10.1093/bib/bbab473] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 11/14/2022] Open
Abstract
The rapid development of single-cell RNA-sequencing (scRNA-seq) technology has raised significant computational and analytical challenges. The application of deep learning to scRNA-seq data analysis is rapidly evolving and can overcome the unique challenges in upstream (quality control and normalization) and downstream (cell-, gene- and pathway-level) analysis of scRNA-seq data. In the present study, recent advances and applications of deep learning-based methods, together with specific tools for scRNA-seq data analysis, were summarized. Moreover, the future perspectives and challenges of deep-learning techniques regarding the appropriate analysis and interpretation of scRNA-seq data were investigated. The present study aimed to provide evidence supporting the biomedical application of deep learning-based tools and may aid biologists and bioinformaticians in navigating this exciting and fast-moving area.
Collapse
Affiliation(s)
- Siqi Bao
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jia Qu
- School of Information and Communication Engineering, Hainan University, Haikou 570228, P. R. China.,School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.,Hainan Institute of Real World Data, Haikou 570228, P. R. China
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| |
Collapse
|
40
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
41
|
Shadbad MA, Asadzadeh Z, Hosseinkhani N, Derakhshani A, Alizadeh N, Brunetti O, Silvestris N, Baradaran B. A Systematic Review of the Tumor-Infiltrating CD8 + T-Cells/PD-L1 Axis in High-Grade Glial Tumors: Toward Personalized Immuno-Oncology. Front Immunol 2021; 12:734956. [PMID: 34603316 PMCID: PMC8486082 DOI: 10.3389/fimmu.2021.734956] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022] Open
Abstract
Based on preclinical findings, programmed death-ligand 1 (PD-L1) can substantially attenuate CD8+ T-cell-mediated anti-tumoral immune responses. However, clinical studies have reported controversial results regarding the significance of the tumor-infiltrating CD8+ T-cells/PD-L1 axis on the clinical picture and the response rate of patients with high-grade glial tumors to anti-cancer therapies. Herein, we conducted a systematic review according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statements to clarify the clinical significance of the tumor-infiltrating CD8+ T-cells/PD-L1 axis and elucidate the impact of this axis on the response rate of affected patients to anti-cancer therapies. Indeed, a better understanding of the impact of this axis on the response rate of affected patients to anti-cancer therapies can provide valuable insights to address the futile response rate of immune checkpoint inhibitors in patients with high-grade glial tumors. For this purpose, we systematically searched Scopus, Web of Science, Embase, and PubMed to obtain peer-reviewed studies published before 1 January 2021. We have observed that PD-L1 overexpression can be associated with the inferior prognosis of glioblastoma patients who have not been exposed to chemo-radiotherapy. Besides, exposure to anti-cancer therapies, e.g., chemo-radiotherapy, can up-regulate inhibitory immune checkpoint molecules in tumor-infiltrating CD8+ T-cells. Therefore, unlike unexposed patients, increased tumor-infiltrating CD8+ T-cells in anti-cancer therapy-exposed tumoral tissues can be associated with the inferior prognosis of affected patients. Because various inhibitory immune checkpoints can regulate anti-tumoral immune responses, the single-cell sequencing of the cells residing in the tumor microenvironment can provide valuable insights into the expression patterns of inhibitory immune checkpoints in the tumor micromovement. Thus, administrating immune checkpoint inhibitors based on the data from the single-cell sequencing of these cells can increase patients’ response rates, decrease the risk of immune-related adverse events development, prevent immune-resistance development, and reduce the risk of tumor recurrence.
Collapse
Affiliation(s)
- Mahdi Abdoli Shadbad
- Research Center for Evidence-Based Medicine, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.,Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.,Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Zahra Asadzadeh
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Negar Hosseinkhani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Afshin Derakhshani
- Laboratory of Experimental Pharmacology, Istituto Di Ricovero e Cura a Carattere Scientifico (IRCCS) Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Nazila Alizadeh
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Oronzo Brunetti
- Medical Oncology Unit, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy
| | - Nicola Silvestris
- Medical Oncology Unit, IRCCS Istituto Tumori Giovanni Paolo II, Bari, Italy.,Department of Biomedical Sciences and Human Oncology, University of Bari "Aldo Moro", Bari, Italy
| | - Behzad Baradaran
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.,Department of Immunology, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
42
|
Weidemüller P, Kholmatov M, Petsalaki E, Zaugg JB. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021; 21:e2000034. [PMID: 34314098 DOI: 10.1002/pmic.202000034] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/17/2023]
Abstract
Transcription factors (TFs) are key regulators of intrinsic cellular processes, such as differentiation and development, and of the cellular response to external perturbation through signaling pathways. In this review we focus on the role of TFs as a link between signaling pathways and gene regulation. Cell signaling tends to result in the modulation of a set of TFs that then lead to changes in the cell's transcriptional program. We highlight the molecular layers at which TF activity can be measured and the associated technical and conceptual challenges. These layers include post-translational modifications (PTMs) of the TF, regulation of TF binding to DNA through chromatin accessibility and epigenetics, and expression of target genes. We highlight that a large number of TFs are understudied in both signaling and gene regulation studies, and that our knowledge about known TF targets has a strong literature bias. We argue that TFs serve as a perfect bridge between the fields of gene regulation and signaling, and that separating these fields hinders our understanding of cell functions. Multi-omics approaches that measure multiple dimensions of TF activity are ideally suited to study the interplay of cell signaling and gene regulation using TFs as the anchor to link the two fields.
Collapse
Affiliation(s)
- Paula Weidemüller
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maksim Kholmatov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| | - Evangelia Petsalaki
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Judith B Zaugg
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| |
Collapse
|
43
|
The Trifecta of Single-Cell, Systems-Biology, and Machine-Learning Approaches. Genes (Basel) 2021; 12:genes12071098. [PMID: 34356114 PMCID: PMC8306972 DOI: 10.3390/genes12071098] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 07/13/2021] [Accepted: 07/18/2021] [Indexed: 12/18/2022] Open
Abstract
Together, single-cell technologies and systems biology have been used to investigate previously unanswerable questions in biomedicine with unparalleled detail. Despite these advances, gaps in analytical capacity remain. Machine learning, which has revolutionized biomedical imaging analysis, drug discovery, and systems biology, is an ideal strategy to fill these gaps in single-cell studies. Machine learning additionally has proven to be remarkably synergistic with single-cell data because it remedies unique challenges while capitalizing on the positive aspects of single-cell data. In this review, we describe how systems-biology algorithms have layered machine learning with biological components to provide systems level analyses of single-cell omics data, thus elucidating complex biological mechanisms. Accordingly, we highlight the trifecta of single-cell, systems-biology, and machine-learning approaches and illustrate how this trifecta can significantly contribute to five key areas of scientific research: cell trajectory and identity, individualized medicine, pharmacology, spatial omics, and multi-omics. Given its success to date, the systems-biology, single-cell omics, and machine-learning trifecta has proven to be a potent combination that will further advance biomedical research.
Collapse
|
44
|
Wang X, Dong Y, Zheng Y, Chen Y. Multiomics metabolic and epigenetics regulatory network in cancer: A systems biology perspective. J Genet Genomics 2021; 48:520-530. [PMID: 34362682 DOI: 10.1016/j.jgg.2021.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/07/2021] [Accepted: 05/11/2021] [Indexed: 12/21/2022]
Abstract
Genetic, epigenetic, and metabolic alterations are all hallmarks of cancer. However, the epigenome and metabolome are both highly complex and dynamic biological networks in vivo. The interplay between the epigenome and metabolome contributes to a biological system that is responsive to the tumor microenvironment and possesses a wealth of unknown biomarkers and targets of cancer therapy. From this perspective, we first review the state of high-throughput biological data acquisition (i.e. multiomics data) and analysis (i.e. computational tools) and then propose a conceptual in silico metabolic and epigenetic regulatory network (MER-Net) that is based on these current high-throughput methods. The conceptual MER-Net is aimed at linking metabolomic and epigenomic networks through observation of biological processes, omics data acquisition, analysis of network information, and integration with validated database knowledge. Thus, MER-Net could be used to reveal new potential biomarkers and therapeutic targets using deep learning models to integrate and analyze large multiomics networks. We propose that MER-Net can serve as a tool to guide integrated metabolomics and epigenomics research or can be modified to answer other complex biological and clinical questions using multiomics data.
Collapse
Affiliation(s)
- Xuezhu Wang
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | - Yucheng Dong
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | - Yongchang Zheng
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| | - Yang Chen
- The State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China.
| |
Collapse
|
45
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 147] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
46
|
Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst 2021; 12:622-635. [PMID: 34139169 PMCID: PMC8245186 DOI: 10.1016/j.cels.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/14/2021] [Indexed: 01/14/2023]
Abstract
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Collapse
Affiliation(s)
- Leah V Schaffer
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
47
|
MUW researcher of the month. Wien Klin Wochenschr 2021; 133:630-631. [PMID: 34115228 DOI: 10.1007/s00508-021-01905-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
48
|
Ji Y, Lotfollahi M, Wolf FA, Theis FJ. Machine learning for perturbational single-cell omics. Cell Syst 2021; 12:522-537. [PMID: 34139164 DOI: 10.1016/j.cels.2021.05.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 05/04/2021] [Accepted: 05/19/2021] [Indexed: 12/18/2022]
Abstract
Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.
Collapse
Affiliation(s)
- Yuge Ji
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - F Alexander Wolf
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Cellarity, Cambridge, MA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany; Cellarity, Cambridge, MA, USA.
| |
Collapse
|
49
|
Matthews ML, Marshall-Colón A. Multiscale plant modeling: from genome to phenome and beyond. Emerg Top Life Sci 2021; 5:231-237. [PMID: 33543231 PMCID: PMC8166335 DOI: 10.1042/etls20200276] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 01/08/2023]
Abstract
Plants are complex organisms that adapt to changes in their environment using an array of regulatory mechanisms that span across multiple levels of biological organization. Due to this complexity, it is difficult to predict emergent properties using conventional approaches that focus on single levels of biology such as the genome, transcriptome, or metabolome. Mathematical models of biological systems have emerged as useful tools for exploring pathways and identifying gaps in our current knowledge of biological processes. Identification of emergent properties, however, requires their vertical integration across biological scales through multiscale modeling. Multiscale models that capture and predict these emergent properties will allow us to predict how plants will respond to a changing climate and explore strategies for plant engineering. In this review, we (1) summarize the recent developments in plant multiscale modeling; (2) examine multiscale models of microbial systems that offer insight to potential future directions for the modeling of plant systems; (3) discuss computational tools and resources for developing multiscale models; and (4) examine future directions of the field.
Collapse
Affiliation(s)
- Megan L Matthews
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
- Institute for Sustainability, Energy, and Environment, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
| | - Amy Marshall-Colón
- Institute for Sustainability, Energy, and Environment, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
| |
Collapse
|