1
|
Li H, Han Z, Sun Y, Wang F, Hu P, Gao Y, Bai X, Peng S, Ren C, Xu X, Liu Z, Chen H, Yang Y, Bo X. CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection. Nat Commun 2024; 15:5997. [PMID: 39013885 PMCID: PMC11252405 DOI: 10.1038/s41467-024-50426-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/09/2024] [Indexed: 07/18/2024] Open
Abstract
Cancer is rarely the straightforward consequence of an abnormality in a single gene, but rather reflects a complex interplay of many genes, represented as gene modules. Here, we leverage the recent advances of model-agnostic interpretation approach and develop CGMega, an explainable and graph attention-based deep learning framework to perform cancer gene module dissection. CGMega outperforms current approaches in cancer gene prediction, and it provides a promising approach to integrate multi-omics information. We apply CGMega to breast cancer cell line and acute myeloid leukemia (AML) patients, and we uncover the high-order gene module formed by ErbB family and tumor factors NRG1, PPM1A and DLG2. We identify 396 candidate AML genes, and observe the enrichment of either known AML genes or candidate AML genes in a single gene module. We also identify patient-specific AML genes and associated gene modules. Together, these results indicate that CGMega can be used to dissect cancer gene modules, and provide high-order mechanistic insights into cancer development and heterogeneity.
Collapse
Affiliation(s)
- Hao Li
- Academy of Military Medical Sciences, Beijing, China
| | - Zebei Han
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yu Sun
- Academy of Military Medical Sciences, Beijing, China
| | - Fu Wang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Pengzhen Hu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yuang Gao
- Department of Hematology, PLA General Hospital, the Fifth Medical Center, Beijing, China
| | - Xuemei Bai
- Academy of Military Medical Sciences, Beijing, China
| | - Shiyu Peng
- Academy of Military Medical Sciences, Beijing, China
| | - Chao Ren
- Academy of Military Medical Sciences, Beijing, China
| | - Xiang Xu
- Academy of Military Medical Sciences, Beijing, China
| | - Zeyu Liu
- Academy of Military Medical Sciences, Beijing, China
| | - Hebing Chen
- Academy of Military Medical Sciences, Beijing, China.
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
2
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
3
|
Li Q, Yu Y, Kossinna P, Lun T, Liao W, Zhang Q. XA4C: eXplainable representation learning via Autoencoders revealing Critical genes. PLoS Comput Biol 2023; 19:e1011476. [PMID: 37782668 PMCID: PMC10569512 DOI: 10.1371/journal.pcbi.1011476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 10/12/2023] [Accepted: 08/29/2023] [Indexed: 10/04/2023] Open
Abstract
Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the "latent variables" in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL's broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose "Critical genes", defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene's contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET) and a cancer-specific database (COSMIC), evidencing its potential to disclose massive unknown biology. As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada
| | - Yang Yu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada
| | - Theodore Lun
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada
| | - Wenyuan Liao
- Department of Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Qingrun Zhang
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Canada
- Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, Canada
| |
Collapse
|
4
|
Martínez-Enguita D, Dwivedi SK, Jörnsten R, Gustafsson M. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures. Brief Bioinform 2023; 24:bbad293. [PMID: 37587790 PMCID: PMC10516364 DOI: 10.1093/bib/bbad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/25/2023] [Accepted: 07/29/2023] [Indexed: 08/18/2023] Open
Abstract
Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- David Martínez-Enguita
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Rebecka Jörnsten
- Department of Mathematical Sciences, Chalmers University of Technology, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| |
Collapse
|
5
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
6
|
Simonovsky E, Sharon M, Ziv M, Mauer O, Hekselman I, Jubran J, Vinogradov E, Argov CM, Basha O, Kerber L, Yogev Y, Segrè AV, Im HK, Birk O, Rokach L, Yeger‐Lotem E. Predicting molecular mechanisms of hereditary diseases by using their tissue-selective manifestation. Mol Syst Biol 2023; 19:e11407. [PMID: 37232043 PMCID: PMC10407743 DOI: 10.15252/msb.202211407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 04/30/2023] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
How do aberrations in widely expressed genes lead to tissue-selective hereditary diseases? Previous attempts to answer this question were limited to testing a few candidate mechanisms. To answer this question at a larger scale, we developed "Tissue Risk Assessment of Causality by Expression" (TRACE), a machine learning approach to predict genes that underlie tissue-selective diseases and selectivity-related features. TRACE utilized 4,744 biologically interpretable tissue-specific gene features that were inferred from heterogeneous omics datasets. Application of TRACE to 1,031 disease genes uncovered known and novel selectivity-related features, the most common of which was previously overlooked. Next, we created a catalog of tissue-associated risks for 18,927 protein-coding genes (https://netbio.bgu.ac.il/trace/). As proof-of-concept, we prioritized candidate disease genes identified in 48 rare-disease patients. TRACE ranked the verified disease gene among the patient's candidate genes significantly better than gene prioritization methods that rank by gene constraint or tissue expression. Thus, tissue selectivity combined with machine learning enhances genetic and clinical understanding of hereditary diseases.
Collapse
Affiliation(s)
- Eyal Simonovsky
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Moran Sharon
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Maya Ziv
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Omry Mauer
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Idan Hekselman
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Juman Jubran
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Ekaterina Vinogradov
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Chanan M Argov
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Omer Basha
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Lior Kerber
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Yuval Yogev
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health SciencesBen Gurion University of the NegevBeer ShevaIsrael
| | - Ayellet V Segrè
- Ocular Genomics Institute, Massachusetts Eye and EarHarvard Medical SchoolBostonMAUSA
- The Broad Institute of MIT and HarvardCambridgeMAUSA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of MedicineThe University of ChicagoChicagoILUSA
| | | | - Ohad Birk
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health SciencesBen Gurion University of the NegevBeer ShevaIsrael
- The National Institute for Biotechnology in the NegevBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Lior Rokach
- Department of Software & Information Systems EngineeringBen‐Gurion University of the NegevBeer ShevaIsrael
| | - Esti Yeger‐Lotem
- Department of Clinical Biochemistry and PharmacologyBen‐Gurion University of the NegevBeer ShevaIsrael
- The National Institute for Biotechnology in the NegevBen‐Gurion University of the NegevBeer ShevaIsrael
| |
Collapse
|
7
|
Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods 2023:10.1038/s41592-023-01886-z. [PMID: 37248386 DOI: 10.1038/s41592-023-01886-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/22/2023] [Indexed: 05/31/2023]
Abstract
High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.
Collapse
Affiliation(s)
| | - Jaclyn N Taroni
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA
| | | | | | | | - Casey Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| |
Collapse
|
8
|
Choi Y, Li R, Quon G. siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol 2023; 24:29. [PMID: 36803416 PMCID: PMC9940350 DOI: 10.1186/s13059-023-02850-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 01/06/2023] [Indexed: 02/22/2023] Open
Abstract
Neural networks such as variational autoencoders (VAE) perform dimensionality reduction for the visualization and analysis of genomic data, but are limited in their interpretability: it is unknown which data features are represented by each embedding dimension. We present siVAE, a VAE that is interpretable by design, thereby enhancing downstream analysis tasks. Through interpretation, siVAE also identifies gene modules and hubs without explicit gene network inference. We use siVAE to identify gene modules whose connectivity is associated with diverse phenotypes such as iPSC neuronal differentiation efficiency and dementia, showcasing the wide applicability of interpretable generative models for genomic data analysis.
Collapse
Affiliation(s)
- Yongin Choi
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Ruoxin Li
- Genome Center, University of California, Davis, Davis, CA, USA
- Graduate Group in Biostatistics, University of California, Davis, Davis, CA, USA
| | - Gerald Quon
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA.
- Genome Center, University of California, Davis, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
9
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
10
|
van der Velden J, Asselbergs FW, Bakkers J, Batkai S, Bertrand L, Bezzina CR, Bot I, Brundel BJJM, Carrier L, Chamuleau S, Ciccarelli M, Dawson D, Davidson SM, Dendorfer A, Duncker DJ, Eschenhagen T, Fabritz L, Falcão-Pires I, Ferdinandy P, Giacca M, Girao H, Gollmann-Tepeköylü C, Gyongyosi M, Guzik TJ, Hamdani N, Heymans S, Hilfiker A, Hilfiker-Kleiner D, Hoekstra AG, Hulot JS, Kuster DWD, van Laake LW, Lecour S, Leiner T, Linke WA, Lumens J, Lutgens E, Madonna R, Maegdefessel L, Mayr M, van der Meer P, Passier R, Perbellini F, Perrino C, Pesce M, Priori S, Remme CA, Rosenhahn B, Schotten U, Schulz R, Sipido KR, Sluijter JPG, van Steenbeek F, Steffens S, Terracciano CM, Tocchetti CG, Vlasman P, Yeung KK, Zacchigna S, Zwaagman D, Thum T. Animal models and animal-free innovations for cardiovascular research: current status and routes to be explored. Consensus document of the ESC Working Group on Myocardial Function and the ESC Working Group on Cellular Biology of the Heart. Cardiovasc Res 2022; 118:3016-3051. [PMID: 34999816 PMCID: PMC9732557 DOI: 10.1093/cvr/cvab370] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 01/05/2022] [Indexed: 01/09/2023] Open
Abstract
Cardiovascular diseases represent a major cause of morbidity and mortality, necessitating research to improve diagnostics, and to discover and test novel preventive and curative therapies, all of which warrant experimental models that recapitulate human disease. The translation of basic science results to clinical practice is a challenging task, in particular for complex conditions such as cardiovascular diseases, which often result from multiple risk factors and comorbidities. This difficulty might lead some individuals to question the value of animal research, citing the translational 'valley of death', which largely reflects the fact that studies in rodents are difficult to translate to humans. This is also influenced by the fact that new, human-derived in vitro models can recapitulate aspects of disease processes. However, it would be a mistake to think that animal models do not represent a vital step in the translational pathway as they do provide important pathophysiological insights into disease mechanisms particularly on an organ and systemic level. While stem cell-derived human models have the potential to become key in testing toxicity and effectiveness of new drugs, we need to be realistic, and carefully validate all new human-like disease models. In this position paper, we highlight recent advances in trying to reduce the number of animals for cardiovascular research ranging from stem cell-derived models to in situ modelling of heart properties, bioinformatic models based on large datasets, and state-of-the-art animal models, which show clinically relevant characteristics observed in patients with a cardiovascular disease. We aim to provide a guide to help researchers in their experimental design to translate bench findings to clinical routine taking the replacement, reduction, and refinement (3R) as a guiding concept.
Collapse
Grants
- R01 HL150359 NHLBI NIH HHS
- RG/16/14/32397 British Heart Foundation
- FS/18/37/33642 British Heart Foundation
- PG/17/64/33205 British Heart Foundation
- PG/15/88/31780 British Heart Foundation
- FS/RTF/20/30009, NH/19/1/34595, PG/18/35/33786, CS/17/4/32960, PG/15/88/31780, and PG/17/64/33205 British Heart Foundation
- NC/T001488/1 National Centre for the Replacement, Refinement and Reduction of Animals in Research
- PG/18/44/33790 British Heart Foundation
- CH/16/3/32406 British Heart Foundation
- FS/RTF/20/30009 British Heart Foundation
- NWO-ZonMW
- ZonMW and Heart Foundation for the translational research program
- Dutch Cardiovascular Alliance (DCVA)
- Leducq Foundation
- Dutch Research Council
- Association of Collaborating Health Foundations (SGF)
- UCL Hospitals NIHR Biomedical Research Centre, and the DCVA
- Netherlands CardioVascular Research Initiative CVON
- Stichting Hartekind and the Dutch Research Counsel (NWO) (OCENW.GROOT.2019.029)
- National Fund for Scientific Research, Belgium and Action de Recherche Concertée de la Communauté Wallonie-Bruxelles, Belgium
- Netherlands CardioVascular Research Initiative CVON (PREDICT2 and CONCOR-genes projects), the Leducq Foundation
- ERA PerMed (PROCEED study)
- Netherlands Cardiovascular Research Initiative
- Dutch Heart Foundation
- German Centre of Cardiovascular Research (DZHH)
- Chest Heart and Stroke Scotland
- Tenovus Scotland
- Friends of Anchor and Grampian NHS-Endowments
- National Institute for Health Research University College London Hospitals Biomedical Research Centre
- German Centre for Cardiovascular Research
- European Research Council (ERC-AG IndivuHeart), the Deutsche Forschungsgemeinschaft
- European Union Horizon 2020 (REANIMA and TRAINHEART)
- German Ministry of Education and Research (BMBF)
- Centre for Cardiovascular Research (DZHK)
- European Union Horizon 2020
- DFG
- National Research, Development and Innovation Office of Hungary
- Research Excellence Program—TKP; National Heart Program
- Austrian Science Fund
- European Union Commission’s Seventh Framework programme
- CVON2016-Early HFPEF
- CVON She-PREDICTS
- CVON Arena-PRIME
- European Union’s Horizon 2020 research and innovation programme
- Deutsche Forschungsgemeinschaft
- Volkswagenstiftung
- French National Research Agency
- ERA-Net-CVD
- Fédération Française de Cardiologie, the Fondation pour la Recherche Médicale
- French PIA Project
- University Research Federation against heart failure
- Netherlands Heart Foundation
- Dekker Senior Clinical Scientist
- Health Holland TKI-LSH
- TUe/UMCU/UU Alliance Fund
- south African National Foundation
- Cancer Association of South Africa and Winetech
- Netherlands Heart Foundation/Applied & Engineering Sciences
- Dutch Technology Foundation
- Pie Medical Imaging
- Netherlands Organisation for Scientific Research
- Dr. Dekker Program
- Netherlands CardioVascular Research Initiative: the Dutch Heart Foundation
- Dutch Federation of University Medical Centres
- Netherlands Organization for Health Research and Development and the Royal Netherlands Academy of Sciences for the GENIUS-II project
- Netherlands Organization for Scientific Research (NWO) (VICI grant); the European Research Council
- Incyte s.r.l. and from Ministero dell’Istruzione, Università e Ricerca Scientifica
- German Center for Cardiovascular Research (Junior Research Group & Translational Research Project), the European Research Council (ERC Starting Grant NORVAS),
- Swedish Heart-Lung-Foundation
- Swedish Research Council
- National Institutes of Health
- Bavarian State Ministry of Health and Care through the research project DigiMed Bayern
- ERC
- ERA-CVD
- Dutch Heart Foundation, ZonMw
- the NWO Gravitation project
- Ministero dell'Istruzione, Università e Ricerca Scientifica
- Regione Lombardia
- Netherlands Organisation for Health Research and Development
- ITN Network Personalize AF: Personalized Therapies for Atrial Fibrillation: a translational network
- MAESTRIA: Machine Learning Artificial Intelligence Early Detection Stroke Atrial Fibrillation
- REPAIR: Restoring cardiac mechanical function by polymeric artificial muscular tissue
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)
- European Union H2020 program to the project TECHNOBEAT
- EVICARE
- BRAV3
- ZonMw
- German Centre for Cardiovascular Research (DZHK)
- British Heart Foundation Centre for Cardiac Regeneration
- British Heart Foundation studentship
- NC3Rs
- Interreg ITA-AUS project InCARDIO
- Italian Association for Cancer Research
Collapse
Affiliation(s)
- Jolanda van der Velden
- Amsterdam UMC, Vrije Universiteit, Physiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
- Netherlands Heart Institute, Utrecht, The Netherlands
| | - Folkert W Asselbergs
- Division Heart & Lungs, Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
- Faculty of Population Health Sciences, Institute of Cardiovascular Science and Institute of Health Informatics, University College London, London, UK
| | - Jeroen Bakkers
- Hubrecht Institute-KNAW and University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Sandor Batkai
- Hannover Medical School, Institute of Molecular and Translational Therapeutic Strategies, Hannover, Germany
| | - Luc Bertrand
- Hannover Medical School, Institute of Molecular and Translational Therapeutic Strategies, Hannover, Germany
| | - Connie R Bezzina
- Université catholique de Louvain, Institut de Recherche Expérimentale et Clinique, Pole of Cardiovascular Research, Brussels, Belgium
| | - Ilze Bot
- Heart Center, Department of Experimental Cardiology, Amsterdam UMC, Location Academic Medical Center, Amsterdam Cardiovascular Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Division of BioTherapeutics, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Bianca J J M Brundel
- Amsterdam UMC, Vrije Universiteit, Physiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Lucie Carrier
- Institute of Experimental Pharmacology and Toxicology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Steven Chamuleau
- Amsterdam UMC, Heart Center, Cardiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Michele Ciccarelli
- Department of Medicine, Surgery and Odontology, University of Salerno, Fisciano (SA), Italy
| | - Dana Dawson
- Department of Cardiology, Aberdeen Cardiovascular and Diabetes Centre, Aberdeen Royal Infirmary and University of Aberdeen, Aberdeen, UK
| | - Sean M Davidson
- The Hatter Cardiovascular Institute, University College London, 67 Chenies Mews, London WC1E 6HX, UK
| | - Andreas Dendorfer
- Walter-Brendel-Centre of Experimental Medicine, University Hospital, Ludwig-Maximilians-University, Munich, Germany
| | - Dirk J Duncker
- Division of Experimental Cardiology, Department of Cardiology, Thoraxcenter, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Thomas Eschenhagen
- Institute of Experimental Pharmacology and Toxicology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Larissa Fabritz
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
- University Center of Cardiovascular Sciences and Department of Cardiology, University Heart Center Hamburg, Germany and Institute of Cardiovascular Sciences, University of Birmingham, UK
| | - Ines Falcão-Pires
- UnIC - Cardiovascular Research and Development Centre, Department of Surgery and Physiology, Faculty of Medicine, University of Porto, Portugal
| | - Péter Ferdinandy
- Cardiometabolic Research Group and MTA-SE System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Mauro Giacca
- Department of Medicine, Surgery and Health Sciences and Cardiovascular Department, Centre for Translational Cardiology, Azienda Sanitaria Universitaria Integrata Trieste, Trieste, Italy
- International Center for Genetic Engineering and Biotechnology (ICGEB), Trieste, Italy
- King’s British Heart Foundation Centre, King’s College London, London, UK
| | - Henrique Girao
- Univ Coimbra, Center for Innovative Biomedicine and Biotechnology, Faculty of Medicine, Coimbra, Portugal
- Clinical Academic Centre of Coimbra, Coimbra, Portugal
| | | | - Mariann Gyongyosi
- Division of Cardiology, Department of Internal Medicine II, Medical University of Vienna, Vienna, Austria
| | - Tomasz J Guzik
- Instutute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
- Jagiellonian University, Collegium Medicum, Kraków, Poland
| | - Nazha Hamdani
- Division Cardiology, Molecular and Experimental Cardiology, Ruhr University Bochum, Bochum, Germany
- Institute of Physiology, Ruhr University Bochum, Bochum, Germany
| | - Stephane Heymans
- Department of Cardiology, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University Medical Centre, Maastricht University, Maastricht, The Netherlands
- Department of Cardiovascular Sciences, University of Leuven, Leuven, Belgium
| | - Andres Hilfiker
- Department for Cardiothoracic, Transplant, and Vascular Surgery, Hannover Medical School, Hannover, Germany
| | - Denise Hilfiker-Kleiner
- Department for Cardiology and Angiology, Hannover Medical School, Hannover, Germany
- Department of Cardiovascular Complications in Pregnancy and in Oncologic Therapies, Comprehensive Cancer Centre, Philipps-Universität Marburg, Germany
| | - Alfons G Hoekstra
- Computational Science Lab, Informatics Institute, Faculty of Science, University of Amsterdam, Amsterdam, the Netherlands
| | - Jean-Sébastien Hulot
- Université de Paris, INSERM, PARCC, F-75015 Paris, France
- CIC1418 and DMU CARTE, AP-HP, Hôpital Européen Georges-Pompidou, F-75015 Paris, France
| | - Diederik W D Kuster
- Amsterdam UMC, Vrije Universiteit, Physiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Linda W van Laake
- Division Heart & Lungs, Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Sandrine Lecour
- Department of Medicine, Hatter Institute for Cardiovascular Research in Africa and Cape Heart Institute, University of Cape Town, Cape Town, South Africa
| | - Tim Leiner
- Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands
| | - Wolfgang A Linke
- Institute of Physiology II, University of Muenster, Robert-Koch-Str. 27B, 48149 Muenster, Germany
| | - Joost Lumens
- Department of Biomedical Engineering, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, the Netherlands
| | - Esther Lutgens
- Experimental Vascular Biology Division, Department of Medical Biochemistry, University of Amsterdam, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, Amsterdam, The Netherlands
- Institute for Cardiovascular Prevention, Ludwig-Maximilians-Universität München (LMU), Munich, Germany
- DZHK, Partner Site Munich Heart Alliance, Munich, Germany
| | - Rosalinda Madonna
- Department of Pathology, Cardiology Division, University of Pisa, 56124 Pisa, Italy
- Department of Internal Medicine, Cardiology Division, University of Texas Medical School in Houston, Houston, TX, USA
| | - Lars Maegdefessel
- DZHK, Partner Site Munich Heart Alliance, Munich, Germany
- Department for Vascular and Endovascular Surgery, Klinikum rechts der Isar, Technical University Munich, Munich, Germany
- Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Manuel Mayr
- King’s British Heart Foundation Centre, King’s College London, London, UK
| | - Peter van der Meer
- Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Robert Passier
- Department of Applied Stem Cell Technologies, TechMed Centre, University of Twente, 7500AE Enschede, The Netherlands
- Department of Anatomy and Embryology, Leiden University Medical Centre, 2300 RC Leiden, The Netherlands
| | - Filippo Perbellini
- Hannover Medical School, Institute of Molecular and Translational Therapeutic Strategies, Hannover, Germany
| | - Cinzia Perrino
- Department of Advanced Biomedical Sciences, Federico II University, Naples, Italy
| | - Maurizio Pesce
- Unità di Ingegneria Tissutale Cardiovascolare, Centro cardiologico Monzino, IRCCS, Milan, Italy
| | - Silvia Priori
- Molecular Cardiology, Istituti Clinici Scientifici Maugeri, Pavia, Italy
- University of Pavia, Pavia, Italy
| | - Carol Ann Remme
- Université catholique de Louvain, Institut de Recherche Expérimentale et Clinique, Pole of Cardiovascular Research, Brussels, Belgium
| | - Bodo Rosenhahn
- Institute for information Processing, Leibniz University of Hanover, 30167 Hannover, Germany
| | - Ulrich Schotten
- Department of Physiology, Cardiovascular Research Institute Maastricht, Maastricht University, Maastricht, the Netherlands
| | - Rainer Schulz
- Institute of Physiology, Justus Liebig University Giessen, Giessen, Germany
| | - Karin R Sipido
- Department of Cardiovascular Sciences, KU Leuven, 3000 Leuven, Belgium
| | - Joost P G Sluijter
- Experimental Cardiology Laboratory, Department of Cardiology, Regenerative Medicine Center Utrecht, Circulatory Health Laboratory, Utrecht University, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Frank van Steenbeek
- Division Heart & Lungs, Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
- Department of Clinical Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Sabine Steffens
- Institute for Cardiovascular Prevention, Ludwig-Maximilians-Universität München (LMU), Munich, Germany
- DZHK, Partner Site Munich Heart Alliance, Munich, Germany
| | | | - Carlo Gabriele Tocchetti
- Cardio-Oncology Unit, Department of Translational Medical Sciences, Center for Basic and Clinical Immunology Research (CISI), Interdepartmental Center for Clinical and Translational Research (CIRCET), Interdepartmental Hypertension Research Center (CIRIAPA), Federico II University, Naples, Italy
| | - Patricia Vlasman
- Amsterdam UMC, Vrije Universiteit, Physiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Kak Khee Yeung
- Amsterdam UMC, Vrije Universiteit, Surgery, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Serena Zacchigna
- Department of Medicine, Surgery and Health Sciences and Cardiovascular Department, Centre for Translational Cardiology, Azienda Sanitaria Universitaria Integrata Trieste, Trieste, Italy
- International Center for Genetic Engineering and Biotechnology (ICGEB), Trieste, Italy
| | - Dayenne Zwaagman
- Amsterdam UMC, Heart Center, Cardiology, Amsterdam Cardiovascular Science, Amsterdam, The Netherlands
| | - Thomas Thum
- Hannover Medical School, Institute of Molecular and Translational Therapeutic Strategies, Hannover, Germany
- Fraunhofer Institute for Toxicology and Experimental Medicine, Hannover, Germany
| |
Collapse
|
11
|
Li H, Li H, Zhou J, Gao X. SD2: spatially resolved transcriptomics deconvolution through integration of dropout and spatial information. Bioinformatics 2022; 38:4878-4884. [PMID: 36063455 PMCID: PMC9789790 DOI: 10.1093/bioinformatics/btac605] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/28/2022] [Accepted: 09/04/2022] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Unveiling the heterogeneity in the tissues is crucial to explore cell-cell interactions and cellular targets of human diseases. Spatial transcriptomics (ST) supplies spatial gene expression profile which has revolutionized our biological understanding, but variations in cell-type proportions of each spot with dozens of cells would confound downstream analysis. Therefore, deconvolution of ST has been an indispensable step and a technical challenge toward the higher-resolution panorama of tissues. RESULTS Here, we propose a novel ST deconvolution method called SD2 integrating spatial information of ST data and embracing an important characteristic, dropout, which is traditionally considered as an obstruction in single-cell RNA sequencing data (scRNA-seq) analysis. First, we extract the dropout-based genes as informative features from ST and scRNA-seq data by fitting a Michaelis-Menten function. After synthesizing pseudo-ST spots by randomly composing cells from scRNA-seq data, auto-encoder is applied to discover low-dimensional and non-linear representation of the real- and pseudo-ST spots. Next, we create a graph containing embedded profiles as nodes, and edges determined by transcriptional similarity and spatial relationship. Given the graph, a graph convolutional neural network is used to predict the cell-type compositions for real-ST spots. We benchmark the performance of SD2 on the simulated seqFISH+ dataset with different resolutions and measurements which show superior performance compared with the state-of-the-art methods. SD2 is further validated on three real-world datasets with different ST technologies and demonstrates the capability to localize cell-type composition accurately with quantitative evidence. Finally, ablation study is conducted to verify the contribution of different modules proposed in SD2. AVAILABILITY AND IMPLEMENTATION The SD2 is freely available in github (https://github.com/leihouyeung/SD2) and Zenodo (https://doi.org/10.5281/zenodo.7024684). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haoyang Li
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Hanmin Li
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Juexiao Zhou
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Xin Gao
- To whom correspondence should be addressed.
| |
Collapse
|
12
|
Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022; 38:2287-2296. [PMID: 35157023 PMCID: PMC10060719 DOI: 10.1093/bioinformatics/btac080] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/01/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate diagnostic classification and biological interpretation are important in biology and medicine, which are data-rich sciences. Thus, integration of different data types is necessary for the high predictive accuracy of clinical phenotypes, and more comprehensive analyses for predicting the prognosis of complex diseases are required. RESULTS Here, we propose a novel multi-task attention learning algorithm for multi-omics data, termed MOMA, which captures important biological processes for high diagnostic performance and interpretability. MOMA vectorizes features and modules using a geometric approach and focuses on important modules in multi-omics data via an attention mechanism. Experiments using public data on Alzheimer's disease and cancer with various classification tasks demonstrated the superior performance of this approach. The utility of MOMA was also verified using a comparison experiment with an attention mechanism that was turned on or off and biological analysis. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/dmcb-gist/MOMA. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sehwan Moon
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| |
Collapse
|
13
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
14
|
Wang X, Wang H, Liu D, Wang N, He D, Wu Z, Zhu X, Wen X, Li X, Li J, Wang Z. Deep learning using bulk RNA-seq data expands cell landscape identification in tumor microenvironment. Oncoimmunology 2022; 11:2043662. [PMID: 35251771 PMCID: PMC8890395 DOI: 10.1080/2162402x.2022.2043662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Affiliation(s)
- Xin Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Hongjiu Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Dan Liu
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Na Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Danni He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Zheyu Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| | - Xu Zhu
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Xiaoling Wen
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Xuhua Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Jin Li
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
| | - Zhenzhen Wang
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, College of Biomedical Information and Engineering, Hainan Medical University, Haikou, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin,China
| |
Collapse
|
15
|
Deep neural network prediction of genome-wide transcriptome signatures - beyond the Black-box. NPJ Syst Biol Appl 2022; 8:9. [PMID: 35197482 PMCID: PMC8866467 DOI: 10.1038/s41540-022-00218-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 01/24/2022] [Indexed: 11/28/2022] Open
Abstract
Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease.
Collapse
|
16
|
Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021; 12:596794. [PMID: 34484285 PMCID: PMC8415302 DOI: 10.3389/fgene.2021.596794] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 01/04/2023] Open
Abstract
Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer–associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer–associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein–protein interaction (PPI) network. We found that the breast cancer–associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer–associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer–associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer–associated genes, and the top predicted genes are better enriched on known breast cancer–associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer–associated genes, which could be used for further in vitro and in vivo experimental validation.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China.,School of Information Science and Engineering, Changsha Medical University, Changsha, China.,Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China.,Academician Workstation, Changsha Medical University, Changsha, China.,Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Liang Tang
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jianming Li
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Qingqing Lu
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Bin-Sheng He
- Academician Workstation, Changsha Medical University, Changsha, China.,Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| |
Collapse
|
17
|
Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021; 12:596794. [PMID: 34484285 DOI: 10.3389/fgene.2021.596794/full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 05/28/2023] Open
Abstract
Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer-associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer-associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein-protein interaction (PPI) network. We found that the breast cancer-associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer-associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer-associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer-associated genes, and the top predicted genes are better enriched on known breast cancer-associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer-associated genes, which could be used for further in vitro and in vivo experimental validation.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Liang Tang
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jianming Li
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Qingqing Lu
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| | - Bin-Sheng He
- Academician Workstation, Changsha Medical University, Changsha, China
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| |
Collapse
|
18
|
Naito T, Suzuki K, Hirata J, Kamatani Y, Matsuda K, Toda T, Okada Y. A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nat Commun 2021; 12:1639. [PMID: 33712626 PMCID: PMC7955122 DOI: 10.1038/s41467-021-21975-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/19/2021] [Indexed: 01/31/2023] Open
Abstract
Conventional human leukocyte antigen (HLA) imputation methods drop their performance for infrequent alleles, which is one of the factors that reduce the reliability of trans-ethnic major histocompatibility complex (MHC) fine-mapping due to inter-ethnic heterogeneity in allele frequency spectra. We develop DEEP*HLA, a deep learning method for imputing HLA genotypes. Through validation using the Japanese and European HLA reference panels (n = 1,118 and 5,122), DEEP*HLA achieves the highest accuracies with significant superiority for low-frequency and rare alleles. DEEP*HLA is less dependent on distance-dependent linkage disequilibrium decay of the target alleles and might capture the complicated region-wide information. We apply DEEP*HLA to type 1 diabetes GWAS data from BioBank Japan (n = 62,387) and UK Biobank (n = 354,459), and successfully disentangle independently associated class I and II HLA variants with shared risk among diverse populations (the top signal at amino acid position 71 of HLA-DRβ1; P = 7.5 × 10-120). Our study illustrates the value of deep learning in genotype imputation and trans-ethnic MHC fine-mapping.
Collapse
Affiliation(s)
- Tatsuhiko Naito
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.26999.3d0000 0001 2151 536XDepartment of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ken Suzuki
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jun Hirata
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.419889.50000 0004 1779 3502Pharmaceutical Discovery Research Laboratories, Teijin Pharma Limited, Hino, Japan
| | - Yoichiro Kamatani
- grid.26999.3d0000 0001 2151 536XLaboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Koichi Matsuda
- grid.26999.3d0000 0001 2151 536XLaboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Tatsushi Toda
- grid.26999.3d0000 0001 2151 536XDepartment of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yukinori Okada
- grid.136593.b0000 0004 0373 3971Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan ,grid.136593.b0000 0004 0373 3971Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan ,grid.136593.b0000 0004 0373 3971Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
| |
Collapse
|
19
|
Domain randomization-enhanced deep learning models for bird detection. Sci Rep 2021; 11:639. [PMID: 33436851 PMCID: PMC7803967 DOI: 10.1038/s41598-020-80101-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 12/11/2020] [Indexed: 11/20/2022] Open
Abstract
Automatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.
Collapse
|
20
|
Abstract
Deep neural networks often achieve high predictive accuracy on biological problems, but it can be hard to contextualize how and explain why predictions are made. In this issue, Kuenzi et al. model the sensitivity of cancers to drugs using deep neural networks with a hierarchical structure derived from the Gene Ontology.
Collapse
Affiliation(s)
- Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA 19102, USA.
| | - James C Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|