1
|
Labory J, Njomgue-Fotso E, Bottini S. Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data. Comput Struct Biotechnol J 2024; 23:1274-1287. [PMID: 38560281 PMCID: PMC10979063 DOI: 10.1016/j.csbj.2024.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/12/2024] [Accepted: 03/18/2024] [Indexed: 04/04/2024] Open
Abstract
Objective Classification tasks are an open challenge in the field of biomedicine. While several machine-learning techniques exist to accomplish this objective, several peculiarities associated with biomedical data, especially when it comes to omics measurements, prevent their use or good performance achievements. Omics approaches aim to understand a complex biological system through systematic analysis of its content at the molecular level. On the other hand, omics data are heterogeneous, sparse and affected by the classical "curse of dimensionality" problem, i.e. having much fewer observation, samples (n) than omics features (p). Furthermore, a major problem with multi-omics data is the imbalance either at the class or feature level. The objective of this work is to study whether feature extraction and/or feature selection techniques can improve the performances of classification machine-learning algorithms on omics measurements. Methods Among all omics, metabolomics has emerged as a powerful tool in cancer research, facilitating a deeper understanding of the complex metabolic landscape associated with tumorigenesis and tumor progression. Thus, we selected three publicly available metabolomics datasets, and we applied several feature extraction techniques both linear and non-linear, coupled or not with feature selection methods, and evaluated the performances regarding patient classification in the different configurations for the three datasets. Results We provide general workflow and guidelines on when to use those techniques depending on the characteristics of the data available. To further test the extension of our approach to other omics data, we have included a transcriptomics and a proteomics data. Overall, for all datasets, we showed that applying supervised feature selection improves the performances of feature extraction methods for classification purposes. Scripts used to perform all analyses are available at: https://github.com/Plant-Net/Metabolomic_project/.
Collapse
Affiliation(s)
- Justine Labory
- Université Côte d′Azur, Center of Modeling Simulation and Interactions, Nice, France
- INRAE, Université Côte d′Azur, CNRS, Institut Sophia Agrobiotech, Sophia-Antipolis, France
- Université Côte d′Azur, Inserm U1081, CNRS UMR 7284, Institute for Research on Cancer and Aging, Nice (IRCAN), Nice, France
| | | | - Silvia Bottini
- Université Côte d′Azur, Center of Modeling Simulation and Interactions, Nice, France
- INRAE, Université Côte d′Azur, CNRS, Institut Sophia Agrobiotech, Sophia-Antipolis, France
| |
Collapse
|
2
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024; 23:549-560. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
3
|
Pfeifer B, Sirocchi C, Bloice MD, Kreuzthaler M, Urschler M. Federated unsupervised random forest for privacy-preserving patient stratification. Bioinformatics 2024; 40:ii198-ii207. [PMID: 39230698 PMCID: PMC11373406 DOI: 10.1093/bioinformatics/btae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing. RESULTS We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing. AVAILABILITY AND IMPLEMENTATION The proposed methods are available as an R-package (https://github.com/pievos101/uRF).
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Christel Sirocchi
- Department of Pure and Applied Sciences, University of Urbino, Urbino, 61029, Italy
| | - Marcus D Bloice
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Markus Kreuzthaler
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Martin Urschler
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| |
Collapse
|
4
|
Waqas A, Tripathi A, Ramachandran RP, Stewart PA, Rasool G. Multimodal data integration for oncology in the era of deep neural networks: a review. Front Artif Intell 2024; 7:1408843. [PMID: 39118787 PMCID: PMC11308435 DOI: 10.3389/frai.2024.1408843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open
Abstract
Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.
Collapse
Affiliation(s)
- Asim Waqas
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, United States
| | - Aakash Tripathi
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| | - Ravi P. Ramachandran
- Department of Electrical and Computer Engineering, Rowan University, Glassboro, NJ, United States
| | - Paul A. Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States
| | - Ghulam Rasool
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, United States
| |
Collapse
|
5
|
Singh SP, Yadav DK, Chamran MK, Perera DG. Intelligent mutation based evolutionary optimization algorithm for genomics and precision medicine. Funct Integr Genomics 2024; 24:128. [PMID: 39037544 DOI: 10.1007/s10142-024-01401-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/02/2023] [Accepted: 07/02/2024] [Indexed: 07/23/2024]
Abstract
In this paper, genomics and precision medicine have witnessed remarkable progress with the advent of high-throughput sequencing technologies and advances in data analytics. However, because of the data's great dimensionality and complexity, the processing and interpretation of large-scale genomic data present major challenges. In order to overcome these difficulties, this research suggests a novel Intelligent Mutation-Based Evolutionary Optimization Algorithm (IMBOA) created particularly for applications in genomics and precision medicine. In the proposed IMBOA, the mutation operator is guided by genome-based information, allowing for the introduction of variants in candidate solutions that are consistent with known biological processes. The algorithm's combination of Differential Evolution with this intelligent mutation mechanism enables effective exploration and exploitation of the solution space. Applying a domain-specific fitness function, the system evaluates potential solutions for each generation based on genomic correctness and fitness. The fitness function directs the search toward ideal solutions that achieve the problem's objectives, while the genome accuracy measure assures that the solutions have physiologically relevant genomic properties. This work demonstrates extensive tests on diverse genomics datasets, including genotype-phenotype association studies and predictive modeling tasks in precision medicine, to verify the accuracy of the proposed approach. The results demonstrate that, in terms of precision, convergence rate, mean error, standard deviation, prediction, and fitness cost of physiologically important genomic biomarkers, the IMBOA consistently outperforms other cutting-edge optimization methods.
Collapse
Affiliation(s)
| | | | | | - Darshika G Perera
- Department of Electrical & Computer Engineering, University of Colorado Colorado Springs, Colorado Springs, CO, 80918, USA
| |
Collapse
|
6
|
Vaparanta K, Merilahti JAM, Ojala VK, Elenius K. De Novo Multi-Omics Pathway Analysis Designed for Prior Data Independent Inference of Cell Signaling Pathways. Mol Cell Proteomics 2024; 23:100780. [PMID: 38703893 PMCID: PMC11259815 DOI: 10.1016/j.mcpro.2024.100780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 04/07/2024] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open
Abstract
New tools for cell signaling pathway inference from multi-omics data that are independent of previous knowledge are needed. Here, we propose a new de novo method, the de novo multi-omics pathway analysis (DMPA), to model and combine omics data into network modules and pathways. DMPA was validated with published omics data and was found accurate in discovering reported molecular associations in transcriptome, interactome, phosphoproteome, methylome, and metabolomics data, and signaling pathways in multi-omics data. DMPA was benchmarked against module discovery and multi-omics integration methods and outperformed previous methods in module and pathway discovery especially when applied to datasets of relatively low sample sizes. Transcription factor, kinase, subcellular location, and function prediction algorithms were devised for transcriptome, phosphoproteome, and interactome modules and pathways, respectively. To apply DMPA in a biologically relevant context, interactome, phosphoproteome, transcriptome, and proteome data were collected from analyses carried out using melanoma cells to address gamma-secretase cleavage-dependent signaling characteristics of the receptor tyrosine kinase TYRO3. The pathways modeled with DMPA reflected the predicted function and its direction in validation experiments.
Collapse
Affiliation(s)
- Katri Vaparanta
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland.
| | - Johannes A M Merilahti
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Veera K Ojala
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Klaus Elenius
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland.
| |
Collapse
|
7
|
Liu P, Page D, Ahlquist P, Ong IM, Gitter A. MPAC: a computational framework for inferring cancer pathway activities from multi-omic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599113. [PMID: 38948762 PMCID: PMC11212914 DOI: 10.1101/2024.06.15.599113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Fully capturing cellular state requires examining genomic, epigenomic, transcriptomic, proteomic, and other assays for a biological sample and comprehensive computational modeling to reason with the complex and sometimes conflicting measurements. Modeling these so-called multi-omic data is especially beneficial in disease analysis, where observations across omic data types may reveal unexpected patient groupings and inform clinical outcomes and treatments. We present Multi-omic Pathway Analysis of Cancer (MPAC), a computational framework that interprets multi-omic data through prior knowledge from biological pathways. MPAC uses network relationships encoded in pathways using a factor graph to infer consensus activity levels for proteins and associated pathway entities from multi-omic data, runs permutation testing to eliminate spurious activity predictions, and groups biological samples by pathway activities to prioritize proteins with potential clinical relevance. Using DNA copy number alteration and RNA-seq data from head and neck squamous cell carcinoma patients from The Cancer Genome Atlas as an example, we demonstrate that MPAC predicts a patient subgroup related to immune responses not identified by analysis with either input omic data type alone. Key proteins identified via this subgroup have pathway activities related to clinical outcome as well as immune cell compositions. Our MPAC R package, available at https://bioconductor.org/packages/MPAC, enables similar multi-omic analyses on new datasets.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Paul Ahlquist
- John and Jeanne Rowe Center for Research in Virology, Morgridge Institute for Research, Madison, Wisconsin, United States of America
- McArdle Laboratory for Cancer Research, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Institute for Molecular Virology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Center for Human Genomics and Precision Medicine, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- John and Jeanne Rowe Center for Research in Virology, Morgridge Institute for Research, Madison, Wisconsin, United States of America
| |
Collapse
|
8
|
Feyaerts D, Marić I, Arck PC, Prins JR, Gomez-Lopez N, Gaudillière B, Stelzer IA. Predicting Spontaneous Preterm Birth Using the Immunome. Clin Perinatol 2024; 51:441-459. [PMID: 38705651 DOI: 10.1016/j.clp.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Throughout pregnancy, the maternal peripheral circulation contains valuable information reflecting pregnancy progression, detectable as tightly regulated immune dynamics. Local immune processes at the maternal-fetal interface and other reproductive and non-reproductive tissues are likely to be the pacemakers for this peripheral immune "clock." This cellular immune status of pregnancy can be leveraged for the early risk assessment and prediction of spontaneous preterm birth (sPTB). Systems immunology approaches to sPTB subtypes and cross-tissue (local and peripheral) interactions, as well as integration of multiple biological data modalities promise to improve our understanding of preterm birth pathobiology and identify potential clinically actionable biomarkers.
Collapse
Affiliation(s)
- Dorien Feyaerts
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Stanford, CA 94305, USA
| | - Ivana Marić
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 453 Quarry Road, Palo Alto, CA 94304, USA
| | - Petra C Arck
- Department of Obstetrics and Fetal Medicine and Hamburg Center for Translational Immunology, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20251 Hamburg, Germany
| | - Jelmer R Prins
- Department of Obstetrics and Gynecology, University of Groningen, University Medical Center Groningen, Postbus 30.001, 9700RB, Groningen, The Netherlands
| | - Nardhy Gomez-Lopez
- Department of Obstetrics and Gynecology, Washington University School of Medicine, 425 S. Euclid Avenue, St. Louis, MO 63110, USA; Department of Pathology and Immunology, Washington University School of Medicine, 425 S. Euclid Avenue, St. Louis, MO 63110, USA
| | - Brice Gaudillière
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University, Stanford, CA 94305, USA; Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ina A Stelzer
- Department of Pathology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|
9
|
Marić I, Stevenson DK, Aghaeepour N, Gaudillière B, Wong RJ, Angst MS. Predicting Preterm Birth Using Proteomics. Clin Perinatol 2024; 51:391-409. [PMID: 38705648 PMCID: PMC11186213 DOI: 10.1016/j.clp.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
The complexity of preterm birth (PTB), both spontaneous and medically indicated, and its various etiologies and associated risk factors pose a significant challenge for developing tools to accurately predict risk. This review focuses on the discovery of proteomics signatures that might be useful for predicting spontaneous PTB or preeclampsia, which often results in PTB. We describe methods for proteomics analyses, proteomics biomarker candidates that have so far been identified, obstacles for discovering biomarkers that are sufficiently accurate for clinical use, and the derivation of composite signatures including clinical parameters to increase predictive power.
Collapse
Affiliation(s)
- Ivana Marić
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 453 Quarry Road, Palo Alto, CA 94304, USA.
| | - David K Stevenson
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 453 Quarry Road, Palo Alto, CA 94304, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Grant Building, Office 276A, 300 Pasteur Drive, Stanford, CA 94305-5117, USA; Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305, USA
| | - Brice Gaudillière
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Grant Building, Office 276A, 300 Pasteur Drive, Stanford, CA 94305-5117, USA; Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Grant S280, Stanford, CA 94305, USA
| | - Ronald J Wong
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, 453 Quarry Road, Palo Alto, CA 94304, USA
| | - Martin S Angst
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Grant Building, Office 276A, 300 Pasteur Drive, Stanford, CA 94305-5117, USA
| |
Collapse
|
10
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
11
|
Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024; 25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
12
|
Choi JM, Park C, Chae H. moSCminer: a cell subtype classification framework based on the attention neural network integrating the single-cell multi-omics dataset on the cloud. PeerJ 2024; 12:e17006. [PMID: 38426141 PMCID: PMC10903350 DOI: 10.7717/peerj.17006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets-gene expression, DNA methylation, and DNA accessibility-while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer's superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, Virginia, United States
| | - Chaelin Park
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| |
Collapse
|
13
|
Kazwini NE, Sanguinetti G. SHARE-Topic: Bayesian interpretable modeling of single-cell multi-omic data. Genome Biol 2024; 25:55. [PMID: 38395871 PMCID: PMC10885556 DOI: 10.1186/s13059-024-03180-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/31/2024] [Indexed: 02/25/2024] Open
Abstract
Multi-omic single-cell technologies, which simultaneously measure the transcriptional and epigenomic state of the same cell, enable understanding epigenetic mechanisms of gene regulation. However, noisy and sparse data pose fundamental statistical challenges to extract biological knowledge from complex datasets. SHARE-Topic, a Bayesian generative model of multi-omic single cell data using topic models, aims to address these challenges. SHARE-Topic identifies common patterns of co-variation between different omic layers, providing interpretable explanations for the data complexity. Tested on data from different technological platforms, SHARE-Topic provides low dimensional representations recapitulating known biology and defines associations between genes and distal regulators in individual cells.
Collapse
Affiliation(s)
- Nour El Kazwini
- Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy
| | - Guido Sanguinetti
- Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy.
| |
Collapse
|
14
|
Ren Y, Gao Y, Du W, Qiao W, Li W, Yang Q, Liang Y, Li G. Classifying breast cancer using multi-view graph neural network based on multi-omics data. Front Genet 2024; 15:1363896. [PMID: 38444760 PMCID: PMC10912483 DOI: 10.3389/fgene.2024.1363896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 02/02/2024] [Indexed: 03/07/2024] Open
Abstract
Introduction: As the evaluation indices, cancer grading and subtyping have diverse clinical, pathological, and molecular characteristics with prognostic and therapeutic implications. Although researchers have begun to study cancer differentiation and subtype prediction, most of relevant methods are based on traditional machine learning and rely on single omics data. It is necessary to explore a deep learning algorithm that integrates multi-omics data to achieve classification prediction of cancer differentiation and subtypes. Methods: This paper proposes a multi-omics data fusion algorithm based on a multi-view graph neural network (MVGNN) for predicting cancer differentiation and subtype classification. The model framework consists of a graph convolutional network (GCN) module for learning features from different omics data and an attention module for integrating multi-omics data. Three different types of omics data are used. For each type of omics data, feature selection is performed using methods such as the chi-square test and minimum redundancy maximum relevance (mRMR). Weighted patient similarity networks are constructed based on the selected omics features, and GCN is trained using omics features and corresponding similarity networks. Finally, an attention module integrates different types of omics features and performs the final cancer classification prediction. Results: To validate the cancer classification predictive performance of the MVGNN model, we conducted experimental comparisons with traditional machine learning models and currently popular methods based on integrating multi-omics data using 5-fold cross-validation. Additionally, we performed comparative experiments on cancer differentiation and its subtypes based on single omics data, two omics data, and three omics data. Discussion: This paper proposed the MVGNN model and it performed well in cancer classification prediction based on multiple omics data.
Collapse
Affiliation(s)
- Yanjiao Ren
- College of Information Technology, Smart Agriculture Research Institute, Jilin Agricultural University, Changchun, Jilin, China
| | - Yimeng Gao
- College of Information Technology, Smart Agriculture Research Institute, Jilin Agricultural University, Changchun, Jilin, China
| | - Wei Du
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Weibo Qiao
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Wei Li
- College of Information Technology, Smart Agriculture Research Institute, Jilin Agricultural University, Changchun, Jilin, China
| | - Qianqian Yang
- College of Information Technology, Smart Agriculture Research Institute, Jilin Agricultural University, Changchun, Jilin, China
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
| | - Gaoyang Li
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China
| |
Collapse
|
15
|
Abbasi EY, Deng Z, Ali Q, Khan A, Shaikh A, Reshan MSA, Sulaiman A, Alshahrani H. A machine learning and deep learning-based integrated multi-omics technique for leukemia prediction. Heliyon 2024; 10:e25369. [PMID: 38352790 PMCID: PMC10862685 DOI: 10.1016/j.heliyon.2024.e25369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/13/2023] [Accepted: 01/25/2024] [Indexed: 02/16/2024] Open
Abstract
In recent years, scientific data on cancer has expanded, providing potential for a better understanding of malignancies and improved tailored care. Advances in Artificial Intelligence (AI) processing power and algorithmic development position Machine Learning (ML) and Deep Learning (DL) as crucial players in predicting Leukemia, a blood cancer, using integrated multi-omics technology. However, realizing these goals demands novel approaches to harness this data deluge. This study introduces a novel Leukemia diagnosis approach, analyzing multi-omics data for accuracy using ML and DL algorithms. ML techniques, including Random Forest (RF), Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), Gradient Boosting (GB), and DL methods such as Recurrent Neural Networks (RNN) and Feedforward Neural Networks (FNN) are compared. GB achieved 97 % accuracy in ML, while RNN outperformed by achieving 98 % accuracy in DL. This approach filters unclassified data effectively, demonstrating the significance of DL for leukemia prediction. The testing validation was based on 17 different features such as patient age, sex, mutation type, treatment methods, chromosomes, and others. Our study compares ML and DL techniques and chooses the best technique that gives optimum results. The study emphasizes the implications of high-throughput technology in healthcare, offering improved patient care.
Collapse
Affiliation(s)
- Erum Yousef Abbasi
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Zhongliang Deng
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Qasim Ali
- Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
| | - Adil Khan
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Asadullah Shaikh
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| | - Mana Saleh Al Reshan
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
- Scientific and Engineering Research Centre, Najran University, Najran, 61441, Saudi Arabia
| | - Adel Sulaiman
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| | - Hani Alshahrani
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| |
Collapse
|
16
|
Boubnovski Martell M, Linton-Reid K, Hindocha S, Chen M, Moreno P, Álvarez-Benito M, Salvatierra Á, Lee R, Posma JM, Calzado MA, Aboagye EO. Deep representation learning of tissue metabolome and computed tomography annotates NSCLC classification and prognosis. NPJ Precis Oncol 2024; 8:28. [PMID: 38310164 PMCID: PMC10838282 DOI: 10.1038/s41698-024-00502-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/04/2024] [Indexed: 02/05/2024] Open
Abstract
The rich chemical information from tissue metabolomics provides a powerful means to elaborate tissue physiology or tumor characteristics at cellular and tumor microenvironment levels. However, the process of obtaining such information requires invasive biopsies, is costly, and can delay clinical patient management. Conversely, computed tomography (CT) is a clinical standard of care but does not intuitively harbor histological or prognostic information. Furthermore, the ability to embed metabolome information into CT to subsequently use the learned representation for classification or prognosis has yet to be described. This study develops a deep learning-based framework -- tissue-metabolomic-radiomic-CT (TMR-CT) by combining 48 paired CT images and tumor/normal tissue metabolite intensities to generate ten image embeddings to infer metabolite-derived representation from CT alone. In clinical NSCLC settings, we ascertain whether TMR-CT results in an enhanced feature generation model solving histology classification/prognosis tasks in an unseen international CT dataset of 742 patients. TMR-CT non-invasively determines histological classes - adenocarcinoma/squamous cell carcinoma with an F1-score = 0.78 and further asserts patients' prognosis with a c-index = 0.72, surpassing the performance of radiomics models and deep learning on single modality CT feature extraction. Additionally, our work shows the potential to generate informative biology-inspired CT-led features to explore connections between hard-to-obtain tissue metabolic profiles and routine lesion-derived image data.
Collapse
Affiliation(s)
| | | | - Sumeet Hindocha
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
| | - Mitchell Chen
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Paula Moreno
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Departamento de Cirugía Toráxica y Trasplante de Pulmón, Hospital Universitario Reina Sofía, Córdoba, 14014, Spain
| | - Marina Álvarez-Benito
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Ángel Salvatierra
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain
- Unidad de Radiodiagnóstico y Cáncer de Mama, Hospital Universitario Reina Sofía, Córdoba, 14004, Spain
| | - Richard Lee
- Early Diagnosis and Detection Centre, National Institute for Health and Care Research Biomedical Research Centre at the Royal Marsden and Institute of Cancer Research, London, SW3 6JJ, UK
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse Street, London, SW3 6LY, UK
| | - Joram M Posma
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK
| | - Marco A Calzado
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, 14004, Spain.
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, 14014, Spain.
| | - Eric O Aboagye
- Imperial College London Hammersmith Campus, London, SW7 2AZ, UK.
| |
Collapse
|
17
|
Reggiani F, El Rashed Z, Petito M, Pfeffer M, Morabito A, Tanda ET, Spagnolo F, Croce M, Pfeffer U, Amaro A. Machine Learning Methods for Gene Selection in Uveal Melanoma. Int J Mol Sci 2024; 25:1796. [PMID: 38339073 PMCID: PMC10855534 DOI: 10.3390/ijms25031796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 01/25/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Uveal melanoma (UM) is the most common primary intraocular malignancy with a limited five-year survival for metastatic patients. Limited therapeutic treatments are currently available for metastatic disease, even if the genomics of this tumor has been deeply studied using next-generation sequencing (NGS) and functional experiments. The profound knowledge of the molecular features that characterize this tumor has not led to the development of efficacious therapies, and the survival of metastatic patients has not changed for decades. Several bioinformatics methods have been applied to mine NGS tumor data in order to unveil tumor biology and detect possible molecular targets for new therapies. Each application can be single domain based while others are more focused on data integration from multiple genomics domains (as gene expression and methylation data). Examples of single domain approaches include differentially expressed gene (DEG) analysis on gene expression data with statistical methods such as SAM (significance analysis of microarray) or gene prioritization with complex algorithms such as deep learning. Data fusion or integration methods merge multiple domains of information to define new clusters of patients or to detect relevant genes, according to multiple NGS data. In this work, we compare different strategies to detect relevant genes for metastatic disease prediction in the TCGA uveal melanoma (UVM) dataset. Detected targets are validated with multi-gene score analysis on a larger UM microarray dataset.
Collapse
Affiliation(s)
- Francesco Reggiani
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Zeinab El Rashed
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Mariangela Petito
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
- Department of Experimental Medicine (DIMES), University of Genova, Via Leon Battista Alberti, 16132 Genova, Italy
| | - Max Pfeffer
- Institute of Numerical and Applied Mathematics, University of Göttingen, 37083 Göttingen, Germany;
| | - Anna Morabito
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Enrica Teresa Tanda
- Skin Cancer Unit, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (E.T.T.); (F.S.)
- Department of Internal Medicine and Medical Specialties, University of Genova, Viale Benedetto XV, 16132 Genova, Italy
| | - Francesco Spagnolo
- Skin Cancer Unit, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (E.T.T.); (F.S.)
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genova, 16132 Genova, Italy
| | - Michela Croce
- Biotherapies, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy;
| | - Ulrich Pfeffer
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| | - Adriana Amaro
- Laboratory of Gene Expression Regulation, IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy; (F.R.); (M.P.); (A.M.)
| |
Collapse
|
18
|
Wang J, Wen Y, Zhang Y, Wang Z, Jiang Y, Dai C, Wu L, Leng D, He S, Bo X. An interpretable artificial intelligence framework for designing synthetic lethality-based anti-cancer combination therapies. J Adv Res 2023:S2090-1232(23)00374-0. [PMID: 38043609 DOI: 10.1016/j.jare.2023.11.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/27/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023] Open
Abstract
INTRODUCTION Synthetic lethality (SL) provides an opportunity to leverage different genetic interactions when designing synergistic combination therapies. To further explore SL-based combination therapies for cancer treatment, it is important to identify and mechanistically characterize more SL interactions. Artificial intelligence (AI) methods have recently been proposed for SL prediction, but the results of these models are often not interpretable such that deriving the underlying mechanism can be challenging. OBJECTIVES This study aims to develop an interpretable AI framework for SL prediction and subsequently utilize it to design SL-based synergistic combination therapies. METHODS We propose a knowledge and data dual-driven AI framework for SL prediction (KDDSL). Specifically, we use gene knowledge related to the SL mechanism to guide the construction of the model and develop a method to identify the most relevant gene knowledge for the predicted results. RESULTS Experimental and literature-based validation confirmed a good balance between predictive and interpretable ability when using KDDSL. Moreover, we demonstrated that KDDSL could help to discover promising drug combinations and clarify associated biological processes, such as the combination of MDM2 and CDK9 inhibitors, which exhibited significant anti-cancer effects in vitro and in vivo. CONCLUSION These data underscore the potential of KDDSL to guide SL-based combination therapy design. There is a need for biomedicine-focused AI strategies to combine rational biological knowledge with developed models.
Collapse
Affiliation(s)
- Jing Wang
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Zhongming Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Yuyang Jiang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Chong Dai
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| |
Collapse
|
19
|
Li W, Huang Q, Peng Y, Pan S, Hu M, Wang P, He Y. A deep learning approach based on multi-omics data integration to construct a risk stratification prediction model for skin cutaneous melanoma. J Cancer Res Clin Oncol 2023; 149:15923-15938. [PMID: 37673824 DOI: 10.1007/s00432-023-05358-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/26/2023] [Indexed: 09/08/2023]
Abstract
PURPOSE Skin cutaneous melanoma (SKCM) is a highly aggressive melanocytic carcinoma whose high heterogeneity and complex etiology make its prognosis difficult to predict. This study aimed to construct a risk subtype typing model for SKCM. METHODS The study proposes a deep learning framework combining early fusion feature autoencoder (AE) and late fusion feature AE for risk subtype prediction of SKCM. The deep learning framework integrates mRNA, miRNA, and DNA methylation data of SKCM patients from The Cancer Genome Atlas (TCGA), and clusters the screened multi-omics features associated with survival prognosis to identify risk subtypes. Differential expression analysis and functional enrichment analysis were performed between risk subtypes, while SVM classifiers were constructed between differentially expressed genes (DEGs) obtained by Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression screening and risk subtype labels inferred from multi-omics data, and the predictive robustness of risk subtypes inferred from the risk subtype classification prediction model was validated using two independent datasets. RESULTS The deep learning framework that combined early fusion feature AE with late fusion feature AE distinguished the two best risk subtypes compared to the multi-omics integration approach with single strategy AE or PCA. A promising C-index (C-index = 0.748) and a significant difference in survival (log-rank P value = 4.61 × 10-9) were found between the identified risk subtypes. The DEGs with the top significance values together with differentially expressed miRNAs provided the biological interpretation of risk subtypes on SKCM. Finally, the framework was applied to predict risk subtypes in two independent test datasets of SKCM patients, all of which showed good predictive power (C-index > 0.680) and significant survival differences (log-rank P value < 0.01). CONCLUSION The SKCM risk subtypes identified by integrating multi-omics data based on deep learning can not only improve the understanding of the molecular mechanisms of SKCM, but also provide clinicians with assistance in treatment decisions.
Collapse
Affiliation(s)
- Weijia Li
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Qiao Huang
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Yi Peng
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Suyue Pan
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Min Hu
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Pu Wang
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China
| | - Yuqing He
- Department of Epidemiology and Medical Statistics, Institute of Medical Systems Biology, Guangdong Medical University, Dongguan, Guangdong, China.
- Dongguan Liaobu Hospital, Dongguan, Guangdong, China.
| |
Collapse
|
20
|
Han X, Wang B, Situ C, Qi Y, Zhu H, Li Y, Guo X. scapGNN: A graph neural network-based framework for active pathway and gene module inference from single-cell multi-omics data. PLoS Biol 2023; 21:e3002369. [PMID: 37956172 PMCID: PMC10681325 DOI: 10.1371/journal.pbio.3002369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 11/27/2023] [Accepted: 10/07/2023] [Indexed: 11/15/2023] Open
Abstract
Although advances in single-cell technologies have enabled the characterization of multiple omics profiles in individual cells, extracting functional and mechanistic insights from such information remains a major challenge. Here, we present scapGNN, a graph neural network (GNN)-based framework that creatively transforms sparse single-cell profile data into the stable gene-cell association network for inferring single-cell pathway activity scores and identifying cell phenotype-associated gene modules from single-cell multi-omics data. Systematic benchmarking demonstrated that scapGNN was more accurate, robust, and scalable than state-of-the-art methods in various downstream single-cell analyses such as cell denoising, batch effect removal, cell clustering, cell trajectory inference, and pathway or gene module identification. scapGNN was developed as a systematic R package that can be flexibly extended and enhanced for existing analysis processes. It provides a new analytical platform for studying single cells at the pathway and network levels.
Collapse
Affiliation(s)
- Xudong Han
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Bing Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Chenghao Situ
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yaling Qi
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Hui Zhu
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| | - Yan Li
- Department of Clinical Laboratory, Sir Run Run Hospital, Nanjing Medical University, Nanjing, China
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine and Offspring Health, School of Medicine, Southeast University, Nanjing, China
- Department of Histology and Embryology, State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
21
|
Chen Y, Wen Y, Xie C, Chen X, He S, Bo X, Zhang Z. MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning. iScience 2023; 26:107378. [PMID: 37559907 PMCID: PMC10407241 DOI: 10.1016/j.isci.2023.107378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 08/11/2023] Open
Abstract
Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.
Collapse
Affiliation(s)
- Yuxin Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chenyang Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xinjian Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
22
|
Yang L, Wang J, Altreuter J, Jhaveri A, Wong CJ, Song L, Fu J, Taing L, Bodapati S, Sahu A, Tokheim C, Zhang Y, Zeng Z, Bai G, Tang M, Qiu X, Long HW, Michor F, Liu Y, Liu XS. Tutorial: integrative computational analysis of bulk RNA-sequencing data to characterize tumor immunity using RIMA. Nat Protoc 2023; 18:2404-2414. [PMID: 37391666 DOI: 10.1038/s41596-023-00841-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/22/2023] [Indexed: 07/02/2023]
Abstract
RNA-sequencing (RNA-seq) has become an increasingly cost-effective technique for molecular profiling and immune characterization of tumors. In the past decade, many computational tools have been developed to characterize tumor immunity from gene expression data. However, the analysis of large-scale RNA-seq data requires bioinformatics proficiency, large computational resources and cancer genomics and immunology knowledge. In this tutorial, we provide an overview of computational analysis of bulk RNA-seq data for immune characterization of tumors and introduce commonly used computational tools with relevance to cancer immunology and immunotherapy. These tools have diverse functions such as evaluation of expression signatures, estimation of immune infiltration, inference of the immune repertoire, prediction of immunotherapy response, neoantigen detection and microbiome quantification. We describe the RNA-seq IMmune Analysis (RIMA) pipeline integrating many of these tools to streamline RNA-seq analysis. We also developed a comprehensive and user-friendly guide in the form of a GitBook with text and video demos to assist users in analyzing bulk RNA-seq data for immune characterization at both individual sample and cohort levels by using RIMA.
Collapse
Affiliation(s)
- Lin Yang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jin Wang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- School of Life Science and Technology, Tongji University, Shanghai, China
| | - Jennifer Altreuter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Aashna Jhaveri
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Cheryl J Wong
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Li Song
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Jingxin Fu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- School of Life Science and Technology, Tongji University, Shanghai, China
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Len Taing
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sudheshna Bodapati
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Avinash Sahu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Collin Tokheim
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Yi Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Zexian Zeng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Gali Bai
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Ming Tang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Xintao Qiu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Henry W Long
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Franziska Michor
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
- Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, MA, USA
- The Ludwig Center at Harvard, Boston, MA, USA
| | - Yang Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
23
|
Lee M. Deep Learning Techniques with Genomic Data in Cancer Prognosis: A Comprehensive Review of the 2021-2023 Literature. BIOLOGY 2023; 12:893. [PMID: 37508326 PMCID: PMC10376033 DOI: 10.3390/biology12070893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023]
Abstract
Deep learning has brought about a significant transformation in machine learning, leading to an array of novel methodologies and consequently broadening its influence. The application of deep learning in various sectors, especially biomedical data analysis, has initiated a period filled with noteworthy scientific developments. This trend has majorly influenced cancer prognosis, where the interpretation of genomic data for survival analysis has become a central research focus. The capacity of deep learning to decode intricate patterns embedded within high-dimensional genomic data has provoked a paradigm shift in our understanding of cancer survival. Given the swift progression in this field, there is an urgent need for a comprehensive review that focuses on the most influential studies from 2021 to 2023. This review, through its careful selection and thorough exploration of dominant trends and methodologies, strives to fulfill this need. The paper aims to enhance our existing understanding of applications of deep learning in cancer survival analysis, while also highlighting promising directions for future research. This paper undertakes aims to enrich our existing grasp of the application of deep learning in cancer survival analysis, while concurrently shedding light on promising directions for future research in this vibrant and rapidly proliferating field.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
24
|
Wu X, Yan H, Qiu M, Qu X, Wang J, Xu S, Zheng Y, Ge M, Yan L, Liang L. Comprehensive characterization of tumor microenvironment in colorectal cancer via molecular analysis. eLife 2023; 12:e86032. [PMID: 37267125 PMCID: PMC10238095 DOI: 10.7554/elife.86032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 05/10/2023] [Indexed: 06/04/2023] Open
Abstract
Colorectal cancer (CRC) remains a challenging and deadly disease with high tumor microenvironment (TME) heterogeneity. Using an integrative multi-omics analysis and artificial intelligence-enabled spatial analysis of whole-slide images, we performed a comprehensive characterization of TME in colorectal cancer (CCCRC). CRC samples were classified into four CCCRC subtypes with distinct TME features, namely, C1 as the proliferative subtype with low immunogenicity; C2 as the immunosuppressed subtype with the terminally exhausted immune characteristics; C3 as the immune-excluded subtype with the distinct upregulation of stromal components and a lack of T cell infiltration in the tumor core; and C4 as the immunomodulatory subtype with the remarkable upregulation of anti-tumor immune components. The four CCCRC subtypes had distinct histopathologic and molecular characteristics, therapeutic efficacy, and prognosis. We found that the C1 subtype may be suitable for chemotherapy and cetuximab, the C2 subtype may benefit from a combination of chemotherapy and bevacizumab, the C3 subtype has increased sensitivity to the WNT pathway inhibitor WIKI4, and the C4 subtype is a potential candidate for immune checkpoint blockade treatment. Importantly, we established a simple gene classifier for accurate identification of each CCCRC subtype. Collectively our integrative analysis ultimately established a holistic framework to thoroughly dissect the TME of CRC, and the CCCRC classification system with high biological interpretability may contribute to biomarker discovery and future clinical trial design.
Collapse
Affiliation(s)
- Xiangkun Wu
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Hong Yan
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of ChinaHefeiChina
| | - Mingxing Qiu
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Xiaoping Qu
- Nanjing Simcere Medical Laboratory Science Co., LtdNanjingChina
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., LtdNanjingChina
| | - Jing Wang
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Shaowan Xu
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Yiran Zheng
- Nanjing Simcere Medical Laboratory Science Co., LtdNanjingChina
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., LtdNanjingChina
| | - Minghui Ge
- Nanjing Simcere Medical Laboratory Science Co., LtdNanjingChina
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., LtdNanjingChina
| | - Linlin Yan
- Nanjing Simcere Medical Laboratory Science Co., LtdNanjingChina
- State Key Laboratory of Translational Medicine and Innovative Drug Development, Jiangsu Simcere Diagnostics Co., LtdNanjingChina
| | - Li Liang
- Department of Pathology, Nanfang Hospital/School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Department of Pathology and Guangdong Province Key Laboratory of Molecular Tumor Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Jinfeng LaboratoryChongqingChina
| |
Collapse
|
25
|
Li X, Yang L, Jiao X. Deep learning-based multiomics integration model for predicting axillary lymph node metastasis in breast cancer. Future Oncol 2023; 19:1429-1438. [PMID: 37489287 DOI: 10.2217/fon-2023-0070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023] Open
Abstract
Aim: To develop a deep learning-based multiomics integration model. Materials & methods: Five types of omics data (mRNA, DNA methylation, miRNA, copy number variation and protein expression) were used to build a deep learning-based multiomics integration model via a deep neural network, incorporating an attention mechanism that adaptively considers the weights of multiomics features. Results: Compared with other methods, the deep learning-based multiomics integration model achieved remarkable results, with an area under the curve of 0.89 (95% CI: 0.863-0.910). Conclusion: The deep learning-based multiomics integration model achieved promising results and is an effective method for predicting axillary lymph node metastasis in breast cancer.
Collapse
Affiliation(s)
- Xue Li
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| | - Lifeng Yang
- College of Computer Science & Technology, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| | - Xiong Jiao
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, People's Republic of China
| |
Collapse
|
26
|
Qi G, Zou H, Peng X, He S, Zhang Q, Ye W, Jiang Y, Wang W, Ren G, Qu X. Metabolic Footprinting-Based DNA-AuNP Encoders for Extracellular Metabolic Response Profiling. Anal Chem 2023; 95:8088-8096. [PMID: 37155931 DOI: 10.1021/acs.analchem.3c01109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Metabolic footprinting as a convenient and non-invasive cell metabolomics strategy relies on monitoring the whole extracellular metabolic process. It covers nutrient consumption and metabolite secretion of in vitro cell culture, which is hindered by low universality owing to pre-treatment of the cell medium and special equipment. Here, we report the design and a variety of applicability, for quantifying extracellular metabolism, of fluorescently labeled single-stranded DNA (ssDNA)-AuNP encoders, whose multi-modal signal response is triggered by extracellular metabolites. We constructed metabolic response profiling of cells by detecting extracellular metabolites in different tumor cells and drug-induced extracellular metabolites. We further assessed the extracellular metabolism differences using a machine learning algorithm. This metabolic response profiling based on the DNA-AuNP encoder strategy is a powerful complement to metabolic footprinting, which significantly applies potential non-invasive identification of tumor cell heterogeneity.
Collapse
Affiliation(s)
- Guangpei Qi
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | - Haixia Zou
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | | | - Shiliang He
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen 518118, China
| | - Qiqi Zhang
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | - Wei Ye
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | - Yizhou Jiang
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | - Wentao Wang
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| | - Guangli Ren
- Department of Pediatrics, General Hospital of Southern Theater Command of PLA, Guangzhou 510010, China
| | - Xiangmeng Qu
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province and School of Biomedical Engineering, Sun Yat-Sen University, Shenzhen 518107, China
| |
Collapse
|
27
|
Wu Z, Lohmöller J, Kuhl C, Wehrle K, Jankowski J. Use of Computation Ecosystems to Analyze the Kidney-Heart Crosstalk. Circ Res 2023; 132:1084-1100. [PMID: 37053282 DOI: 10.1161/circresaha.123.321765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
The identification of mediators for physiologic processes, correlation of molecular processes, or even pathophysiological processes within a single organ such as the kidney or heart has been extensively studied to answer specific research questions using organ-centered approaches in the past 50 years. However, it has become evident that these approaches do not adequately complement each other and display a distorted single-disease progression, lacking holistic multilevel/multidimensional correlations. Holistic approaches have become increasingly significant in understanding and uncovering high dimensional interactions and molecular overlaps between different organ systems in the pathophysiology of multimorbid and systemic diseases like cardiorenal syndrome because of pathological heart-kidney crosstalk. Holistic approaches to unraveling multimorbid diseases are based on the integration, merging, and correlation of extensive, heterogeneous, and multidimensional data from different data sources, both -omics and nonomics databases. These approaches aimed at generating viable and translatable disease models using mathematical, statistical, and computational tools, thereby creating first computational ecosystems. As part of these computational ecosystems, systems medicine solutions focus on the analysis of -omics data in single-organ diseases. However, the data-scientific requirements to address the complexity of multimodality and multimorbidity reach far beyond what is currently available and require multiphased and cross-sectional approaches. These approaches break down complexity into small and comprehensible challenges. Such holistic computational ecosystems encompass data, methods, processes, and interdisciplinary knowledge to manage the complexity of multiorgan crosstalk. Therefore, this review summarizes the current knowledge of kidney-heart crosstalk, along with methods and opportunities that arise from the novel application of computational ecosystems providing a holistic analysis on the example of kidney-heart crosstalk.
Collapse
Affiliation(s)
- Zhuojun Wu
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Johannes Lohmöller
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Christiane Kuhl
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Klaus Wehrle
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Joachim Jankowski
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Pathology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, The Netherlands (J.J.)
- Aachen-Maastricht Institute for Cardiorenal Disease (AMICARE), University Hospital Rheinisch-Westfälische Technische Hochschule Aachen, Germany (J.J.)
| |
Collapse
|
28
|
Guttà C, Morhard C, Rehm M. Applying a GAN-based classifier to improve transcriptome-based prognostication in breast cancer. PLoS Comput Biol 2023; 19:e1011035. [PMID: 37011102 PMCID: PMC10101642 DOI: 10.1371/journal.pcbi.1011035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 04/13/2023] [Accepted: 03/17/2023] [Indexed: 04/05/2023] Open
Abstract
Established prognostic tests based on limited numbers of transcripts can identify high-risk breast cancer patients, yet are approved only for individuals presenting with specific clinical features or disease characteristics. Deep learning algorithms could hold potential for stratifying patient cohorts based on full transcriptome data, yet the development of robust classifiers is hampered by the number of variables in omics datasets typically far exceeding the number of patients. To overcome this hurdle, we propose a classifier based on a data augmentation pipeline consisting of a Wasserstein generative adversarial network (GAN) with gradient penalty and an embedded auxiliary classifier to obtain a trained GAN discriminator (T-GAN-D). Applied to 1244 patients of the METABRIC breast cancer cohort, this classifier outperformed established breast cancer biomarkers in separating low- from high-risk patients (disease specific death, progression or relapse within 10 years from initial diagnosis). Importantly, the T-GAN-D also performed across independent, merged transcriptome datasets (METABRIC and TCGA-BRCA cohorts), and merging data improved overall patient stratification. In conclusion, the reiterative GAN-based training process allowed generating a robust classifier capable of stratifying low- vs high-risk patients based on full transcriptome data and across independent and heterogeneous breast cancer cohorts.
Collapse
Affiliation(s)
- Cristiano Guttà
- Institute of Cell Biology and Immunology, University of Stuttgart, Stuttgart, Germany
| | | | - Markus Rehm
- Institute of Cell Biology and Immunology, University of Stuttgart, Stuttgart, Germany
- Stuttgart Research Center Systems Biology, University of Stuttgart, Stuttgart, Germany
| |
Collapse
|
29
|
Zhao J, Zhao B, Song X, Lyu C, Chen W, Xiong Y, Wei DQ. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data. Brief Bioinform 2023; 24:7005165. [PMID: 36702755 DOI: 10.1093/bib/bbad025] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/21/2022] [Accepted: 01/08/2023] [Indexed: 01/28/2023] Open
Abstract
Due to the high heterogeneity and complexity of cancers, patients with different cancer subtypes often have distinct groups of genomic and clinical characteristics. Therefore, the discovery and identification of cancer subtypes are crucial to cancer diagnosis, prognosis and treatment. Recent technological advances have accelerated the increasing availability of multi-omics data for cancer subtyping. To take advantage of the complementary information from multi-omics data, it is necessary to develop computational models that can represent and integrate different layers of data into a single framework. Here, we propose a decoupled contrastive clustering method (Subtype-DCC) based on multi-omics data integration for clustering to identify cancer subtypes. The idea of contrastive learning is introduced into deep clustering based on deep neural networks to learn clustering-friendly representations. Experimental results demonstrate the superior performance of the proposed Subtype-DCC model in identifying cancer subtypes over the currently available state-of-the-art clustering methods. The strength of Subtype-DCC is also supported by the survival and clinical analysis.
Collapse
Affiliation(s)
- Jing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaotong Song
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chujun Lyu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Weizhi Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nayang, Henan, 473006, China
| |
Collapse
|
30
|
Carrion J, Nandakumar R, Shi X, Gu H, Kim Y, Raskind WH, Peter B, Dinu V. A data-fusion approach to identifying developmental dyslexia from multi-omics datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530280. [PMID: 36909570 PMCID: PMC10002702 DOI: 10.1101/2023.02.27.530280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5-10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models.
Collapse
Affiliation(s)
- Jackson Carrion
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Rohit Nandakumar
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Xiaojian Shi
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
- Cellular and Molecular Physiology Department, Yale School of Medicine, New Haven, CT 06510
| | - Haiwei Gu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
- Center for Translational Science, Florida International University, Port St. Lucie, FL 34987
| | - Yookyung Kim
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Wendy H Raskind
- Department of Medicine/Medical Genetics, University of Washington, Seattle, WA 98105
| | - Beate Peter
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| | - Valentin Dinu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004
| |
Collapse
|
31
|
Jardim SR, de Souza LMP, de Souza HSP. The Rise of Gastrointestinal Cancers as a Global Phenomenon: Unhealthy Behavior or Progress? INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3640. [PMID: 36834334 PMCID: PMC9962127 DOI: 10.3390/ijerph20043640] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/09/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The overall burden of cancer is rapidly increasing worldwide, reflecting not only population growth and aging, but also the prevalence and spread of risk factors. Gastrointestinal (GI) cancers, including stomach, liver, esophageal, pancreatic, and colorectal cancers, represent more than a quarter of all cancers. While smoking and alcohol use are the risk factors most commonly associated with cancer development, a growing consensus also includes dietary habits as relevant risk factors for GI cancers. Current evidence suggests that socioeconomic development results in several lifestyle modifications, including shifts in dietary habits from local traditional diets to less-healthy Western diets. Moreover, recent data indicate that increased production and consumption of processed foods underlies the current pandemics of obesity and related metabolic disorders, which are directly or indirectly associated with the emergence of various chronic noncommunicable conditions and GI cancers. However, environmental changes are not restricted to dietary patterns, and unhealthy behavioral features should be analyzed with a holistic view of lifestyle. In this review, we discussed the epidemiological aspects, gut dysbiosis, and cellular and molecular characteristics of GI cancers and explored the impact of unhealthy behaviors, diet, and physical activity on developing GI cancers in the context of progressive societal changes.
Collapse
Affiliation(s)
- Silvia Rodrigues Jardim
- Division of Worker’s Health, Universidade Federal do Rio de Janeiro, Rio de Janeiro 22290-140, RJ, Brazil
| | - Lucila Marieta Perrotta de Souza
- Departamento de Clínica Médica, Hospital Universitário, Universidade Federal do Rio de Janeiro, Rua Prof. Rodolpho Paulo Rocco 255, Ilha do Fundão, Rio de Janeiro 21941-913, RJ, Brazil
| | - Heitor Siffert Pereira de Souza
- Departamento de Clínica Médica, Hospital Universitário, Universidade Federal do Rio de Janeiro, Rua Prof. Rodolpho Paulo Rocco 255, Ilha do Fundão, Rio de Janeiro 21941-913, RJ, Brazil
- D’Or Institute for Research and Education (IDOR), Rua Diniz Cordeiro 30, Botafogo, Rio de Janeiro 22281-100, RJ, Brazil
| |
Collapse
|
32
|
Wang S, Wang S, Wang Z. A survey on multi-omics-based cancer diagnosis using machine learning with the potential application in gastrointestinal cancer. Front Med (Lausanne) 2023; 9:1109365. [PMID: 36703893 PMCID: PMC9871466 DOI: 10.3389/fmed.2022.1109365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 12/28/2022] [Indexed: 01/12/2023] Open
Abstract
Gastrointestinal cancer is becoming increasingly common, which leads to over 3 million deaths every year. No typical symptoms appear in the early stage of gastrointestinal cancer, posing a significant challenge in the diagnosis and treatment of patients with gastrointestinal cancer. Many patients are in the middle and late stages of gastrointestinal cancer when they feel uncomfortable, unfortunately, most of them will die of gastrointestinal cancer. Recently, various artificial intelligence techniques like machine learning based on multi-omics have been presented for cancer diagnosis and treatment in the era of precision medicine. This paper provides a survey on multi-omics-based cancer diagnosis using machine learning with potential application in gastrointestinal cancer. Particularly, we make a comprehensive summary and analysis from the perspective of multi-omics datasets, task types, and multi-omics-based integration methods. Furthermore, this paper points out the remaining challenges of multi-omics-based cancer diagnosis using machine learning and discusses future topics.
Collapse
Affiliation(s)
- Suixue Wang
- School of Information and Communication Engineering, Hainan University, Haikou, China
| | - Shuling Wang
- Department of Neurology, Affiliated Haikou Hospital of Xiangya School of Medicine, Central South University, Haikou, China
| | - Zhengxia Wang
- School of Computer Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
33
|
Chen G, Yu R, Chen X. Editorial: Integrative analysis of single-cell and/or bulk multi-omics sequencing data. Front Genet 2023; 13:1121999. [PMID: 36685891 PMCID: PMC9845394 DOI: 10.3389/fgene.2022.1121999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Affiliation(s)
- Geng Chen
- Stemirna Therapeutics Co., Ltd., Shanghai, China,*Correspondence: Geng Chen,
| | - Rongshan Yu
- Department of Computer Science, School of Informatics, Xiamen University, Xiamen, China
| | - Xingdong Chen
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences, Fudan University, Shanghai, China
| |
Collapse
|
34
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
- Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China
| |
Collapse
|
35
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|