1
|
Chen F, Peng W, Dai W, Wei S, Fu X, Liu L, Liu L. Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration. Health Inf Sci Syst 2024; 12:12. [PMID: 38404715 PMCID: PMC10891026 DOI: 10.1007/s13755-024-00274-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 01/09/2024] [Indexed: 02/27/2024] Open
Abstract
Cancer is one of the most deadly diseases in the world. Accurate cancer subtype classification is critical for patient diagnosis, treatment, and prognosis. Ever-increasing multi-omics data describes the characteristics of the patients from different views and serves as complementary information to promote cancer subtype identification. However, omics data generally have different distributions and high dimensions. How to effectively integrate multiple omics data to classify cancer subtypes accurately is a challenge for researchers. This work proposes a method integrating multi-omics data based on supervised graph contrast learning (MCRGCN) to classify cancer subtypes. The method considers the unique feature distribution of each omics data and the interaction of different omics data features to improve the accuracy of cancer subtype classification. To achieve this, MCRGCN first constructs different sample networks based on the multi-omics data of the samples. Then, it puts the omics data and adjacency matrix of the sample into different residual graph convolution models to get multi-omics features of the samples, which are trained with a supervised comparison loss to maintain that the sample features of each omics should be as consistent as possible. Finally, we input the sample features combining multi-omics features into a classifier to obtain the cancer subtypes. We applied MCRGCN to the invasive breast carcinoma (BRCA) and glioblastoma multiforme (GBM) datasets, integrating gene expression, miRNA expression, and DNA methylation data. The results demonstrate that our model is superior to other methods in integrating multi-omics data. Moreover, the results of survival analysis experiments demonstrate that the cancer subtypes identified by our model have significant clinical features. Furthermore, our model can help to identify potential biomarkers and pathways associated with cancer subtypes.
Collapse
Affiliation(s)
- Fangxu Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Shoulin Wei
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Xiaodong Fu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Li Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Lijun Liu
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
- Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| |
Collapse
|
2
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024; 23:549-560. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
3
|
Rintala TJ, Fortino V. COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms. PLoS Comput Biol 2024; 20:e1012275. [PMID: 39102448 PMCID: PMC11326705 DOI: 10.1371/journal.pcbi.1012275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 08/15/2024] [Accepted: 06/25/2024] [Indexed: 08/07/2024] Open
Abstract
Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. Clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different survival outcomes in several case studies.
Collapse
Affiliation(s)
- Teemu J. Rintala
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
4
|
Xie M, Kuang Y, Song M, Bao E. Subtype-MGTP: a cancer subtype identification framework based on multi-omics translation. Bioinformatics 2024; 40:btae360. [PMID: 38857453 PMCID: PMC11194476 DOI: 10.1093/bioinformatics/btae360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/30/2024] [Accepted: 06/07/2024] [Indexed: 06/12/2024] Open
Abstract
MOTIVATION The identification of cancer subtypes plays a crucial role in cancer research and treatment. With the rapid development of high-throughput sequencing technologies, there has been an exponential accumulation of cancer multi-omics data. Integrating multi-omics data has emerged as a cost-effective and efficient strategy for cancer subtyping. While current methods primarily rely on genomics data, protein expression data offers a closer representation of phenotype. Therefore, integrating protein expression data holds promise for enhancing subtyping accuracy. However, the scarcity of protein expression data compared to genomics data presents a challenge in its direct incorporation into existing methods. Moreover, striking a balance between omics-specific learning and cross-omics learning remains a prevalent challenge in current multi-omics integration methods. RESULTS We introduce Subtype-MGTP, a novel cancer subtyping framework based on the translation of Multiple Genomics To Proteomics. Subtype-MGTP comprises two modules: a translation module, which leverages available protein data to translate multi-type genomics data into predicted protein expression data, and an improved deep subspace clustering module, which integrates contrastive learning to cluster the predicted protein data, yielding refined subtyping results. Extensive experiments conducted on benchmark datasets demonstrate that Subtype-MGTP outperforms nine state-of-the-art cancer subtyping methods. The interpretability of clustering results is further supported by the clinical and survival analysis. Subtype-MGTP also exhibits strong robustness against varying rates of missing protein data and demonstrates distinct advantages in integrating multi-omics data with imbalanced multi-omics data. AVAILABILITY AND IMPLEMENTATION The code and results are available at https://github.com/kybinn/Subtype-MGTP.
Collapse
Affiliation(s)
- Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
- Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), Changsha 410081, China
- College of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China
| | - Yabin Kuang
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
| | - Mengyun Song
- College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
| | - Ergude Bao
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
5
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
6
|
Shannon CP, Lee AH, Tebbutt SJ, Singh A. A Commentary on Multi-omics Data Integration in Systems Vaccinology. J Mol Biol 2024; 436:168522. [PMID: 38458605 DOI: 10.1016/j.jmb.2024.168522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Affiliation(s)
| | - Amy Hy Lee
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, Canada; Department of Medicine, The University of British Columbia, Vancouver, Canada; Centre for Heart Lung Innovation, Vancouver, Canada
| | - Amrit Singh
- Centre for Heart Lung Innovation, Vancouver, Canada; Department of Anesthesiology, Pharmacology and Therapeutics, The University of British Columbia, Vancouver, Canada.
| |
Collapse
|
7
|
Moingeon P, Garbay C, Dahan M, Fermont I, Benmakhlouf A, Gouyette A, Poitou P, Saint-Pierre A. [The revolution of AI in drug development]. Med Sci (Paris) 2024; 40:369-376. [PMID: 38651962 DOI: 10.1051/medsci/2024028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024] Open
Abstract
Artificial intelligence and machine learning enable the construction of predictive models, which are currently used to assist in decision-making throughout the process of drug discovery and development. These computational models can be used to represent the heterogeneity of a disease, identify therapeutic targets, design and optimize drug candidates, and evaluate the efficacy of these drugs on virtual patients or digital twins. By combining detailed patient characteristics with the prediction of potential drug-candidate properties, artificial intelligence promotes the emergence of a "computational" precision medicine, allowing for more personalized treatments, better tailored to patient specificities with the aid of such predictive models. Based on such new capabilities, a mixed reality approach to the development of new drugs is being adopted by the pharmaceutical industry, which integrates the outputs of predictive virtual models with real-world empirical studies.
Collapse
|
8
|
Costes V, Sellem E, Marthey S, Hoze C, Bonnet A, Schibler L, Kiefer H, Jaffrezic F. Multi-omics data integration for the identification of biomarkers for bull fertility. PLoS One 2024; 19:e0298623. [PMID: 38394258 PMCID: PMC10890740 DOI: 10.1371/journal.pone.0298623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/26/2024] [Indexed: 02/25/2024] Open
Abstract
Bull fertility is an important economic trait, and the use of subfertile semen for artificial insemination decreases the global efficiency of the breeding sector. Although the analysis of semen functional parameters can help to identify infertile bulls, no tools are currently available to enable precise predictions and prevent the commercialization of subfertile semen. Because male fertility is a multifactorial phenotype that is dependent on genetic, epigenetic, physiological and environmental factors, we hypothesized that an integrative analysis might help to refine our knowledge and understanding of bull fertility. We combined -omics data (genotypes, sperm DNA methylation at CpGs and sperm small non-coding RNAs) and semen parameters measured on a large cohort of 98 Montbéliarde bulls with contrasting fertility levels. Multiple Factor Analysis was conducted to study the links between the datasets and fertility. Four methodologies were then considered to identify the features linked to bull fertility variation: Logistic Lasso, Random Forest, Gradient Boosting and Neural Networks. Finally, the features selected by these methods were annotated in terms of genes, to conduct functional enrichment analyses. The less relevant features in -omics data were filtered out, and MFA was run on the remaining 12,006 features, including the 11 semen parameters and a balanced proportion of each type of-omics data. The results showed that unlike the semen parameters studied the-omics datasets were related to fertility. Biomarkers related to bull fertility were selected using the four methodologies mentioned above. The most contributory CpGs, SNPs and miRNAs targeted genes were all found to be involved in development. Interestingly, fragments derived from ribosomal RNAs were overrepresented among the selected features, suggesting roles in male fertility. These markers could be used in the future to identify subfertile bulls in order to increase the global efficiency of the breeding sector.
Collapse
Affiliation(s)
- Valentin Costes
- Université Paris-Saclay, UVSQ, INRAE, BREED, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d’Alfort, BREED, Maisons-Alfort, France
- R&D Department, ELIANCE, 149 rue de Bercy, Paris, France
- Université Paris-Saclay, AgroParisTech, INRAE, GABI, Jouy-en-Josas, France
| | - Eli Sellem
- Université Paris-Saclay, UVSQ, INRAE, BREED, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d’Alfort, BREED, Maisons-Alfort, France
- R&D Department, ELIANCE, 149 rue de Bercy, Paris, France
| | - Sylvain Marthey
- Université Paris-Saclay, AgroParisTech, INRAE, GABI, Jouy-en-Josas, France
- INRAE, MaIAGE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Chris Hoze
- R&D Department, ELIANCE, 149 rue de Bercy, Paris, France
- Université Paris-Saclay, AgroParisTech, INRAE, GABI, Jouy-en-Josas, France
| | - Aurélie Bonnet
- Université Paris-Saclay, UVSQ, INRAE, BREED, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d’Alfort, BREED, Maisons-Alfort, France
- R&D Department, ELIANCE, 149 rue de Bercy, Paris, France
| | | | - Hélène Kiefer
- Université Paris-Saclay, UVSQ, INRAE, BREED, Jouy-en-Josas, France
- Ecole Nationale Vétérinaire d’Alfort, BREED, Maisons-Alfort, France
| | - Florence Jaffrezic
- Université Paris-Saclay, AgroParisTech, INRAE, GABI, Jouy-en-Josas, France
| |
Collapse
|
9
|
Rashid MM, Hamano M, Iida M, Iwata M, Ko T, Nomura S, Komuro I, Yamanishi Y. Network-based identification of diagnosis-specific trans-omic biomarkers via integration of multiple omics data. Biosystems 2024; 236:105122. [PMID: 38199520 DOI: 10.1016/j.biosystems.2024.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 01/01/2024] [Accepted: 01/07/2024] [Indexed: 01/12/2024]
Abstract
The integration of multiple omics data promises to reveal new insights into the pathogenic mechanisms of complex human diseases, with the potential to identify avenues for the development of targeted therapies for disease subtypes. However, the extraction of diagnostic/disease-specific biomarkers from multiple omics data with biological pathway knowledge is a challenging issue in precision medicine. In this paper, we present a novel computational method to identify diagnosis-specific trans-omic biomarkers from multiple omics data. In the algorithm, we integrated multi-class sparse canonical correlation analysis (MSCCA) and molecular pathway analysis in order to derive discriminative molecular features that are correlated across different omics layers. We applied our proposed method to analyzing proteome and metabolome data of heart failure (HF), and extracted trans-omic biomarkers for HF subtypes; specifically, ischemic cardiomyopathy (ICM) and dilated cardiomyopathy (DCM). We were able to detect not only individual proteins that were previously reported from single-omics studies but also correlated protein-metabolite pairs characteristic of HF disease subtypes. For example, we identified hexokinase1(HK1)-d-fructose-6-phosphate as a paired trans-omic biomarker for DCM, which could significantly perturb amino-sugar metabolism. Our proposed method is expected to be useful for various applications in precision medicine.
Collapse
Affiliation(s)
- Md Mamunur Rashid
- Department of Bioscience and Bioinformatics, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan; Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), Singapore 138671, Singapore
| | - Momoko Hamano
- Department of Bioscience and Bioinformatics, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Midori Iida
- Department of Bioscience and Bioinformatics, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan; Department of Physics and Information Technology, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Michio Iwata
- Department of Bioscience and Bioinformatics, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Toshiyuki Ko
- Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Seitaro Nomura
- Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Issei Komuro
- Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan; International University of Health and Welafare, 4-1-26 Akasaka, Minato, Tokyo, 107-8402, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan; Graduate School of Informatics, Nagoya University, Chikusa, Nagoya 464-8601, Japan.
| |
Collapse
|
10
|
Chamoso-Sanchez D, Rabadán Pérez F, Argente J, Barbas C, Martos-Moreno GA, Rupérez FJ. Identifying subgroups of childhood obesity by using multiplatform metabotyping. Front Mol Biosci 2023; 10:1301996. [PMID: 38174068 PMCID: PMC10761426 DOI: 10.3389/fmolb.2023.1301996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Introduction: Obesity results from an interplay between genetic predisposition and environmental factors such as diet, physical activity, culture, and socioeconomic status. Personalized treatments for obesity would be optimal, thus necessitating the identification of individual characteristics to improve the effectiveness of therapies. For example, genetic impairment of the leptin-melanocortin pathway can result in rare cases of severe early-onset obesity. Metabolomics has the potential to distinguish between a healthy and obese status; however, differentiating subsets of individuals within the obesity spectrum remains challenging. Factor analysis can integrate patient features from diverse sources, allowing an accurate subclassification of individuals. Methods: This study presents a workflow to identify metabotypes, particularly when routine clinical studies fail in patient categorization. 110 children with obesity (BMI > +2 SDS) genotyped for nine genes involved in the leptin-melanocortin pathway (CPE, MC3R, MC4R, MRAP2, NCOA1, PCSK1, POMC, SH2B1, and SIM1) and two glutamate receptor genes (GRM7 and GRIK1) were studied; 55 harboring heterozygous rare sequence variants and 55 with no variants. Anthropometric and routine clinical laboratory data were collected, and serum samples processed for untargeted metabolomic analysis using GC-q-MS and CE-TOF-MS and reversed-phase U(H)PLC-QTOF-MS/MS in positive and negative ionization modes. Following signal processing and multialignment, multivariate and univariate statistical analyses were applied to evaluate the genetic trait association with metabolomics data and clinical and routine laboratory features. Results and Discussion: Neither the presence of a heterozygous rare sequence variant nor clinical/routine laboratory features determined subgroups in the metabolomics data. To identify metabolomic subtypes, we applied Factor Analysis, by constructing a composite matrix from the five analytical platforms. Six factors were discovered and three different metabotypes. Subtle but neat differences in the circulating lipids, as well as in insulin sensitivity could be established, which opens the possibility to personalize the treatment according to the patients categorization into such obesity subtypes. Metabotyping in clinical contexts poses challenges due to the influence of various uncontrolled variables on metabolic phenotypes. However, this strategy reveals the potential to identify subsets of patients with similar clinical diagnoses but different metabolic conditions. This approach underscores the broader applicability of Factor Analysis in metabotyping across diverse clinical scenarios.
Collapse
Affiliation(s)
- David Chamoso-Sanchez
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| | | | - Jesús Argente
- Department of Pediatrics and Pediatric Endocrinology, Hospital Infantil Universitario Niño Jesús, Instituto de Investigación Sanitaria La Princesa, Universidad Autónoma de Madrid, Madrid, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
- IMDEA Food Institute, Madrid, Spain
| | - Coral Barbas
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| | - Gabriel A. Martos-Moreno
- Department of Pediatrics and Pediatric Endocrinology, Hospital Infantil Universitario Niño Jesús, Instituto de Investigación Sanitaria La Princesa, Universidad Autónoma de Madrid, Madrid, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Francisco J. Rupérez
- Centro de Metabolómica y Bioanálisis (CEMBIO), Facultad de Farmacia, Universidad San Pablo-CEU, CEU Universities, Boadilla del Monte, Spain
| |
Collapse
|
11
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
12
|
Zhao Z, Jin T, Chen B, Dong Q, Liu M, Guo J, Song X, Li Y, Chen T, Han H, Liang H, Gu Y. Multi-omics integration analysis unveils heterogeneity in breast cancer at the individual level. Cell Cycle 2023; 22:2229-2244. [PMID: 37974462 PMCID: PMC10730166 DOI: 10.1080/15384101.2023.2281816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
Identifying robust breast cancer subtypes will help to reveal the cancer heterogeneity. However, previous breast cancer subtypes were based on population-level quantitative gene expression, which is affected by batch effects and cannot be applied to individuals. We detected differential gene expression, genomic, and epigenomic alterations to identify driver differential expression at the individual level. The individual driver differential expression reflected the breast cancer patients' heterogeneity and revealed four subtypes. Mesenchymal subtype as the most aggressive subtype harbored deletion and downregulated expression of genes in chromosome 11q23 region. Specifically, silencing of the SDHD gene in 11q23 promoted the invasion and migration of breast cancer cells in vitro by the epithelial-mesenchymal transition. The immunologically hot subtype displayed an immune-hot microenvironment, including high T-cell infiltration and upregulated PD-1 and CTLA4. Luminal and genomic-unstable subtypes showed opposite macrophage polarization, which may be regulated by the ligand-receptor pairs of CD99. The integration of multi-omics data at the individual level provides a powerful framework for elucidating the heterogeneity of breast cancer.
Collapse
Affiliation(s)
- Zhangxiang Zhao
- The Sino-Russian Medical Research Center of Jinan University, The Institute of Chronic Disease of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Tongzhu Jin
- Department of Pharmacology (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, China
| | - Bo Chen
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qi Dong
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingyue Liu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiayu Guo
- Department of Pharmacology (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, China
| | - Xiaoying Song
- Department of Pharmacology (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, China
| | - Yawei Li
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Tingting Chen
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Huiming Han
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haihai Liang
- The Sino-Russian Medical Research Center of Jinan University, The Institute of Chronic Disease of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
- Department of Pharmacology (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, China
| | - Yunyan Gu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
13
|
Batten DJ, Crofts JJ, Chuzhanova N. Towards In Silico Identification of Genes Contributing to Similarity of Patients' Multi-Omics Profiles: A Case Study of Acute Myeloid Leukemia. Genes (Basel) 2023; 14:1795. [PMID: 37761935 PMCID: PMC10531350 DOI: 10.3390/genes14091795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 09/09/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023] Open
Abstract
We propose a computational framework for selecting biologically plausible genes identified by clustering of multi-omics data that reveal patients' similarity, thus giving researchers a more comprehensive view on any given disease. We employ spectral clustering of a similarity network created by fusion of three similarity networks, based on mRNA expression of immune genes, miRNA expression and DNA methylation data, using SNF_v2.1 software. For each cluster, we rank multi-omics features, ensuring the best separation between clusters, and select the top-ranked features that preserve clustering. To find genes targeted by DNA methylation and miRNAs found in the top-ranked features, we use chromosome-conformation capture data and miRNet2.0 software, respectively. To identify informative genes, these combined sets of target genes are analyzed in terms of their enrichment in somatic/germline mutations, GO biological processes/pathways terms and known sets of genes considered to be important in relation to a given disease, as recorded in the Molecular Signature Database from GSEA. The protein-protein interaction (PPI) networks were analyzed to identify genes that are hubs of PPI networks. We used data recorded in The Cancer Genome Atlas for patients with acute myeloid leukemia to demonstrate our approach, and discuss our findings in the context of results in the literature.
Collapse
Affiliation(s)
| | | | - Nadia Chuzhanova
- School of Science and Technology, Nottingham Trent University, Clifton Lane, Nottingham NG11 8NS, UK; (D.J.B.); (J.J.C.)
| |
Collapse
|
14
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
15
|
Ouyang D, Liang Y, Li L, Ai N, Lu S, Yu M, Liu X, Xie S. Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification. Comput Biol Med 2023; 164:107303. [PMID: 37586201 DOI: 10.1016/j.compbiomed.2023.107303] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 07/08/2023] [Accepted: 07/28/2023] [Indexed: 08/18/2023]
Abstract
With the rapid development and accumulation of high-throughput sequencing technology and omics data, many studies have conducted a more comprehensive understanding of human diseases from a multi-omics perspective. Meanwhile, graph-based methods have been widely used to process multi-omics data due to its powerful expressive ability. However, most existing graph-based methods utilize fixed graphs to learn sample embedding representations, which often leads to sub-optimal results. Furthermore, treating embedding representations of different omics equally usually cannot obtain more reasonable integrated information. In addition, the complex correlation between omics is not fully taken into account. To this end, we propose an end-to-end interpretable multi-omics integration method, named MOGLAM, for disease classification prediction. Dynamic graph convolutional network with feature selection is first utilized to obtain higher quality omic-specific embedding information by adaptively learning the graph structure and discover important biomarkers. Then, multi-omics attention mechanism is applied to adaptively weight the embedding representations of different omics, thereby obtaining more reasonable integrated information. Finally, we propose omic-integrated representation learning to capture complex common and complementary information between omics while performing multi-omics integration. Experimental results on three datasets show that MOGLAM achieves superior performance than other state-of-the-art multi-omics integration methods. Moreover, MOGLAM can identify important biomarkers from different omics data types in an end-to-end manner.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, 518055, China; School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, 518055, China.
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Shanghui Lu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Mingkun Yu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, 519090, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, 510000, China
| |
Collapse
|
16
|
Chong D, Jones NC, Schittenhelm RB, Anderson A, Casillas-Espinosa PM. Multi-omics Integration and Epilepsy: Towards a Better Understanding of Biological Mechanisms. Prog Neurobiol 2023:102480. [PMID: 37286031 DOI: 10.1016/j.pneurobio.2023.102480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/09/2023]
Abstract
The epilepsies are a group of complex neurological disorders characterised by recurrent seizures. Approximately 30% of patients fail to respond to anti-seizure medications, despite the recent introduction of many new drugs. The molecular processes underlying epilepsy development are not well understood and this knowledge gap impedes efforts to identify effective targets and develop novel therapies against epilepsy. Omics studies allow a comprehensive characterisation of a class of molecules. Omics-based biomarkers have led to clinically validated diagnostic and prognostic tests for personalised oncology, and more recently for non-cancer diseases. We believe that, in epilepsy, the full potential of multi-omics research is yet to be realised and we envisage that this review will serve as a guide to researchers planning to undertake omics-based mechanistic studies.
Collapse
Affiliation(s)
- Debbie Chong
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia
| | - Nigel C Jones
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Ralf B Schittenhelm
- Monash Proteomics & Metabolomics Facility and Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria, 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| | - Pablo M Casillas-Espinosa
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia; Department of Medicine (The Royal Melbourne Hospital), The University of Melbourne, 3000, Victoria, Australia; Department of Neurology, Alfred Health, Melbourne, 3004, Victoria, Australia
| |
Collapse
|
17
|
Jiang MZ, Aguet F, Ardlie K, Chen J, Cornell E, Cruz D, Durda P, Gabriel SB, Gerszten RE, Guo X, Johnson CW, Kasela S, Lange LA, Lappalainen T, Liu Y, Reiner AP, Smith J, Sofer T, Taylor KD, Tracy RP, VanDenBerg DJ, Wilson JG, Rich SS, Rotter JI, Love MI, Raffield LM, Li Y. Canonical correlation analysis for multi-omics: Application to cross-cohort analysis. PLoS Genet 2023; 19:e1010517. [PMID: 37216410 PMCID: PMC10237647 DOI: 10.1371/journal.pgen.1010517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 06/02/2023] [Accepted: 05/01/2023] [Indexed: 05/24/2023] Open
Abstract
Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.
Collapse
Affiliation(s)
- Min-Zhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - François Aguet
- Illumina Artificial Intelligence Laboratory, Illumina, Inc., San Diego, California, United States of America
| | - Kristin Ardlie
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Elaine Cornell
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, Vermont, United States of America
| | - Dan Cruz
- Department of Medicine, Cardiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Peter Durda
- Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America
| | - Stacey B. Gabriel
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Robert E. Gerszten
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Craig W. Johnson
- Department of Biostatistics, University of Washington at Seattle, Seattle, Washington, United States of America
| | - Silva Kasela
- New York Genome Center, New York, New York, United States of America
| | - Leslie A. Lange
- Department of Epidemiology, Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, Lifecourse Epidemiology of Adiposity & Diabetes Center, Aurora, Colorado, United States of America
| | - Tuuli Lappalainen
- New York Genome Center, New York, New York, United States of America
| | - Yongmei Liu
- Department of Medicine, Cardiology and Neurology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Alex P. Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Josh Smith
- Northwest Genomic Center, University of Washington, Seattle, Washington, United States of America
| | - Tamar Sofer
- Department of Biostatistics, Harvard Medical School, Medicine-Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Kent D. Taylor
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Russell P. Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America
| | - David J. VanDenBerg
- Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America
| | - James G. Wilson
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, United States of America
| | - Jerome I. Rotter
- Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | |
Collapse
|
18
|
Wu Z, Lohmöller J, Kuhl C, Wehrle K, Jankowski J. Use of Computation Ecosystems to Analyze the Kidney-Heart Crosstalk. Circ Res 2023; 132:1084-1100. [PMID: 37053282 DOI: 10.1161/circresaha.123.321765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
The identification of mediators for physiologic processes, correlation of molecular processes, or even pathophysiological processes within a single organ such as the kidney or heart has been extensively studied to answer specific research questions using organ-centered approaches in the past 50 years. However, it has become evident that these approaches do not adequately complement each other and display a distorted single-disease progression, lacking holistic multilevel/multidimensional correlations. Holistic approaches have become increasingly significant in understanding and uncovering high dimensional interactions and molecular overlaps between different organ systems in the pathophysiology of multimorbid and systemic diseases like cardiorenal syndrome because of pathological heart-kidney crosstalk. Holistic approaches to unraveling multimorbid diseases are based on the integration, merging, and correlation of extensive, heterogeneous, and multidimensional data from different data sources, both -omics and nonomics databases. These approaches aimed at generating viable and translatable disease models using mathematical, statistical, and computational tools, thereby creating first computational ecosystems. As part of these computational ecosystems, systems medicine solutions focus on the analysis of -omics data in single-organ diseases. However, the data-scientific requirements to address the complexity of multimodality and multimorbidity reach far beyond what is currently available and require multiphased and cross-sectional approaches. These approaches break down complexity into small and comprehensible challenges. Such holistic computational ecosystems encompass data, methods, processes, and interdisciplinary knowledge to manage the complexity of multiorgan crosstalk. Therefore, this review summarizes the current knowledge of kidney-heart crosstalk, along with methods and opportunities that arise from the novel application of computational ecosystems providing a holistic analysis on the example of kidney-heart crosstalk.
Collapse
Affiliation(s)
- Zhuojun Wu
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Johannes Lohmöller
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Christiane Kuhl
- Department of Radiology (C.K.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Klaus Wehrle
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Medical Faculty, and Department of Computer Science, Communication and Distributed Systems (COMSYS) (J.L., K.W.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
| | - Joachim Jankowski
- Institute of Molecular Cardiovascular Research (Z.W., J.J.), Rheinisch-Westfälische Technische Hochschule Aachen University, Germany
- Department of Pathology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, The Netherlands (J.J.)
- Aachen-Maastricht Institute for Cardiorenal Disease (AMICARE), University Hospital Rheinisch-Westfälische Technische Hochschule Aachen, Germany (J.J.)
| |
Collapse
|
19
|
Price BA, Marron JS, Mose LE, Perou CM, Parker JS. Translating transcriptomic findings from cancer model systems to humans through joint dimension reduction. Commun Biol 2023; 6:179. [PMID: 36797360 PMCID: PMC9935626 DOI: 10.1038/s42003-023-04529-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 01/25/2023] [Indexed: 02/18/2023] Open
Abstract
Model systems are an essential resource in cancer research. They simulate effects that we can infer into humans, but come at a risk of inaccurately representing human biology. This inaccuracy can lead to inconclusive experiments or misleading results, urging the need for an improved process for translating model system findings into human-relevant data. We present a process for applying joint dimension reduction (jDR) to horizontally integrate gene expression data across model systems and human tumor cohorts. We then use this approach to combine human TCGA gene expression data with data from human cancer cell lines and mouse model tumors. By identifying the aspects of genomic variation joint-acting across cohorts, we demonstrate how predictive modeling and clinical biomarkers from model systems can be improved.
Collapse
Affiliation(s)
- Brandon A. Price
- grid.10698.360000000122483208Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA ,grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - J. S. Marron
- grid.10698.360000000122483208Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA ,grid.10698.360000000122483208Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Lisle E. Mose
- grid.10698.360000000122483208Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Charles M. Perou
- grid.10698.360000000122483208Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA ,grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Joel S. Parker
- grid.10698.360000000122483208Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA ,grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| |
Collapse
|
20
|
Harbig TA, Fratte J, Krone M, Nieselt K. OmicsTIDE: interactive exploration of trends in multi-omics data. BIOINFORMATICS ADVANCES 2023; 3:vbac093. [PMID: 36698763 PMCID: PMC9869718 DOI: 10.1093/bioadv/vbac093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/18/2022] [Accepted: 12/06/2022] [Indexed: 01/22/2023]
Abstract
Motivation The increasing amount of data produced by omics technologies has enabled researchers to study phenomena across multiple omics layers. Besides data-driven analysis strategies, interactive visualization tools have been developed for a more transparent analysis. However, most state-of-the-art tools do not reconstruct the impact of a single omics layer on the integration result. Results We developed a data classification scheme focusing on different aspects of multi-omics datasets for a systemic understanding. Based on this classification, we developed the Omics Trend-comparing Interactive Data Explorer (OmicsTIDE), an interactive visualization tool for the comparison of gene-based quantitative omics data. The tool consists of a computational part that clusters omics datasets to determine trends and an interactive visualization. The trends are visualized as profile plots and are connected by a Sankey diagram that allows for an interactive pairwise trend comparison to discover concordant and discordant trends. Moreover, large-scale omics datasets are broken down into small subsets that can be analyzed functionally using Gene Ontology enrichment within few analysis steps. We demonstrate the interactive analysis using OmicsTIDE with two case studies focusing on different experimental designs. Availability and implementation OmicsTIDE is a web tool available via http://omicstide-tuevis.cs.uni-tuebingen.de/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Theresa A Harbig
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen 72076, Germany
| | - Julian Fratte
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen 72076, Germany
| | - Michael Krone
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen 72076, Germany
| | - Kay Nieselt
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen 72076, Germany
| |
Collapse
|
21
|
Niranjan V, Uttarkar A, Kaul A, Varghese M. A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways. Methods Mol Biol 2023; 2553:441-452. [PMID: 36227554 DOI: 10.1007/978-1-0716-2617-7_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The integrative method approaches are continuously evolving to provide accurate insights from the data that is received through experimentation on various biological systems. Multi-omics data can be integrated with predictive machine learning algorithms in order to provide results with high accuracy. This protocol chapter defines the steps required for the ML-multi-omics integration methods that are applied on biological datasets for its analysis and the visual interpretation of the results thus obtained.
Collapse
Affiliation(s)
- Vidya Niranjan
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India.
| | - Akshay Uttarkar
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Aakaanksha Kaul
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Maryanne Varghese
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| |
Collapse
|
22
|
Jihad M, Yet İ. Multiomics Integration at Single-Cell Resolution Using Bayesian Networks: A Case Study in Hepatocellular Carcinoma. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:24-33. [PMID: 36602810 DOI: 10.1089/omi.2022.0170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Multiomics data integration is one of the leading frontiers of complex disease research and integrative biology. The advances in single-cell sequencing technologies offer yet another crucial dimension in multiomics research. The single-cell studies enable the study and integration of multiomics data simultaneously in the same cell. We report in this study multiomics data integration in single-cell resolution using Bayesian networks (BNs) in a case study of hepatocellular carcinoma (HCC). A BN encodes the conditional dependencies/independencies of variables using a graphical model with an accompanying joint probability. RNA-seq and Reduced Representation Bisulfite Sequencing data were analyzed separately, and copy number variations were estimated by the hidden Markov model method. Several BN models were constructed to reveal omics' causal and associational relationships. These methods were subjected to a validation study using an independent data set. We show the heterogeneity of the multiple cellular layers of HCC at single-cell omics resolution by identifying best-fitted BN models of 295 genes. We also provide novel insights into the multiomics mechanistic relationships in the human lymphocyte antigen class I genes in HCC. To the best of our knowledge, this is the first study to focus on integrating omics data using a machine learning algorithm, BNs, at the single-cell resolution using a case study of HCC.
Collapse
Affiliation(s)
- Muntadher Jihad
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| | - İdil Yet
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey
| |
Collapse
|
23
|
Hao X, Cheng S, Jiang B, Xin S. Applying multi-omics techniques to the discovery of biomarkers for acute aortic dissection. Front Cardiovasc Med 2022; 9:961991. [PMID: 36588568 PMCID: PMC9797526 DOI: 10.3389/fcvm.2022.961991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Acute aortic dissection (AAD) is a cardiovascular disease that manifests suddenly and fatally. Due to the lack of specific early symptoms, many patients with AAD are often overlooked or misdiagnosed, which is undoubtedly catastrophic for patients. The particular pathogenic mechanism of AAD is yet unknown, which makes clinical pharmacological therapy extremely difficult. Therefore, it is necessary and crucial to find and employ unique biomarkers for Acute aortic dissection (AAD) as soon as possible in clinical practice and research. This will aid in the early detection of AAD and give clear guidelines for the creation of focused treatment agents. This goal has been made attainable over the past 20 years by the quick advancement of omics technologies and the development of high-throughput tissue specimen biomarker screening. The primary histology data support and add to one another to create a more thorough and three-dimensional picture of the disease. Based on the introduction of the main histology technologies, in this review, we summarize the current situation and most recent developments in the application of multi-omics technologies to AAD biomarker discovery and emphasize the significance of concentrating on integration concepts for integrating multi-omics data. In this context, we seek to offer fresh concepts and recommendations for fundamental investigation, perspective innovation, and therapeutic development in AAD.
Collapse
Affiliation(s)
- Xinyu Hao
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Shuai Cheng
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Bo Jiang
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China
| | - Shijie Xin
- Department of Vascular Surgery, The First Affiliated Hospital of China Medical University, China Medical University, Shenyang, China,Key Laboratory of Pathogenesis, Prevention and Therapeutics of Aortic Aneurysm, Shenyang, Liaoning, China,*Correspondence: Shijie Xin,
| |
Collapse
|
24
|
Zhang R, Zhang C, Yu C, Dong J, Hu J. Integration of multi-omics technologies for crop improvement: Status and prospects. FRONTIERS IN BIOINFORMATICS 2022; 2:1027457. [PMID: 36438626 PMCID: PMC9689701 DOI: 10.3389/fbinf.2022.1027457] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 09/28/2022] [Indexed: 08/03/2023] Open
Abstract
With the rapid development of next-generation sequencing (NGS), multi-omics techniques have been emerging as effective approaches for crop improvement. Here, we focus mainly on addressing the current status and future perspectives toward omics-related technologies and bioinformatic resources with potential applications in crop breeding. Using a large amount of omics-level data from the functional genome, transcriptome, proteome, epigenome, metabolome, and microbiome, clarifying the interaction between gene and phenotype formation will become possible. The integration of multi-omics datasets with pan-omics platforms and systems biology could predict the complex traits of crops and elucidate the regulatory networks for genetic improvement. Different scales of trait predictions and decision-making models will facilitate crop breeding more intelligent. Potential challenges that integrate the multi-omics data with studies of gene function and their network to efficiently select desirable agronomic traits are discussed by proposing some cutting-edge breeding strategies for crop improvement. Multi-omics-integrated approaches together with other artificial intelligence techniques will contribute to broadening and deepening our knowledge of crop precision breeding, resulting in speeding up the breeding process.
Collapse
|
25
|
Suter P, Dazert E, Kuipers J, Ng CKY, Boldanova T, Hall MN, Heim MH, Beerenwinkel N. Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model. PLoS Comput Biol 2022; 18:e1009767. [PMID: 36067230 PMCID: PMC9481159 DOI: 10.1371/journal.pcbi.1009767] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 09/16/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open
Abstract
Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Eva Dazert
- Biozentrum, University of Basel, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Charlotte K. Y. Ng
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Institute of Medical Genetics and Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Tuyana Boldanova
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | | | - Markus H. Heim
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Department of Gastroenterology and Hepatology, Clarunis, University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
26
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People’s Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People’s Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| |
Collapse
|
27
|
Li L, Wei Y, Shi G, Yang H, Li Z, Fang R, Cao H, Cui Y. Multi-omics data integration for subtype identification of Chinese lower-grade gliomas: a joint similarity network fusion approach. Comput Struct Biotechnol J 2022; 20:3482-3492. [PMID: 35860412 PMCID: PMC9284445 DOI: 10.1016/j.csbj.2022.06.065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/30/2022] [Accepted: 06/30/2022] [Indexed: 12/28/2022] Open
Abstract
Lower-grade gliomas (LGG), characterized by heterogeneity and invasiveness, originate from the central nervous system. Although studies focusing on molecular subtyping and molecular characteristics have provided novel insights into improving the diagnosis and therapy of LGG, there is an urgent need to identify new molecular subtypes and biomarkers that are promising to improve patient survival outcomes. Here, we proposed a joint similarity network fusion (Joint-SNF) method to integrate different omics data types to construct a fused network using the Joint and Individual Variation Explained (JIVE) technique under the SNF framework. Focusing on the joint network structure, a spectral clustering method was employed to obtain subtypes of patients. Simulation studies show that the proposed Joint-SNF method outperforms the original SNF approach under various simulation scenarios. We further applied the method to a Chinese LGG data set including mRNA expression, DNA methylation and microRNA (miRNA). Three molecular subtypes were identified and showed statistically significant differences in patient survival outcomes. The five-year mortality rates of the three subtypes are 80.8%, 32.1%, and 34.4%, respectively. After adjusting for clinically relevant covariates, the death risk of patients in Cluster 1 was 5.06 times higher than patients in other clusters. The fused network attained by the proposed Joint-SNF method enhances strong similarities, thus greatly improves subtyping performance compared to the original SNF method. The findings in the real application may provide important clues for improving patient survival outcomes and for precision treatment for Chinese LGG patients. An R package to implement the method can be accessed in Github at https://github.com/Sameerer/Joint-SNF.
Collapse
Affiliation(s)
- Lingmei Li
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Yifang Wei
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Guojing Shi
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, Shijiazhuang, Hebei 050017, PR China
| | - Zhi Li
- Department of Hematology, Taiyuan Central Hospital of Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, PR China
- Shanxi Medical University-Yidu Cloud Institute of Medical Data Science, Taiyuan, Shanxi 030001, PR China
- Corresponding authors at: Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, PR China.
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
- Corresponding authors at: Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, PR China.
| |
Collapse
|
28
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
29
|
Mokhtari A, Porte B, Belzeaux R, Etain B, Ibrahim EC, Marie-Claire C, Lutz PE, Delahaye-Duriez A. The molecular pathophysiology of mood disorders: From the analysis of single molecular layers to multi-omic integration. Prog Neuropsychopharmacol Biol Psychiatry 2022; 116:110520. [PMID: 35104608 DOI: 10.1016/j.pnpbp.2022.110520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 01/22/2022] [Accepted: 01/22/2022] [Indexed: 12/14/2022]
Abstract
Next-generation sequencing now enables the rapid and affordable production of reliable biological data at multiple molecular levels, collectively referred to as "omics". To maximize the potential for discovery, computational biologists have created and adapted integrative multi-omic analytical methods. When applied to diseases with traceable pathophysiology such as cancer, these new algorithms and statistical approaches have enabled the discovery of clinically relevant molecular mechanisms and biomarkers. In contrast, these methods have been much less applied to the field of molecular psychiatry, although diagnostic and prognostic biomarkers are similarly needed. In the present review, we first briefly summarize main findings from two decades of studies that investigated single molecular processes in relation to mood disorders. Then, we conduct a systematic review of multi-omic strategies that have been proposed and used more recently. We also list databases and types of data available to researchers for future work. Finally, we present the newest methodologies that have been employed for multi-omics integration in other medical fields, and discuss their potential for molecular psychiatry studies.
Collapse
Affiliation(s)
- Amazigh Mokhtari
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France
| | - Baptiste Porte
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France
| | - Raoul Belzeaux
- Aix Marseille Université CNRS, Institut de Neurosciences de la Timone, F-13005 Marseille, France; Fondation FondaMental, F-94000 Créteil, France; Assistance Publique Hôpitaux de Marseille, Pôle de psychiatrie, pédopsychiatrie et addictologie, F-13005 Marseille, France
| | - Bruno Etain
- Assistance Publique des Hôpitaux de Paris, GHU Lariboisière-Saint Louis-Fernand Widal, DMU Neurosciences, Département de psychiatrie et de Médecine Addictologique, F-75010 Paris, France; Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, F-75006 Paris, France
| | - El Cherif Ibrahim
- Aix Marseille Université CNRS, Institut de Neurosciences de la Timone, F-13005 Marseille, France
| | - Cynthia Marie-Claire
- Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, F-75006 Paris, France
| | - Pierre-Eric Lutz
- Centre National de la Recherche Scientifique, Université de Strasbourg, Fédération de Médecine Translationnelle de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR3212, F-67000 Strasbourg, France; Douglas Mental Health University Institute, McGill University, QC H4H 1R3 Montréal, Canada.
| | - Andrée Delahaye-Duriez
- NeuroDiderot, Inserm U1141, Université de Paris, F-75019 Paris, France; Assistance Publique des Hôpitaux de Paris, Unité de médecine génomique, Département BioPhaReS, Hôpital Jean Verdier, Hôpitaux Universitaires de Paris Seine Saint Denis, F-93140 Bondy, France; Université Sorbonne Paris Nord, F-93000 Bobigny, France.
| |
Collapse
|
30
|
From single-omics to interactomics: How can ligand-induced perturbations modulate single-cell phenotypes? ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:45-83. [PMID: 35871896 DOI: 10.1016/bs.apcsb.2022.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Cells suffer from perturbations by different stimuli, which, consequently, rise to individual alterations in their profile and function that may end up affecting the tissue as a whole. This is no different if we consider the effect of a therapeutic agent on a biological system. As cells are exposed to external ligands their profile can change at different single-omics levels. Detecting how these changes take place through different sequencing technologies is key to a better understanding of the effects of therapeutic agents. Single-cell RNA-sequencing stands out as one of the most common approaches for cell profiling and perturbation analysis. As a result, single-cell transcriptomics data can be integrated with other omics data sources, such as proteomics and epigenomics data, to clarify the perturbation effects and mechanism at the cell level. Appropriate computational tools are key to process and integrate the available information. This chapter focuses on the recent advances on ligand-induced perturbation and single-cell omics computational tools and algorithms, their current limitations, and how the deluge of data can be used to improve the current process of drug research and development.
Collapse
|
31
|
Gonzalez-Reymundez A, Grueneberg A, Lu G, Alves FC, Rincon G, Vazquez AI. MOSS: multi-omic integration with sparse value decomposition. Bioinformatics 2022; 38:2956-2958. [PMID: 35561193 PMCID: PMC9113319 DOI: 10.1093/bioinformatics/btac179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 03/07/2022] [Accepted: 03/23/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY This article presents multi-omic integration with sparse value decomposition (MOSS), a free and open-source R package for integration and feature selection in multiple large omics datasets. This package is computationally efficient and offers biological insight through capabilities, such as cluster analysis and identification of informative omic features. AVAILABILITY AND IMPLEMENTATION https://CRAN.R-project.org/package=MOSS. SUPPLEMENTARY INFORMATION Supplementary information can be found at https://github.com/agugonrey/GonzalezReymundez2021.
Collapse
Affiliation(s)
| | - Alexander Grueneberg
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Guanqi Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Gonzalo Rincon
- Genus PLC Inc., Genome Sciences R&D, De Forest, WI 53532, USA
| | - Ana I Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
32
|
Zhang X, Zhou Z, Xu H, Liu CT. Integrative clustering methods for multi-omics data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14. [PMID: 35573155 PMCID: PMC9097984 DOI: 10.1002/wics.1553] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Integrative analysis of multi-omics data has drawn much attention from the scientific community due to the technological advancements which have generated various omics data. Leveraging these multi-omics data potentially provides a more comprehensive view of the disease mechanism or biological processes. Integrative multi-omics clustering is an unsupervised integrative method specifically used to find coherent groups of samples or features by utilizing information across multi-omics data. It aims to better stratify diseases and to suggest biological mechanisms and potential targeted therapies for the diseases. However, applying integrative multi-omics clustering is both statistically and computationally challenging due to various reasons such as high dimensionality and heterogeneity. In this review, we summarized integrative multi-omics clustering methods into three general categories: concatenated clustering, clustering of clusters, and interactive clustering based on when and how the multi-omics data are processed for clustering. We further classified the methods into different approaches under each category based on the main statistical strategy used during clustering. In addition, we have provided recommended practices tailored to four real-life scenarios to help researchers to strategize their selection in integrative multi-omics clustering methods for their future studies.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Zhenwei Zhou
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Hanfei Xu
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
33
|
Pierre-Jean M, Mauger F, Deleuze JF, Le Floch E. PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data. Bioinformatics 2021; 38:900-907. [PMID: 34849583 PMCID: PMC8796362 DOI: 10.1093/bioinformatics/btab786] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 09/30/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION It is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks. RESULTS Here, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data. AVAILABILITY AND IMPLEMENTATION An R package is available at https://github.com/mpierrejean/pintmf. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Florence Mauger
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Edith Le Floch
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| |
Collapse
|
34
|
Miao Z, Humphreys BD, McMahon AP, Kim J. Multi-omics integration in the age of million single-cell data. Nat Rev Nephrol 2021; 17:710-724. [PMID: 34417589 PMCID: PMC9191639 DOI: 10.1038/s41581-021-00463-x] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/25/2021] [Indexed: 02/06/2023]
Abstract
An explosion in single-cell technologies has revealed a previously underappreciated heterogeneity of cell types and novel cell-state associations with sex, disease, development and other processes. Starting with transcriptome analyses, single-cell techniques have extended to multi-omics approaches and now enable the simultaneous measurement of data modalities and spatial cellular context. Data are now available for millions of cells, for whole-genome measurements and for multiple modalities. Although analyses of such multimodal datasets have the potential to provide new insights into biological processes that cannot be inferred with a single mode of assay, the integration of very large, complex, multimodal data into biological models and mechanisms represents a considerable challenge. An understanding of the principles of data integration and visualization methods is required to determine what methods are best applied to a particular single-cell dataset. Each class of method has advantages and pitfalls in terms of its ability to achieve various biological goals, including cell-type classification, regulatory network modelling and biological process inference. In choosing a data integration strategy, consideration must be given to whether the multi-omics data are matched (that is, measured on the same cell) or unmatched (that is, measured on different cells) and, more importantly, the overall modelling and visualization goals of the integrated analysis.
Collapse
Affiliation(s)
- Zhen Miao
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Benjamin D Humphreys
- Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Andrew P McMahon
- Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Junhyong Kim
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
35
|
Moingeon P, Kuenemann M, Guedj M. Artificial intelligence-enhanced drug design and development: Toward a computational precision medicine. Drug Discov Today 2021; 27:215-222. [PMID: 34555509 DOI: 10.1016/j.drudis.2021.09.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/13/2021] [Accepted: 09/14/2021] [Indexed: 12/29/2022]
Abstract
Artificial Intelligence (AI) relies upon a convergence of technologies with further synergies with life science technologies to capture the value of massive multi-modal data in the form of predictive models supporting decision-making. AI and machine learning (ML) enhance drug design and development by improving our understanding of disease heterogeneity, identifying dysregulated molecular pathways and therapeutic targets, designing and optimizing drug candidates, as well as evaluating in silico clinical efficacy. By providing an unprecedented level of knowledge on both patient specificities and drug candidate properties, AI is fostering the emergence of a computational precision medicine allowing the design of therapies or preventive measures tailored to the singularities of individual patients in terms of their physiology, disease features, and exposure to environmental risks.
Collapse
Affiliation(s)
- Philippe Moingeon
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France.
| | - Mélaine Kuenemann
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| | - Mickaël Guedj
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| |
Collapse
|
36
|
Dong X, Liu C, Dozmorov M. Review of multi-omics data resources and integrative analysis for human brain disorders. Brief Funct Genomics 2021; 20:223-234. [PMID: 33969380 PMCID: PMC8287916 DOI: 10.1093/bfgp/elab024] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/05/2021] [Accepted: 04/12/2021] [Indexed: 12/20/2022] Open
Abstract
In the last decade, massive omics datasets have been generated for human brain research. It is evolving so fast that a timely update is urgently needed. In this review, we summarize the main multi-omics data resources for the human brains of both healthy controls and neuropsychiatric disorders, including schizophrenia, autism, bipolar disorder, Alzheimer's disease, Parkinson's disease, progressive supranuclear palsy, etc. We also review the recent development of single-cell omics in brain research, such as single-nucleus RNA-seq, single-cell ATAC-seq and spatial transcriptomics. We further investigate the integrative multi-omics analysis methods for both tissue and single-cell data. Finally, we discuss the limitations and future directions of the multi-omics study of human brain disorders.
Collapse
Affiliation(s)
- Xianjun Dong
- Harvard Medical School, head of the Genomics and Bioinformatics Hub at Brigham and Women’s Hospital
| | | | | |
Collapse
|
37
|
Fiorentino G, Visintainer R, Domenici E, Lauria M, Marchetti L. MOUSSE: Multi-Omics Using Subject-Specific SignaturEs. Cancers (Basel) 2021; 13:cancers13143423. [PMID: 34298641 PMCID: PMC8304726 DOI: 10.3390/cancers13143423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/29/2021] [Accepted: 06/30/2021] [Indexed: 01/06/2023] Open
Abstract
Simple Summary Modern profiling technologies have led to relevant progress toward precision medicine and disease management. A new trend in patient classification is to integrate multiple data types for the same subjects to increase the chance of identifying meaningful phenotype groups. However, these methodologies are still in their infancy, with their performance varying widely depending on the biological conditions analyzed. We developed MOUSSE, a new unsupervised and normalization-free tool for multi-omics integration able to maintain good clustering performance across a wide range of omics data. We verified its efficiency in clustering patients based on survival for ten different cancer types. The results we obtained show a higher average score in classification performance than ten other state-of-the-art algorithms. We have further validated the method by identifying a list of biological features potentially involved in patient survival, finding a high degree of concordance with the literature. Abstract High-throughput technologies make it possible to produce a large amount of data representing different biological layers, examples of which are genomics, proteomics, metabolomics and transcriptomics. Omics data have been individually investigated to understand the molecular bases of various diseases, but this may not be sufficient to fully capture the molecular mechanisms and the multilayer regulatory processes underlying complex diseases, especially cancer. To overcome this problem, several multi-omics integration methods have been introduced but a commonly agreed standard of analysis is still lacking. In this paper, we present MOUSSE, a novel normalization-free pipeline for unsupervised multi-omics integration. The main innovations are the use of rank-based subject-specific signatures and the use of such signatures to derive subject similarity networks. A separate similarity network was derived for each omics, and the resulting networks were then carefully merged in a way that considered their informative content. We applied it to analyze survival in ten different types of cancer. We produced a meaningful clusterization of the subjects and obtained a higher average classification score than ten state-of-the-art algorithms tested on the same data. As further validation, we extracted from the subject-specific signatures a list of relevant features used for the clusterization and investigated their biological role in survival. We were able to verify that, according to the literature, these features are highly involved in cancer progression and differential survival.
Collapse
Affiliation(s)
- Giuseppe Fiorentino
- Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy; (G.F.); (R.V.); (E.D.); (M.L.)
- Department of Cellular, Computational, and Integrative Biology (CiBio), University of Trento, 38123 Povo, Italy
| | - Roberto Visintainer
- Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy; (G.F.); (R.V.); (E.D.); (M.L.)
| | - Enrico Domenici
- Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy; (G.F.); (R.V.); (E.D.); (M.L.)
- Department of Cellular, Computational, and Integrative Biology (CiBio), University of Trento, 38123 Povo, Italy
| | - Mario Lauria
- Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy; (G.F.); (R.V.); (E.D.); (M.L.)
- Department of Mathematics, University of Trento, 38123 Povo, Italy
| | - Luca Marchetti
- Fondazione The Microsoft Research, University of Trento Centre for Computational and Systems Biology (COSBI), 38068 Rovereto, Italy; (G.F.); (R.V.); (E.D.); (M.L.)
- Correspondence:
| |
Collapse
|
38
|
Brière G, Darbo É, Thébault P, Uricaru R. Consensus clustering applied to multi-omics disease subtyping. BMC Bioinformatics 2021; 22:361. [PMID: 34229612 PMCID: PMC8259015 DOI: 10.1186/s12859-021-04279-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 06/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Facing the diversity of omics data and the difficulty of selecting one result over all those produced by several methods, consensus strategies have the potential to reconcile multiple inputs and to produce robust results. RESULTS Here, we introduce ClustOmics, a generic consensus clustering tool that we use in the context of cancer subtyping. ClustOmics relies on a non-relational graph database, which allows for the simultaneous integration of both multiple omics data and results from various clustering methods. This new tool conciliates input clusterings, regardless of their origin, their number, their size or their shape. ClustOmics implements an intuitive and flexible strategy, based upon the idea of evidence accumulation clustering. ClustOmics computes co-occurrences of pairs of samples in input clusters and uses this score as a similarity measure to reorganize data into consensus clusters. CONCLUSION We applied ClustOmics to multi-omics disease subtyping on real TCGA cancer data from ten different cancer types. We showed that ClustOmics is robust to heterogeneous qualities of input partitions, smoothing and reconciling preliminary predictions into high-quality consensus clusters, both from a computational and a biological point of view. The comparison to a state-of-the-art consensus-based integration tool, COCA, further corroborated this statement. However, the main interest of ClustOmics is not to compete with other tools, but rather to make profit from their various predictions when no gold-standard metric is available to assess their significance. AVAILABILITY The ClustOmics source code, released under MIT license, and the results obtained on TCGA cancer data are available on GitHub: https://github.com/galadrielbriere/ClustOmics .
Collapse
Affiliation(s)
- Galadriel Brière
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France. .,INRA, Bordeaux INP, NutriNeuro, UMR 1286, Univ. Bordeaux, 33000, Bordeaux, France.
| | - Élodie Darbo
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France.,INSERM U1218, Institut Bergonié, Univ. Bordeaux, 33076, Bordeaux, France
| | - Patricia Thébault
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France
| | - Raluca Uricaru
- CNRS, Bordeaux INP, LaBRI, UMR 5800, Univ. Bordeaux, 33400, Talence, France
| |
Collapse
|
39
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 265] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
40
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
41
|
Zimmer A, Korem Y, Rappaport N, Wilmanski T, Baloni P, Jade K, Robinson M, Magis AT, Lovejoy J, Gibbons SM, Hood L, Price ND. The geometry of clinical labs and wellness states from deeply phenotyped humans. Nat Commun 2021; 12:3578. [PMID: 34117230 PMCID: PMC8196202 DOI: 10.1038/s41467-021-23849-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
Longitudinal multi-omics measurements are highly valuable in studying heterogeneity in health and disease phenotypes. For thousands of people, we have collected longitudinal multi-omics data. To analyze, interpret and visualize this extremely high-dimensional data, we use the Pareto Task Inference (ParTI) method. We find that the clinical labs data fall within a tetrahedron. We then use all other data types to characterize the four archetypes. We find that the tetrahedron comprises three wellness states, defining a wellness triangular plane, and one aberrant health state that captures aspects of commonality in movement away from wellness. We reveal the tradeoffs that shape the data and their hierarchy, and use longitudinal data to observe individual trajectories. We then demonstrate how the movement on the tetrahedron can be used for detecting unexpected trajectories, which might indicate transitions from health to disease and reveal abnormal conditions, even when all individual blood measurements are in the norm.
Collapse
Affiliation(s)
- Anat Zimmer
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Yael Korem
- grid.13992.300000 0004 0604 7563Weizmann Institute, Rehovot, Israel
| | - Noa Rappaport
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Tomasz Wilmanski
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Priyanka Baloni
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Kathleen Jade
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Max Robinson
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Andrew T. Magis
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Jennifer Lovejoy
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Sean M. Gibbons
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| | - Leroy Hood
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA ,Providence St Joseph Health, Seattle, WA USA
| | - Nathan D. Price
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, WA USA
| |
Collapse
|
42
|
Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun 2021; 12:3445. [PMID: 34103512 PMCID: PMC8187432 DOI: 10.1038/s41467-021-23774-w] [Citation(s) in RCA: 125] [Impact Index Per Article: 41.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open
Abstract
To fully utilize the advances in omics technologies and achieve a more comprehensive understanding of human diseases, novel computational methods are required for integrative analysis of multiple types of omics data. Here, we present a novel multi-omics integrative method named Multi-Omics Graph cOnvolutional NETworks (MOGONET) for biomedical classification. MOGONET jointly explores omics-specific learning and cross-omics correlation learning for effective multi-omics data classification. We demonstrate that MOGONET outperforms other state-of-the-art supervised multi-omics integrative analysis approaches from different biomedical classification applications using mRNA expression data, DNA methylation data, and microRNA expression data. Furthermore, MOGONET can identify important biomarkers from different omics data types related to the investigated biomedical problems.
Collapse
Affiliation(s)
- Tongxin Wang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, USA
| | - Wei Shao
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zhi Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - Haixu Tang
- Department of Computer Science, Indiana University Bloomington, Bloomington, IN, USA
| | - Jie Zhang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zhengming Ding
- Department of Computer Science, Tulane University, New Orleans, LA, USA.
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
- Regenstrief Institute, Indianapolis, IN, USA.
| |
Collapse
|
43
|
Odenkirk MT, Reif DM, Baker ES. Multiomic Big Data Analysis Challenges: Increasing Confidence in the Interpretation of Artificial Intelligence Assessments. Anal Chem 2021; 93:7763-7773. [PMID: 34029068 PMCID: PMC8465926 DOI: 10.1021/acs.analchem.0c04850] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The need for holistic molecular measurements to better understand disease initiation, development, diagnosis, and therapy has led to an increasing number of multiomic analyses. The wealth of information available from multiomic assessments, however, requires both the evaluation and interpretation of extremely large data sets, limiting analysis throughput and ease of adoption. Computational methods utilizing artificial intelligence (AI) provide the most promising way to address these challenges, yet despite the conceptual benefits of AI and its successful application in singular omic studies, the widespread use of AI in multiomic studies remains limited. Here, we discuss present and future capabilities of AI techniques in multiomic studies while introducing analytical checks and balances to validate the computational conclusions.
Collapse
Affiliation(s)
- Melanie T Odenkirk
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27606, United States
| | - David M Reif
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27606, United States
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27606, United States
| | - Erin S Baker
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27606, United States
| |
Collapse
|
44
|
Wu M, Yi H, Ma S. Vertical integration methods for gene expression data analysis. Brief Bioinform 2021; 22:bbaa169. [PMID: 32793970 PMCID: PMC8138889 DOI: 10.1093/bib/bbaa169] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/18/2020] [Accepted: 07/04/2020] [Indexed: 12/12/2022] Open
Abstract
Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a 'lack of information' problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics
| | - Huangdi Yi
- Department of Biostatistics at Yale University
| | - Shuangge Ma
- Department of Biostatistics at Yale University
| |
Collapse
|
45
|
Coates JTT, Pirovano G, El Naqa I. Radiomic and radiogenomic modeling for radiotherapy: strategies, pitfalls, and challenges. J Med Imaging (Bellingham) 2021; 8:031902. [PMID: 33768134 PMCID: PMC7985651 DOI: 10.1117/1.jmi.8.3.031902] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/12/2021] [Indexed: 12/14/2022] Open
Abstract
The power of predictive modeling for radiotherapy outcomes has historically been limited by an inability to adequately capture patient-specific variabilities; however, next-generation platforms together with imaging technologies and powerful bioinformatic tools have facilitated strategies and provided optimism. Integrating clinical, biological, imaging, and treatment-specific data for more accurate prediction of tumor control probabilities or risk of radiation-induced side effects are high-dimensional problems whose solutions could have widespread benefits to a diverse patient population-we discuss technical approaches toward this objective. Increasing interest in the above is specifically reflected by the emergence of two nascent fields, which are distinct but complementary: radiogenomics, which broadly seeks to integrate biological risk factors together with treatment and diagnostic information to generate individualized patient risk profiles, and radiomics, which further leverages large-scale imaging correlates and extracted features for the same purpose. We review classical analytical and data-driven approaches for outcomes prediction that serve as antecedents to both radiomic and radiogenomic strategies. Discussion then focuses on uses of conventional and deep machine learning in radiomics. We further consider promising strategies for the harmonization of high-dimensional, heterogeneous multiomics datasets (panomics) and techniques for nonparametric validation of best-fit models. Strategies to overcome common pitfalls that are unique to data-intensive radiomics are also discussed.
Collapse
Affiliation(s)
- James T. T. Coates
- Massachusetts General Hospital & Harvard Medical School, Center for Cancer Research, Boston, Massachusetts, United States
| | - Giacomo Pirovano
- Memorial Sloan Kettering Cancer Center, Department of Radiology, New York, New York, United States
| | - Issam El Naqa
- Moffitt Cancer Center and Research Institute, Department of Machine Learning, Tampa, Florida, United States
| |
Collapse
|
46
|
A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning. Biomolecules 2021; 11:biom11040565. [PMID: 33921457 PMCID: PMC8070530 DOI: 10.3390/biom11040565] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/02/2021] [Accepted: 04/07/2021] [Indexed: 02/06/2023] Open
Abstract
Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.
Collapse
|
47
|
Kaur H, Kumar R, Lathwal A, Raghava GPS. Computational resources for identification of cancer biomarkers from omics data. Brief Funct Genomics 2021; 20:213-222. [PMID: 33788922 DOI: 10.1093/bfgp/elab021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 02/11/2021] [Accepted: 03/08/2021] [Indexed: 12/18/2022] Open
Abstract
Cancer is one of the most prevailing, deadly and challenging diseases worldwide. The advancement in technology led to the generation of different types of omics data at each genome level that may potentially improve the current status of cancer patients. These data have tremendous applications in managing cancer effectively with improved outcome in patients. This review summarizes the various computational resources and tools housing several types of omics data related to cancer. Major categorization of resources includes-cancer-associated multiomics data repositories, visualization/analysis tools for omics data, machine learning-based diagnostic, prognostic, and predictive biomarker tools, and data analysis algorithms employing the multiomics data. The review primarily focuses on providing comprehensive information on the open-source multiomics tools and data repositories, owing to their broader applicability, economic-benefit and usability. Sections including the comparative analysis, tools applicability and possible future directions have also been discussed in detail. We hope that this information will significantly benefit the researchers and clinicians, especially those with no sound background in bioinformatics and who lack sufficient data analysis skills to interpret something from the plethora of cancer-specific data generated nowadays.
Collapse
|
48
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
49
|
Veenstra TD. Omics in Systems Biology: Current Progress and Future Outlook. Proteomics 2021; 21:e2000235. [PMID: 33320441 DOI: 10.1002/pmic.202000235] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/25/2020] [Indexed: 12/16/2022]
Abstract
Biological research has undergone tremendous changes over the past three decades. Research used to almost exclusively focus on a single aspect of a single molecule per experiment. Modern technologies have enabled thousands of molecules to be simultaneously analyzed and the way that these molecules influence each other to be discerned. The change is so dramatic that it has given rise to a whole new descriptive suffix (i.e., omics) to describe these fields of study. While genomics was arguably the initial driver of this new trend, it quickly spread to other biological entities resulting in the creation of transcriptomics, proteomics, metabolomics, etc. The development of these "big four omics" created a wave of other omic fields, such as epigenomics, glycomics, lipidomics, microbiomics, and even foodomics; all with the purpose of comprehensively studying all the molecular entities or processes within their respective domain. The large number of omic fields that are invented even led to the term "panomics" as a way to classify them all under one category. Ultimately, all of these omic fields are setting the foundation for developing systems biology; in which the focus will be on determining the complex interactions that occur within biological systems.
Collapse
|
50
|
Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 2021; 12:124. [PMID: 33402734 PMCID: PMC7785750 DOI: 10.1038/s41467-020-20430-7] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 12/02/2020] [Indexed: 01/08/2023] Open
Abstract
High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook—multi-omics mix (momix)—to foster reproducibility, and support users and future developers. Advances in omics technology have resulted in the generation of multi-view data for cancer samples. Here, the authors compare dimensionality reduction techniques using simulated and TCGA data and identify the features of the methods with superior performance.
Collapse
|