1
|
Itai Y, Rappoport N, Shamir R. Integration of gene expression and DNA methylation data across different experiments. Nucleic Acids Res 2023; 51:7762-7776. [PMID: 37395437 PMCID: PMC10450176 DOI: 10.1093/nar/gkad566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/04/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
Integrative analysis of multi-omic datasets has proven to be extremely valuable in cancer research and precision medicine. However, obtaining multimodal data from the same samples is often difficult. Integrating multiple datasets of different omics remains a challenge, with only a few available algorithms developed to solve it. Here, we present INTEND (IntegratioN of Transcriptomic and EpigeNomic Data), a novel algorithm for integrating gene expression and DNA methylation datasets covering disjoint sets of samples. To enable integration, INTEND learns a predictive model between the two omics by training on multi-omic data measured on the same set of samples. In comprehensive testing on 11 TCGA (The Cancer Genome Atlas) cancer datasets spanning 4329 patients, INTEND achieves significantly superior results compared with four state-of-the-art integration algorithms. We also demonstrate INTEND's ability to uncover connections between DNA methylation and the regulation of gene expression in the joint analysis of two lung adenocarcinoma single-omic datasets from different sources. INTEND's data-driven approach makes it a valuable multi-omic data integration tool. The code for INTEND is available at https://github.com/Shamir-Lab/INTEND.
Collapse
Affiliation(s)
- Yonatan Itai
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Lisa M. Bramer ✉
| |
Collapse
|
4
|
Ward B, Yombi JC, Balligand JL, Cani PD, Collet JF, de Greef J, Dewulf JP, Gatto L, Haufroid V, Jodogne S, Kabamba B, Pyr dit Ruys S, Vertommen D, Elens L, Belkhir L. HYGIEIA: HYpothesizing the Genesis of Infectious Diseases and Epidemics through an Integrated Systems Biology Approach. Viruses 2022; 14:v14071373. [PMID: 35891354 PMCID: PMC9318602 DOI: 10.3390/v14071373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 12/13/2022] Open
Abstract
More than two years on, the COVID-19 pandemic continues to wreak havoc around the world and has battle-tested the pandemic-situation responses of all major global governments. Two key areas of investigation that are still unclear are: the molecular mechanisms that lead to heterogenic patient outcomes, and the causes of Post COVID condition (AKA Long-COVID). In this paper, we introduce the HYGIEIA project, designed to respond to the enormous challenges of the COVID-19 pandemic through a multi-omic approach supported by network medicine. It is hoped that in addition to investigating COVID-19, the logistics deployed within this project will be applicable to other infectious agents, pandemic-type situations, and also other complex, non-infectious diseases. Here, we first look at previous research into COVID-19 in the context of the proteome, metabolome, transcriptome, microbiome, host genome, and viral genome. We then discuss a proposed methodology for a large-scale multi-omic longitudinal study to investigate the aforementioned biological strata through high-throughput sequencing (HTS) and mass-spectrometry (MS) technologies. Lastly, we discuss how a network medicine approach can be used to analyze the data and make meaningful discoveries, with the final aim being the translation of these discoveries into the clinics to improve patient care.
Collapse
Affiliation(s)
- Bradley Ward
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
| | - Jean Cyr Yombi
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Jean-Luc Balligand
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Pole of Pharmacology and Therapeutics (FATH), Institut de Recherche Experimentale et Clinique (IREC), Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Patrice D. Cani
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), Metabolism and Nutrition Research Group, Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Jean-François Collet
- WELBIO (Walloon Excellence in Life Sciences and Biotechnology), de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Julien de Greef
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Joseph P. Dewulf
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
- Department of Biochemistry, de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Vincent Haufroid
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Sébastien Jodogne
- Computer Science and Engineering Department (INGI), Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), UCLouvain, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium;
| | - Benoît Kabamba
- Department of Laboratory Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
- Pôle de Microbiologie, Institut de Recherche Expérimentale et Clinique, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium
| | - Sébastien Pyr dit Ruys
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
| | - Didier Vertommen
- De Duve Institute, and MASSPROT Platform, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
| | - Laure Elens
- Integrated Pharmacometrics, Pharmacogenomics and Pharmacokinetics Group (PMGK), Louvain Drug Research Institute (LDRI), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (B.W.); (S.P.d.R.)
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Correspondence: (L.E.); (L.B.)
| | - Leïla Belkhir
- Louvain Center for Toxicology and Applied Pharmacology (LTAP), Institut de Recherche Expérimentale et Clinique (IREC), UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium; (J.d.G.); (J.P.D.); (V.H.)
- Department of Internal Medicine, Cliniques Universitaires Saint-Luc, UCLouvain, Université Catholique de Louvain, 1200 Brussels, Belgium;
- Correspondence: (L.E.); (L.B.)
| |
Collapse
|
5
|
Heo YJ, Hwa C, Lee GH, Park JM, An JY. Integrative Multi-Omics Approaches in Cancer Research: From Biological Networks to Clinical Subtypes. Mol Cells 2021; 44:433-443. [PMID: 34238766 PMCID: PMC8334347 DOI: 10.14348/molcells.2021.0042] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/09/2021] [Accepted: 05/12/2021] [Indexed: 11/27/2022] Open
Abstract
Multi-omics approaches are novel frameworks that integrate multiple omics datasets generated from the same patients to better understand the molecular and clinical features of cancers. A wide range of emerging omics and multi-view clustering algorithms now provide unprecedented opportunities to further classify cancers into subtypes, improve the survival prediction and therapeutic outcome of these subtypes, and understand key pathophysiological processes through different molecular layers. In this review, we overview the concept and rationale of multi-omics approaches in cancer research. We also introduce recent advances in the development of multi-omics algorithms and integration methods for multiple-layered datasets from cancer patients. Finally, we summarize the latest findings from large-scale multi-omics studies of various cancers and their implications for patient subtyping and drug development.
Collapse
Affiliation(s)
- Yong Jin Heo
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul 02841, Korea
- Department of Integrated Biomedical and Life Science, Korea University, Seoul 02841, Korea
| | - Chanwoong Hwa
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul 02841, Korea
| | - Gang-Hee Lee
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul 02841, Korea
| | - Jae-Min Park
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul 02841, Korea
| | - Joon-Yong An
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul 02841, Korea
- Department of Integrated Biomedical and Life Science, Korea University, Seoul 02841, Korea
| |
Collapse
|
6
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 154] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|