1
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024:elae013. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
2
|
Wang P, Yu Y, Liu J, Li B, Zhang Y, Li D, Xu W, Liu Q, Wang Z. IMCC: A Novel Quantitative Approach Revealing Variation of Global Modular Map and Local Inter-Module Coordination Among Differential Drug's Targeted Cerebral Ischemic Networks. Front Pharmacol 2021; 12:637253. [PMID: 33935725 PMCID: PMC8087074 DOI: 10.3389/fphar.2021.637253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 02/23/2021] [Indexed: 02/01/2023] Open
Abstract
Stroke is a common disease characterized by multiple genetic dysfunctions. In this complex disease, detecting the strength of inter-module coordination (genetic community interaction) and subsequent modular rewiring is essential to characterize the reactive biosystematic variation (biosystematic perturbation) brought by multiple-target drugs, whose effects are achieved by hitting on a series of targets (target profile) jointly. Here, a quantitative approach for inter-module coordination and its transition, named as IMCC, was developed. Applying IMCC to mouse cerebral ischemia–related gene microarray, we investigated a holistic view of modular map and its rewiring from ischemic stroke to drugs (baicalin, BA; ursodeoxycholic acid, UA; and jasminoidin, JA) perturbation states and locally identified the cooperative pathological module pair and its dissection. Our result suggested the global modular map in cerebral ischemia exhibited a characteristic “core–periphery” architecture, and this architecture was rewired by the effective drugs heterogeneously: BA and UA converged modules into an intensively connected integrity, whereas JA diverged partial modules and widened the remaining inter-module paths. Locally, the PMP dissociation brought by drugs contributed to the reversion of the pathological condition: the focus of the cellular function shift from survival after nervous system injury into development and repair, including neurotrophin regulation, hormone releasing, and chemokine signaling activation. The core targets and mechanisms were validated by in vivo experiments. Overall, our result highlights the holistic inter-module coordination rearrangement rather than a target or a single module that brings phenotype alteration. This strategy may lead to systematically explore detailed variation of inter-module pharmacological action mode of multiple-target drugs, which is the principal problem of module pharmacology for network-based drug discovery.
Collapse
Affiliation(s)
- Pengqian Wang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yanan Yu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Jun Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Bing Li
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China.,Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yingying Zhang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Dongfeng Li
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Wenjuan Xu
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Qiong Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
3
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
4
|
Degree Adjusted Large-Scale Network Analysis Reveals Novel Putative Metabolic Disease Genes. BIOLOGY 2021; 10:biology10020107. [PMID: 33546175 PMCID: PMC7913176 DOI: 10.3390/biology10020107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 01/24/2021] [Accepted: 01/30/2021] [Indexed: 11/16/2022]
Abstract
Simple Summary To explore some of the low-degree but topologically important nodes in the Metabolic disease (MD) network, we propose a background-corrected betweenness centrality (BC) and identify 16 novel candidates likely to play a role in MD. MD specific protein–protein interaction networks (PPINs) were constructed using two known databasesHuman Protein Reference Database (HPRD) and BioGRID. The identified candidates have been found to play a role in diverse conditions including co-morbidities of MD, neurological and immune system-related conditions. Abstract A large percentage of the global population is currently afflicted by metabolic diseases (MD), and the incidence is likely to double in the next decades. MD associated co-morbidities such as non-alcoholic fatty liver disease (NAFLD) and cardiomyopathy contribute significantly to impaired health. MD are complex, polygenic, with many genes involved in its aetiology. A popular approach to investigate genetic contributions to disease aetiology is biological network analysis. However, data dependence introduces a bias (noise, false positives, over-publication) in the outcome. While several approaches have been proposed to overcome these biases, many of them have constraints, including data integration issues, dependence on arbitrary parameters, database dependent outcomes, and computational complexity. Network topology is also a critical factor affecting the outcomes. Here, we propose a simple, parameter-free method, that takes into account database dependence and network topology, to identify central genes in the MD network. Among them, we infer novel candidates that have not yet been annotated as MD genes and show their relevance by highlighting their differential expression in public datasets and carefully examining the literature. The method contributes to uncovering connections in the MD mechanisms and highlights several candidates for in-depth study of their contribution to MD and its co-morbidities.
Collapse
|
5
|
Savino A, Provero P, Poli V. Differential Co-Expression Analyses Allow the Identification of Critical Signalling Pathways Altered during Tumour Transformation and Progression. Int J Mol Sci 2020; 21:E9461. [PMID: 33322692 PMCID: PMC7764314 DOI: 10.3390/ijms21249461] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/02/2020] [Accepted: 12/09/2020] [Indexed: 02/02/2023] Open
Abstract
Biological systems respond to perturbations through the rewiring of molecular interactions, organised in gene regulatory networks (GRNs). Among these, the increasingly high availability of transcriptomic data makes gene co-expression networks the most exploited ones. Differential co-expression networks are useful tools to identify changes in response to an external perturbation, such as mutations predisposing to cancer development, and leading to changes in the activity of gene expression regulators or signalling. They can help explain the robustness of cancer cells to perturbations and identify promising candidates for targeted therapy, moreover providing higher specificity with respect to standard co-expression methods. Here, we comprehensively review the literature about the methods developed to assess differential co-expression and their applications to cancer biology. Via the comparison of normal and diseased conditions and of different tumour stages, studies based on these methods led to the definition of pathways involved in gene network reorganisation upon oncogenes' mutations and tumour progression, often converging on immune system signalling. A relevant implementation still lagging behind is the integration of different data types, which would greatly improve network interpretability. Most importantly, performance and predictivity evaluation of the large variety of mathematical models proposed would urgently require experimental validations and systematic comparisons. We believe that future work on differential gene co-expression networks, complemented with additional omics data and experimentally tested, will considerably improve our insights into the biology of tumours.
Collapse
Affiliation(s)
- Aurora Savino
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| | - Paolo Provero
- Department of Neurosciences “Rita Levi Montalcini”, University of Turin, Corso Massimo D’Ázeglio 52, 10126 Turin, Italy;
- Center for Omics Sciences, Ospedale San Raffaele IRCCS, Via Olgettina 60, 20132 Milan, Italy
| | - Valeria Poli
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| |
Collapse
|
6
|
Morselli Gysi D, de Miranda Fragoso T, Zebardast F, Bertoli W, Busskamp V, Almaas E, Nowick K. Whole transcriptomic network analysis using Co-expression Differential Network Analysis (CoDiNA). PLoS One 2020; 15:e0240523. [PMID: 33057419 PMCID: PMC7561188 DOI: 10.1371/journal.pone.0240523] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 09/29/2020] [Indexed: 01/05/2023] Open
Abstract
Biological and medical sciences are increasingly acknowledging the significance of gene co-expression-networks for investigating complex-systems, phenotypes or diseases. Typically, complex phenotypes are investigated under varying conditions. While approaches for comparing nodes and links in two networks exist, almost no methods for the comparison of multiple networks are available and—to best of our knowledge—no comparative method allows for whole transcriptomic network analysis. However, it is the aim of many studies to compare networks of different conditions, for example, tissues, diseases, treatments, time points, or species. Here we present a method for the systematic comparison of an unlimited number of networks, with unlimited number of transcripts: Co-expression Differential Network Analysis (CoDiNA). In particular, CoDiNA detects links and nodes that are common, specific or different among the networks. We developed a statistical framework to normalize between these different categories of common or changed network links and nodes, resulting in a comprehensive network analysis method, more sophisticated than simply comparing the presence or absence of network nodes. Applying CoDiNA to a neurogenesis study we identified candidate genes involved in neuronal differentiation. We experimentally validated one candidate, demonstrating that its overexpression resulted in a significant disturbance in the underlying gene regulatory network of neurogenesis. Using clinical studies, we compared whole transcriptome co-expression networks from individuals with or without HIV and active tuberculosis (TB) and detected signature genes specific to HIV. Furthermore, analyzing multiple cancer transcription factor (TF) networks, we identified common and distinct features for particular cancer types. These CoDiNA applications demonstrate the successful detection of genes associated with specific phenotypes. Moreover, CoDiNA can also be used for comparing other types of undirected networks, for example, metabolic, protein-protein interaction, ecological and psychometric networks. CoDiNA is publicly available as an R package in CRAN (https://CRAN.R-project.org/package=CoDiNA).
Collapse
Affiliation(s)
- Deisy Morselli Gysi
- Department of Computer Science, Leipzig University, Leipzig, Germany
- * E-mail: (KN); (DMG)
| | | | - Fatemeh Zebardast
- Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Berlin, Germany
| | - Wesley Bertoli
- Department of Statistics, Federal University of Technology - Paraná, Curitiba, Brazil
| | - Volker Busskamp
- Center for Regenerative Therapies (CRTD), Technical University Dresden, Dresden, Germany
- Dept. of Ophthalmology, Universitäts-Augenklinik Bonn, University of Bonn, Bonn, Germany
| | - Eivind Almaas
- Department of Biotechnology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- K.G. Jebsen Centre for Genetic Epidemiology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Katja Nowick
- Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Berlin, Germany
- * E-mail: (KN); (DMG)
| |
Collapse
|
7
|
Nicora G, Vitali F, Dagliati A, Geifman N, Bellazzi R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front Oncol 2020; 10:1030. [PMID: 32695678 PMCID: PMC7338582 DOI: 10.3389/fonc.2020.01030] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/26/2020] [Indexed: 12/16/2022] Open
Abstract
In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.
Collapse
Affiliation(s)
- Giovanna Nicora
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Francesca Vitali
- Center for Innovation in Brain Science, University of Arizona, Tucson, AZ, United States.,Department of Neurology, College of Medicine, University of Arizona, Tucson, AZ, United States.,Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, United States
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.,Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Nophar Geifman
- Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
8
|
Sanford JA, Nogiec CD, Lindholm ME, Adkins JN, Amar D, Dasari S, Drugan JK, Fernández FM, Radom-Aizik S, Schenk S, Snyder MP, Tracy RP, Vanderboom P, Trappe S, Walsh MJ, Adkins JN, Amar D, Dasari S, Drugan JK, Evans CR, Fernandez FM, Li Y, Lindholm ME, Nogiec CD, Radom-Aizik S, Sanford JA, Schenk S, Snyder MP, Tomlinson L, Tracy RP, Trappe S, Vanderboom P, Walsh MJ, Lee Alekel D, Bekirov I, Boyce AT, Boyington J, Fleg JL, Joseph LJ, Laughlin MR, Maruvada P, Morris SA, McGowan JA, Nierras C, Pai V, Peterson C, Ramos E, Roary MC, Williams JP, Xia A, Cornell E, Rooney J, Miller ME, Ambrosius WT, Rushing S, Stowe CL, Jack Rejeski W, Nicklas BJ, Pahor M, Lu CJ, Trappe T, Chambers T, Raue U, Lester B, Bergman BC, Bessesen DH, Jankowski CM, Kohrt WM, Melanson EL, Moreau KL, Schauer IE, Schwartz RS, Kraus WE, Slentz CA, Huffman KM, Johnson JL, Willis LH, Kelly L, Houmard JA, Dubis G, Broskey N, Goodpaster BH, Sparks LM, Coen PM, Cooper DM, Haddad F, Rankinen T, Ravussin E, Johannsen N, Harris M, Jakicic JM, Newman AB, Forman DD, Kershaw E, Rogers RJ, Nindl BC, Page LC, Stefanovic-Racic M, Barr SL, Rasmussen BB, Moro T, Paddon-Jones D, Volpi E, Spratt H, Musi N, Espinoza S, Patel D, Serra M, Gelfond J, Burns A, Bamman MM, Buford TW, Cutter GR, Bodine SC, Esser K, Farrar RP, Goodyear LJ, Hirshman MF, Albertson BG, Qian WJ, Piehowski P, Gritsenko MA, Monore ME, Petyuk VA, McDermott JE, Hansen JN, Hutchison C, Moore S, Gaul DA, Clish CB, Avila-Pacheco J, Dennis C, Kellis M, Carr S, Jean-Beltran PM, Keshishian H, Mani D, Clauser K, Krug K, Mundorff C, Pearce C, Ivanova AA, Ortlund EA, Maner-Smith K, Uppal K, Zhang T, Sealfon SC, Zaslavsky E, Nair V, Li S, Jain N, Ge Y, Sun Y, Nudelman G, Ruf-zamojski F, Smith G, Pincas N, Rubenstein A, Anne Amper M, Seenarine N, Lappalainen T, Lanza IR, Sreekumaran Nair K, Klaus K, Montgomery SB, Smith KS, Gay NR, Zhao B, Hung CJ, Zebarjadi N, Balliu B, Fresard L, Burant CF, Li JZ, Kachman M, Soni T, Raskind AB, Gerszten R, Robbins J, Ilkayeva O, Muehlbauer MJ, Newgard CB, Ashley EA, Wheeler MT, Jimenez-Morales D, Raja A, Dalton KP, Zhen J, Suk Kim Y, Christle JW, Marwaha S, Chin ET, Hershman SG, Hastie T, Tibshirani R, Rivas MA. Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise. Cell 2020; 181:1464-1474. [DOI: 10.1016/j.cell.2020.06.004] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/19/2020] [Accepted: 06/01/2020] [Indexed: 12/31/2022]
|
9
|
Al-Harazi O, El Allali A, Colak D. Biomolecular Databases and Subnetwork Identification Approaches of Interest to Big Data Community: An Expert Review. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2020; 23:138-151. [PMID: 30883301 DOI: 10.1089/omi.2018.0205] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases. Consequently, many researchers have applied these approaches to discover the genetic/genomic causes of common complex and rare human diseases, generating multiomics big data that span the continuum of genomics, proteomics, metabolomics, and many other system science fields. Therefore, there is a significant and unmet need for biological databases and tools that enable and empower the researchers to analyze, integrate, and make sense of big data. There are currently large number of databases that offer different types of biological information. In particular, the integration of gene expression profiles and protein-protein interaction networks provides a deeper understanding of the complex multilayered molecular architecture of human diseases. Therefore, there has been a growing interest in developing methodologies that integrate and contextualize big data from molecular interaction networks to identify biomarkers of human diseases at a subnetwork resolution as well. In this expert review, we provide a comprehensive summary of most popular biomolecular databases for molecular interactions (e.g., Biological General Repository for Interaction Datasets, Kyoto Encyclopedia of Genes and Genomes and Search Tool for The Retrieval of Interacting Genes/Proteins), gene-disease associations (e.g., Online Mendelian Inheritance in Man, Disease-Gene Network, MalaCards), and population-specific databases (e.g., Human Genetic Variation Database), and describe some examples of their usage and potential applications. We also present the most recent subnetwork identification approaches and discuss their main advantages and limitations. As the field of data science continues to emerge, the present analysis offers a deeper and contextualized understanding of the available databases in molecular biomedicine.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.,2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- 2 Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dilek Colak
- 1 Department of Biostatistics, Epidemiology, and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
10
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
11
|
Ramšak Ž, Coll A, Stare T, Tzfadia O, Baebler Š, Van de Peer Y, Gruden K. Network Modeling Unravels Mechanisms of Crosstalk between Ethylene and Salicylate Signaling in Potato. PLANT PHYSIOLOGY 2018; 178:488-499. [PMID: 29934298 PMCID: PMC6130022 DOI: 10.1104/pp.18.00450] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 06/09/2018] [Indexed: 05/25/2023]
Abstract
To develop novel crop breeding strategies, it is crucial to understand the mechanisms underlying the interaction between plants and their pathogens. Network modeling represents a powerful tool that can unravel properties of complex biological systems. In this study, we aimed to use network modeling to better understand immune signaling in potato (Solanum tuberosum). For this, we first built on a reliable Arabidopsis (Arabidopsis thaliana) immune signaling model, extending it with the information from diverse publicly available resources. Next, we translated the resulting prior knowledge network (20,012 nodes and 70,091 connections) to potato and superimposed it with an ensemble network inferred from time-resolved transcriptomics data for potato. We used different network modeling approaches to generate specific hypotheses of potato immune signaling mechanisms. An interesting finding was the identification of a string of molecular events illuminating the ethylene pathway modulation of the salicylic acid pathway through Nonexpressor of PR Genes1 gene expression. Functional validations confirmed this modulation, thus supporting the potential of our integrative network modeling approach for unraveling molecular mechanisms in complex systems. In addition, this approach can ultimately result in improved breeding strategies for potato and other sensitive crops.
Collapse
Affiliation(s)
- Živa Ramšak
- National Institute of Biology, Department of Biotechnology and Systems Biology, 1000 Ljubljana, Slovenia
| | - Anna Coll
- National Institute of Biology, Department of Biotechnology and Systems Biology, 1000 Ljubljana, Slovenia
| | - Tjaša Stare
- National Institute of Biology, Department of Biotechnology and Systems Biology, 1000 Ljubljana, Slovenia
| | - Oren Tzfadia
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Špela Baebler
- National Institute of Biology, Department of Biotechnology and Systems Biology, 1000 Ljubljana, Slovenia
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| | - Kristina Gruden
- National Institute of Biology, Department of Biotechnology and Systems Biology, 1000 Ljubljana, Slovenia
| |
Collapse
|
12
|
Amar D, Izraeli S, Shamir R. Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications. Oncogene 2017; 36:3375-3383. [PMID: 28092680 PMCID: PMC5485176 DOI: 10.1038/onc.2016.489] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 11/20/2016] [Accepted: 11/22/2016] [Indexed: 02/07/2023]
Abstract
Large cancer projects measure somatic mutations in thousands of samples, gradually assembling a catalog of recurring mutations in cancer. Many methods analyze these data jointly with auxiliary information with the aim of identifying subtype-specific results. Here, we show that somatic gene mutations alone can reliably and specifically predict cancer subtypes. Interpretation of the classifiers provides useful insights for several biomedical applications. We analyze the COSMIC database, which collects somatic mutations from The Cancer Genome Atlas (TCGA) as well as from many smaller scale studies. We use multi-label classification techniques and the Disease Ontology hierarchy in order to identify cancer subtype-specific biomarkers. Cancer subtype classifiers based on TCGA and the smaller studies have comparable performance, and the smaller studies add a substantial value in terms of validation, coverage of additional subtypes, and improved classification. The gene sets of the classifiers are used for threefold contribution. First, we refine the associations of genes to cancer subtypes and identify novel compelling candidate driver genes. Second, using our classifiers we successfully predict the primary site of metastatic samples. Third, we provide novel hypotheses regarding detection of subtype-specific synthetic lethality interactions. From the cancer research community perspective, our results suggest that curation efforts, such as COSMIC, have great added and complementary value even in the era of large international cancer projects.
Collapse
Affiliation(s)
- D Amar
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - S Izraeli
- Department of Pediatric Hematology-Oncology, Safra Children’s Hospital, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel-Aviv, Israel
| | - R Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
13
|
Raghow R. An 'Omics' Perspective on Cardiomyopathies and Heart Failure. Trends Mol Med 2016; 22:813-827. [PMID: 27499035 DOI: 10.1016/j.molmed.2016.07.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 07/15/2016] [Accepted: 07/15/2016] [Indexed: 12/27/2022]
Abstract
Pathological enlargement of the heart, represented by hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM), occurs in response to many genetic and non-genetic factors. The clinical course of cardiac hypertrophy is remarkably variable, ranging from lifelong absence of symptoms to rapidly declining heart function and sudden cardiac death (SCD). Unbiased omics studies have begun to provide a glimpse into the molecular framework underpinning altered mechanotransduction, mitochondrial energetics, oxidative stress, and extracellular matrix in the heart undergoing physiological and pathological hypertrophy. Omics analyses indicate that post-transcriptional regulation of gene expression plays an overriding role in the normal and diseased heart. Studies to date highlight a need for more effective bioinformatics to better integrate patient omics data with their comprehensive clinical histories.
Collapse
Affiliation(s)
- Rajendra Raghow
- Department of Pharmacology, College of Medicine, The University of Tennessee Health Science Center and the VA Medical Center, Memphis, TN 38104, USA.
| |
Collapse
|
14
|
Penga J, Wang T, Huc J, Wang Y, Chen J. Constructing Networks of Organelle Functional Modules in Arabidopsis. Curr Genomics 2016; 17:427-438. [PMID: 28479871 PMCID: PMC5320545 DOI: 10.2174/1389202917666160726151048] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 05/30/2015] [Accepted: 06/05/2015] [Indexed: 11/22/2022] Open
Abstract
With the rapid accumulation of gene expression data, gene functional module identification has become a widely used approach in functional analysis. However, tools to identify organelle functional modules and analyze their relationships are still missing. We present a soft thresholding approach to construct networks of functional modules using gene expression datasets, in which nodes are strongly co-expressed genes that encode proteins residing in the same subcellular localization, and links represent strong inter-module connections. Our algorithm has three steps. First, we identify functional modules by analyzing gene expression data. Next, we use a self-adaptive approach to construct a mixed network of functional modules and genes. Finally, we link functional modules that are tightly connected in the mixed network. Analysis of experimental data from Arabidopsis demonstrates that our approach is effective in improving the interpretability of high-throughput transcriptomic data and inferring function of unknown genes.
Collapse
Affiliation(s)
- Jiajie Penga
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China.,Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jianping Huc
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Plant Biology, Michigan State University, East Lansing, USA
| | - Yadong Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jin Chen
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| |
Collapse
|
15
|
Aguiar-Pulido V, Huang W, Suarez-Ulloa V, Cickovski T, Mathee K, Narasimhan G. Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis. Evol Bioinform Online 2016; 12:5-16. [PMID: 27199545 PMCID: PMC4869604 DOI: 10.4137/ebo.s36436] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 01/26/2016] [Accepted: 01/31/2016] [Indexed: 01/21/2023] Open
Abstract
Microbiomes are ubiquitous and are found in the ocean, the soil, and in/on other living organisms. Changes in the microbiome can impact the health of the environmental niche in which they reside. In order to learn more about these communities, different approaches based on data from multiple omics have been pursued. Metagenomics produces a taxonomical profile of the sample, metatranscriptomics helps us to obtain a functional profile, and metabolomics completes the picture by determining which byproducts are being released into the environment. Although each approach provides valuable information separately, we show that, when combined, they paint a more comprehensive picture. We conclude with a review of network-based approaches as applied to integrative studies, which we believe holds the key to in-depth understanding of microbiomes.
Collapse
Affiliation(s)
- Vanessa Aguiar-Pulido
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL, USA
| | - Wenrui Huang
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL, USA
| | - Victoria Suarez-Ulloa
- Chromatin Structure and Evolution Group (Chromevol), Department of Biological Sciences, Florida International University, Miami, FL, USA
| | - Trevor Cickovski
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL, USA.; Department of Computer Science, Eckerd College, St. Petersburg, FL, USA
| | - Kalai Mathee
- Biomolecular Sciences Institute, Florida International University, Miami, FL, USA.; Herbert Wertheim College of Medicine, Florida International University, Miami, FL, USA.; Global Health Consortium, Florida International University, Miami, FL, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, FL, USA.; Biomolecular Sciences Institute, Florida International University, Miami, FL, USA
| |
Collapse
|
16
|
Van Landeghem S, Van Parys T, Dubois M, Inzé D, Van de Peer Y. Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks. BMC Bioinformatics 2016; 17:18. [PMID: 26729218 PMCID: PMC4700732 DOI: 10.1186/s12859-015-0863-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 12/17/2015] [Indexed: 11/30/2022] Open
Abstract
Background Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. Results In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Availability Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0863-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sofie Van Landeghem
- Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium.
| | - Thomas Van Parys
- Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium.
| | - Marieke Dubois
- Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium.
| | - Dirk Inzé
- Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium.
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium. .,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium. .,Genomics Research Institute, University of Pretoria, Private bag X200028, Pretoria, South Africa.
| |
Collapse
|
17
|
Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network. J Genet Genomics 2015; 43:349-67. [PMID: 27318646 DOI: 10.1016/j.jgg.2015.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 10/22/2015] [Accepted: 11/20/2015] [Indexed: 12/16/2022]
Abstract
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Sadiq Al Insaif
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Monirah A Al-Ajlan
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia; College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| | - Namik Kaya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Nduna Dzimiri
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Dilek Colak
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia.
| |
Collapse
|
18
|
Ou-Yang L, Dai DQ, Zhang XF. Detecting Protein Complexes from Signed Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1333-1344. [PMID: 26671805 DOI: 10.1109/tcbb.2015.2401014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identification of protein complexes is fundamental for understanding the cellular functional organization. With the accumulation of physical protein-protein interaction (PPI) data, computational detection of protein complexes from available PPI networks has drawn a lot of attentions. While most of the existing protein complex detection algorithms focus on analyzing the physical protein-protein interaction network, none of them take into account the "signs" (i.e., activation-inhibition relationships) of physical interactions. As the "signs" of interactions reflect the way proteins communicate, considering the "signs" of interactions can not only increase the accuracy of protein complex identification, but also deepen our understanding of the mechanisms of cell functions. In this study, we proposed a novel Signed Graph regularized Nonnegative Matrix Factorization (SGNMF) model to identify protein complexes from signed PPI networks. In our experiments, we compared the results collected by our model on signed PPI networks with those predicted by the state-of-the-art complex detection techniques on the original unsigned PPI networks. We observed that considering the "signs" of interactions significantly benefits the detection of protein complexes. Furthermore, based on the predicted complexes, we predicted a set of signed complex-complex interactions for each dataset, which provides a novel insight of the higher level organization of the cell. All the experimental results and codes can be downloaded from http://mail.sysu.edu.cn/home/stsddq@mail.sysu.edu.cn/dai/others/SGNMF.zip.
Collapse
|
19
|
Singh H, Khan AA, Dinner AR. Gene regulatory networks in the immune system. Trends Immunol 2014; 35:211-8. [DOI: 10.1016/j.it.2014.03.006] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Revised: 03/28/2014] [Accepted: 03/28/2014] [Indexed: 01/09/2023]
|