1
|
Vaparanta K, Merilahti JAM, Ojala VK, Elenius K. De Novo Multi-Omics Pathway Analysis Designed for Prior Data Independent Inference of Cell Signaling Pathways. Mol Cell Proteomics 2024; 23:100780. [PMID: 38703893 PMCID: PMC11259815 DOI: 10.1016/j.mcpro.2024.100780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 04/07/2024] [Accepted: 04/30/2024] [Indexed: 05/06/2024] Open
Abstract
New tools for cell signaling pathway inference from multi-omics data that are independent of previous knowledge are needed. Here, we propose a new de novo method, the de novo multi-omics pathway analysis (DMPA), to model and combine omics data into network modules and pathways. DMPA was validated with published omics data and was found accurate in discovering reported molecular associations in transcriptome, interactome, phosphoproteome, methylome, and metabolomics data, and signaling pathways in multi-omics data. DMPA was benchmarked against module discovery and multi-omics integration methods and outperformed previous methods in module and pathway discovery especially when applied to datasets of relatively low sample sizes. Transcription factor, kinase, subcellular location, and function prediction algorithms were devised for transcriptome, phosphoproteome, and interactome modules and pathways, respectively. To apply DMPA in a biologically relevant context, interactome, phosphoproteome, transcriptome, and proteome data were collected from analyses carried out using melanoma cells to address gamma-secretase cleavage-dependent signaling characteristics of the receptor tyrosine kinase TYRO3. The pathways modeled with DMPA reflected the predicted function and its direction in validation experiments.
Collapse
Affiliation(s)
- Katri Vaparanta
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland.
| | - Johannes A M Merilahti
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Veera K Ojala
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland
| | - Klaus Elenius
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland; Medicity Research Laboratories, University of Turku, Turku, Finland; Institute of Biomedicine, University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland.
| |
Collapse
|
2
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024:elae013. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
3
|
Goddard TR, Brookes KJ, Sharma R, Moemeni A, Rajkumar AP. Dementia with Lewy Bodies: Genomics, Transcriptomics, and Its Future with Data Science. Cells 2024; 13:223. [PMID: 38334615 PMCID: PMC10854541 DOI: 10.3390/cells13030223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/17/2024] [Accepted: 01/23/2024] [Indexed: 02/10/2024] Open
Abstract
Dementia with Lewy bodies (DLB) is a significant public health issue. It is the second most common neurodegenerative dementia and presents with severe neuropsychiatric symptoms. Genomic and transcriptomic analyses have provided some insight into disease pathology. Variants within SNCA, GBA, APOE, SNCB, and MAPT have been shown to be associated with DLB in repeated genomic studies. Transcriptomic analysis, conducted predominantly on candidate genes, has identified signatures of synuclein aggregation, protein degradation, amyloid deposition, neuroinflammation, mitochondrial dysfunction, and the upregulation of heat-shock proteins in DLB. Yet, the understanding of DLB molecular pathology is incomplete. This precipitates the current clinical position whereby there are no available disease-modifying treatments or blood-based diagnostic biomarkers. Data science methods have the potential to improve disease understanding, optimising therapeutic intervention and drug development, to reduce disease burden. Genomic prediction will facilitate the early identification of cases and the timely application of future disease-modifying treatments. Transcript-level analyses across the entire transcriptome and machine learning analysis of multi-omic data will uncover novel signatures that may provide clues to DLB pathology and improve drug development. This review will discuss the current genomic and transcriptomic understanding of DLB, highlight gaps in the literature, and describe data science methods that may advance the field.
Collapse
Affiliation(s)
- Thomas R. Goddard
- Mental Health and Clinical Neurosciences Academic Unit, Institute of Mental Health, School of Medicine, University of Nottingham, Nottingham NG7 2TU, UK
| | - Keeley J. Brookes
- Department of Biosciences, School of Science & Technology, Nottingham Trent University, Nottingham NG11 8NS, UK
| | - Riddhi Sharma
- Biodiscovery Institute, School of Medicine, University of Nottingham, Nottingham NG7 2RD, UK
- UK Health Security Agency, Radiation Effects Department, Radiation Protection Science Division, Harwell Science Campus, Didcot, Oxfordshire OX11 0RQ, UK
| | - Armaghan Moemeni
- School of Computer Science, University of Nottingham, Nottingham NG8 1BB, UK
| | - Anto P. Rajkumar
- Mental Health and Clinical Neurosciences Academic Unit, Institute of Mental Health, School of Medicine, University of Nottingham, Nottingham NG7 2TU, UK
| |
Collapse
|
4
|
Lingjærde C, Richardson S. StabJGL: a stability approach to sparsity and similarity selection in multiple-network reconstruction. BIOINFORMATICS ADVANCES 2023; 3:vbad185. [PMID: 38152341 PMCID: PMC10751232 DOI: 10.1093/bioadv/vbad185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/23/2023] [Accepted: 12/18/2023] [Indexed: 12/29/2023]
Abstract
Motivation In recent years, network models have gained prominence for their ability to capture complex associations. In statistical omics, networks can be used to model and study the functional relationships between genes, proteins, and other types of omics data. If a Gaussian graphical model is assumed, a gene association network can be determined from the non-zero entries of the inverse covariance matrix of the data. Due to the high-dimensional nature of such problems, integrative methods that leverage similarities between multiple graphical structures have become increasingly popular. The joint graphical lasso is a powerful tool for this purpose, however, the current AIC-based selection criterion used to tune the network sparsities and similarities leads to poor performance in high-dimensional settings. Results We propose stabJGL, which equips the joint graphical lasso with a stable and well-performing penalty parameter selection approach that combines the notion of model stability with likelihood-based similarity selection. The resulting method makes the powerful joint graphical lasso available for use in omics settings, and outperforms the standard joint graphical lasso, as well as state-of-the-art joint methods, in terms of all performance measures we consider. Applying stabJGL to proteomic data from a pan-cancer study, we demonstrate the potential for novel discoveries the method brings. Availability and implementation A user-friendly R package for stabJGL with tutorials is available on Github https://github.com/Camiling/stabJGL.
Collapse
Affiliation(s)
- Camilla Lingjærde
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| |
Collapse
|
5
|
Boris V, Vanessa V. Molecular systems biology approaches to investigate mechanisms of gut-brain communication in neurological diseases. Eur J Neurol 2023; 30:3622-3632. [PMID: 37038632 DOI: 10.1111/ene.15819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/03/2023] [Accepted: 04/05/2023] [Indexed: 04/12/2023]
Abstract
BACKGROUND Whilst the incidence of neurological diseases is increasing worldwide, treatment remains mostly limited to symptom management. The gut-brain axis, which encompasses the communication routes between microbiota, gut and brain, has emerged as a crucial area of investigation for identifying new preventive and therapeutic targets in neurological disease. METHODS Due to the inter-organ, systemic nature of the gut-brain axis, together with the multitude of biomolecules and microbial species involved, molecular systems biology approaches are required to accurately investigate the mechanisms of gut-brain communication. High-throughput omics profiling, together with computational methodologies such as dimensionality reduction or clustering, machine learning, network inference and genome-scale metabolic models, allows novel biomarkers to be discovered and elucidates mechanistic insights. RESULTS In this review, the general concepts of experimental and computational methodologies for gut-brain axis research are introduced and their applications are discussed, mainly in human cohorts. Important aspects are further highlighted concerning rational study design, sampling procedures and data modalities relevant for gut-brain communication, strengths and limitations of methodological approaches and some future perspectives. CONCLUSION Multi-omics analyses, together with advanced data mining, are essential to functionally characterize the gut-brain axis and put forward novel preventive or therapeutic strategies in neurological disease.
Collapse
Affiliation(s)
- Vandemoortele Boris
- Laboratory for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Vermeirssen Vanessa
- Laboratory for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
6
|
Can H, Chanumolu SK, Nielsen BD, Alvarez S, Naldrett MJ, Ünlü G, Otu HH. Integration of Meta-Multi-Omics Data Using Probabilistic Graphs and External Knowledge. Cells 2023; 12:1998. [PMID: 37566077 PMCID: PMC10417344 DOI: 10.3390/cells12151998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/11/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023] Open
Abstract
Multi-omics has the promise to provide a detailed molecular picture of biological systems. Although obtaining multi-omics data is relatively easy, methods that analyze such data have been lagging. In this paper, we present an algorithm that uses probabilistic graph representations and external knowledge to perform optimal structure learning and deduce a multifarious interaction network for multi-omics data from a bacterial community. Kefir grain, a microbial community that ferments milk and creates kefir, represents a self-renewing, stable, natural microbial community. Kefir has been shown to have a wide range of health benefits. We obtained a controlled bacterial community using the two most abundant and well-studied species in kefir grains: Lentilactobacillus kefiri and Lactobacillus kefiranofaciens. We applied growth temperatures of 30 °C and 37 °C and obtained transcriptomic, metabolomic, and proteomic data for the same 20 samples (10 samples per temperature). We obtained a multi-omics interaction network, which generated insights that would not have been possible with single-omics analysis. We identified interactions among transcripts, proteins, and metabolites, suggesting active toxin/antitoxin systems. We also observed multifarious interactions that involved the shikimate pathway. These observations helped explain bacterial adaptation to different stress conditions, co-aggregation, and increased activation of L. kefiranofaciens at 37 °C.
Collapse
Affiliation(s)
- Handan Can
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Sree K. Chanumolu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Barbara D. Nielsen
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Sophie Alvarez
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Michael J. Naldrett
- Proteomics and Metabolomics Facility, Nebraska Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Gülhan Ünlü
- Department of Animal, Veterinary and Food Sciences, University of Idaho, Moscow, ID 83844, USA
- Department of Chemical and Biological Engineering, University of Idaho, Moscow, ID 83844, USA
- School of Food Science, Washington State University, Pullman, WA 99164, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
7
|
O'Connor LM, O'Connor BA, Lim SB, Zeng J, Lo CH. Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective. J Pharm Anal 2023; 13:836-850. [PMID: 37719197 PMCID: PMC10499660 DOI: 10.1016/j.jpha.2023.06.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 09/19/2023] Open
Abstract
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information, with its application to neuroscience termed neuroinformatics. Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms, which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases. Importantly, integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile. In this review, we first summarize data mining studies utilizing datasets from the individual type of omics analysis, including epigenetics/epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, and spatial omics, pertaining to Alzheimer's disease, Parkinson's disease, and multiple sclerosis. We then discuss multi-omics integration approaches, including independent biological integration and unsupervised integration methods, for more intuitive and informative interpretation of the biological data obtained across different omics layers. We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks. Finally, we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery, therapeutic development, and elucidation of disease mechanisms. We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
Collapse
Affiliation(s)
- Lance M. O'Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Blake A. O'Connor
- School of Pharmacy, University of Wisconsin, Madison, WI, 53705, USA
| | - Su Bin Lim
- Department of Biochemistry and Molecular Biology, Ajou University School of Medicine, Suwon, 16499, South Korea
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| |
Collapse
|
8
|
Benkirane H, Pradat Y, Michiels S, Cournède PH. CustOmics: A versatile deep-learning based strategy for multi-omics integration. PLoS Comput Biol 2023; 19:e1010921. [PMID: 36877736 PMCID: PMC10019780 DOI: 10.1371/journal.pcbi.1010921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/16/2023] [Accepted: 02/04/2023] [Indexed: 03/07/2023] Open
Abstract
The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease's underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source's singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).
Collapse
Affiliation(s)
- Hakim Benkirane
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
| | - Yoann Pradat
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
| | - Stefan Michiels
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
- Bureau de Biostatistique et d’Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
| | - Paul-Henry Cournède
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- * E-mail:
| |
Collapse
|
9
|
Chen X, Han M, Li Y, Li X, Zhang J, Zhu Y. Identification of functional gene modules by integrating multi-omics data and known molecular interactions. Front Genet 2023; 14:1082032. [PMID: 36760999 PMCID: PMC9902936 DOI: 10.3389/fgene.2023.1082032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023] Open
Abstract
Multi-omics data integration has emerged as a promising approach to identify patient subgroups. However, in terms of grouping genes (or gene products) into co-expression modules, data integration methods suffer from two main drawbacks. First, most existing methods only consider genes or samples measured in all different datasets. Second, known molecular interactions (e.g., transcriptional regulatory interactions, protein-protein interactions and biological pathways) cannot be utilized to assist in module detection. Herein, we present a novel data integration framework, Correlation-based Local Approximation of Membership (CLAM), which provides two methodological innovations to address these limitations: 1) constructing a trans-omics neighborhood matrix by integrating multi-omics datasets and known molecular interactions, and 2) using a local approximation procedure to define gene modules from the matrix. Applying Correlation-based Local Approximation of Membership to human colorectal cancer (CRC) and mouse B-cell differentiation multi-omics data obtained from The Cancer Genome Atlas (TCGA), Clinical Proteomics Tumor Analysis Consortium (CPTAC), Gene Expression Omnibus (GEO) and ProteomeXchange database, we demonstrated its superior ability to recover biologically relevant modules and gene ontology (GO) terms. Further investigation of the colorectal cancer modules revealed numerous transcription factors and KEGG pathways that played crucial roles in colorectal cancer progression. Module-based survival analysis constructed four survival-related networks in which pairwise gene correlations were significantly correlated with colorectal cancer patient survival. Overall, the series of evaluations demonstrated the great potential of Correlation-based Local Approximation of Membership for identifying modular biomarkers for complex diseases. We implemented Correlation-based Local Approximation of Membership as a user-friendly application available at https://github.com/free1234hm/CLAM.
Collapse
Affiliation(s)
- Xiaoqing Chen
- Basic Medical School, Anhui Medical University, Hefei, China,National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Mingfei Han
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Yingxing Li
- Central Research Laboratory, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao Li
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Jiaqi Zhang
- National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China
| | - Yunping Zhu
- Basic Medical School, Anhui Medical University, Hefei, China,National Center for Protein Sciences (Beijing), Beijing Proteome Research Center, Beijing Institute of Lifeomics, Beijing, China,*Correspondence: Yunping Zhu,
| |
Collapse
|
10
|
Niranjan V, Uttarkar A, Kaul A, Varghese M. A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways. Methods Mol Biol 2023; 2553:441-452. [PMID: 36227554 DOI: 10.1007/978-1-0716-2617-7_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The integrative method approaches are continuously evolving to provide accurate insights from the data that is received through experimentation on various biological systems. Multi-omics data can be integrated with predictive machine learning algorithms in order to provide results with high accuracy. This protocol chapter defines the steps required for the ML-multi-omics integration methods that are applied on biological datasets for its analysis and the visual interpretation of the results thus obtained.
Collapse
Affiliation(s)
- Vidya Niranjan
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India.
| | - Akshay Uttarkar
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Aakaanksha Kaul
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Maryanne Varghese
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| |
Collapse
|
11
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
12
|
Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci 2022; 9:967205. [PMID: 36452456 PMCID: PMC9703081 DOI: 10.3389/fmolb.2022.967205] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 10/20/2022] [Indexed: 08/27/2023] Open
Abstract
Advances in omics technologies allow for holistic studies into biological systems. These studies rely on integrative data analysis techniques to obtain a comprehensive view of the dynamics of cellular processes, and molecular mechanisms. Network-based integrative approaches have revolutionized multi-omics analysis by providing the framework to represent interactions between multiple different omics-layers in a graph, which may faithfully reflect the molecular wiring in a cell. Here we review network-based multi-omics/multi-modal integrative analytical approaches. We classify these approaches according to the type of omics data supported, the methods and/or algorithms implemented, their node and/or edge weighting components, and their ability to identify key nodes and subnetworks. We show how these approaches can be used to identify biomarkers, disease subtypes, crosstalk, causality, and molecular drivers of physiological and pathological mechanisms. We provide insight into the most appropriate methods and tools for research questions as showcased around the aetiology and treatment of COVID-19 that can be informed by multi-omics data integration. We conclude with an overview of challenges associated with multi-omics network-based analysis, such as reproducibility, heterogeneity, (biological) interpretability of the results, and we highlight some future directions for network-based integration.
Collapse
Affiliation(s)
- Francis E. Agamah
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jumamurat R. Bayjanov
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Anna Niehues
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Kelechi F. Njoku
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Michelle Skelton
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- African Institute for Mathematical Sciences, Cape Town, South Africa
| | - Thomas H. A. Ederveen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle, United Kingdom
| | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
13
|
Raufaste-Cazavieille V, Santiago R, Droit A. Multi-omics analysis: Paving the path toward achieving precision medicine in cancer treatment and immuno-oncology. Front Mol Biosci 2022; 9:962743. [PMID: 36304921 PMCID: PMC9595279 DOI: 10.3389/fmolb.2022.962743] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
The acceleration of large-scale sequencing and the progress in high-throughput computational analyses, defined as omics, was a hallmark for the comprehension of the biological processes in human health and diseases. In cancerology, the omics approach, initiated by genomics and transcriptomics studies, has revealed an incredible complexity with unsuspected molecular diversity within a same tumor type as well as spatial and temporal heterogeneity of tumors. The integration of multiple biological layers of omics studies brought oncology to a new paradigm, from tumor site classification to pan-cancer molecular classification, offering new therapeutic opportunities for precision medicine. In this review, we will provide a comprehensive overview of the latest innovations for multi-omics integration in oncology and summarize the largest multi-omics dataset available for adult and pediatric cancers. We will present multi-omics techniques for characterizing cancer biology and show how multi-omics data can be combined with clinical data for the identification of prognostic and treatment-specific biomarkers, opening the way to personalized therapy. To conclude, we will detail the newest strategies for dissecting the tumor immune environment and host–tumor interaction. We will explore the advances in immunomics and microbiomics for biomarker identification to guide therapeutic decision in immuno-oncology.
Collapse
Affiliation(s)
| | - Raoul Santiago
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Division of Pediatric Hematology-Oncology, Centre Hospitalier Universitaire de L’Université Laval, Charles Bruneau Cancer Center, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| | - Arnaud Droit
- CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Raoul Santiago, ; Arnaud Droit,
| |
Collapse
|
14
|
Zhao N, Quicksall Z, Asmann YW, Ren Y. Network approaches for omics studies of neurodegenerative diseases. Front Genet 2022; 13:984338. [PMID: 36186441 PMCID: PMC9523597 DOI: 10.3389/fgene.2022.984338] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
The recent methodological advances in multi-omics approaches, including genomic, transcriptomic, metabolomic, lipidomic, and proteomic, have revolutionized the research field by generating “big data” which greatly enhanced our understanding of the molecular complexity of the brain and disease states. Network approaches have been routinely applied to single-omics data to provide critical insight into disease biology. Furthermore, multi-omics integration has emerged as both a vital need and a new direction to connect the different layers of information underlying disease mechanisms. In this review article, we summarize popular network analytic approaches for single-omics data and multi-omics integration and discuss how these approaches have been utilized in studying neurodegenerative diseases.
Collapse
Affiliation(s)
- Na Zhao
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, United States
| | - Zachary Quicksall
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, United States
| | - Yan W. Asmann
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, United States
| | - Yingxue Ren
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, United States
- *Correspondence: Yingxue Ren,
| |
Collapse
|
15
|
Loers JU, Vermeirssen V. SUBATOMIC: a SUbgraph BAsed mulTi-OMIcs clustering framework to analyze integrated multi-edge networks. BMC Bioinformatics 2022; 23:363. [PMID: 36064320 PMCID: PMC9442970 DOI: 10.1186/s12859-022-04908-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/24/2022] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Representing the complex interplay between different types of biomolecules across different omics layers in multi-omics networks bears great potential to gain a deep mechanistic understanding of gene regulation and disease. However, multi-omics networks easily grow into giant hairball structures that hamper biological interpretation. Module detection methods can decompose these networks into smaller interpretable modules. However, these methods are not adapted to deal with multi-omics data nor consider topological features. When deriving very large modules or ignoring the broader network context, interpretability remains limited. To address these issues, we developed a SUbgraph BAsed mulTi-OMIcs Clustering framework (SUBATOMIC), which infers small and interpretable modules with a specific topology while keeping track of connections to other modules and regulators. RESULTS SUBATOMIC groups specific molecular interactions in composite network subgraphs of two and three nodes and clusters them into topological modules. These are functionally annotated, visualized and overlaid with expression profiles to go from static to dynamic modules. To preserve the larger network context, SUBATOMIC investigates statistically the connections in between modules as well as between modules and regulators such as miRNAs and transcription factors. We applied SUBATOMIC to analyze a composite Homo sapiens network containing transcription factor-target gene, miRNA-target gene, protein-protein, homologous and co-functional interactions from different databases. We derived and annotated 5586 modules with diverse topological, functional and regulatory properties. We created novel functional hypotheses for unannotated genes. Furthermore, we integrated modules with condition specific expression data to study the influence of hypoxia in three cancer cell lines. We developed two prioritization strategies to identify the most relevant modules in specific biological contexts: one considering GO term enrichments and one calculating an activity score reflecting the degree of differential expression. Both strategies yielded modules specifically reacting to low oxygen levels. CONCLUSIONS We developed the SUBATOMIC framework that generates interpretable modules from integrated multi-omics networks and applied it to hypoxia in cancer. SUBATOMIC can infer and contextualize modules, explore condition or disease specific modules, identify regulators and functionally related modules, and derive novel gene functions for uncharacterized genes. The software is available at https://github.com/CBIGR/SUBATOMIC .
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium. .,Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium. .,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
16
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data. Brief Bioinform 2022; 23:6596990. [PMID: 35649346 PMCID: PMC9310531 DOI: 10.1093/bib/bbac193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
With the advances in high-throughput biotechnologies, high-dimensional multi-layer omics data become increasingly available. They can provide both confirmatory and complementary information to disease risk and thus have offered unprecedented opportunities for risk prediction studies. However, the high-dimensionality and complex inter/intra-relationships among multi-omics data have brought tremendous analytical challenges. Here we present a computationally efficient penalized linear mixed model with generalized method of moments estimator (MpLMMGMM) for the prediction analysis on multi-omics data. Our method extends the widely used linear mixed model proposed for genomic risk predictions to model multi-omics data, where kernel functions are used to capture various types of predictive effects from different layers of omics data and penalty terms are introduced to reduce the impact of noise. Compared with existing penalized linear mixed models, the proposed method adopts the generalized method of moments estimator and it is much more computationally efficient. Through extensive simulation studies and the analysis of positron emission tomography imaging outcomes, we have demonstrated that MpLMMGMM can simultaneously consider a large number of variables and efficiently select those that are predictive from the corresponding omics layers. It can capture both linear and nonlinear predictive effects and achieves better prediction performance than competing methods.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| |
Collapse
|
17
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
18
|
Pouryahya M, Oh JH, Javanmard P, Mathews JC, Belkhatir Z, Deasy JO, Tannenbaum AR. aWCluster: A Novel Integrative Network-Based Clustering of Multiomics for Subtype Analysis of Cancer Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1472-1483. [PMID: 33226952 PMCID: PMC9518829 DOI: 10.1109/tcbb.2020.3039511] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The remarkable growth of multi-platform genomic profiles has led to the challenge of multiomics data integration. In this study, we present a novel network-based multiomics clustering founded on the Wasserstein distance from optimal mass transport. This distance has many important geometric properties making it a suitable choice for application in machine learning and clustering. Our proposed method of aggregating multiomics and Wasserstein distance clustering (aWCluster) is applied to breast carcinoma as well as bladder carcinoma, colorectal adenocarcinoma, renal carcinoma, lung non-small cell adenocarcinoma, and endometrial carcinoma from The Cancer Genome Atlas project. Subtypes were characterized by the concordant effect of mRNA expression, DNA copy number alteration, and DNA methylation of genes and their neighbors in the interaction network. aWCluster successfully clusters all cancer types into classes with significantly different survival rates. Also, a gene ontology enrichment analysis of significant genes in the low survival subgroup of breast cancer leads to the well-known phenomenon of tumor hypoxia and the transcription factor ETS1 whose expression is induced by hypoxia. We believe aWCluster has the potential to discover novel subtypes and biomarkers by accentuating the genes that have concordant multiomics measurements in their interaction network, which are challenging to find without the network inference or with single omics analysis.
Collapse
|
19
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
20
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
21
|
Tripp BA, Otu HH. Integration of Multi-Omics Data Using Probabilistic Graph Models and
External Knowledge. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210906141545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
High-throughput sequencing technologies have revolutionized the ability to
perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive
characterization of different layers of biological information. Integration of these heterogeneous
layers can provide insight into the underlying biology but is challenged by modeling complex interactions.
Objective:
We introduce OBaNK: omics integration using Bayesian networks and external knowledge,
an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate
complex functional clusters and emergent relationships associated with an observed phenotype.
Method:
Using Bayesian network learning, we modeled the statistical dependencies and interactions
between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between
molecules was altered based on external knowledge.
Results :
Networks learned from synthetic datasets based on real pathways achieved an average area under
the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real
multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological
data were identified, and the results were compared to other multi-omics integration approaches.
Conclusion:
OBaNK successfully improved the accuracy of learning interaction networks from data integrating
external knowledge, identified heterogeneous functional networks from real data, and suggested
potential novel interactions associated with the phenotype. These findings can guide future hypothesis
generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a
graphical user interface is available at: http://otulab.unl.edu/OBaNK.
Collapse
Affiliation(s)
- Bridget A. Tripp
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
- PhD Program of Complex Biosystems, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
22
|
Decaesteker B, Durinck K, Van Roy N, De Wilde B, Van Neste C, Van Haver S, Roberts S, De Preter K, Vermeirssen V, Speleman F. From DNA Copy Number Gains and Tumor Dependencies to Novel Therapeutic Targets for High-Risk Neuroblastoma. J Pers Med 2021; 11:1286. [PMID: 34945759 PMCID: PMC8707517 DOI: 10.3390/jpm11121286] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/19/2021] [Accepted: 11/20/2021] [Indexed: 12/15/2022] Open
Abstract
Neuroblastoma is a pediatric tumor arising from the sympatho-adrenal lineage and a worldwide leading cause of childhood cancer-related deaths. About half of high-risk patients die from the disease while survivors suffer from multiple therapy-related side-effects. While neuroblastomas present with a low mutational burden, focal and large segmental DNA copy number aberrations are highly recurrent and associated with poor survival. It can be assumed that the affected chromosomal regions contain critical genes implicated in neuroblastoma biology and behavior. More specifically, evidence has emerged that several of these genes are implicated in tumor dependencies thus potentially providing novel therapeutic entry points. In this review, we briefly review the current status of recurrent DNA copy number aberrations in neuroblastoma and provide an overview of the genes affected by these genomic variants for which a direct role in neuroblastoma has been established. Several of these genes are implicated in networks that positively regulate MYCN expression or stability as well as cell cycle control and apoptosis. Finally, we summarize alternative approaches to identify and prioritize candidate copy-number driven dependency genes for neuroblastoma offering novel therapeutic opportunities.
Collapse
Grants
- P30 CA008748 NCI NIH HHS
- G087221N, G.0507.12, G049720N,12U4718N, 11C3921N, 11J8313N, 12B5313N, 1514215N, 1197617N,1238420N, 12Q8322N, 3F018519, 12N6917N Fund for Scientific Research Flanders
- 2018-087, 2018-125, 2020-112 Belgian Foundation against Cancer
Collapse
Affiliation(s)
- Bieke Decaesteker
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Kaat Durinck
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Nadine Van Roy
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Bram De Wilde
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
- Department of Internal Medicine and Pediatrics, Ghent University Hospital, Corneel Heymanslaan 10, B-9000 Ghent, Belgium
| | - Christophe Van Neste
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Stéphane Van Haver
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Stephen Roberts
- Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA;
| | - Katleen De Preter
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| | - Vanessa Vermeirssen
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
- Department of Biomedical Molecular Biology, Ghent University, Technologiepark 71, B-9052 Zwijnaarde, Belgium
| | - Frank Speleman
- Department for Biomolecular Medicine, Ghent University, Medical Research Building (MRB1), Corneel Heymanslaan 10, B-9000 Ghent, Belgium; (B.D.); (K.D.); (N.V.R.); (B.D.W.); (C.V.N.); (S.V.H.); (K.D.P.); (V.V.)
| |
Collapse
|
23
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
24
|
Temporal Transcriptomics of Gut Escherichia coli in Caenorhabditis elegans Models of Aging. Microbiol Spectr 2021; 9:e0049821. [PMID: 34523995 PMCID: PMC8557943 DOI: 10.1128/spectrum.00498-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Host-bacterial interactions over the course of aging are understudied due to complexities of the human microbiome and challenges of collecting samples that span a lifetime. To investigate the role of host-microbial interactions in aging, we performed transcriptomics using wild-type Caenorhabditis elegans (N2) and three long-lived mutants (daf-2, eat-2, and asm-3) fed Escherichia coli OP50 and sampled at days 5, 7.5, and 10 of adulthood. We found host age is a better predictor of the E. coli expression profiles than host genotype. Specifically, host age was associated with clustering (permutational multivariate analysis of variance [PERMANOVA], P = 0.001) and variation (Adonis, P = 0.001, R2 = 11.5%) among E. coli expression profiles, whereas host genotype was not (PERMANOVA, P > 0.05; Adonis, P > 0.05, R2 = 5.9%). Differential analysis of the E. coli transcriptome yielded 22 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and 100 KEGG genes enriched when samples were grouped by time point [LDA, linear discriminant analysis; log(LDA), ≥2; P ≤ 0.05], including several involved in biofilm formation. Coexpression analysis of host and bacterial genes yielded six modules of C. elegans genes that were coexpressed with one bacterial regulator gene over time. The three most significant bacterial regulators included genes relating to biofilm formation, lipopolysaccharide production, and thiamine biosynthesis. Age was significantly associated with clustering and variation among transcriptomic samples, supporting the idea that microbes are active and plastic within C. elegans throughout life. Coexpression analysis further revealed interactions between E. coli and C. elegans that occurred over time, building on a growing literature of host-microbial interactions. IMPORTANCE Previous research has reported effects of the microbiome on health span and life span of Caenorhabditis elegans, including interactions with evolutionarily conserved pathways in humans. We build on this literature by reporting the gene expression of Escherichia coli OP50 in wild-type (N2) and three long-lived mutants of C. elegans. The manuscript represents the first study, to our knowledge, to perform temporal host-microbial transcriptomics in the model organism C. elegans. Understanding changes to the microbial transcriptome over time is an important step toward elucidating host-microbial interactions and their potential relationship to aging. We found that age was significantly associated with clustering and variation among transcriptomic samples, supporting the idea that microbes are active and plastic within C. elegans throughout life. Coexpression analysis further revealed interactions between E. coli and C. elegans that occurred over time, which contributes to our growing knowledge about host-microbial interactions.
Collapse
|
25
|
Ding J, Blencowe M, Nghiem T, Ha SM, Chen YW, Li G, Yang X. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics. Nucleic Acids Res 2021; 49:W375-W387. [PMID: 34048577 PMCID: PMC8262738 DOI: 10.1093/nar/gkab405] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 04/28/2021] [Accepted: 05/02/2021] [Indexed: 12/13/2022] Open
Abstract
The Mergeomics web server is a flexible online tool for multi-omics data integration to derive biological pathways, networks, and key drivers important to disease pathogenesis and is based on the open source Mergeomics R package. The web server takes summary statistics of multi-omics disease association studies (GWAS, EWAS, TWAS, PWAS, etc.) as input and features four functions: Marker Dependency Filtering (MDF) to correct for known dependency between omics markers, Marker Set Enrichment Analysis (MSEA) to detect disease relevant biological processes, Meta-MSEA to examine the consistency of biological processes informed by various omics datasets, and Key Driver Analysis (KDA) to identify essential regulators of disease-associated pathways and networks. The web server has been extensively updated and streamlined in version 2.0 including an overhauled user interface, improved tutorials and results interpretation for each analytical step, inclusion of numerous disease GWAS, functional genomics datasets, and molecular networks to allow for comprehensive omics integrations, increased functionality to decrease user workload, and increased flexibility to cater to user-specific needs. Finally, we have incorporated our newly developed drug repositioning pipeline PharmOmics for prediction of potential drugs targeting disease processes that were identified by Mergeomics. Mergeomics is freely accessible at http://mergeomics.research.idre.ucla.edu and does not require login.
Collapse
Affiliation(s)
- Jessica Ding
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Thien Nghiem
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Sung-min Ha
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Yen-Wei Chen
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular Toxicology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Gaoyan Li
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular Toxicology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Bioinformatics, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| |
Collapse
|
26
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 243] [Impact Index Per Article: 81.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
27
|
Lu X, Liu F, Miao Q, Liu P, Gao Y, He K. A novel method to identify gene interaction patterns. BMC Genomics 2021; 22:436. [PMID: 34112093 PMCID: PMC8194229 DOI: 10.1186/s12864-021-07628-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 04/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene interaction patterns, including modules and motifs, can be used to identify cancer specific biomarkers and to reveal the mechanism of tumorigenesis. Most of the existing module network inferencing methods focus on gene independent functional patterns, while the studies of overlapping characteristics between modules are lacking. The objective of this study was to reveal the functional overlapping patterns in gene modules, helping elucidate the regulatory relationship between overlapping genes and communities, as well as to explore cancer formation and progression. RESULTS We analyzed six cancer datasets from The Cancer Genome Atlas and obtained three kinds of gene functional modules for each cancer, including Independent-Community, Dependent-Community and Merged-Community. In the six cancers, 59(3.5%) Independent-Communities were identified, while 1631(96.5%) Dependent-Communities were acquired. Compared with Lemon-Tree and K-Means, the gene communities identified by our method were enriched in more known GO categories with lower p-values. Meanwhile, those identified distinguishing communities can significantly distinguish the survival prognostic of patients by Kaplan-Meier analysis. Furthermore, identified driver genes in the gene communities can be considered as biomarkers which can accurately distinguish the tumour or normal samples for each cancer type. CONCLUSIONS In all identified communities, Dependent-Communities are the majority. Our method is more effective than the other two methods which do not consider the overlapping characteristics of modules. This indicates that overlapping genes are located in different specific functional groups, and a communication bridge is established between the communities to construct a comprehensive carcinogenesis.
Collapse
Affiliation(s)
- Xinguo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China.
| | - Fang Liu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Qiumai Miao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Ping Liu
- Hunan Want Want Hospital, Renmin Zhong Road, Changsha, 410006, China
| | - Yan Gao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| | - Keren He
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Road, Changsha, 410082, China
| |
Collapse
|
28
|
multiSLIDE is a web server for exploring connected elements of biological pathways in multi-omics data. Nat Commun 2021; 12:2279. [PMID: 33863886 PMCID: PMC8052434 DOI: 10.1038/s41467-021-22650-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/24/2021] [Indexed: 12/12/2022] Open
Abstract
Quantitative multi-omics data are difficult to interpret and visualize due to large volume of data, complexity among data features, and heterogeneity of information represented by different omics platforms. Here, we present multiSLIDE, a web-based interactive tool for the simultaneous visualization of interconnected molecular features in heatmaps of multi-omics data sets. multiSLIDE visualizes biologically connected molecular features by keyword search of pathways or genes, offering convenient functionalities to query, rearrange, filter, and cluster data on a web browser in real time. Various querying mechanisms make it adaptable to diverse omics types, and visualizations are customizable. We demonstrate the versatility of multiSLIDE through three examples, showcasing its applicability to a wide range of multi-omics data sets, by allowing users to visualize established links between molecules from different omics data, as well as incorporate custom inter-molecular relationship information into the visualization. Online and stand-alone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE. The integration and interpretation of different omics data types is an ongoing challenge for biologists. Here, the authors present a web-based, interactive tool called multiSLIDE for the visualization of protein, phosphoprotein, and RNA data presented as interlinked heatmaps.
Collapse
|
29
|
Kogelman LJA, Falkenberg K, Buil A, Erola P, Courraud J, Laursen SS, Michoel T, Olesen J, Hansen TF. Changes in the gene expression profile during spontaneous migraine attacks. Sci Rep 2021; 11:8294. [PMID: 33859262 PMCID: PMC8050061 DOI: 10.1038/s41598-021-87503-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 03/23/2021] [Indexed: 12/15/2022] Open
Abstract
Migraine attacks are delimited, allowing investigation of changes during and outside attack. Gene expression fluctuates according to environmental and endogenous events and therefore, we hypothesized that changes in RNA expression during and outside a spontaneous migraine attack exist which are specific to migraine. Twenty-seven migraine patients were assessed during a spontaneous migraine attack, including headache characteristics and treatment effect. Blood samples were taken during attack, two hours after treatment, on a headache-free day and after a cold pressor test. RNA-Sequencing, genotyping, and steroid profiling were performed. RNA-Sequences were analyzed at gene level (differential expression analysis) and at network level, and genomic and transcriptomic data were integrated. We found 29 differentially expressed genes between 'attack' and 'after treatment', after subtracting non-migraine specific genes, that were functioning in fatty acid oxidation, signaling pathways and immune-related pathways. Network analysis revealed mechanisms affected by changes in gene interactions, e.g. 'ion transmembrane transport'. Integration of genomic and transcriptomic data revealed pathways related to sumatriptan treatment, i.e. '5HT1 type receptor mediated signaling pathway'. In conclusion, we uniquely investigated intra-individual changes in gene expression during a migraine attack. We revealed both genes and pathways potentially involved in the pathophysiology of migraine and/or migraine treatment.
Collapse
Affiliation(s)
- Lisette J A Kogelman
- Danish Headache Center, Department of Neurology, Rigshospitalet Glostrup, Glostrup, Denmark.
| | - Katrine Falkenberg
- Danish Headache Center, Department of Neurology, Rigshospitalet Glostrup, Glostrup, Denmark
| | - Alfonso Buil
- Institute for Biological Psychiatry, Mental Health Center Sct. Hans, Roskilde, Denmark
| | - Pau Erola
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Julie Courraud
- Department of Clinical Biochemistry and Immunology, Statens Serum Institute Copenhagen, Copenhagen, Denmark
| | - Susan Svane Laursen
- Department of Clinical Biochemistry and Immunology, Statens Serum Institute Copenhagen, Copenhagen, Denmark
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Jes Olesen
- Danish Headache Center, Department of Neurology, Rigshospitalet Glostrup, Glostrup, Denmark
| | - Thomas F Hansen
- Danish Headache Center, Department of Neurology, Rigshospitalet Glostrup, Glostrup, Denmark.
- Institute for Biological Psychiatry, Mental Health Center Sct. Hans, Roskilde, Denmark.
- Novo Nordisk Foundation Centre for Protein Research, Copenhagen University, Copenhagen, Denmark.
| |
Collapse
|
30
|
A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning. Biomolecules 2021; 11:biom11040565. [PMID: 33921457 PMCID: PMC8070530 DOI: 10.3390/biom11040565] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/02/2021] [Accepted: 04/07/2021] [Indexed: 02/06/2023] Open
Abstract
Although the incidence of central nervous system (CNS) cancers is not high, it significantly reduces a patient’s quality of life and results in high mortality rates. A low incidence also means a low number of cases, which in turn means a low amount of information. To compensate, researchers have tried to increase the amount of information available from a single test using high-throughput technologies. This approach, referred to as single-omics analysis, has only been partially successful as one type of data may not be able to appropriately describe all the characteristics of a tumor. It is presently unclear what type of data can describe a particular clinical situation. One way to solve this problem is to use multi-omics data. When using many types of data, a selected data type or a combination of them may effectively resolve a clinical question. Hence, we conducted a comprehensive survey of papers in the field of neuro-oncology that used multi-omics data for analysis and found that most of the papers utilized machine learning techniques. This fact shows that it is useful to utilize machine learning techniques in multi-omics analysis. In this review, we discuss the current status of multi-omics analysis in the field of neuro-oncology and the importance of using machine learning techniques.
Collapse
|
31
|
Li Y, Ma L, Wu D, Chen G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief Bioinform 2021; 22:6189773. [PMID: 33778867 DOI: 10.1093/bib/bbab024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Multi-omics allows the systematic understanding of the information flow across different omics layers, while single omics can mainly reflect one aspect of the biological system. The advancement of bulk and single-cell sequencing technologies and related computational methods for multi-omics largely facilitated the development of system biology and precision medicine. Single-cell approaches have the advantage of dissecting cellular dynamics and heterogeneity, whereas traditional bulk technologies are limited to individual/population-level investigation. In this review, we first summarize the technologies for producing bulk and single-cell multi-omics data. Then, we survey the computational approaches for integrative analysis of bulk and single-cell multimodal data, respectively. Moreover, the databases and data storage for multi-omics, as well as the tools for visualizing multimodal data are summarized. We also outline the integration between bulk and single-cell data, and discuss the applications of multi-omics in precision medicine. Finally, we present the challenges and perspectives for multi-omics development.
Collapse
Affiliation(s)
| | - Lu Ma
- China Normal University, China
| | | | | |
Collapse
|
32
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
33
|
Nazarov PV, Kreis S. Integrative approaches for analysis of mRNA and microRNA high-throughput data. Comput Struct Biotechnol J 2021; 19:1154-1162. [PMID: 33680358 PMCID: PMC7895676 DOI: 10.1016/j.csbj.2021.01.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 01/19/2021] [Accepted: 01/20/2021] [Indexed: 12/11/2022] Open
Abstract
Review on tools and databases linking miRNA and its mRNA targetome. Databases show little overlap in miRNA targetome predictions suggesting strong contextual effects. Deconvolution and deep learning approaches are promising new approaches to improve miRNA targetome predictions.
Advanced sequencing technologies such as RNASeq provide the means for production of massive amounts of data, including transcriptome-wide expression levels of coding RNAs (mRNAs) and non-coding RNAs such as miRNAs, lncRNAs, piRNAs and many other RNA species. In silico analysis of datasets, representing only one RNA species is well established and a variety of tools and pipelines are available. However, attaining a more systematic view of how different players come together to regulate the expression of a gene or a group of genes requires a more intricate approach to data analysis. To fully understand complex transcriptional networks, datasets representing different RNA species need to be integrated. In this review, we will focus on miRNAs as key post-transcriptional regulators summarizing current computational approaches for miRNA:target gene prediction as well as new data-driven methods to tackle the problem of comprehensively and accurately dissecting miRNome-targetome interactions.
Collapse
Key Words
- CCA, canonical correlation analysis
- CDS, coding sequence
- CLASH, cross-linking, ligation and sequencing of hybrids
- CLIP, cross-linking immunoprecipitation
- CNN, convolutional neural network
- Data integration
- GO, gene ontology
- ICA, independent component analysis
- Matrix factorization
- NGS, next-generation sequencing
- NMF, non-negative matrix factorization
- PCA, principal component analysis
- RNASeq, high-throughput RNA sequencing
- TDMD, target RNA-directed miRNA degradation
- TF, transcription factors
- Target prediction
- Transcriptomics
- circRNA, circular RNA
- lncRNA, long non-coding RNA
- mRNA, messenger RNA
- miRNA, microRNA
- microRNA
Collapse
Affiliation(s)
- Petr V Nazarov
- Multiomics Data Science Research Group, Department of Oncology & Quantitative Biology Unit, Luxembourg Institute of Health (LIH), Strassen L-1445, Luxembourg
| | - Stephanie Kreis
- Signal Transduction Group, Department of Life Sciences and Medicine, University of Luxembourg, Belvaux L-4367, Luxembourg
| |
Collapse
|
34
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
35
|
Yang L, Zhang Z, Sun Y, Pang S, Yao Q, Lin P, Cheng J, Li J, Ding G, Hui L, Li Y, Li H. Integrative analysis reveals novel driver genes and molecular subclasses of hepatocellular carcinoma. Aging (Albany NY) 2020; 12:23849-23871. [PMID: 33221766 PMCID: PMC7762459 DOI: 10.18632/aging.104047] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 08/25/2020] [Indexed: 01/06/2023]
Abstract
Hepatocellular carcinoma (HCC) is a heterogeneous disease with various genetic and epigenetic abnormalities. Previous studies of HCC driver genes were primarily based on frequency of mutations and copy number alterations. Here, we performed an integrative analysis of genomic and epigenomic data from 377 HCC patients to identify driver genes that regulate gene expression in HCC. This integrative approach has significant advantages over single-platform analyses for identifying cancer drivers. Using this approach, HCC tissues were divided into four subgroups, based on expression of the transcription factor E2F and the mutation status of TP53. HCC tissues with E2F overexpression and TP53 mutation had the highest cell cycle activity, indicating a synergistic effect of E2F and TP53. We found that overexpression of the identified driver genes, stratifin (SFN) and SPP1, correlates with tumor grade and poor survival in HCC and promotes HCC cell proliferation. These findings indicate SFN and SPP1 function as oncogenes in HCC and highlight the important role of enhancers in the regulation of gene expression in HCC.
Collapse
Affiliation(s)
- Liguang Yang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhengtao Zhang
- University of Chinese Academy of Sciences, Beijing 100049, China.,State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yidi Sun
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shichao Pang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qianlan Yao
- Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Ping Lin
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinming Cheng
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jia Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Guohui Ding
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,Anhui Engineering Laboratory for Big Data of Precision Medicine, Anhui 234000, China
| | - Lijian Hui
- State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.,Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200433, China.,Shanghai Center for Bioinformation Technology, Shanghai Academy of Science and Technology, Shanghai 201203, China
| | - Hong Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
36
|
Li P, Ning Y, Wang W, Guo X, Poulet B, Wang X, Wen Y, Han J, Hao J, Liang X, Liu L, Du Y, Cheng B, Cheng S, Zhang L, Ma M, Qi X, Liang C, Wu C, Wang S, Zhao H, Zhao G, Goldring MB, Zhang F, Xu P. The integrative analysis of DNA methylation and mRNA expression profiles confirmed the role of selenocompound metabolism pathway in Kashin-Beck disease. Cell Cycle 2020; 19:2351-2366. [PMID: 32816579 DOI: 10.1080/15384101.2020.1807665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Kashin-Beck disease (KBD) is an endemic chronic osteochondropathy. The etiology of KBD remains unknown. In this study, we conducted an integrative analysis of genome-wide DNA methylation and mRNA expression profiles between KBD and normal controls to identify novel candidate genes and pathways for KBD. Articular cartilage samples from 17 grade III KBD patients and 17 healthy controls were used in this study. DNA methylation profiling of knee cartilage and mRNA expression profile data were obtained from our previous studies. InCroMAP was performed to integrative analysis of genome-wide DNA methylation profiles and mRNA expression profiles. Gene ontology (GO) enrichment analysis was conducted by online DAVID 6.7. The quantitative real-time polymerase chain reaction (qPCR), Western blot, immunohistochemistry (IHC), and lentiviral vector transfection were used to validate one of the identified pathways. We identified 298 common genes (such as COL4A1, HOXA13, TNFAIP6 and TGFBI), 36 GO terms (including collagen function, skeletal system development, growth factor), and 32 KEGG pathways associated with KBD (including Selenocompound metabolism pathway, PI3K-Akt signaling pathway, and TGF-beta signaling pathway). Our results suggest the dysfunction of many genes and pathways implicated in the pathogenesis of KBD, most importantly, both the integrative analysis and in vitro study in KBD cartilage highlighted the importance of selenocompound metabolism pathway in the pathogenesis of KBD for the first time.
Collapse
Affiliation(s)
- Ping Li
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Yujie Ning
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Weizhuo Wang
- Department of Orthopedics, the Second Affiliated Hospital, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Xiong Guo
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Blandine Poulet
- Institute of Ageing and Chronic Diseases, University of Liverpool , Liverpool, UK
| | - Xi Wang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Yan Wen
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Jing Han
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Jingcan Hao
- Cancer Center, The First Affiliated Hospital of Xi'an Jiaotong University , Xi'an, China
| | - Xiao Liang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Li Liu
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Yanan Du
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Bolun Cheng
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Shiqiang Cheng
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Lu Zhang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Mei Ma
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Xin Qi
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Chujun Liang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Cuiyan Wu
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Sen Wang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Hongmou Zhao
- Department of Joint Surgery, The Red Cross Hospital of Xi'an Jiaotong University , Xi'an, China
| | - Guanghui Zhao
- Department of Joint Surgery, The Red Cross Hospital of Xi'an Jiaotong University , Xi'an, China
| | - Mary B Goldring
- Hospital for Special Surgery, Weill College of Medicine of Cornell University , New York, NY, USA
| | - Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases, National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University , Xi'an, China
| | - Peng Xu
- Department of Joint Surgery, The Red Cross Hospital of Xi'an Jiaotong University , Xi'an, China
| |
Collapse
|
37
|
Zandi E, Ayatollahi Mehrgardi A, Esmailizadeh A. Mammary tissue transcriptomic analysis for construction of integrated regulatory networks involved in lactogenesis of Ovis aries. Genomics 2020; 112:4277-4287. [PMID: 32693106 DOI: 10.1016/j.ygeno.2020.07.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/19/2020] [Accepted: 07/13/2020] [Indexed: 10/23/2022]
Abstract
The mammary gland experiences vast changes between the onset of lactation and pregnancy. This remodeling involves different functions such as lactation that is controlled by innumerable regulators and various gene networks which are still not completely understood. MicroRNAs (miRNAs) are one of the important non-coding gene regulators which control an extensive range of biological processes. Thus, exploring miRNAs functions is important for solving gene regulation complexity. The main purpose in the present study is to identify the various gene regulative integrated networks involved in lactation progress in mammary gland. We analyzed ovine mammary tissue data sets which included expression profiles of mRNA (genes) and miRNAs related to six ewes in different days of lactation and nutritional treatments. We combined two different types of information: the network that is module inference by mRNAs (RNA-seq data), miRNAs and transcription factors (TFs) expression matrix and prediction of targets via computational methods. To discover the miRNAs regulatory function, 134 modules were predicted by using gene expression data and 14 TFs and 20 miRNAs were allocated to these predicted modules. By applying this integrated computation-based method, 38 miRNA-modules and 35 TF-module interactions were identified from ovine mammary tissue data during lactogenesis. A lot of these modules were involved in lipid and protein metabolism, as well as steroids and vitamin biosynthesis, which would play key roles in mammary tissue and lactation development. These results present new information about the regulatory procedures at the miRNAs and TF levels throughout lactation.
Collapse
Affiliation(s)
- Elmira Zandi
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran; Yong Researchers Society, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
| | - Ahmad Ayatollahi Mehrgardi
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran.
| |
Collapse
|
38
|
Shi WJ, Zhuang Y, Russell PH, Hobbs BD, Parker MM, Castaldi PJ, Rudra P, Vestal B, Hersh CP, Saba LM, Kechris K. Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics 2020; 35:4336-4343. [PMID: 30957844 DOI: 10.1093/bioinformatics/btz226] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 02/01/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Complex diseases often involve a wide spectrum of phenotypic traits. Better understanding of the biological mechanisms relevant to each trait promotes understanding of the etiology of the disease and the potential for targeted and effective treatment plans. There have been many efforts towards omics data integration and network reconstruction, but limited work has examined the incorporation of relevant (quantitative) phenotypic traits. RESULTS We propose a novel technique, sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype. As a case study, we focus on miRNA-mRNA networks. Through simulations, we demonstrate that SmCCNet has better overall prediction performance compared to popular gene expression network construction and integration approaches under realistic settings. Applying SmCCNet to studies on chronic obstructive pulmonary disease (COPD) and breast cancer, we found enrichment of known relevant pathways (e.g. the Cadherin pathway for COPD and the interferon-gamma signaling pathway for breast cancer) as well as less known omics features that may be important to the diseases. Although those applications focus on miRNA-mRNA co-expression networks, SmCCNet is applicable to a variety of omics and other data types. It can also be easily generalized to incorporate multiple quantitative phenotype simultaneously. The versatility of SmCCNet suggests great potential of the approach in many areas. AVAILABILITY AND IMPLEMENTATION The SmCCNet algorithm is written in R, and is freely available on the web at https://cran.r-project.org/web/packages/SmCCNet/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- W Jenny Shi
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Pamela H Russell
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pratyaydipta Rudra
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Statistics, Oklahoma State University, Stillwater, OK
| | - Brian Vestal
- Center for Genes, Environment & Health, National Jewish Health, Denver, CO, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Laura M Saba
- Department of Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
39
|
Nicora G, Vitali F, Dagliati A, Geifman N, Bellazzi R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front Oncol 2020; 10:1030. [PMID: 32695678 PMCID: PMC7338582 DOI: 10.3389/fonc.2020.01030] [Citation(s) in RCA: 109] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/26/2020] [Indexed: 12/16/2022] Open
Abstract
In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.
Collapse
Affiliation(s)
- Giovanna Nicora
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Francesca Vitali
- Center for Innovation in Brain Science, University of Arizona, Tucson, AZ, United States.,Department of Neurology, College of Medicine, University of Arizona, Tucson, AZ, United States.,Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, United States
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.,Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Nophar Geifman
- Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
40
|
Chen T, Tyagi S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 2020; 9:giaa064. [PMID: 32543653 PMCID: PMC7297091 DOI: 10.1093/gigascience/giaa064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/25/2020] [Accepted: 05/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. RESULTS In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. CONCLUSIONS A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Collapse
Affiliation(s)
- Tyrone Chen
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Sonika Tyagi
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
41
|
John A, Qin B, Kalari KR, Wang L, Yu J. Patient-specific multi-omics models and the application in personalized combination therapy. Future Oncol 2020; 16:1737-1750. [PMID: 32462937 DOI: 10.2217/fon-2020-0119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The rapid advancement of high-throughput technologies and sharp decrease in cost have opened up the possibility to generate large amount of multi-omics data on an individual basis. The development of high-throughput -omics, including genomics, epigenomics, transcriptomics, proteomics, metabolomics and microbiomics, enables the application of multi-omics technologies in the clinical settings. Combination therapy, defined as disease treatment with two or more drugs to achieve efficacy with lower doses or lower drug toxicity, is the basis for the care of diseases like cancer. Patient-specific multi-omics data integration can help the identification and development of combination therapies. In this review, we provide an overview of different -omics platforms, and discuss the methods for multi-omics, high-throughput, data integration, personalized combination therapy.
Collapse
Affiliation(s)
- August John
- Mayo Clinic Graduate School of Biomedical Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Bo Qin
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA.,Gastroenterology Research Unit, Mayo Clinic, Rochester, MN 55905, USA.,Department of Oncology, Mayo Clinic, Rochester, MN 55905, USA
| | - Krishna R Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Liewei Wang
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Jia Yu
- Department of Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
42
|
Montagud A, Traynard P, Martignetti L, Bonnet E, Barillot E, Zinovyev A, Calzone L. Conceptual and computational framework for logical modelling of biological networks deregulated in diseases. Brief Bioinform 2020; 20:1238-1249. [PMID: 29237040 DOI: 10.1093/bib/bbx163] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 10/24/2017] [Indexed: 01/02/2023] Open
Abstract
Mathematical models can serve as a tool to formalize biological knowledge from diverse sources, to investigate biological questions in a formal way, to test experimental hypotheses, to predict the effect of perturbations and to identify underlying mechanisms. We present a pipeline of computational tools that performs a series of analyses to explore a logical model's properties. A logical model of initiation of the metastatic process in cancer is used as a transversal example. We start by analysing the structure of the interaction network constructed from the literature or existing databases. Next, we show how to translate this network into a mathematical object, specifically a logical model, and how robustness analyses can be applied to it. We explore the visualization of the stable states, defined as specific attractors of the model, and match them to cellular fates or biological read-outs. With the different tools we present here, we explain how to assign to each solution of the model a probability and how to identify genetic interactions using mutant phenotype probabilities. Finally, we connect the model to relevant experimental data: we present how some data analyses can direct the construction of the network, and how the solutions of a mathematical model can also be compared with experimental data, with a particular focus on high-throughput data in cancer biology. A step-by-step tutorial is provided as a Supplementary Material and all models, tools and scripts are provided on an accompanying website: https://github.com/sysbio-curie/Logical_modelling_pipeline.
Collapse
|
43
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
44
|
Pournoor E, Mousavian Z, Dalini AN, Masoudi-Nejad A. Identification of Key Components in Colon Adenocarcinoma Using Transcriptome to Interactome Multilayer Framework. Sci Rep 2020; 10:4991. [PMID: 32193399 PMCID: PMC7081269 DOI: 10.1038/s41598-020-59605-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 01/31/2020] [Indexed: 12/21/2022] Open
Abstract
Complexity of cascading interrelations between molecular cell components at different levels from genome to metabolome ordains a massive difficulty in comprehending biological happenings. However, considering these complications in the systematic modelings will result in realistic and reliable outputs. The multilayer networks approach is a relatively innovative concept that could be applied for multiple omics datasets as an integrative methodology to overcome heterogeneity difficulties. Herein, we employed the multilayer framework to rehabilitate colon adenocarcinoma network by observing co-expression correlations, regulatory relations, and physical binding interactions. Hub nodes in this three-layer network were selected using a heterogeneous random walk with random jump procedure. We exploited local composite modules around the hub nodes having high overlay with cancer-specific pathways, and investigated their genes showing a different expressional pattern in the tumor progression. These genes were examined for survival effects on the patient's lifespan, and those with significant impacts were selected as potential candidate biomarkers. Results suggest that identified genes indicate noteworthy importance in the carcinogenesis of the colon.
Collapse
Affiliation(s)
- Ehsan Pournoor
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Zaynab Mousavian
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Abbas Nowzari Dalini
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
45
|
Linder H, Zhang Y. A pan-cancer integrative pathway analysis of multi-omics data. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-019-0185-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
Erola P, Björkegren JLM, Michoel T. Model-based clustering of multi-tissue gene expression data. Bioinformatics 2020; 36:1807-1813. [PMID: 31688915 PMCID: PMC7162352 DOI: 10.1093/bioinformatics/btz805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 09/05/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre (ICMC), Karolinska Institutet, Huddinge 141 57, Sweden
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen N-5020, Norway
| |
Collapse
|
47
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 215] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
48
|
Koh HWL, Fermin D, Vogel C, Choi KP, Ewing RM, Choi H. iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery. NPJ Syst Biol Appl 2019; 5:22. [PMID: 31312515 PMCID: PMC6616462 DOI: 10.1038/s41540-019-0099-y] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 06/14/2019] [Indexed: 12/15/2022] Open
Abstract
Computational tools for multiomics data integration have usually been designed for unsupervised detection of multiomics features explaining large phenotypic variations. To achieve this, some approaches extract latent signals in heterogeneous data sets from a joint statistical error model, while others use biological networks to propagate differential expression signals and find consensus signatures. However, few approaches directly consider molecular interaction as a data feature, the essential linker between different omics data sets. The increasing availability of genome-scale interactome data connecting different molecular levels motivates a new class of methods to extract interactive signals from multiomics data. Here we developed iOmicsPASS, a tool to search for predictive subnetworks consisting of molecular interactions within and between related omics data types in a supervised analysis setting. Based on user-provided network data and relevant omics data sets, iOmicsPASS computes a score for each molecular interaction, and applies a modified nearest shrunken centroid algorithm to the scores to select densely connected subnetworks that can accurately predict each phenotypic group. iOmicsPASS detects a sparse set of predictive molecular interactions without loss of prediction accuracy compared to alternative methods, and the selected network signature immediately provides mechanistic interpretation of the multiomics profile representing each sample group. Extensive simulation studies demonstrate clear benefit of interaction-level modeling. iOmicsPASS analysis of TCGA/CPTAC breast cancer data also highlights new transcriptional regulatory network underlying the basal-like subtype as positive protein markers, a result not seen through analysis of individual omics data.
Collapse
Affiliation(s)
- Hiromi W. L. Koh
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Damian Fermin
- University of Michigan Medical School, Ann Arbor, MI USA
| | - Christine Vogel
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003 USA
| | - Kwok Pui Choi
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Rob M. Ewing
- School of Biological Sciences, University of Southampton, Southampton, UK
| | - Hyungwon Choi
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
- Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore, Singapore
| |
Collapse
|
49
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
50
|
Wang Q, Peng WX, Wang L, Ye L. Toward multiomics-based next-generation diagnostics for precision medicine. Per Med 2019; 16:157-170. [PMID: 30816060 DOI: 10.2217/pme-2018-0085] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Our healthcare system is experiencing a paradigm shift to precision medicine, aiming at an early prediction of individual disease risks and targeted interventions. Whole-genome sequencing is currently gaining momentum, as it has the potential to capture all classes of genetic variation, thus providing a more complete picture of the individual's genetic makeup, which could be utilized in genetic testing; however, this will also lead to difficulties in interpreting the test results, necessitating careful integration of genomic data with other layers of information, both molecular multiomics measurements of epigenome, transcriptome, proteome, metabolome and even microbiome, as well as comprehensive information on diet, lifestyle and environment. Overall, the translation of patient-specific data into actionable diagnostic tools will be a challenging task, requiring expertise from multiple disciplines, secure data sharing in large reference databases and a strong computational infrastructure.
Collapse
Affiliation(s)
- Qi Wang
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Wei-Xian Peng
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Lu Wang
- Department of Emergency Medicine, Hangzhou Hospital of Traditional Chinese Medicine, Hangzhou 310007, Zhejiang Province, China
| | - Li Ye
- Department of Nursing, Tongde Hospital of Zhejiang Province, Hangzhou 310012, Zhejiang Province, China
| |
Collapse
|