1
|
Li M, Cai Y, Zhang M, Deng S, Wang L. NNBGWO-BRCA marker: Neural Network and binary grey wolf optimization based Breast cancer biomarker discovery framework using multi-omics dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108291. [PMID: 38909399 DOI: 10.1016/j.cmpb.2024.108291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/09/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is a multifaceted condition characterized by diverse features and a substantial mortality rate, underscoring the imperative for timely detection and intervention. The utilization of multi-omics data has gained significant traction in recent years to identify biomarkers and classify subtypes in breast cancer. This kind of research idea from part to whole will also be an inevitable trend in future life science research. Deep learning can integrate and analyze multi-omics data to predict cancer subtypes, which can further drive targeted therapies. However, there are few articles leveraging the nature of deep learning for feature selection. Therefore, this paper proposes a Neural Network and Binary grey Wolf Optimization based BReast CAncer bioMarker (NNBGWO-BRCAMarker) discovery framework using multi-omics data to obtain a series of biomarkers for precise classification of breast cancer subtypes. METHODS NNBGWO-BRCAMarker consists of two phases: in the first phase, relevant genes are selected using the weights obtained from a trained feedforward neural network; in the second phase, the binary grey wolf optimization algorithm is leveraged to further screen the selected genes, resulting in a set of potential breast cancer biomarkers. RESULTS The SVM classifier with RBF kernel achieved a classification accuracy of 0.9242 ± 0.03 when trained using the 80 biomarkers identified by NNBGWO-BRCAMarker, as evidenced by the experimental results. We conducted a comprehensive gene set analysis, prognostic analysis, and druggability analysis, unveiling 25 druggable genes, 16 enriched pathways strongly linked to specific subtypes of breast cancer, and 8 genes linked to prognostic outcomes. CONCLUSIONS The proposed framework successfully identified 80 biomarkers from the multi-omics data, enabling accurate classification of breast cancer subtypes. This discovery may offer novel insights for clinicians to pursue in further studies.
Collapse
Affiliation(s)
- Min Li
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China.
| | - Yuheng Cai
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Mingzhuang Zhang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Shaobo Deng
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Lei Wang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| |
Collapse
|
2
|
Zhang H, Huang D, Chen E, Cao D, Xu T, Dizdar B, Li G, Chen Y, Payne P, Province M, Li F. mosGraphGPT: a foundation model for multi-omic signaling graphs using generative AI. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.01.606222. [PMID: 39149314 PMCID: PMC11326168 DOI: 10.1101/2024.08.01.606222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Generative pretrained models represent a significant advancement in natural language processing and computer vision, which can generate coherent and contextually relevant content based on the pre-training on large general datasets and fine-tune for specific tasks. Building foundation models using large scale omic data is promising to decode and understand the complex signaling language patterns within cells. Different from existing foundation models of omic data, we build a foundation model, mosGraphGPT, for multi-omic signaling (mos) graphs, in which the multi-omic data was integrated and interpreted using a multi-level signaling graph. The model was pretrained using multi-omic data of cancers in The Cancer Genome Atlas (TCGA), and fine-turned for multi-omic data of Alzheimer's Disease (AD). The experimental evaluation results showed that the model can not only improve the disease classification accuracy, but also is interpretable by uncovering disease targets and signaling interactions. And the model code are uploaded via GitHub with link: https://github.com/mosGraph/mosGraphGPT.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | - Di Huang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | - Emily Chen
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- School of Arts and Sciences, University of Rochester, Rochester, NY, 14627, USA
| | - Dekang Cao
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Tim Xu
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Ben Dizdar
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, CT, 06032, USA
| | - Yixin Chen
- Department of Computer Science and Engineering
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | | | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
3
|
Shi C, Cheng L, Yu Y, Chen S, Dai Y, Yang J, Zhang H, Chen J, Geng N. Multi-omics integration analysis: Tools and applications in environmental toxicology. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 360:124675. [PMID: 39103035 DOI: 10.1016/j.envpol.2024.124675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 08/03/2024] [Indexed: 08/07/2024]
Abstract
Nowadays, traditional single-omics study is not enough to explain the causality between molecular alterations and toxicity endpoints for environmental pollutants. With the development of high-throughput sequencing technology and high-resolution mass spectrometry technology, the integrative analysis of multi-omics has become an efficient strategy to understand holistic biological mechanisms and to uncover the regulation network in specific biological processes. This review summarized sample preparation methods, integration analysis tools and the application of multi-omics integration analyses in environmental toxicology field. Currently, omics methods have been widely applied being as the sensitivity of early biological response, especially for low-dose and long-term exposure to environmental pollutants. Integrative omics can reveal the overall changes of genes, proteins, and/or metabolites in the cells, tissues or organisms, which provide new insights into revealing the overall toxicity effects, screening the toxic targets, and exploring the underlying molecular mechanism of pollutants.
Collapse
Affiliation(s)
- Chengcheng Shi
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Lin Cheng
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Ying Yu
- College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Shuangshuang Chen
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Yubing Dai
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Jiajia Yang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Materials Science and Engineering, Hebei University of Engineering, Handan, 056038, China
| | - Haijun Zhang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Jiping Chen
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Ningbo Geng
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China.
| |
Collapse
|
4
|
Jeyananthan P. Performance comparison between multi-level gene expression data in cancer subgroup classification. Pathol Res Pract 2024; 260:155419. [PMID: 38955118 DOI: 10.1016/j.prp.2024.155419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/06/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024]
Abstract
Cancer is a serious disease that can affect various parts of the body such as breast, colon, lung or stomach. Each of these cancers has their own treatment dependent historical subgroups. Hence, the correct identification of cancer subgroup has almost same importance as the timely diagnosis of cancer. This is still a challenging task and a system with highest accuracy is essential. Current researches are moving towards analyzing the gene expression data of cancer patients for various purposes including biomarker identification and studying differently expressed genes, using gene expression data measured in a single level (selected from different gene levels including genome, transcriptome or translation). However, previous studies showed that information carried by one level of gene expression is not similar to another level. This shows the importance of integrating multi-level omics data in these studies. Hence, this study uses tumor gene expression data measured from various levels of gene along with the integration of those data in the subgroup classification of nine different cancers. This is a comprehensive analysis where four different gene expression data such as transcriptome, miRNA, methylation and proteome are used in this subgrouping and the performances between models are compared to reveal the best model.
Collapse
|
5
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets. Brief Bioinform 2024; 25:bbae366. [PMID: 39082650 PMCID: PMC11289684 DOI: 10.1093/bib/bbae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/21/2024] [Accepted: 07/18/2024] [Indexed: 08/03/2024] Open
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
Collapse
Affiliation(s)
- Zeyu Lu
- Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
| | - Qiang Zheng
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Xinlei Wang
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
- Department of Mathematics, University of Texas at Arlington, 411 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, United States
| |
Collapse
|
6
|
Yang H, Zhao L, Li D, An C, Fang X, Chen Y, Liu J, Xiao T, Wang Z. Subtype-WGME enables whole-genome-wide multi-omics cancer subtyping. CELL REPORTS METHODS 2024; 4:100781. [PMID: 38761803 PMCID: PMC11228280 DOI: 10.1016/j.crmeth.2024.100781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/05/2024] [Accepted: 04/26/2024] [Indexed: 05/20/2024]
Abstract
We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liang Zhao
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Congcong An
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Xiaoyang Fang
- Cornell Tech, Cornell University, New York, NY 14853, USA
| | - Yiwen Chen
- Center for Continuing and Lifelong Education, National University of Singapore, Singapore 119077, Singapore
| | - Jingping Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Ting Xiao
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
7
|
Esquivel Gaytan A, Bomer N, Grote Beverborg N, van der Meer P. 404-error "Disease not found": Unleashing the translational potential of -omics approaches beyond traditional disease classification in heart failure research. Eur J Heart Fail 2024; 26:1313-1323. [PMID: 38741225 DOI: 10.1002/ejhf.3268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/15/2024] [Accepted: 04/14/2024] [Indexed: 05/16/2024] Open
Abstract
The emergence of personalized medicine, facilitated by the progress in -omics technologies, has initiated a new era in medical diagnostics and treatment. This review examines the potential of -omics approaches in heart failure, a condition that has not yet fully capitalized on personalized strategies compared to other medical fields like cancer therapy. Here, we argue that integrating multi-omics technology with systems medicine approaches could fundamentally transform heart failure management, moving away from the traditional paradigm of 'one size fits all'. Our review examines how omics can enhance understanding of heart failure's molecular foundations and contribute to a more comprehensive disease classification. We draw attention to the current state of medical practice that only relies on clinical evidence and a number of standard laboratory tests. At the same time, we propose a shift towards a universal approach that uses quantitative data from multi-omics to unravel complex molecular interactions. The discussion centres around the potential of the transition as a means to enhance individual risk assessment and emphasizes management within clinical settings. While the use of omics in cardiovascular research is not recent, many past studies have focused only on a single omics approach. In order to achieve a better understanding of disease mechanisms, we explore more holistic approaches using genomics, transcriptomics, epigenomics, and proteomics. This review concludes with a call to action to adopt multi-omics in clinical trials and practice to pave the way for more personalized disease management and more effective heart failure interventions.
Collapse
Affiliation(s)
- Antonio Esquivel Gaytan
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Nils Bomer
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Niels Grote Beverborg
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Peter van der Meer
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
8
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
9
|
Sibilio P, Conte F, Huang Y, Castaldi PJ, Hersh CP, DeMeo DL, Silverman EK, Paci P. Correlation-based network integration of lung RNA sequencing and DNA methylation data in chronic obstructive pulmonary disease. Heliyon 2024; 10:e31301. [PMID: 38807864 PMCID: PMC11130701 DOI: 10.1016/j.heliyon.2024.e31301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/08/2024] [Accepted: 05/14/2024] [Indexed: 05/30/2024] Open
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous, chronic inflammatory process of the lungs and, like other complex diseases, is caused by both genetic and environmental factors. Detailed understanding of the molecular mechanisms of complex diseases requires the study of the interplay among different biomolecular layers, and thus the integration of different omics data types. In this study, we investigated COPD-associated molecular mechanisms through a correlation-based network integration of lung tissue RNA-seq and DNA methylation data of COPD cases (n = 446) and controls (n = 346) derived from the Lung Tissue Research Consortium. First, we performed a SWIM-network based analysis to build separate correlation networks for RNA-seq and DNA methylation data for our case-control study population. Then, we developed a method to integrate the results into a coupled network of differentially expressed and differentially methylated genes to investigate their relationships across both molecular layers. The functional enrichment analysis of the nodes of the coupled network revealed a strikingly significant enrichment in Immune System components, both innate and adaptive, as well as immune-system component communication (interleukin and cytokine-cytokine signaling). Our analysis allowed us to reveal novel putative COPD-associated genes and to analyze their relationships, both at the transcriptomics and epigenomics levels, thus contributing to an improved understanding of COPD pathogenesis.
Collapse
Affiliation(s)
- Pasquale Sibilio
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Federica Conte
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Yichen Huang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
- Karolinska Institutet, 17177, Stockholm, Sweden
| |
Collapse
|
10
|
Taunk K, Jajula S, Bhavsar PP, Choudhari M, Bhanuse S, Tamhankar A, Naiya T, Kalita B, Rapole S. The prowess of metabolomics in cancer research: current trends, challenges and future perspectives. Mol Cell Biochem 2024:10.1007/s11010-024-05041-w. [PMID: 38814423 DOI: 10.1007/s11010-024-05041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 05/18/2024] [Indexed: 05/31/2024]
Abstract
Cancer due to its heterogeneous nature and large prevalence has tremendous socioeconomic impacts on populations across the world. Therefore, it is crucial to discover effective panels of biomarkers for diagnosing cancer at an early stage. Cancer leads to alterations in cell growth and differentiation at the molecular level, some of which are very unique. Therefore, comprehending these alterations can aid in a better understanding of the disease pathology and identification of the biomolecules that can serve as effective biomarkers for cancer diagnosis. Metabolites, among other biomolecules of interest, play a key role in the pathophysiology of cancer whose levels are significantly altered while 'reprogramming the energy metabolism', a cellular condition favored in cancer cells which is one of the hallmarks of cancer. Metabolomics, an emerging omics technology has tremendous potential to contribute towards the goal of investigating cancer metabolites or the metabolic alterations during the development of cancer. Diverse metabolites can be screened in a variety of biofluids, and tumor tissues sampled from cancer patients against healthy controls to capture the altered metabolism. In this review, we provide an overview of different metabolomics approaches employed in cancer research and the potential of metabolites as biomarkers for cancer diagnosis. In addition, we discuss the challenges associated with metabolomics-driven cancer research and gaze upon the prospects of this emerging field.
Collapse
Affiliation(s)
- Khushman Taunk
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, West Bengal, NH12 Simhat, Haringhata, Nadia, West Bengal, 741249, India
| | - Saikiran Jajula
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Praneeta Pradip Bhavsar
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Mahima Choudhari
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Sadanand Bhanuse
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Anup Tamhankar
- Department of Surgical Oncology, Deenanath Mangeshkar Hospital and Research Centre, Erandawne, Pune, Maharashtra, 411004, India
| | - Tufan Naiya
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, West Bengal, NH12 Simhat, Haringhata, Nadia, West Bengal, 741249, India
| | - Bhargab Kalita
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India.
- Amrita School of Nanosciences and Molecular Medicine, Amrita Institute of Medical Sciences and Research Centre, Amrita Vishwa Vidyapeetham, Ponekkara, Kochi, Kerala, 682041, India.
| | - Srikanth Rapole
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India.
| |
Collapse
|
11
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
12
|
Banerjee J, Tiwari AK, Banerjee S. Drug repurposing for cancer. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 207:123-150. [PMID: 38942535 DOI: 10.1016/bs.pmbts.2024.03.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the dynamic landscape of cancer therapeutics, the innovative strategy of drug repurposing emerges as a transformative paradigm, heralding a new era in the fight against malignancies. This book chapter aims to embark on the comprehension of the strategic deployment of approved drugs for repurposing and the meticulous journey of drug repurposing from earlier times to the current era. Moreover, the chapter underscores the multifaceted and complex nature of cancer biology, and the evolving field of cancer drug therapeutics while emphasizing the mandate of drug repurposing to advance cancer therapeutics. Importantly, the narrative explores the latest tools, technologies, and cutting-edge methodologies including high-throughput screening, omics technologies, and artificial intelligence-driven approaches, for shaping and accelerating the pace of drug repurposing to uncover novel cancer therapeutic avenues. The chapter critically assesses the breakthroughs, expanding the repertoire of repurposing drug candidates in cancer, and their major categories. Another focal point of this book chapter is that it addresses the emergence of combination therapies involving repurposed drugs, reflecting a shift towards personalized and synergistic treatment approaches. The expert analysis delves into the intricacies of combinatorial regimens, elucidating their potential to target heterogeneous cancer populations and overcome resistance mechanisms, thereby enhancing treatment efficacy. Therefore, this chapter provides in-depth insights into the potential of repurposing towards bringing the much-needed big leap in the field of cancer therapeutics.
Collapse
Affiliation(s)
- Juni Banerjee
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India
| | - Anand Krishna Tiwari
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India
| | - Shuvomoy Banerjee
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India.
| |
Collapse
|
13
|
Drouard G, Mykkänen J, Heiskanen J, Pohjonen J, Ruohonen S, Pahkala K, Lehtimäki T, Wang X, Ollikainen M, Ripatti S, Pirinen M, Raitakari O, Kaprio J. Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data. BMC Med Inform Decis Mak 2024; 24:116. [PMID: 38698395 PMCID: PMC11064347 DOI: 10.1186/s12911-024-02521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/29/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. METHODS We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. RESULTS Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. CONCLUSIONS By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Jarkko Heiskanen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Joona Pohjonen
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Saku Ruohonen
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Katja Pahkala
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Paavo Nurmi Centre & Unit for Health and Physical Activity, University of Turku, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
14
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
15
|
Ewald JD, Zhou G, Lu Y, Kolic J, Ellis C, Johnson JD, Macdonald PE, Xia J. Web-based multi-omics integration using the Analyst software suite. Nat Protoc 2024; 19:1467-1497. [PMID: 38355833 DOI: 10.1038/s41596-023-00950-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/21/2023] [Indexed: 02/16/2024]
Abstract
The growing number of multi-omics studies demands clear conceptual workflows coupled with easy-to-use software tools to facilitate data analysis and interpretation. This protocol covers three key components involved in multi-omics analysis, including single-omics data analysis, knowledge-driven integration using biological networks and data-driven integration through joint dimensionality reduction. Using the dataset from a recent multi-omics study of human pancreatic islet tissue and plasma samples, the first section introduces how to perform transcriptomics/proteomics data analysis using ExpressAnalyst and lipidomics data analysis using MetaboAnalyst. On the basis of significant features detected in these workflows, the second section demonstrates how to perform knowledge-driven integration using OmicsNet. The last section illustrates how to perform data-driven integration from the normalized omics data and metadata using OmicsAnalyst. The complete protocol can be executed in ~2 h. Compared with other available options for multi-omics integration, the Analyst software suite described in this protocol enables researchers to perform a wide range of omics data analysis tasks via a user-friendly web interface.
Collapse
Affiliation(s)
- Jessica D Ewald
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| | - Jelena Kolic
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cara Ellis
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - James D Johnson
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick E Macdonald
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
16
|
Williams A. Multiomics data integration, limitations, and prospects to reveal the metabolic activity of the coral holobiont. FEMS Microbiol Ecol 2024; 100:fiae058. [PMID: 38653719 PMCID: PMC11067971 DOI: 10.1093/femsec/fiae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/25/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2024] Open
Abstract
Since their radiation in the Middle Triassic period ∼240 million years ago, stony corals have survived past climate fluctuations and five mass extinctions. Their long-term survival underscores the inherent resilience of corals, particularly when considering the nutrient-poor marine environments in which they have thrived. However, coral bleaching has emerged as a global threat to coral survival, requiring rapid advancements in coral research to understand holobiont stress responses and allow for interventions before extensive bleaching occurs. This review encompasses the potential, as well as the limits, of multiomics data applications when applied to the coral holobiont. Synopses for how different omics tools have been applied to date and their current restrictions are discussed, in addition to ways these restrictions may be overcome, such as recruiting new technology to studies, utilizing novel bioinformatics approaches, and generally integrating omics data. Lastly, this review presents considerations for the design of holobiont multiomics studies to support lab-to-field advancements of coral stress marker monitoring systems. Although much of the bleaching mechanism has eluded investigation to date, multiomic studies have already produced key findings regarding the holobiont's stress response, and have the potential to advance the field further.
Collapse
Affiliation(s)
- Amanda Williams
- Microbial Biology Graduate Program, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
| |
Collapse
|
17
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2024:10.1007/s12033-024-01133-6. [PMID: 38565775 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
18
|
Lundy DJ, Szomolay B, Liao CT. Systems Approaches to Cell Culture-Derived Extracellular Vesicles for Acute Kidney Injury Therapy: Prospects and Challenges. FUNCTION 2024; 5:zqae012. [PMID: 38706963 PMCID: PMC11065115 DOI: 10.1093/function/zqae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/02/2024] [Accepted: 03/05/2024] [Indexed: 05/07/2024] Open
Abstract
Acute kidney injury (AKI) is a heterogeneous syndrome, comprising diverse etiologies of kidney insults that result in high mortality and morbidity if not well managed. Although great efforts have been made to investigate underlying pathogenic mechanisms of AKI, there are limited therapeutic strategies available. Extracellular vesicles (EV) are membrane-bound vesicles secreted by various cell types, which can serve as cell-free therapy through transfer of bioactive molecules. In this review, we first overview the AKI syndrome and EV biology, with a particular focus on the technical aspects and therapeutic application of cell culture-derived EVs. Second, we illustrate how multi-omic approaches to EV miRNA, protein, and genomic cargo analysis can yield new insights into their mechanisms of action and address unresolved questions in the field. We then summarize major experimental evidence regarding the therapeutic potential of EVs in AKI, which we subdivide into stem cell and non-stem cell-derived EVs. Finally, we highlight the challenges and opportunities related to the clinical translation of animal studies into human patients.
Collapse
Affiliation(s)
- David J Lundy
- Graduate Institute of Biomedical Materials & Tissue Engineering, Taipei Medical University, Taipei 235603, Taiwan
- International PhD Program in Biomedical Engineering, Taipei Medical University, Taipei 235603, Taiwan
- Center for Cell Therapy, Taipei Medical University Hospital, Taipei 110301, Taiwan
| | - Barbara Szomolay
- Systems Immunity Research Institute, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
- Division of Infection and Immunity, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
| | - Chia-Te Liao
- Division of Nephrology, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan
- Division of Nephrology, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- Research Center of Urology and Kidney, Taipei Medical University, Taipei 110, Taiwan
| |
Collapse
|
19
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578316. [PMID: 38562775 PMCID: PMC10983863 DOI: 10.1101/2024.02.01.578316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement. Key points An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
Collapse
|
20
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RPJ, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. PLoS Comput Biol 2024; 20:e1011814. [PMID: 38527092 PMCID: PMC10994553 DOI: 10.1371/journal.pcbi.1011814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/04/2024] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. PathIntegrate is available as an open-source Python package.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, Denver, Colorado, United States of America
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Rachel PJ Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
21
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
22
|
Cai Y, Wang S. Deeply integrating latent consistent representations in high-noise multi-omics data for cancer subtyping. Brief Bioinform 2024; 25:bbae061. [PMID: 38426322 PMCID: PMC10939425 DOI: 10.1093/bib/bbae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/13/2024] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
23
|
Mardoc E, Sow MD, Déjean S, Salse J. Genomic data integration tutorial, a plant case study. BMC Genomics 2024; 25:66. [PMID: 38233804 PMCID: PMC10792847 DOI: 10.1186/s12864-023-09833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/22/2023] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. RESULTS To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. CONCLUSION The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
Collapse
Affiliation(s)
- Emile Mardoc
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Mamadou Dia Sow
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, Université Paul Sabatier, Toulouse, France
| | - Jérôme Salse
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
24
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
25
|
Amente LD, Mills NT, Le TD, Hyppönen E, Lee SH. Unraveling phenotypic variance in metabolic syndrome through multi-omics. Hum Genet 2024; 143:35-47. [PMID: 38095720 DOI: 10.1007/s00439-023-02619-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/18/2023] [Indexed: 01/19/2024]
Abstract
Complex multi-omics effects drive the clustering of cardiometabolic risk factors, underscoring the imperative to comprehend how individual and combined omics shape phenotypic variation. Our study partitions phenotypic variance in metabolic syndrome (MetS), blood glucose (GLU), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and blood pressure through genome, transcriptome, metabolome, and exposome (i.e., lifestyle exposome) analyses. Our analysis included a cohort of 62,822 unrelated individuals with white British ancestry, sourced from the UK biobank. We employed linear mixed models to partition phenotypic variance using the restricted maximum likelihood (REML) method, implemented in MTG2 (v2.22). We initiated the analysis by individually modeling omics, followed by subsequent integration of pairwise omics in a joint model that also accounted for the covariance and interaction between omics layers. Finally, we estimated the correlations of various omics effects between the phenotypes using bivariate REML. Significant proportions of the MetS variance were attributed to distinct data sources: genome (9.47%), transcriptome (4.24%), metabolome (14.34%), and exposome (3.77%). The phenotypic variances explained by the genome, transcriptome, metabolome, and exposome ranged from 3.28% for GLU to 25.35% for HDL-C, 0% for GLU to 19.34% for HDL-C, 4.29% for systolic blood pressure (SBP) to 35.75% for TG, and 0.89% for GLU to 10.17% for HDL-C, respectively. Significant correlations were found between genomic and transcriptomic effects for TG and HDL-C. Furthermore, significant interaction effects between omics data were detected for both MetS and its components. Interestingly, significant correlation of omics effect between the phenotypes was found. This study underscores omics' roles, interaction effects, and random-effects covariance in unveiling phenotypic variation in multi-omics domains.
Collapse
Affiliation(s)
- Lamessa Dube Amente
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.
| | - Natalie T Mills
- Discipline of Psychiatry, University of Adelaide, Adelaide, SA, 5000, Australia
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia
- UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.
| |
Collapse
|
26
|
Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, Porwal O, Fuloria NK. Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer. Curr Drug Deliv 2024; 21:870-886. [PMID: 37670704 DOI: 10.2174/1567201821666230905090621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 09/07/2023]
Abstract
Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Amit Singh
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Sanjana Chauhan
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Pramod Kumar Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Shubham Chaudhary
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Astha Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Omji Porwal
- Department of Pharmacognosy, Faculty of Pharmacy, Tishk International University, Erbil 44001, Iraq
| | | |
Collapse
|
27
|
Na AY, Lee H, Min EK, Paudel S, Choi SY, Sim H, Liu KH, Kim KT, Bae JS, Lee S. Novel Time-dependent Multi-omics Integration in Sepsis-associated Liver Dysfunction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1101-1116. [PMID: 37084954 PMCID: PMC11082264 DOI: 10.1016/j.gpb.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 02/03/2023] [Accepted: 04/11/2023] [Indexed: 04/23/2023]
Abstract
The recently developed technologies that allow the analysis of each single omics have provided an unbiased insight into ongoing disease processes. However, it remains challenging to specify the study design for the subsequent integration strategies that can associate sepsis pathophysiology and clinical outcomes. Here, we conducted a time-dependent multi-omics integration (TDMI) in a sepsis-associated liver dysfunction (SALD) model. We successfully deduced the relation of the Toll-like receptor 4 (TLR4) pathway with SALD. Although TLR4 is a critical factor in sepsis progression, it is not specified in single-omics analyses but only in the TDMI analysis. This finding indicates that the TDMI-based approach is more advantageous than single-omics analyses in terms of exploring the underlying pathophysiological mechanism of SALD. Furthermore, TDMI-based approach can be an ideal paradigm for insightful biological interpretations of multi-omics datasets that will potentially reveal novel insights into basic biology, health, and diseases, thus allowing the identification of promising candidates for therapeutic strategies.
Collapse
Affiliation(s)
- Ann-Yae Na
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Hyojin Lee
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Eun Ki Min
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Sanjita Paudel
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - So Young Choi
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - HyunChae Sim
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Kwang-Hyeon Liu
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Ki-Tae Kim
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Jong-Sup Bae
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Sangkyu Lee
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea; School of Pharmacy, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
28
|
Fernandez ME, Martinez-Romero J, Aon MA, Bernier M, Price NL, de Cabo R. How is Big Data reshaping preclinical aging research? Lab Anim (NY) 2023; 52:289-314. [PMID: 38017182 DOI: 10.1038/s41684-023-01286-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/10/2023] [Indexed: 11/30/2023]
Abstract
The exponential scientific and technological progress during the past 30 years has favored the comprehensive characterization of aging processes with their multivariate nature, leading to the advent of Big Data in preclinical aging research. Spanning from molecular omics to organism-level deep phenotyping, Big Data demands large computational resources for storage and analysis, as well as new analytical tools and conceptual frameworks to gain novel insights leading to discovery. Systems biology has emerged as a paradigm that utilizes Big Data to gain insightful information enabling a better understanding of living organisms, visualized as multilayered networks of interacting molecules, cells, tissues and organs at different spatiotemporal scales. In this framework, where aging, health and disease represent emergent states from an evolving dynamic complex system, context given by, for example, strain, sex and feeding times, becomes paramount for defining the biological trajectory of an organism. Using bioinformatics and artificial intelligence, the systems biology approach is leading to remarkable advances in our understanding of the underlying mechanism of aging biology and assisting in creative experimental study designs in animal models. Future in-depth knowledge acquisition will depend on the ability to fully integrate information from different spatiotemporal scales in organisms, which will probably require the adoption of theories and methods from the field of complex systems. Here we review state-of-the-art approaches in preclinical research, with a focus on rodent models, that are leading to conceptual and/or technical advances in leveraging Big Data to understand basic aging biology and its full translational potential.
Collapse
Affiliation(s)
- Maria Emilia Fernandez
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Jorge Martinez-Romero
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Epidemiology and Population Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Miguel A Aon
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Cardiovascular Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Michel Bernier
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Nathan L Price
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Rafael de Cabo
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| |
Collapse
|
29
|
Li CX, Chen H, Zounemat-Kermani N, Adcock IM, Sköld CM, Zhou M, Wheelock ÅM. Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage. Brief Bioinform 2023; 25:bbad501. [PMID: 38205966 PMCID: PMC10782800 DOI: 10.1093/bib/bbad501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 11/14/2023] [Accepted: 12/01/2023] [Indexed: 01/12/2024] Open
Abstract
Multi-omics data integration is a complex and challenging task in biomedical research. Consensus clustering, also known as meta-clustering or cluster ensembles, has become an increasingly popular downstream tool for phenotyping and endotyping using multiple omics and clinical data. However, current consensus clustering methods typically rely on ensembling clustering outputs with similar sample coverages (mathematical replicates), which may not reflect real-world data with varying sample coverages (biological replicates). To address this issue, we propose a new consensus clustering with missing labels (ccml) strategy termed ccml, an R protocol for two-step consensus clustering that can handle unequal missing labels (i.e. multiple predictive labels with different sample coverages). Initially, the regular consensus weights are adjusted (normalized) by sample coverage, then a regular consensus clustering is performed to predict the optimal final cluster. We applied the ccml method to predict molecularly distinct groups based on 9-omics integration in the Karolinska COSMIC cohort, which investigates chronic obstructive pulmonary disease, and 24-omics handprint integrative subgrouping of adult asthma patients of the U-BIOPRED cohort. We propose ccml as a downstream toolkit for multi-omics integration analysis algorithms such as Similarity Network Fusion and robust clustering of clinical data to overcome the limitations posed by missing data, which is inevitable in human cohorts consisting of multiple data modalities. The ccml tool is available in the R language (https://CRAN.R-project.org/package=ccml, https://github.com/pulmonomics-lab/ccml, or https://github.com/ZhoulabCPH/ccml).
Collapse
Affiliation(s)
- Chuan-Xing Li
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
| | - Hongyan Chen
- School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Nazanin Zounemat-Kermani
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, United Kingdom
- Data Science Institute, Imperial College London, London, United Kingdom
| | - Ian M Adcock
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, United Kingdom
- Data Science Institute, Imperial College London, London, United Kingdom
| | - C Magnus Sköld
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital Solna, Stockholm, Sweden
| | - Meng Zhou
- School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Åsa M Wheelock
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital Solna, Stockholm, Sweden
| | | |
Collapse
|
30
|
Guo W, Lv C, Guo M, Zhao Q, Yin X, Zhang L. Innovative applications of artificial intelligence in zoonotic disease management. SCIENCE IN ONE HEALTH 2023; 2:100045. [PMID: 39077042 PMCID: PMC11262289 DOI: 10.1016/j.soh.2023.100045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 10/22/2023] [Indexed: 07/31/2024]
Abstract
Zoonotic diseases, transmitted between humans and animals, pose a substantial threat to global public health. In recent years, artificial intelligence (AI) has emerged as a transformative tool in the fight against diseases. This comprehensive review discusses the innovative applications of AI in the management of zoonotic diseases, including disease prediction, early diagnosis, drug development, and future prospects. AI-driven predictive models leverage extensive datasets to predict disease outbreaks and transmission patterns, thereby facilitating proactive public health responses. Early diagnosis benefits from AI-powered diagnostic tools that expedite pathogen identification and containment. Furthermore, AI technologies have accelerated drug discovery by identifying potential drug targets and optimizing candidate drugs. This review addresses these advancements, while also examining the promising future of AI in zoonotic disease control. We emphasize the pivotal role of AI in revolutionizing our approach to managing zoonotic diseases and highlight its potential to safeguard the health of both humans and animals on a global scale.
Collapse
Affiliation(s)
- Wenqiang Guo
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Chenrui Lv
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meng Guo
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Qiwei Zhao
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi Yin
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Zhang
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
31
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
32
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
33
|
Fiocchi C. Omics and Multi-Omics in IBD: No Integration, No Breakthroughs. Int J Mol Sci 2023; 24:14912. [PMID: 37834360 PMCID: PMC10573814 DOI: 10.3390/ijms241914912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/27/2023] [Accepted: 10/02/2023] [Indexed: 10/15/2023] Open
Abstract
The recent advent of sophisticated technologies like sequencing and mass spectroscopy platforms combined with artificial intelligence-powered analytic tools has initiated a new era of "big data" research in various complex diseases of still-undetermined cause and mechanisms. The investigation of these diseases was, until recently, limited to traditional in vitro and in vivo biological experimentation, but a clear switch to in silico methodologies is now under way. This review tries to provide a comprehensive assessment of state-of-the-art knowledge on omes, omics and multi-omics in inflammatory bowel disease (IBD). The notion and importance of omes, omics and multi-omics in both health and complex diseases like IBD is introduced, followed by a discussion of the various omics believed to be relevant to IBD pathogenesis, and how multi-omics "big data" can generate new insights translatable into useful clinical tools in IBD such as biomarker identification, prediction of remission and relapse, response to therapy, and precision medicine. The pitfalls and limitations of current IBD multi-omics studies are critically analyzed, revealing that, regardless of the types of omes being analyzed, the majority of current reports are still based on simple associations of descriptive retrospective data from cross-sectional patient cohorts rather than more powerful longitudinally collected prospective datasets. Given this limitation, some suggestions are provided on how IBD multi-omics data may be optimized for greater clinical and therapeutic benefit. The review concludes by forecasting the upcoming incorporation of multi-omics analyses in the routine management of IBD.
Collapse
Affiliation(s)
- Claudio Fiocchi
- Department of Inflammation & Immunity, Lerner Research Institute, Cleveland, OH 44195, USA;
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| |
Collapse
|
34
|
Way GP, Sailem H, Shave S, Kasprowicz R, Carragher NO. Evolution and impact of high content imaging. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:292-305. [PMID: 37666456 DOI: 10.1016/j.slasd.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/09/2023] [Accepted: 08/29/2023] [Indexed: 09/06/2023]
Abstract
The field of high content imaging has steadily evolved and expanded substantially across many industry and academic research institutions since it was first described in the early 1990's. High content imaging refers to the automated acquisition and analysis of microscopic images from a variety of biological sample types. Integration of high content imaging microscopes with multiwell plate handling robotics enables high content imaging to be performed at scale and support medium- to high-throughput screening of pharmacological, genetic and diverse environmental perturbations upon complex biological systems ranging from 2D cell cultures to 3D tissue organoids to small model organisms. In this perspective article the authors provide a collective view on the following key discussion points relevant to the evolution of high content imaging: • Evolution and impact of high content imaging: An academic perspective • Evolution and impact of high content imaging: An industry perspective • Evolution of high content image analysis • Evolution of high content data analysis pipelines towards multiparametric and phenotypic profiling applications • The role of data integration and multiomics • The role and evolution of image data repositories and sharing standards • Future perspective of high content imaging hardware and software.
Collapse
Affiliation(s)
- Gregory P Way
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Heba Sailem
- School of Cancer and Pharmaceutical Sciences, King's College London, UK
| | - Steven Shave
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK; Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK
| | - Richard Kasprowicz
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK
| | - Neil O Carragher
- Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK.
| |
Collapse
|
35
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
36
|
Ouyang D, Liang Y, Li L, Ai N, Lu S, Yu M, Liu X, Xie S. Integration of multi-omics data using adaptive graph learning and attention mechanism for patient classification and biomarker identification. Comput Biol Med 2023; 164:107303. [PMID: 37586201 DOI: 10.1016/j.compbiomed.2023.107303] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 07/08/2023] [Accepted: 07/28/2023] [Indexed: 08/18/2023]
Abstract
With the rapid development and accumulation of high-throughput sequencing technology and omics data, many studies have conducted a more comprehensive understanding of human diseases from a multi-omics perspective. Meanwhile, graph-based methods have been widely used to process multi-omics data due to its powerful expressive ability. However, most existing graph-based methods utilize fixed graphs to learn sample embedding representations, which often leads to sub-optimal results. Furthermore, treating embedding representations of different omics equally usually cannot obtain more reasonable integrated information. In addition, the complex correlation between omics is not fully taken into account. To this end, we propose an end-to-end interpretable multi-omics integration method, named MOGLAM, for disease classification prediction. Dynamic graph convolutional network with feature selection is first utilized to obtain higher quality omic-specific embedding information by adaptively learning the graph structure and discover important biomarkers. Then, multi-omics attention mechanism is applied to adaptively weight the embedding representations of different omics, thereby obtaining more reasonable integrated information. Finally, we propose omic-integrated representation learning to capture complex common and complementary information between omics while performing multi-omics integration. Experimental results on three datasets show that MOGLAM achieves superior performance than other state-of-the-art multi-omics integration methods. Moreover, MOGLAM can identify important biomarkers from different omics data types in an end-to-end manner.
Collapse
Affiliation(s)
- Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, 518055, China; School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, 518055, China.
| | - Le Li
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Ning Ai
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Shanghui Lu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Mingkun Yu
- School of Computer Science and Engineering, Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, 519090, China
| | - Shengli Xie
- Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, 510000, China
| |
Collapse
|
37
|
Chen Y, Wen Y, Xie C, Chen X, He S, Bo X, Zhang Z. MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning. iScience 2023; 26:107378. [PMID: 37559907 PMCID: PMC10407241 DOI: 10.1016/j.isci.2023.107378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/23/2023] [Accepted: 07/07/2023] [Indexed: 08/11/2023] Open
Abstract
Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.
Collapse
Affiliation(s)
- Yuxin Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chenyang Xie
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Xinjian Chen
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
38
|
Gygi JP, Kleinstein SH, Guan L. Predictive overfitting in immunological applications: Pitfalls and solutions. Hum Vaccin Immunother 2023; 19:2251830. [PMID: 37697867 PMCID: PMC10498807 DOI: 10.1080/21645515.2023.2251830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 07/27/2023] [Accepted: 08/21/2023] [Indexed: 09/13/2023] Open
Abstract
Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
Collapse
Affiliation(s)
- Jeremy P. Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Steven H. Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
39
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
40
|
Mahdi-Esferizi R, Haji Molla Hoseyni B, Mehrpanah A, Golzade Y, Najafi A, Elahian F, Zadeh Shirazi A, Gomez GA, Tahmasebian S. DeeP4med: deep learning for P4 medicine to predict normal and cancer transcriptome in multiple human tissues. BMC Bioinformatics 2023; 24:275. [PMID: 37403016 DOI: 10.1186/s12859-023-05400-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 06/25/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND P4 medicine (predict, prevent, personalize, and participate) is a new approach to diagnosing and predicting diseases on a patient-by-patient basis. For the prevention and treatment of diseases, prediction plays a fundamental role. One of the intelligent strategies is the design of deep learning models that can predict the state of the disease using gene expression data. RESULTS We create an autoencoder deep learning model called DeeP4med, including a Classifier and a Transferor that predicts cancer's gene expression (mRNA) matrix from its matched normal sample and vice versa. The range of the F1 score of the model, depending on tissue type in the Classifier, is from 0.935 to 0.999 and in Transferor from 0.944 to 0.999. The accuracy of DeeP4med for tissue and disease classification was 0.986 and 0.992, respectively, which performed better compared to seven classic machine learning models (Support Vector Classifier, Logistic Regression, Linear Discriminant Analysis, Naive Bayes, Decision Tree, Random Forest, K Nearest Neighbors). CONCLUSIONS Based on the idea of DeeP4med, by having the gene expression matrix of a normal tissue, we can predict its tumor gene expression matrix and, in this way, find effective genes in transforming a normal tissue into a tumor tissue. Results of Differentially Expressed Genes (DEGs) and enrichment analysis on the predicted matrices for 13 types of cancer showed a good correlation with the literature and biological databases. This led that by using the gene expression matrix, to train the model with features of each person in a normal and cancer state, this model could predict diagnosis based on gene expression data from healthy tissue and be used to identify possible therapeutic interventions for those patients.
Collapse
Affiliation(s)
- Roohallah Mahdi-Esferizi
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | | | - Amir Mehrpanah
- Faculty of Mathematics, Shahid Beheshti University, Tehran, Iran
| | - Yazdan Golzade
- Department of Mathematics, Faculty of Basic Sciences, Iran University of Science and Technology,(IUST), Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Fatemeh Elahian
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Amin Zadeh Shirazi
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Guillermo A Gomez
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Shahram Tahmasebian
- Cellular and Molecular Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran.
| |
Collapse
|
41
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
42
|
Salimy S, Lanjanian H, Abbasi K, Salimi M, Najafi A, Tapak L, Masoudi-Nejad A. A deep learning-based framework for predicting survival-associated groups in colon cancer by integrating multi-omics and clinical data. Heliyon 2023; 9:e17653. [PMID: 37455955 PMCID: PMC10344710 DOI: 10.1016/j.heliyon.2023.e17653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/30/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
Precise prognostic classification of patients and identifying survival subgroups and their associated genes can be important clinical references when designing treatment strategies for cancer patients. Multi-omics and data integration techniques are powerful tools to achieve this goal. This study aimed to introduce a machine learning method to integrate three types of biological data, and investigate the performance of two other methods, in identifying the survival dependency of patients. The data included TCGA RNA-seq gene expression, DNA methylation, and clinical data from 368 patients with colon cancer also we use an independent external validation data set, containing 232 samples. Three methods including, hyper-parameter optimized autoencoders (HPOAE), normal autoencoder, and penalized principal component analysis (PPCA) were used for simultaneous data integration and estimation under a COX hazards model. The HPOAE was thought to outperform other methods. The HPOAE had the Log Rank Mantel-Cox value of 14.27 ± 2, and a Breslow-Generalized Wilcoxon value of 13.13 ± 1. Ten miRNA, 11 methylated genes, and 28 mRNA all by (importance of marginal cutoff > 0.95) were identified. The study demonstrated that hsa-miR-485-5p targets both ZMYM1 and tp53, the latter of which has been previously associated with cancer in numerous studies. Furthermore, compared to other methods, the HPOAE exhibited a greater capacity for identifying survival subgroups and the genes associated with them in patients with colon cancer. However, all of the results were obtained by computational methods, and clinical and experimental studies are needed to validate these results.
Collapse
Affiliation(s)
- Siamak Salimy
- Laboratory of System Biology and Bioinformatics (LBB), Department of Bioinformatics, University of Tehran, Kish International Campus, Kish, Iran
| | - Hossein Lanjanian
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics & Artificial Intelligent in Medicine (LBBai), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, Iran
| | - Mahdieh Salimi
- Department of Medical Genetics, Institute of Medical Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Tehran, Iran
| | - Leili Tapak
- Department of Biostatistics, School of Public Health and Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Ali Masoudi-Nejad
- Laboratory of System Biology and Bioinformatics (LBB), Department of Bioinformatics, University of Tehran, Kish International Campus, Kish, Iran
| |
Collapse
|
43
|
Chatterjee B, Thakur SS. Proteins and metabolites fingerprints of gestational diabetes mellitus forming protein-metabolite interactomes are its potential biomarkers. Proteomics 2023; 23:e2200257. [PMID: 36919629 DOI: 10.1002/pmic.202200257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 03/04/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023]
Abstract
Gestational diabetes mellitus (GDM) is a consequence of glucose intolerance with an inadequate production of insulin that happens during pregnancy and leads to adverse health consequences for both mother and fetus. GDM patients are at higher risk for preeclampsia, and developing diabetes mellitus type 2 in later life, while the child born to GDM mothers are more prone to macrosomia, and hypoglycemia. The universally accepted diagnostic criteria for GDM are lacking, therefore there is a need for a diagnosis of GDM that can identify GDM at its early stage (first trimester). We have reviewed the literature on proteins and metabolites fingerprints of GDM. Further, we have performed protein-protein, metabolite-metabolite, and protein-metabolite interaction network studies on GDM proteins and metabolites fingerprints. Notably, some proteins and metabolites fingerprints are forming strong interaction networks at high confidence scores. Therefore, we have suggested that those proteins and metabolites that are forming protein-metabolite interactomes are the potential biomarkers of GDM. The protein-metabolite biomarkers interactome may help in a deep understanding of the prognosis, pathogenesis of GDM, and also detection of GDM. The protein-metabolites interactome may be further applied in planning future therapeutic strategies to promote long-term health benefits in GDM mothers and their children.
Collapse
Affiliation(s)
- Bhaswati Chatterjee
- National Institute of Pharmaceutical Education and Research, Hyderabad, India
- National Institute of Animal Biotechnology (NIAB), Hyderabad, India
| | - Suman S Thakur
- Centre for Cellular and Molecular Biology, Hyderabad, India
| |
Collapse
|
44
|
Demir Karaman E, Işık Z. Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers. Med Sci (Basel) 2023; 11:44. [PMID: 37489460 PMCID: PMC10366886 DOI: 10.3390/medsci11030044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/18/2023] [Accepted: 06/20/2023] [Indexed: 07/26/2023] Open
Abstract
Combining omics data from different layers using integrative methods provides a better understanding of the biology of a complex disease such as cancer. The discovery of biomarkers related to cancer development or prognosis helps to find more effective treatment options. This study integrates multi-omics data of different cancer types with a network-based approach to explore common gene modules among different tumors by running community detection methods on the integrated network. The common modules were evaluated by several biological metrics adapted to cancer. Then, a new prognostic scoring method was developed by weighting mRNA expression, methylation, and mutation status of genes. The survival analysis pointed out statistically significant results for GNG11, CBX2, CDKN3, ARHGEF10, CLN8, SEC61G and PTDSS1 genes. The literature search reveals that the identified biomarkers are associated with the same or different types of cancers. Our method does not only identify known cancer-specific biomarker genes, but also proposes new potential biomarkers. Thus, this study provides a rationale for identifying new gene targets and expanding treatment options across cancer types.
Collapse
Affiliation(s)
- Ezgi Demir Karaman
- Department of Computer Engineering, Institute of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey
| | - Zerrin Işık
- Department of Computer Engineering, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey
| |
Collapse
|
45
|
Kwoji ID, Aiyegoro OA, Okpeku M, Adeleke MA. 'Multi-omics' data integration: applications in probiotics studies. NPJ Sci Food 2023; 7:25. [PMID: 37277356 DOI: 10.1038/s41538-023-00199-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 05/22/2023] [Indexed: 06/07/2023] Open
Abstract
The concept of probiotics is witnessing increasing attention due to its benefits in influencing the host microbiome and the modulation of host immunity through the strengthening of the gut barrier and stimulation of antibodies. These benefits, combined with the need for improved nutraceuticals, have resulted in the extensive characterization of probiotics leading to an outburst of data generated using several 'omics' technologies. The recent development in system biology approaches to microbial science is paving the way for integrating data generated from different omics techniques for understanding the flow of molecular information from one 'omics' level to the other with clear information on regulatory features and phenotypes. The limitations and tendencies of a 'single omics' application to ignore the influence of other molecular processes justify the need for 'multi-omics' application in probiotics selections and understanding its action on the host. Different omics techniques, including genomics, transcriptomics, proteomics, metabolomics and lipidomics, used for studying probiotics and their influence on the host and the microbiome are discussed in this review. Furthermore, the rationale for 'multi-omics' and multi-omics data integration platforms supporting probiotics and microbiome analyses was also elucidated. This review showed that multi-omics application is useful in selecting probiotics and understanding their functions on the host microbiome. Hence, recommend a multi-omics approach for holistically understanding probiotics and the microbiome.
Collapse
Affiliation(s)
- Iliya Dauda Kwoji
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Sciences, University of KwaZulu-Natal, 4090, Durban, South Africa
| | - Olayinka Ayobami Aiyegoro
- Unit for Environmental Sciences and Management, North-West University, Potchefstroom, Northwest, South Africa
| | - Moses Okpeku
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Sciences, University of KwaZulu-Natal, 4090, Durban, South Africa
| | - Matthew Adekunle Adeleke
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Sciences, University of KwaZulu-Natal, 4090, Durban, South Africa.
| |
Collapse
|
46
|
Devonshire A, Gautam Y, Johansson E, Mersha TB. Multi-omics profiling approach in food allergy. World Allergy Organ J 2023; 16:100777. [PMID: 37214173 PMCID: PMC10199264 DOI: 10.1016/j.waojou.2023.100777] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 04/05/2023] [Accepted: 04/05/2023] [Indexed: 05/24/2023] Open
Abstract
The prevalence of food allergy (FA) among children is increasing, affecting nearly 8% of children, and FA is the most common cause of anaphylaxis and anaphylaxis-related emergency department visits in children. Importantly, FA is a complex, multi-system, multifactorial disease mediated by food-specific immunoglobulin E (IgE) and type 2 immune responses and involving environmental and genetic factors and gene-environment interactions. Early exposure to external and internal environmental factors largely influences the development of immune responses to allergens. Genetic factors and gene-environment interactions have established roles in the FA pathophysiology. To improve diagnosis and identification of FA therapeutic targets, high-throughput omics approaches have emerged and been applied over the past decades to screen for potential FA biomarkers, such as genes, transcripts, proteins, and metabolites. In this article, we provide an overview of the current status of FA omics studies, namely genomic, transcriptomic, epigenomic, proteomic, exposomic, and metabolomic. The current development of multi-omics integration of FA studies is also briefly discussed. As individual omics technologies only provide limited information on the multi-system biological processes of FA, integration of population-based multi-omics data and clinical data may lead to robust biomarker discovery that could translate into advances in disease management and clinical care and ultimately lead to precision medicine approaches.
Collapse
Affiliation(s)
- Ashley Devonshire
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Yadu Gautam
- Division of Asthma Research, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Elisabet Johansson
- Division of Asthma Research, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Tesfaye B. Mersha
- Division of Asthma Research, Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| |
Collapse
|
47
|
Choi JM, Chae H. moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinformatics 2023; 24:169. [PMID: 37101124 PMCID: PMC10131354 DOI: 10.1186/s12859-023-05273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Breast cancer is a highly heterogeneous disease that comprises multiple biological components. Owing its diversity, patients have different prognostic outcomes; hence, early diagnosis and accurate subtype prediction are critical for treatment. Standardized breast cancer subtyping systems, mainly based on single-omics datasets, have been developed to ensure proper treatment in a systematic manner. Recently, multi-omics data integration has attracted attention to provide a comprehensive view of patients but poses a challenge due to the high dimensionality. In recent years, deep learning-based approaches have been proposed, but they still present several limitations. RESULTS In this study, we describe moBRCA-net, an interpretable deep learning-based breast cancer subtype classification framework that uses multi-omics datasets. Three omics datasets comprising gene expression, DNA methylation and microRNA expression data were integrated while considering the biological relationships among them, and a self-attention module was applied to each omics dataset to capture the relative importance of each feature. The features were then transformed to new representations considering the respective learned importance, allowing moBRCA-net to predict the subtype. CONCLUSIONS Experimental results confirmed that moBRCA-net has a significantly enhanced performance compared with other methods, and the effectiveness of multi-omics integration and omics-level attention were identified. moBRCA-net is publicly available at https://github.com/cbi-bioinfo/moBRCA-net .
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea.
| |
Collapse
|
48
|
Yang D, Wu Y, Wan Z, Xu Z, Li W, Yuan P, Shang Q, Peng J, Tao L, Chen Q, Dan H, Xu H. HISMD: A Novel Immune Subtyping System for HNSCC. J Dent Res 2023; 102:270-279. [PMID: 36333876 DOI: 10.1177/00220345221134605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Immune subtyping is an important way to reveal immune heterogeneity, which may contribute to the diversity of the progression and treatment in head and neck squamous cell carcinoma (HNSCC). However, reported immune subtypes mainly focus on levels of immune infiltration and are mostly based on a mono-omics profile. This study aimed to identify a comprehensive immune subtype for HNSCC via multi-omics clustering and build a novel subtype prediction system for clinical application. Data were obtained from The Cancer Genome Atlas database and our independent multicenter cohort. Multi-omics clustering was performed to identify 3 clusters of 499 patients in The Cancer Genome Atlas based on immune-related gene expression and somatic mutations. The immune characteristics and biological features of the obtained clusters were revealed by bioinformatics, and 3 immune subtypes were identified: 1) adaptive immune activation subtype predominantly enriched in T cells, 2) innate immune activation subtype predominantly enriched in macrophages, and 3) immune desert subtype. Subsequently, the clinical implications of each subtype were analyzed per clinical epidemiology. We found that adaptive immune activation showed better survival outcomes and had a similar response to chemotherapy with innate immune activation, whereas immune desert might be relatively resistant to chemotherapy. Moreover, a subtype prediction system was developed by deep learning with whole slide images and named HISMD: HNSCC Immune Subtypes via Multi-omics and Deep Learning. We endowed HISMD with interpretability through image-based key feature extraction. The clinical implications, biological significances, and predictive stability of HISMD were successfully verified by using our independent multicenter cohort data set. In summary, this study revealed the immune heterogeneity of HNSCC and obtained a novel, highly accurate, and interpretable immune subtyping prediction system. For clinical implementation in the future, additional validation and utility studies are warranted.
Collapse
Affiliation(s)
- D Yang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Y Wu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Z Wan
- Department of Pathology, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Z Xu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - W Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - P Yuan
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Q Shang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - J Peng
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - L Tao
- College of Mathematics, Sichuan University, Chengdu, China
| | - Q Chen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China.,Key Laboratory of Oral Biomedical Research of Zhejiang Province, Affiliated Stomatology Hospital, Zhejiang University School of Stomatology, Hangzhou, China
| | - H Dan
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - H Xu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
49
|
Echegaray N, Yilmaz B, Sharma H, Kumar M, Pateiro M, Ozogul F, Lorenzo JM. A novel approach to Lactiplantibacillus plantarum: From probiotic properties to the omics insights. Microbiol Res 2023; 268:127289. [PMID: 36571922 DOI: 10.1016/j.micres.2022.127289] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 10/24/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
Lactiplantibacillus plantarum (previously known as Lactobacillus plantarum) strains are one of the lactic acid bacteria (LAB) commonly used in fermentation and their probiotic and functional properties along with their health-promoting roles come to the fore. Food-derived L. plantarum strains have shown good resistance and adhesion in the gastrointestinal tract (GI) and excellent antioxidant and antimicrobial properties. Furthermore, many strains of L. plantarum can produce bacteriocins with interesting antimicrobial activity. This probiotic properties of L. plantarum and existing in different niches give a great potential to have beneficial effects on health. It is also has been shown that L. plantarum can regulate the intestinal microbiota composition in a good way. Recently, omics approaches such as metabolomics, secretomics, proteomics, transcriptomics and genomics try to understand the roles and mechanisms of L. plantarum that are related to its functional characteristics. This review provides an overview of the probiotic properties, including the specific interactions between microbiota and host, and omics insights of L. plantarum.
Collapse
Affiliation(s)
- Noemí Echegaray
- Centro Tecnológico de la Carne de Galicia, Avda. Galicia nº 4, Parque Tecnológico de Galicia, San Cibrao das Viñas, 32900 Ourense, Spain
| | - Birsen Yilmaz
- Department of Nutrition and Dietetics, Cukurova University, Sarıcam, 01330 Adana, Turkey
| | - Heena Sharma
- Dairy Technology Division, ICAR-National Dairy Research Institute, Karnāl, Haryana, 132001, India
| | - Manoj Kumar
- Chemical and Biochemical Processing Division, Central Institute for Research on Cotton Technology, Mumbai 400019, India
| | - Mirian Pateiro
- Centro Tecnológico de la Carne de Galicia, Avda. Galicia nº 4, Parque Tecnológico de Galicia, San Cibrao das Viñas, 32900 Ourense, Spain
| | - Fatih Ozogul
- Department of Seafood Processing Technology, Faculty of Fisheries, Cukurova University, 01330, Adana, Turkey
| | - Jose Manuel Lorenzo
- Centro Tecnológico de la Carne de Galicia, Avda. Galicia nº 4, Parque Tecnológico de Galicia, San Cibrao das Viñas, 32900 Ourense, Spain; Universidade de Vigo, Área de Tecnoloxía dos Alimentos, Facultade de Ciencias de Ourense, 32004 Ourense, Spain.
| |
Collapse
|
50
|
Zafari N, Bathaei P, Velayati M, Khojasteh-Leylakoohi F, Khazaei M, Fiuji H, Nassiri M, Hassanian SM, Ferns GA, Nazari E, Avan A. Integrated analysis of multi-omics data for the discovery of biomarkers and therapeutic targets for colorectal cancer. Comput Biol Med 2023; 155:106639. [PMID: 36805214 DOI: 10.1016/j.compbiomed.2023.106639] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/14/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The considerable burden of colorectal cancer and the rising trend in young adults emphasize the necessity of understanding its underlying mechanisms, providing new diagnostic and prognostic markers, and improving therapeutic approaches. Precision medicine is a new trend all over the world and identification of novel biomarkers and therapeutic targets is a step forward towards this trend. In this context, multi-omics data and integrated analysis are being investigated to develop personalized medicine in the management of colorectal cancer. Given the large amount of data from multi-omics approach, data integration and analysis is a great challenge. In this Review, we summarize how statistical and machine learning techniques are applied to analyze multi-omics data and how it contributes to the discovery of useful diagnostic and prognostic biomarkers and therapeutic targets. Moreover, we discuss the importance of these biomarkers and therapeutic targets in the clinical management of colorectal cancer in the future. Taken together, integrated analysis of multi-omics data has great potential for finding novel diagnostic and prognostic biomarkers and therapeutic targets, however, there are still challenges to overcome in future studies.
Collapse
Affiliation(s)
- Nima Zafari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Parsa Bathaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Velayati
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Fatemeh Khojasteh-Leylakoohi
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Khazaei
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hamid Fiuji
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammadreza Nassiri
- Recombinant Proteins Research Group, The Research Institute of Biotechnology, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Seyed Mahdi Hassanian
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex, BN1 9PH, UK
| | - Elham Nazari
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Basic Sciences Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|