1
|
Fountzilas E, Pearce T, Baysal MA, Chakraborty A, Tsimberidou AM. Convergence of evolving artificial intelligence and machine learning techniques in precision oncology. NPJ Digit Med 2025; 8:75. [PMID: 39890986 PMCID: PMC11785769 DOI: 10.1038/s41746-025-01471-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 01/19/2025] [Indexed: 02/03/2025] Open
Abstract
The confluence of new technologies with artificial intelligence (AI) and machine learning (ML) analytical techniques is rapidly advancing the field of precision oncology, promising to improve diagnostic approaches and therapeutic strategies for patients with cancer. By analyzing multi-dimensional, multiomic, spatial pathology, and radiomic data, these technologies enable a deeper understanding of the intricate molecular pathways, aiding in the identification of critical nodes within the tumor's biology to optimize treatment selection. The applications of AI/ML in precision oncology are extensive and include the generation of synthetic data, e.g., digital twins, in order to provide the necessary information to design or expedite the conduct of clinical trials. Currently, many operational and technical challenges exist related to data technology, engineering, and storage; algorithm development and structures; quality and quantity of the data and the analytical pipeline; data sharing and generalizability; and the incorporation of these technologies into the current clinical workflow and reimbursement models.
Collapse
Affiliation(s)
- Elena Fountzilas
- Department of Medical Oncology, St Luke's Clinic, Panorama, Thessaloniki, Greece
| | | | - Mehmet A Baysal
- Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX, USA
| | - Abhijit Chakraborty
- Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX, USA
| | - Apostolia M Tsimberidou
- Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX, USA.
| |
Collapse
|
2
|
Guo Y, Yu L, Guo L, Xu L, Li Q. A regularized Bayesian Dirichlet-multinomial regression model for integrating single-cell-level omics and patient-level clinical study data. Biometrics 2025; 81:ujaf005. [PMID: 39887052 PMCID: PMC11783250 DOI: 10.1093/biomtc/ujaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 11/02/2024] [Accepted: 01/21/2025] [Indexed: 02/01/2025]
Abstract
The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.
Collapse
Affiliation(s)
- Yanghong Guo
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States
| | - Lei Yu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Lei Guo
- Quantitative Biomedical Research Center, Peter O’Donnell Jr School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States
| |
Collapse
|
3
|
Narro-Serrano J, Marhuenda-Egea FC. Diagnosis, Severity, and Prognosis from Potential Biomarkers of COVID-19 in Urine: A Review of Clinical and Omics Results. Metabolites 2024; 14:724. [PMID: 39728505 DOI: 10.3390/metabo14120724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 12/19/2024] [Accepted: 12/20/2024] [Indexed: 12/28/2024] Open
Abstract
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has spurred an extraordinary scientific effort to better understand the disease's pathophysiology and develop diagnostic and prognostic tools to guide more precise and effective clinical management. Among the biological samples analyzed for biomarker identification, urine stands out due to its low risk of infection, non-invasive collection, and suitability for frequent, large-volume sampling. Integrating data from omics studies with standard biochemical analyses offers a deeper and more comprehensive understanding of COVID-19. This review aims to provide a detailed summary of studies published to date that have applied omics and clinical analyses on urine samples to identify potential biomarkers for COVID-19. In July 2024, an advanced search was conducted in Web of Science using the query: "covid* (Topic) AND urine (Topic) AND metabol* (Topic)". The search included results published up to 14 October 2024. The studies retrieved from this digital search were evaluated through a two-step screening process: first by reviewing titles and abstracts for eligibility, and then by retrieving and assessing the full texts of articles that met the specific criteria. The initial search retrieved 913 studies, of which 45 articles were ultimately included in this review. The most robust biomarkers identified include kynurenine, neopterin, total proteins, red blood cells, ACE2, citric acid, ketone bodies, hypoxanthine, amino acids, and glucose. The biological causes underlying these alterations reflect the multisystemic impact of COVID-19, highlighting key processes such as systemic inflammation, renal dysfunction, critical hypoxia, and metabolic stress.
Collapse
Affiliation(s)
| | - Frutos Carlos Marhuenda-Egea
- Department of Biochemistry and Molecular Biology and Soil Science and Agricultural Chemistry, University of Alicante, 03690 Alicante, Spain
| |
Collapse
|
4
|
Feng X, Wu W, Liu F. AH-6809 mediated regulation of lung adenocarcinoma metastasis through NLRP7 and prognostic analysis of key metastasis-related genes. Front Pharmacol 2024; 15:1486265. [PMID: 39697539 PMCID: PMC11652142 DOI: 10.3389/fphar.2024.1486265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Accepted: 09/30/2024] [Indexed: 12/20/2024] Open
Abstract
Introduction Lung adenocarcinoma (LUAD) has become one of the leading causes of cancer-related deaths globally, with metastasis representing the most lethal stage of the disease. Despite significant advances in diagnostic and therapeutic strategies for LUAD, the mechanisms enabling cancer cells to breach the blood-brain barrier remain poorly understood. While genomic profiling has shed light on the nature of primary tumors, the genetic drivers and clinical relevance of LUAD metastasis are still largely unexplored. Objectives This study aims to investigate the genomic differences between brain-metastatic and non-brain-metastatic LUAD, identify potential prognostic biomarkers, and evaluate the efficacy of AH-6809 in modulating key molecular pathways involved in LUAD metastasis, with a focus on post-translational modifications (PTMs). Methods Genomic analyses were performed using data from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). Differentially expressed genes (DEGs) between brain-metastatic and non-metastatic LUAD samples were identified. Key gene modules were determined using Weighted Gene Co-expression Network Analysis (WGCNA), and their prognostic significance was assessed through Kaplan-Meier analysis. Cellular experiments, including CCK8 and qRT-PCR assays, were conducted to evaluate the anti-cancer effects of AH-6809 in LUAD cells. Apoptosis and inflammatory marker expression were assessed using immunofluorescence. Results Genomic analysis differentiated brain-metastatic from non-brain-metastatic LUAD and identified NLRP7, FIBCD1, and ELF5 as prognostic markers. AH-6809 significantly suppressed LUAD cell proliferation, promoted apoptosis, and modulated epithelial-mesenchymal transition (EMT) markers. These effects were reversed upon NLRP7 knockdown, highlighting its role in metastasis. Literature analysis further supported AH-6809's tumor-suppressive activity, particularly in NLRP7 knockdown cells, where it inhibited cell growth and facilitated apoptosis. AH-6809 was also found to affect SUMO1-mediated PTMs and downregulate EMT markers, including VIM and CDH2. NLRP7 knockdown partially reversed these effects. Immunofluorescence revealed enhanced apoptosis and inflammation in lung cancer cells, especially in NLRP7 knockdown cells treated with AH-6809. The regulatory mechanisms involve SUMO1-mediated post-translational modifications and NQO1. Further studies are required to elucidate the molecular mechanisms and assess the clinical potential of these findings. Conclusion These findings demonstrate the critical role of NLRP7 and associated genes in LUAD metastasis and suggest that AH-6809 holds promise as a potential therapeutic agent for brain-metastatic LUAD.
Collapse
Affiliation(s)
- Xu Feng
- Department of Neurointerventional, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Wei Wu
- Department of Acupuncture, Jin Zhou Hospital of Traditional Chinese Medicine, Jinzhou, China
| | - Feifei Liu
- Department of Anesthesiology, The First Affiliated Hospital of Jinzhou MedicalUniversity, Jinzhou, China
| |
Collapse
|
5
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024; 23:549-560. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
6
|
Guo Y, Yu L, Guo L, Xu L, Li Q. A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597391. [PMID: 38895417 PMCID: PMC11185671 DOI: 10.1101/2024.06.04.597391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.
Collapse
Affiliation(s)
- Yanghong Guo
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, U.S.A
| | - Lei Yu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, Texas, U.S.A
| | - Lei Guo
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, Texas, U.S.A
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, Texas, U.S.A
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, U.S.A
| |
Collapse
|
7
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
8
|
Rashid MM, Selvarajoo K. Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data. Brief Bioinform 2024; 25:bbae300. [PMID: 38904542 PMCID: PMC11190965 DOI: 10.1093/bib/bbae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/30/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open
Abstract
The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.
Collapse
Affiliation(s)
- Md Mamunur Rashid
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
| | - Kumar Selvarajoo
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, NUS, Singapore 117456, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University (NTU), Singapore 639798, Republic of Singapore
| |
Collapse
|
9
|
Fawaz A, Ferraresi A, Isidoro C. Systems Biology in Cancer Diagnosis Integrating Omics Technologies and Artificial Intelligence to Support Physician Decision Making. J Pers Med 2023; 13:1590. [PMID: 38003905 PMCID: PMC10672164 DOI: 10.3390/jpm13111590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 11/26/2023] Open
Abstract
Cancer is the second major cause of disease-related death worldwide, and its accurate early diagnosis and therapeutic intervention are fundamental for saving the patient's life. Cancer, as a complex and heterogeneous disorder, results from the disruption and alteration of a wide variety of biological entities, including genes, proteins, mRNAs, miRNAs, and metabolites, that eventually emerge as clinical symptoms. Traditionally, diagnosis is based on clinical examination, blood tests for biomarkers, the histopathology of a biopsy, and imaging (MRI, CT, PET, and US). Additionally, omics biotechnologies help to further characterize the genome, metabolome, microbiome traits of the patient that could have an impact on the prognosis and patient's response to the therapy. The integration of all these data relies on gathering of several experts and may require considerable time, and, unfortunately, it is not without the risk of error in the interpretation and therefore in the decision. Systems biology algorithms exploit Artificial Intelligence (AI) combined with omics technologies to perform a rapid and accurate analysis and integration of patient's big data, and support the physician in making diagnosis and tailoring the most appropriate therapeutic intervention. However, AI is not free from possible diagnostic and prognostic errors in the interpretation of images or biochemical-clinical data. Here, we first describe the methods used by systems biology for combining AI with omics and then discuss the potential, challenges, limitations, and critical issues in using AI in cancer research.
Collapse
Affiliation(s)
| | | | - Ciro Isidoro
- Laboratory of Molecular Pathology, Department of Health Sciences, Università del Piemonte Orientale, 28100 Novara, Italy; (A.F.); (A.F.)
| |
Collapse
|
10
|
Viana JN, Pilbeam C, Howard M, Scholz B, Ge Z, Fisser C, Mitchell I, Raman S, Leach J. Maintaining High-Touch in High-Tech Digital Health Monitoring and Multi-Omics Prognostication: Ethical, Equity, and Societal Considerations in Precision Health for Palliative Care. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:461-473. [PMID: 37861713 DOI: 10.1089/omi.2023.0120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Advances in digital health, systems biology, environmental monitoring, and artificial intelligence (AI) continue to revolutionize health care, ushering a precision health future. More than disease treatment and prevention, precision health aims at maintaining good health throughout the lifespan. However, how can precision health impact care for people with a terminal or life-limiting condition? We examine here the ethical, equity, and societal/relational implications of two precision health modalities, (1) integrated systems biology/multi-omics analysis for disease prognostication and (2) digital health technologies for health status monitoring and communication. We focus on three main ethical and societal considerations: benefits and risks associated with integration of these modalities into the palliative care system; inclusion of underrepresented and marginalized groups in technology development and deployment; and the impact of high-tech modalities on palliative care's highly personalized and "high-touch" practice. We conclude with 10 recommendations for ensuring that precision health technologies, such as multi-omics prognostication and digital health monitoring, for palliative care are developed, tested, and implemented ethically, inclusively, and equitably.
Collapse
Affiliation(s)
- John Noel Viana
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
- Responsible Innovation Future Science Platform, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Australia
| | - Caitlin Pilbeam
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Mark Howard
- Monash Data Futures Institute, Monash University, Clayton, Australia
- Department of Philosophy, School of Philosophical, Historical and International Studies, Monash University, Clayton, Australia
| | - Brett Scholz
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Zongyuan Ge
- Monash Data Futures Institute, Monash University, Clayton, Australia
- Department of Data Science & AI, Monash University, Clayton, Australia
| | - Carys Fisser
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Imogen Mitchell
- School of Medicine and Psychology, College of Health and Medicine, The Australian National University, Canberra, Australia
- Intensive Care Unit, Canberra Hospital, Canberra, Australia
| | - Sujatha Raman
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
| | - Joan Leach
- Australian National Centre for the Public Awareness of Science, College of Science, The Australian National University, Canberra, Australia
| |
Collapse
|
11
|
Manganaro L, Bianco S, Bironzo P, Cipollini F, Colombi D, Corà D, Corti G, Doronzo G, Errico L, Falco P, Gandolfi L, Guerrera F, Monica V, Novello S, Papotti M, Parab S, Pittaro A, Primo L, Righi L, Sabbatini G, Sandri A, Vattakunnel S, Bussolino F, Scagliotti GV. Consensus clustering methodology to improve molecular stratification of non-small cell lung cancer. Sci Rep 2023; 13:7759. [PMID: 37173325 PMCID: PMC10182023 DOI: 10.1038/s41598-023-33954-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 04/21/2023] [Indexed: 05/15/2023] Open
Abstract
Recent advances in machine learning research, combined with the reduced sequencing costs enabled by modern next-generation sequencing, paved the way to the implementation of precision medicine through routine multi-omics molecular profiling of tumours. Thus, there is an emerging need of reliable models exploiting such data to retrieve clinically useful information. Here, we introduce an original consensus clustering approach, overcoming the intrinsic instability of common clustering methods based on molecular data. This approach is applied to the case of non-small cell lung cancer (NSCLC), integrating data of an ongoing clinical study (PROMOLE) with those made available by The Cancer Genome Atlas, to define a molecular-based stratification of the patients beyond, but still preserving, histological subtyping. The resulting subgroups are biologically characterized by well-defined mutational and gene-expression profiles and are significantly related to disease-free survival (DFS). Interestingly, it was observed that (1) cluster B, characterized by a short DFS, is enriched in KEAP1 and SKP2 mutations, that makes it an ideal candidate for further studies with inhibitors, and (2) over- and under-representation of inflammation and immune systems pathways in squamous-cell carcinomas subgroups could be potentially exploited to stratify patients treated with immunotherapy.
Collapse
Affiliation(s)
- L Manganaro
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - S Bianco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - P Bironzo
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - F Cipollini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Colombi
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - D Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases-CAAD, Novara, Italy
| | - G Corti
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G Doronzo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Errico
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - P Falco
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - L Gandolfi
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - F Guerrera
- Division of Thoracic Surgery at AOU Città della Salute e della Scienza, Department of Surgical Sciences, University of Torino, Torino, Italy
| | - V Monica
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - S Novello
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - M Papotti
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - S Parab
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - A Pittaro
- Pathology Division at AOU Città della Salute e della Scienza, Department of Oncology, University of Torino, Torino, Italy
| | - L Primo
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - L Righi
- Pathology Division at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | - G Sabbatini
- aizoOn Technology Consulting S.R.L, Torino, Italy
| | - A Sandri
- Division of Thoracic Surgery at AOU San Luigi, Department of Oncology, University of Torino, Orbassano (TO), Italy
| | | | - F Bussolino
- Department of Oncology, University of Torino, 10060, Candiolo, Italy
- Candiolo Cancer Institute-IRCCS-FPO, 10060, Candiolo, Italy
| | - G V Scagliotti
- Medical Oncology Division at San Luigi Hospital, Department of Oncology, University of Torino, Orbassano (TO), Italy.
| |
Collapse
|
12
|
Duan R, Tong J, Sutton AJ, Asch DA, Chu H, Schmid CH, Chen Y. Origami plot: a novel multivariate data visualization tool that improves radar chart. J Clin Epidemiol 2023; 156:85-94. [PMID: 36822444 PMCID: PMC10599795 DOI: 10.1016/j.jclinepi.2023.02.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 02/07/2023] [Accepted: 02/15/2023] [Indexed: 02/23/2023]
Abstract
OBJECTIVES We propose the origami plot, which maintains the original functionality of a radar chart and avoids potential misuse of its connected regions, with newly added features to better assist multicriteria decision-making. STUDY DESIGN AND SETTING Built upon a radar chart, the origami plot adds additional auxiliary axes and points such that the area of the connected region of all dots is invariant to the ordering of axes. As such, it enables ranking different individuals by the overall performance for multicriteria decision-making while maintaining the intuitive visual appeal of the radar chart. We develop extensions of the origami plot, including the weighted origami plot, which allows reweighting of each attribute to define the overall performance, and the pairwise origami plot, which highlights comparisons between two individuals. RESULTS We illustrate the different versions of origami plots using the hospital compare database developed by the Centers for Medicare & Medicaid Services (CMS). The plot shows individual hospital's performance on mortality, readmission, complication, and infection, as well as patient experience and timely and effective care, as well as their overall performance across these metrics. The weighted origami plot allows weighing the attributes differently when some are more important than others. We illustrate the potential use of the pairwise origami plot in electronic health records (EHR) system to monitor five clinical measures (body mass index [BMI]), fasting glucose level, blood pressure, triglycerides, and low-density lipoprotein ([LDL] cholesterol) of a patient across multiple hospital visits. CONCLUSION The origami plot is a useful visualization tool to assist multicriteria decision making. It improves radar charts by avoiding potential misuse of the connected regions. It has several new features and allows flexible customization.
Collapse
Affiliation(s)
- Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Jiayi Tong
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Alex J Sutton
- Department of Health Sciences, University of Leicester, Leicester, UK
| | - David A Asch
- Division of General Internal Medicine, University of Pennsylvania, Philadelphia, PA, USA; Leonard Davis Institute of Health Economics, Philadelphia, PA, USA
| | - Haitao Chu
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA; Statistical Research and Innovation, Global Biometrics and Data Management, Pfizer Inc., New York, NY, USA
| | | | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA; Leonard Davis Institute of Health Economics, Philadelphia, PA, USA.
| |
Collapse
|
13
|
Doan LMT, Angione C, Occhipinti A. Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer. Methods Mol Biol 2023; 2553:325-393. [PMID: 36227551 DOI: 10.1007/978-1-0716-2617-7_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Breast cancer is one of the most common cancers in women worldwide, which causes an enormous number of deaths annually. However, early diagnosis of breast cancer can improve survival outcomes enabling simpler and more cost-effective treatments. The recent increase in data availability provides unprecedented opportunities to apply data-driven and machine learning methods to identify early-detection prognostic factors capable of predicting the expected survival and potential sensitivity to treatment of patients, with the final aim of enhancing clinical outcomes. This tutorial presents a protocol for applying machine learning models in survival analysis for both clinical and transcriptomic data. We show that integrating clinical and mRNA expression data is essential to explain the multiple biological processes driving cancer progression. Our results reveal that machine-learning-based models such as random survival forests, gradient boosted survival model, and survival support vector machine can outperform the traditional statistical methods, i.e., Cox proportional hazard model. The highest C-index among the machine learning models was recorded when using survival support vector machine, with a value 0.688, whereas the C-index recorded using the Cox model was 0.677. Shapley Additive Explanation (SHAP) values were also applied to identify the feature importance of the models and their impact on the prediction outcomes.
Collapse
Affiliation(s)
- Le Minh Thao Doan
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK
- National Horizons Centre, Teesside University, Darlington, UK
| | - Annalisa Occhipinti
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- National Horizons Centre, Teesside University, Darlington, UK.
| |
Collapse
|
14
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
15
|
Jardillier R, Koca D, Chatelain F, Guyon L. Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening. BMC Cancer 2022; 22:1045. [PMID: 36199072 PMCID: PMC9533541 DOI: 10.1186/s12885-022-10117-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of patient survival from tumor molecular '-omics' data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of "high dimension", as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events. Thus, pre-screening together with penalization methods are widely used for dimensional reduction. METHODS In the present paper, (i) we benchmark the performance of the lasso penalization and three variants (i.e., ridge, elastic net, adaptive elastic net) on 16 cancers from TCGA after pre-screening, (ii) we propose a bi-dimensional pre-screening procedure based on both gene variability and p-values from single variable Cox models to predict survival, and (iii) we compare our results with iterative sure independence screening (ISIS). RESULTS First, we show that integration of mRNA-seq data with clinical data improves predictions over clinical data alone. Second, our bi-dimensional pre-screening procedure can only improve, in moderation, the C-index and/or the integrated Brier score, while excluding irrelevant genes for prediction. We demonstrate that the different penalization methods reached comparable prediction performances, with slight differences among datasets. Finally, we provide advice in the case of multi-omics data integration. CONCLUSIONS Tumor profiles convey more prognostic information than clinical variables such as stage for many cancer subtypes. Lasso and Ridge penalizations perform similarly than Elastic Net penalizations for Cox models in high-dimension. Pre-screening of the top 200 genes in term of single variable Cox model p-values is a practical way to reduce dimension, which may be particularly useful when integrating multi-omics.
Collapse
Affiliation(s)
- Rémy Jardillier
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France
- GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Dzenis Koca
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France
| | - Florent Chatelain
- GIPSA-lab, Institute of Engineering University Grenoble Alpes, Univ. Grenoble Alpes, CNRS, Grenoble INP, Grenoble, France
| | - Laurent Guyon
- IRIG, Biosanté U1292, Univ. Grenoble Alpes, Inserm, CEA, Grenoble, France
| |
Collapse
|
16
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
17
|
Gliozzo J, Mesiti M, Notaro M, Petrini A, Patak A, Puertas-Gallardo A, Paccanaro A, Valentini G, Casiraghi E. Heterogeneous data integration methods for patient similarity networks. Brief Bioinform 2022; 23:6604996. [PMID: 35679533 PMCID: PMC9294435 DOI: 10.1093/bib/bbac207] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/14/2022] [Accepted: 05/04/2022] [Indexed: 12/29/2022] Open
Abstract
Patient similarity networks (PSNs), where patients are represented as nodes and their similarities as weighted edges, are being increasingly used in clinical research. These networks provide an insightful summary of the relationships among patients and can be exploited by inductive or transductive learning algorithms for the prediction of patient outcome, phenotype and disease risk. PSNs can also be easily visualized, thus offering a natural way to inspect complex heterogeneous patient data and providing some level of explainability of the predictions obtained by machine learning algorithms. The advent of high-throughput technologies, enabling us to acquire high-dimensional views of the same patients (e.g. omics data, laboratory data, imaging data), calls for the development of data fusion techniques for PSNs in order to leverage this rich heterogeneous information. In this article, we review existing methods for integrating multiple biomedical data views to construct PSNs, together with the different patient similarity measures that have been proposed. We also review methods that have appeared in the machine learning literature but have not yet been applied to PSNs, thus providing a resource to navigate the vast machine learning literature existing on this topic. In particular, we focus on methods that could be used to integrate very heterogeneous datasets, including multi-omics data as well as data derived from clinical information and medical imaging.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,European Commission, Joint Research Centre (JRC), Ispra (VA), Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Mesiti
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Marco Notaro
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alessandro Petrini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra (VA), Italy
| | | | - Alberto Paccanaro
- Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX UK.,School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro Brazil
| | - Giorgio Valentini
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy.,DSRC UNIMI, Data Science Research Center, Milano, 20135, Italy.,ELLIS, European Laboratory for Learning and Intelligent Systems, Berlin, Germany
| | - Elena Casiraghi
- AnacletoLab - Computer Science Department, Universitá degli Studi di Milano, Via Celoria 18, 20135, Milan, Italy.,CINI, Infolife National Laboratory, Roma, Italy
| |
Collapse
|
18
|
Yousefi PD, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton CL. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet 2022; 23:369-383. [PMID: 35304597 DOI: 10.1038/s41576-022-00465-w] [Citation(s) in RCA: 98] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2022] [Indexed: 12/12/2022]
Abstract
DNA methylation data have become a valuable source of information for biomarker development, because, unlike static genetic risk estimates, DNA methylation varies dynamically in relation to diverse exogenous and endogenous factors, including environmental risk factors and complex disease pathology. Reliable methods for genome-wide measurement at scale have led to the proliferation of epigenome-wide association studies and subsequently to the development of DNA methylation-based predictors across a wide range of health-related applications, from the identification of risk factors or exposures, such as age and smoking, to early detection of disease or progression in cancer, cardiovascular and neurological disease. This Review evaluates the progress of existing DNA methylation-based predictors, including the contribution of machine learning techniques, and assesses the uptake of key statistical best practices needed to ensure their reliable performance, such as data-driven feature selection, elimination of data leakage in performance estimates and use of generalizable, adequately powered training samples.
Collapse
Affiliation(s)
- Paul D Yousefi
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
| | - Matthew Suderman
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
| | - Ryan Langdon
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
| | - Oliver Whitehurst
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
| | - George Davey Smith
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK
| | - Caroline L Relton
- Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK.
| |
Collapse
|
19
|
Goetze S, Schüffler P, Athanasiou A, Koetemann A, Poyet C, Fankhauser CD, Wild PJ, Schiess R, Wollscheid B. Use of MS-GUIDE for identification of protein biomarkers for risk stratification of patients with prostate cancer. Clin Proteomics 2022; 19:9. [PMID: 35477343 PMCID: PMC9044739 DOI: 10.1186/s12014-022-09349-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 04/05/2022] [Indexed: 11/25/2022] Open
Abstract
Background Non-invasive liquid biopsies could complement current pathological nomograms for risk stratification of prostate cancer patients. Development and testing of potential liquid biopsy markers is time, resource, and cost-intensive. For most protein targets, no antibodies or ELISAs for efficient clinical cohort pre-evaluation are currently available. We reasoned that mass spectrometry-based prescreening would enable the cost-effective and rational preselection of candidates for subsequent clinical-grade ELISA development. Methods Using Mass Spectrometry-GUided Immunoassay DEvelopment (MS-GUIDE), we screened 48 literature-derived biomarker candidates for their potential utility in risk stratification scoring of prostate cancer patients. Parallel reaction monitoring was used to evaluate these 48 potential protein markers in a highly multiplexed fashion in a medium-sized patient cohort of 78 patients with ground-truth prostatectomy and clinical follow-up information. Clinical-grade ELISAs were then developed for two of these candidate proteins and used for significance testing in a larger, independent patient cohort of 263 patients. Results Machine learning-based analysis of the parallel reaction monitoring data of the liquid biopsies prequalified fibronectin and vitronectin as candidate biomarkers. We evaluated their predictive value for prostate cancer biochemical recurrence scoring in an independent validation cohort of 263 prostate cancer patients using clinical-grade ELISAs. The results of our prostate cancer risk stratification test were statistically significantly 10% better than results of the current gold standards PSA alone, PSA plus prostatectomy biopsy Gleason score, or the National Comprehensive Cancer Network score in prediction of recurrence. Conclusion Using MS-GUIDE we identified fibronectin and vitronectin as candidate biomarkers for prostate cancer risk stratification. Supplementary Information The online version contains supplementary material available at 10.1186/s12014-022-09349-x.
Collapse
Affiliation(s)
- Sandra Goetze
- Department of Health Sciences and Technology, Institute of Translational Medicine, Swiss Federal Institute of Technology, ETH Zurich, 8093, Zurich, Switzerland.,Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.,ETH PHRT Swiss Multi-Omics Center (SMOC), 8093, Zurich, Switzerland
| | - Peter Schüffler
- Institute of General and Surgical Pathology, Technical University of Munich, 81675, Munich, Germany
| | | | - Anika Koetemann
- Department of Health Sciences and Technology, Institute of Translational Medicine, Swiss Federal Institute of Technology, ETH Zurich, 8093, Zurich, Switzerland
| | - Cedric Poyet
- Clinic of Urology, University Hospital Zurich, University of Zurich, 8091, Zurich, Switzerland
| | | | - Peter J Wild
- Department of Pathology and Molecular Pathology, University Hospital Zurich, University of Zurich, 8091, Zurich, Switzerland. .,Dr. Senckenberg Institute of Pathology, University Hospital Frankfurt, 60590, Frankfurt, Germany. .,Frankfurt Institute for Advanced Studies (FIAS), 60438, Frankfurt, Germany. .,WILDLAB, University Hospital Frankfurt MVZ GmbH, 60590, Frankfurt, Germany.
| | | | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Translational Medicine, Swiss Federal Institute of Technology, ETH Zurich, 8093, Zurich, Switzerland. .,Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland. .,ETH PHRT Swiss Multi-Omics Center (SMOC), 8093, Zurich, Switzerland.
| |
Collapse
|
20
|
Integration of Omics and Phenotypic Data for Precision Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:19-35. [PMID: 35437716 DOI: 10.1007/978-1-0716-2265-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Over the past two decades, biomedical research is moving toward a big-data-driven approach. The underlying causes of this transition include the ability to gather genetic or molecular profiles of humans faster, the increasing adoption of electronic health record (EHR) system, and the growing interest in linking omics and phenotypic data for analysis. The integration of individual's biology data (e.g., genomics, proteomics, metabolomics), and health-care data has created unprecedented opportunities for precision medicine, that is, a medical model that uses a patient's unique information, mainly genetic, to prevent, diagnose, or treat disease. This chapter reviewed the research opportunities and applications of integrating omics and phenotypic data for precision medicine, such as understanding the relationship between genotype and phenotype, disease subtyping, and diagnosis or prediction of adverse outcomes. We reviewed the recent advanced methods, particularly the machine learning and deep learning-based approaches used for harnessing and harmonizing the multiomics and phenotypic data to address these applications. We finally discussed the challenges and future directions.
Collapse
|
21
|
Balsano C, Alisi A, Brunetto MR, Invernizzi P, Burra P, Piscaglia F. The application of artificial intelligence in hepatology: A systematic review. Dig Liver Dis 2022; 54:299-308. [PMID: 34266794 DOI: 10.1016/j.dld.2021.06.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 06/04/2021] [Accepted: 06/07/2021] [Indexed: 02/06/2023]
Abstract
The integration of human and artificial intelligence (AI) in medicine has only recently begun but it has already become obvious that intelligent systems can dramatically improve the management of liver diseases. Big data made it possible to envisage transformative developments of the use of AI for diagnosing, predicting prognosis and treating liver diseases, but there is still a lot of work to do. If we want to achieve the 21st century digital revolution, there is an urgent need for specific national and international rules, and to adhere to bioethical parameters when collecting data. Avoiding misleading results is essential for the effective use of AI. A crucial question is whether it is possible to sustain, technically and morally, the process of integration between man and machine. We present a systematic review on the applications of AI to hepatology, highlighting the current challenges and crucial issues related to the use of such technologies.
Collapse
Affiliation(s)
- Clara Balsano
- Dept. of Life, Health and Environmental Sciences MESVA, University of L'Aquila, Piazza S. Salvatore Tommasi 1, 67100, Coppito, L'Aquila. Italy; Francesco Balsano Foundation, Via Giovanni Battista Martini 6, 00198, Rome, Italy.
| | - Anna Alisi
- Research Unit of Molecular Genetics of Complex Phenotypes, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Maurizia R Brunetto
- Hepatology Unit and Laboratory of Molecular Genetics and Pathology of Hepatitis Viruses, University Hospital of Pisa, Pisa, Italy
| | - Pietro Invernizzi
- Division of Gastroenterology and Center of Autoimmune Liver Diseases, Department of Medicine and Surgery, San Gerardo Hospital, University of Milano, Bicocca, Italy
| | - Patrizia Burra
- Multivisceral Transplant Unit, Department of Surgery, Oncology, Gastroenterology, Padua University Hospital, Padua, Italy
| | - Fabio Piscaglia
- Division of Internal Medicine, IRCCS Azienda Ospedaliero Universitaria di Bologna, Bologna, Italy
| | | |
Collapse
|
22
|
Ju J, Wismans LV, Mustafa DAM, Reinders MJT, van Eijck CHJ, Stubbs AP, Li Y. Robust deep learning model for prognostic stratification of pancreatic ductal adenocarcinoma patients. iScience 2021; 24:103415. [PMID: 34901786 PMCID: PMC8637475 DOI: 10.1016/j.isci.2021.103415] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/27/2021] [Accepted: 11/05/2021] [Indexed: 02/07/2023] Open
Abstract
A major challenge for treating patients with pancreatic ductal adenocarcinoma (PDAC) is the unpredictability of their prognoses due to high heterogeneity. We present Multi-Omics DEep Learning for Prognosis-correlated subtyping (MODEL-P) to identify PDAC subtypes and to predict prognoses of new patients. MODEL-P was trained on autoencoder integrated multi-omics of 146 patients with PDAC together with their survival outcome. Using MODEL-P, we identified two PDAC subtypes with distinct survival outcomes (median survival 10.1 and 22.7 months, respectively, log rank p = 1 × 10−6), which correspond to DNA damage repair and immune response. We rigorously validated MODEL-P by stratifying patients in five independent datasets into these two survival groups and achieved significant survival difference, which is superior to current practice and other subtyping schemas. We believe the subtype-specific signatures would facilitate PDAC pathogenesis discovery, and MODEL-P can provide clinicians the prognoses information in the treatment decision-making to better gauge the benefits versus the risks. We developed DL-based MODEL-P to identify prognosis-correlated PDAC subtypes The identified subtypes related to DNA damage repair and immune response processes MODEL-P stratified patients from independent datasets into distinct survival groups MODEL-P could be used in clinics to aid treatment decision-making
Collapse
Affiliation(s)
- Jie Ju
- Department of Pathology & Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Leonoor V Wismans
- Department of Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Dana A M Mustafa
- Department of Pathology & Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Marcel J T Reinders
- The Delft Bioinformatics Lab, Delft University of Technology, Rotterdam, the Netherlands
| | - Casper H J van Eijck
- Department of Surgery, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Andrew P Stubbs
- Department of Pathology & Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Yunlei Li
- Department of Pathology & Clinical Bioinformatics, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, the Netherlands
| |
Collapse
|
23
|
Funingana IG, Reinius MAV, Petrillo A, Ang JE, Brenton JD. Can integrative biomarker approaches improve prediction of platinum and PARP inhibitor response in ovarian cancer? Semin Cancer Biol 2021; 77:67-82. [PMID: 33607245 DOI: 10.1016/j.semcancer.2021.02.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 02/06/2021] [Accepted: 02/10/2021] [Indexed: 12/28/2022]
Abstract
Epithelial ovarian carcinoma (EOC) encompasses distinct histological, molecular and genomic entities that determine intrinsic sensitivity to platinum-based chemotherapy. Current management of each subtype is determined by factors including tumour grade and stage, but only a small number of biomarkers can predict treatment response. The recent incorporation of PARP inhibitors into routine clinical practice has underscored the need to personalise ovarian cancer treatment based on tumour biology. In this article, we review the strengths and limitations of predictive biomarkers in current clinical practice and highlight integrative strategies that may inform the development of future personalised medicine programs and composite biomarkers.
Collapse
Affiliation(s)
- Ionut-Gabriel Funingana
- Department of Oncology, University of Cambridge, Cambridge, UK; Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK; Department of Oncology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Marika A V Reinius
- Department of Oncology, University of Cambridge, Cambridge, UK; Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK; Department of Oncology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Angelica Petrillo
- Medical Oncology Unit, Ospedale del Mare, Naples, Italy; University of Study of Campania "L.Vanvitelli", Naples, Italy.
| | - Joo Ern Ang
- Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK; Department of Oncology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - James D Brenton
- Department of Oncology, University of Cambridge, Cambridge, UK; Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge, UK; Department of Oncology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
24
|
Tan MS, Cheah PL, Chin AV, Looi LM, Chang SW. A review on omics-based biomarkers discovery for Alzheimer's disease from the bioinformatics perspectives: Statistical approach vs machine learning approach. Comput Biol Med 2021; 139:104947. [PMID: 34678481 DOI: 10.1016/j.compbiomed.2021.104947] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 12/26/2022]
Abstract
Alzheimer's Disease (AD) is a neurodegenerative disease that affects cognition and is the most common cause of dementia in the elderly. As the number of elderly individuals increases globally, the incidence and prevalence of AD are expected to increase. At present, AD is diagnosed clinically, according to accepted criteria. The essential elements in the diagnosis of AD include a patients history, a physical examination and neuropsychological testing, in addition to appropriate investigations such as neuroimaging. The omics-based approach is an emerging field of study that may not only aid in the diagnosis of AD but also facilitate the exploration of factors that influence the development of the disease. Omics techniques, including genomics, transcriptomics, proteomics and metabolomics, may reveal the pathways that lead to neuronal death and identify biomolecular markers associated with AD. This will further facilitate an understanding of AD neuropathology. In this review, omics-based approaches that were implemented in studies on AD were assessed from a bioinformatics perspective. Current state-of-the-art statistical and machine learning approaches used in the single omics analysis of AD were compared based on correlations of variants, differential expression, functional analysis and network analysis. This was followed by a review of the approaches used in the integration and analysis of multi-omics of AD. The strengths and limitations of multi-omics analysis methods were explored and the issues and challenges associated with omics studies of AD were highlighted. Lastly, future studies in this area of research were justified.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Phaik-Leng Cheah
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Ai-Vyrn Chin
- Division of Geriatric Medicine, Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Lai-Meng Looi
- Department of Pathology, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Siow-Wee Chang
- Bioinformatics Programme, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
25
|
Jamshidi A, Pelletier JP, Labbe A, Abram F, Martel-Pelletier J, Droit A. Machine Learning-Based Individualized Survival Prediction Model for Total Knee Replacement in Osteoarthritis: Data From the Osteoarthritis Initiative. Arthritis Care Res (Hoboken) 2021; 73:1518-1527. [PMID: 33749148 DOI: 10.1002/acr.24601] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 03/18/2021] [Indexed: 12/28/2022]
Abstract
OBJECTIVE By using machine learning, our study aimed to build a model to predict risk and time to total knee replacement (TKR) of an osteoarthritic knee. METHODS Features were from the Osteoarthritis Initiative (OAI) cohort at baseline. Using the lasso method for variable selection in the Cox regression model, we identified the 10 most important characteristics among 1,107 features. The prognostic power of the selected features was assessed by the Kaplan-Meier method and applied to 7 machine learning methods: Cox, DeepSurv, random forests algorithm, linear/kernel support vector machine (SVM), and linear/neural multi-task logistic regression models. As some of the 10 first-found features included similar radiographic measurements, we further looked at using the least number of features without compromising the accuracy of the model. Prediction performance was assessed by the concordance index, Brier score, and time-dependent area under the curve (AUC). RESULTS Ten features were identified and included radiographs, bone marrow lesions of the medial condyle on magnetic resonance imaging, hyaluronic acid injection, performance measure, medical history, and knee-related symptoms. The methodologies Cox, DeepSurv, and linear SVM demonstrated the highest accuracy (concordance index scores of 0.85, Brier score of 0.02, and an AUC of 0.87). DeepSurv was chosen to build the prediction model to estimate the time to TKR for a given knee. Moreover, we were able to decrease the features to only 3 and maintain the high accuracy (concordance index of 0.85, Brier score of 0.02, and AUC of 0.86), which included bone marrow lesions, Kellgren/Lawrence grade, and knee-related symptoms, to predict risk and time of a TKR event. CONCLUSION For the first time, we developed a model using the OAI cohort to predict with high accuracy if a given osteoarthritic knee would require TKR, when a TKR would be required, and who would likely progress fast toward this event.
Collapse
Affiliation(s)
- Afshin Jamshidi
- University of Montreal Hospital Research Centre, Montreal, Quebec, Canada, and Laval University Hospital Research Centre, Montreal, Quebec, Canada
| | | | | | | | | | - Arnaud Droit
- Laval University Hospital Research Centre, Quebec, Canada
| |
Collapse
|
26
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 205] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
27
|
Zhang J, Lu H, Zhang S, Wang T, Zhao H, Guan F, Zeng P. Leveraging Methylation Alterations to Discover Potential Causal Genes Associated With the Survival Risk of Cervical Cancer in TCGA Through a Two-Stage Inference Approach. Front Genet 2021; 12:667877. [PMID: 34149809 PMCID: PMC8206792 DOI: 10.3389/fgene.2021.667877] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/19/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Multiple genes were previously identified to be associated with cervical cancer; however, the genetic architecture of cervical cancer remains unknown and many potential causal genes are yet to be discovered. METHODS To explore potential causal genes related to cervical cancer, a two-stage causal inference approach was proposed within the framework of Mendelian randomization, where the gene expression was treated as exposure, with methylations located within the promoter regions of genes serving as instrumental variables. Five prediction models were first utilized to characterize the relationship between the expression and methylations for each gene; then, the methylation-regulated gene expression (MReX) was obtained and the association was evaluated via Cox mixed-effect model based on MReX. We further implemented the aggregated Cauchy association test (ACAT) combination to take advantage of respective strengths of these prediction models while accounting for dependency among the p-values. RESULTS A total of 14 potential causal genes were discovered to be associated with the survival risk of cervical cancer in TCGA when the five prediction models were separately employed. The total number of potential causal genes was brought to 23 when conducting ACAT. Some of the newly discovered genes may be novel (e.g., YJEFN3, SPATA5L1, IMMP1L, C5orf55, PPIP5K2, ZNF330, CRYZL1, PPM1A, ESCO2, ZNF605, ZNF225, ZNF266, FICD, and OSTC). Functional analyses showed that these genes were enriched in tumor-associated pathways. Additionally, four genes (i.e., COL6A1, SYDE1, ESCO2, and GIPC1) were differentially expressed between tumor and normal tissues. CONCLUSION Our study discovered promising candidate genes that were causally associated with the survival risk of cervical cancer and thus provided new insights into the genetic etiology of cervical cancer.
Collapse
Affiliation(s)
- Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Shuo Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
28
|
Asada K, Kaneko S, Takasawa K, Machino H, Takahashi S, Shinkai N, Shimoyama R, Komatsu M, Hamamoto R. Integrated Analysis of Whole Genome and Epigenome Data Using Machine Learning Technology: Toward the Establishment of Precision Oncology. Front Oncol 2021; 11:666937. [PMID: 34055633 PMCID: PMC8149908 DOI: 10.3389/fonc.2021.666937] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/26/2021] [Indexed: 12/17/2022] Open
Abstract
With the completion of the International Human Genome Project, we have entered what is known as the post-genome era, and efforts to apply genomic information to medicine have become more active. In particular, with the announcement of the Precision Medicine Initiative by U.S. President Barack Obama in his State of the Union address at the beginning of 2015, "precision medicine," which aims to divide patients and potential patients into subgroups with respect to disease susceptibility, has become the focus of worldwide attention. The field of oncology is also actively adopting the precision oncology approach, which is based on molecular profiling, such as genomic information, to select the appropriate treatment. However, the current precision oncology is dominated by a method called targeted-gene panel (TGP), which uses next-generation sequencing (NGS) to analyze a limited number of specific cancer-related genes and suggest optimal treatments, but this method causes the problem that the number of patients who benefit from it is limited. In order to steadily develop precision oncology, it is necessary to integrate and analyze more detailed omics data, such as whole genome data and epigenome data. On the other hand, with the advancement of analysis technologies such as NGS, the amount of data obtained by omics analysis has become enormous, and artificial intelligence (AI) technologies, mainly machine learning (ML) technologies, are being actively used to make more efficient and accurate predictions. In this review, we will focus on whole genome sequencing (WGS) analysis and epigenome analysis, introduce the latest results of omics analysis using ML technologies for the development of precision oncology, and discuss the future prospects.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Syuzo Kaneko
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Ryo Shimoyama
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ryuji Hamamoto
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
29
|
Molecular Landscape of the Epithelial-Mesenchymal Transition in Endometrioid Endometrial Cancer. J Clin Med 2021; 10:jcm10071520. [PMID: 33917330 PMCID: PMC8038735 DOI: 10.3390/jcm10071520] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 04/02/2021] [Indexed: 12/25/2022] Open
Abstract
Modern diagnostics are based on molecular analysis and have been focused on searching for new molecular markers to use in diagnostics. Included in this has been the search for the correlation between gene expression in tissue samples and liquid biological materials. The aim of this study was to evaluate the differences in the expression profile of messenger RNA (mRNA) and micro-RNA (miRNA) related to the epithelial-mesenchymal transition (EMT) in different grades of endometrial cancer (G1-G3), in order to select the most promising molecular markers. The study material consisted of tissue samples and whole blood collected from 30 patients with endometrial cancer (study group; G1 = 15; G2 = 8; G3 = 7) and 30 without neoplastic changes (control group). The molecular analysis included the use of the microarray technique and RTqPCR. Microarray analysis indicated the following number of mRNA differentiating the endometrial cancer samples from the control (tissue/blood): G1 vs. C = 21/18 mRNAs, G2 vs. C = 19/14 mRNAs, and G3 vs. C = 10/9 mRNAs. The common genes for the tissue and blood samples (Fold Change; FC > 3.0) were G1 vs. C: TGFB1, WNT5A, TGFB2, and NOTCH1; G2 vs. C: BCL2L, SOX9, BAMBI, and SMAD4; G3 vs. C STAT1 and TGFB1. In addition, mRNA TGFB1, NOTCH1, and BCL2L are common for all grades of endometrial cancer. The analysis showed that miR-144, miR-106a, and miR-30d are most strongly associated with EMT, making them potential diagnostic markers.
Collapse
|
30
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
31
|
|
32
|
Polewko-Klim A, Mnich K, Rudnicki WR. Robust Data Integration Method for Classification of Biomedical Data. J Med Syst 2021; 45:45. [PMID: 33624190 PMCID: PMC7902598 DOI: 10.1007/s10916-021-01718-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 01/26/2021] [Indexed: 10/26/2022]
Abstract
We present a protocol for integrating two types of biological data - clinical and molecular - for more effective classification of patients with cancer. The proposed approach is a hybrid between early and late data integration strategy. In this hybrid protocol, the set of informative clinical features is extended by the classification results based on molecular data sets. The results are then treated as new synthetic variables. The hybrid protocol was applied to METABRIC breast cancer samples and TCGA urothelial bladder carcinoma samples. Various data types were used for clinical endpoint prediction: clinical data, gene expression, somatic copy number aberrations, RNA-Seq, methylation, and reverse phase protein array. The performance of the hybrid data integration was evaluated with a repeated cross validation procedure and compared with other methods of data integration: early integration and late integration via super learning. The hybrid method gave similar results to those obtained by the best of the tested variants of super learning. What is more, the hybrid method allowed for further sensitivity analysis and recursive feature elimination, which led to compact predictive models for cancer clinical endpoints. For breast cancer, the final model consists of eight clinical variables and two synthetic features obtained from molecular data. For urothelial bladder carcinoma, only two clinical features and one synthetic variable were necessary to build the best predictive model. We have shown that the inclusion of the synthetic variables based on the RNA expression levels and copy number alterations can lead to improved quality of prognostic tests. Thus, it should be considered for inclusion in wider medical practice.
Collapse
Affiliation(s)
- Aneta Polewko-Klim
- Institute of Computer Science, University of Bialystok, Bialystok, Poland
| | - Krzysztof Mnich
- Computational Center, University of Bialystok, Bialystok, Poland
| | - Witold R. Rudnicki
- Institute of Computer Science, University of Bialystok, Bialystok, Poland
- Computational Center, University of Bialystok, Bialystok, Poland
| |
Collapse
|
33
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
34
|
Santos HP, Bhattacharya A, Joseph RM, Smeester L, Kuban KCK, Marsit CJ, O'Shea TM, Fry RC. Evidence for the placenta-brain axis: multi-omic kernel aggregation predicts intellectual and social impairment in children born extremely preterm. Mol Autism 2020; 11:97. [PMID: 33308293 PMCID: PMC7730750 DOI: 10.1186/s13229-020-00402-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 11/30/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Children born extremely preterm are at heightened risk for intellectual and social impairment, including Autism Spectrum Disorder (ASD). There is increasing evidence for a key role of the placenta in prenatal developmental programming, suggesting that the placenta may, in part, contribute to origins of neurodevelopmental outcomes. METHODS We examined associations between placental transcriptomic and epigenomic profiles and assessed their ability to predict intellectual and social impairment at age 10 years in 379 children from the Extremely Low Gestational Age Newborn (ELGAN) cohort. Assessment of intellectual ability (IQ) and social function was completed with the Differential Ability Scales-II and Social Responsiveness Scale (SRS), respectively. Examining IQ and SRS allows for studying ASD risk beyond the diagnostic criteria, as IQ and SRS are continuous measures strongly correlated with ASD. Genome-wide mRNA, CpG methylation and miRNA were assayeds with the Illumina Hiseq 2500, HTG EdgeSeq miRNA Whole Transcriptome Assay, and Illumina EPIC/850 K array, respectively. We conducted genome-wide differential analyses of placental mRNA, miRNA, and CpG methylation data. These molecular features were then integrated for a predictive analysis of IQ and SRS outcomes using kernel aggregation regression. We lastly examined associations between ASD and the multi-omic-predicted component of IQ and SRS. RESULTS Genes with important roles in neurodevelopment and placental tissue organization were associated with intellectual and social impairment. Kernel aggregations of placental multi-omics strongly predicted intellectual and social function, explaining approximately 8% and 12% of variance in SRS and IQ scores via cross-validation, respectively. Predicted in-sample SRS and IQ showed significant positive and negative associations with ASD case-control status. LIMITATIONS The ELGAN cohort comprises children born pre-term, and generalization may be affected by unmeasured confounders associated with low gestational age. We conducted external validation of predictive models, though the sample size (N = 49) and the scope of the available out-sample placental dataset are limited. Further validation of the models is merited. CONCLUSIONS Aggregating information from biomarkers within and among molecular data types improves prediction of complex traits like social and intellectual ability in children born extremely preterm, suggesting that traits within the placenta-brain axis may be omnigenic.
Collapse
Affiliation(s)
- Hudson P Santos
- Biobehavioral Laboratory, School of Nursing, University of North Carolina, 544 Carrington Hall, Campus Box 7460, Chapel Hill, NC, 27599-7460, USA.
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA, USA
| | - Robert M Joseph
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA, USA
| | - Lisa Smeester
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, USA
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Karl C K Kuban
- Department of Pediatrics, Division of Pediatric Neurology, Boston University Medical Center, Boston, MA, USA
| | - Carmen J Marsit
- Department of Environmental Health, Emory University, Atlanta, GA, 30322, USA
| | - T Michael O'Shea
- Department of Pediatrics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Rebecca C Fry
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, USA
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
35
|
Whetton AD, Preston GW, Abubeker S, Geifman N. Proteomics and Informatics for Understanding Phases and Identifying Biomarkers in COVID-19 Disease. J Proteome Res 2020; 19:4219-4232. [PMID: 32657586 PMCID: PMC7384384 DOI: 10.1021/acs.jproteome.0c00326] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Indexed: 02/07/2023]
Abstract
The emergence of novel coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 coronavirus, has necessitated the urgent development of new diagnostic and therapeutic strategies. Rapid research and development, on an international scale, has already generated assays for detecting SARS-CoV-2 RNA and host immunoglobulins. However, the complexities of COVID-19 are such that fuller definitions of patient status, trajectory, sequelae, and responses to therapy are now required. There is accumulating evidence-from studies of both COVID-19 and the related disease SARS-that protein biomarkers could help to provide this definition. Proteins associated with blood coagulation (D-dimer), cell damage (lactate dehydrogenase), and the inflammatory response (e.g., C-reactive protein) have already been identified as possible predictors of COVID-19 severity or mortality. Proteomics technologies, with their ability to detect many proteins per analysis, have begun to extend these early findings. To be effective, proteomics strategies must include not only methods for comprehensive data acquisition (e.g., using mass spectrometry) but also informatics approaches via which to derive actionable information from large data sets. Here we review applications of proteomics to COVID-19 and SARS and outline how pipelines involving technologies such as artificial intelligence could be of value for research on these diseases.
Collapse
Affiliation(s)
- Anthony D. Whetton
- Stoller
Biomarker Discovery Centre, Faculty of Biology Medicine and Health
(FBMH), University of Manchester, Manchester M20 4GJ, United Kingdom
- Stem
Cell and Leukaemia Proteomics Laboratory, Manchester Cancer Research
Centre, University of Manchester, Manchester M13 9PL, United Kingdom
- Manchester
National Institute for Health Biomedical Research Centre, Manchester M13 9WL, United Kingdom
| | - George W. Preston
- Stoller
Biomarker Discovery Centre, Faculty of Biology Medicine and Health
(FBMH), University of Manchester, Manchester M20 4GJ, United Kingdom
- Stem
Cell and Leukaemia Proteomics Laboratory, Manchester Cancer Research
Centre, University of Manchester, Manchester M13 9PL, United Kingdom
| | - Semira Abubeker
- Stoller
Biomarker Discovery Centre, Faculty of Biology Medicine and Health
(FBMH), University of Manchester, Manchester M20 4GJ, United Kingdom
- Stem
Cell and Leukaemia Proteomics Laboratory, Manchester Cancer Research
Centre, University of Manchester, Manchester M13 9PL, United Kingdom
| | - Nophar Geifman
- Centre
for Health Informatics, FBMH, University
of Manchester, Manchester M13 9PL, United Kingdom
| |
Collapse
|
36
|
Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020; 10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of "big"-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
Collapse
Affiliation(s)
- Nupur Biswas
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| |
Collapse
|
37
|
Yu X, Wang T, Huang S, Zeng P. How Can Gene-Expression Information Improve Prognostic Prediction in TCGA Cancers: An Empirical Comparison Study on Regularization and Mixed Cox Models. Front Genet 2020; 11:920. [PMID: 32973875 PMCID: PMC7472843 DOI: 10.3389/fgene.2020.00920] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 07/23/2020] [Indexed: 12/30/2022] Open
Abstract
Background Previous cancer prognostic prediction models often consider only the most important transcriptomic expressions, and their power is limited. It is unknown whether prediction power can be further improved when additional transcriptomic information is incorporated. Methods To integrate transcriptomes, four models are compared based on 32 types of cancer in the Cancer Genome Atlas, including the general Cox model with only clinical covariates, the Cox model with a lasso penalty (coxlasso), the Cox model with an elastic net penalty (coxenet), and the mixed-effects Cox model (coxlmm). Furthermore, we partition the survival variance into the relative contribution of clinical and transcriptomic components within the framework of coxlmm. Finally, the influence of different numbers of genes was evaluated in the context of coxlmm. Results Compared with the clinical covariates–only Cox model, the average prediction gain was 2.4% for coxlasso, 4.2% for coxenet, and 7.2% for coxlmm across 16 low-censored cancers; a significant elevation of prediction power was observed for SARC, SKCM, LGG, PAAD, and HNSC. Similar findings were observed for all 32 cancers with the average prediction gain of 2.7, 3.8, and 5.8% for coxlasso, coxenet, and coxlmm. Coxlmm always had comparable or better prediction performance relative to coxlasso and coxenet with an average of 2.8% prediction improvement across the 16 low-censored cancers. In addition, it is shown that the predictive accuracy of coxlmm generally increases with the number of genes included. The survival variance partition analysis demonstrates that the transcriptomic contribution was higher for some cancers (e.g., LGG, CESC, PAAD, SKCM, and SARC) and lower for others (e.g., BRCA, COAD, KIRC, and STAD). Conclusion This study demonstrates that the integration of transcriptomic information can substantially improve prognostic prediction accuracy, but the prediction performance is cancer-specific and varies across cancer types. It further reveals that gene expression exhibits distinct contributions to survival variation across cancers.
Collapse
Affiliation(s)
- Xinghao Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
38
|
Weichenhan D, Lipka DB, Lutsik P, Goyal A, Plass C. Epigenomic technologies for precision oncology. Semin Cancer Biol 2020; 84:60-68. [PMID: 32822861 DOI: 10.1016/j.semcancer.2020.08.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 07/28/2020] [Accepted: 08/03/2020] [Indexed: 12/15/2022]
Abstract
Epigenetic patterns in a cell control the expression of genes and consequently determine the phenotype of a cell. Cancer cells possess altered epigenomes which include aberrant patterns of DNA methylation, histone tail modifications, nucleosome positioning and of the three-dimensional chromatin organization within a nucleus. These altered epigenetic patterns are potential useful biomarkers to detect cancer cells and to classify tumor types. In addition, the cancer epigenome dictates the response of a cancer cell to therapeutic intervention and, therefore its knowledge, will allow to predict response to different therapeutic approaches. Here we review the current state-of-the-art technologies that have been developed to decipher epigenetic patterns on the genomic level and discuss how these methods are potentially useful for precision oncology.
Collapse
Affiliation(s)
- Dieter Weichenhan
- German Cancer Research Center Heidelberg, Cancer Epigenomics (B370), Im Neuenheimer Feld 280, D-69120, Heidelberg, Germany.
| | - Daniel B Lipka
- Section of Translational Cancer Epigenomics, Division of Translational Medical Oncology, National Center for Tumor Diseases Heidelberg & German Cancer Research Center, Im Neuenheimer Feld 581, D-69120, Heidelberg, Germany; Faculty of Medicine, Medical Center, Otto-von-Guericke-University, Leipziger Straße 44, D-39120, Magdeburg, Germany.
| | - Pavlo Lutsik
- German Cancer Research Center Heidelberg, Cancer Epigenomics (B370), Im Neuenheimer Feld 280, D-69120, Heidelberg, Germany.
| | - Ashish Goyal
- German Cancer Research Center Heidelberg, Cancer Epigenomics (B370), Im Neuenheimer Feld 280, D-69120, Heidelberg, Germany.
| | - Christoph Plass
- German Cancer Research Center Heidelberg, Cancer Epigenomics (B370), Im Neuenheimer Feld 280, D-69120, Heidelberg, Germany.
| |
Collapse
|
39
|
Shanthikumar S, Neeland MR, Maksimovic J, Ranganathan SC, Saffery R. DNA methylation biomarkers of future health outcomes in children. Mol Cell Pediatr 2020; 7:7. [PMID: 32642955 PMCID: PMC7343681 DOI: 10.1186/s40348-020-00099-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 06/25/2020] [Indexed: 11/10/2022] Open
Abstract
Biomarkers which predict future health outcomes are key to the goals of precision health. Such biomarkers do not have to be involved in the causal pathway of a disease, and their performance is best assessed using statistical tests of clinical performance and evaluation of net health impact. DNA methylation is the most commonly studied epigenetic process and represents a potential biomarker of future health outcomes. We review 25 studies in non-oncological paediatric conditions where DNA methylation biomarkers of future health outcomes are assessed. Whilst a number of positive findings have been described, the body of evidence is severely limited by issues with outcome measures, tissue-specific samples, accounting for sample cell type heterogeneity, lack of appropriate statistical testing, small effect sizes, limited validation, and no assessment of net health impact. Future studies should concentrate on careful study design to overcome these issues, and integration of DNA methylation data with other 'omic', clinical, and environmental data to generate the most clinically useful biomarkers of paediatric disease.
Collapse
Affiliation(s)
- Shivanthan Shanthikumar
- Respiratory and Sleep Medicine, Royal Children's Hospital, Flemington Road, Parkville, Melbourne, Victoria, 3052, Australia. .,Respiratory Diseases, Murdoch Children's Research Institute, Melbourne, Australia. .,Department of Paediatrics, The University of Melbourne, Melbourne, Australia.
| | - Melanie R Neeland
- Department of Paediatrics, The University of Melbourne, Melbourne, Australia.,Epigenetics, Murdoch Children's Research Institute, Melbourne, Australia
| | - Jovana Maksimovic
- Respiratory Diseases, Murdoch Children's Research Institute, Melbourne, Australia.,Department of Paediatrics, The University of Melbourne, Melbourne, Australia.,Computational Biology, Peter MacCallum Cancer Centre, Melbourne, Australia
| | - Sarath C Ranganathan
- Respiratory and Sleep Medicine, Royal Children's Hospital, Flemington Road, Parkville, Melbourne, Victoria, 3052, Australia.,Respiratory Diseases, Murdoch Children's Research Institute, Melbourne, Australia.,Department of Paediatrics, The University of Melbourne, Melbourne, Australia
| | - Richard Saffery
- Department of Paediatrics, The University of Melbourne, Melbourne, Australia.,Epigenetics, Murdoch Children's Research Institute, Melbourne, Australia
| |
Collapse
|
40
|
Nicora G, Vitali F, Dagliati A, Geifman N, Bellazzi R. Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools. Front Oncol 2020; 10:1030. [PMID: 32695678 PMCID: PMC7338582 DOI: 10.3389/fonc.2020.01030] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/26/2020] [Indexed: 12/16/2022] Open
Abstract
In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.
Collapse
Affiliation(s)
- Giovanna Nicora
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Francesca Vitali
- Center for Innovation in Brain Science, University of Arizona, Tucson, AZ, United States.,Department of Neurology, College of Medicine, University of Arizona, Tucson, AZ, United States.,Center for Biomedical Informatics and Biostatistics, University of Arizona, Tucson, AZ, United States
| | - Arianna Dagliati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.,Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Nophar Geifman
- Centre for Health Informatics, The University of Manchester, Manchester, United Kingdom.,The Manchester Molecular Pathology Innovation Centre, The University of Manchester, Manchester, United Kingdom
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
41
|
Zhao N, Guo M, Wang K, Zhang C, Liu X. Identification of Pan-Cancer Prognostic Biomarkers Through Integration of Multi-Omics Data. Front Bioeng Biotechnol 2020; 8:268. [PMID: 32300588 PMCID: PMC7142216 DOI: 10.3389/fbioe.2020.00268] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/13/2020] [Indexed: 01/09/2023] Open
Abstract
Prognostic biomarkers dedicating to treat cancer are very difficult to identify. Although high-throughput sequencing technology allows us to mine prognostic biomarkers much deeper by analyzing omics data, there is lack of effective methods to comprehensively utilize multi-omics data. In this work, we integrated multi-omics data [DNA methylation (DM), gene expression (GE), somatic copy number alternation, and microRNA expression (ME)] and proposed a method to rank genes by desiring a “Score.” Applying the method, cancer-specific prognostic biomarkers for 13 cancers were obtained. The prognostic powers of the biomarkers were further assessed by C-indexes (ranged from 0.76 to 0.96). Moreover, by comparing the 13 survival-related gene lists, seven genes (SLK, API5, BTBD2, PTAR1, VPS37A, EIF2B1, and ZRANB1) were found to be associated with prognosis in a variety of cancers. In particular, SLK was more likely to be cancer-related due to its high missense mutation rate and associated with cell adhesion. Furthermore, after network analysis, EPRS, HNRNPA2B1, BPTF, LRRK1, and PUM1 were demonstrated to have a broad correlation with cancers. In summary, our method has a better integration of multi-omics data that can be extended to the researches of other diseases. And the prognostic biomarkers had a better prognostic power than previous methods. Our results could provide a reference for translational medicine researchers and clinicians.
Collapse
Affiliation(s)
- Ning Zhao
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China.,School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Kuanquan Wang
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunlong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
42
|
Singh NP, Vinod PK. Integrative analysis of DNA methylation and gene expression in papillary renal cell carcinoma. Mol Genet Genomics 2020; 295:807-824. [PMID: 32185457 DOI: 10.1007/s00438-020-01664-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 03/03/2020] [Indexed: 12/18/2022]
Abstract
Patterns of DNA methylation are significantly altered in cancers. Interpreting the functional consequences of DNA methylation requires the integration of multiple forms of data. The recent advancement in the next-generation sequencing can help to decode this relationship and in biomarker discovery. In this study, we investigated the methylation patterns of papillary renal cell carcinoma (PRCC) and its relationship with the gene expression using The Cancer Genome Atlas (TCGA) multi-omics data. We found that the promoter and body of tumor suppressor genes, microRNAs and gene clusters and families, including cadherins, protocadherins, claudins and collagens, are hypermethylated in PRCC. Hypomethylated genes in PRCC are associated with the immune function. The gene expression of several novel candidate genes, including interleukin receptor IL17RE and immune checkpoint genes HHLA2, SIRPA and HAVCR2, shows a significant correlation with DNA methylation. We also developed machine learning models using features extracted from single and multi-omics data to distinguish early and late stages of PRCC. A comparative study of different feature selection algorithms, predictive models, data integration techniques and representations of methylation data was performed. Integration of both gene expression and DNA methylation features improved the performance of models in distinguishing tumor stages. In summary, our study identifies PRCC driver genes and proposes predictive models based on both DNA methylation and gene expression. These results on PRCC will aid in targeted experiments and provide a strategy to improve the classification accuracy of tumor stages.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Center for Computational Natural Sciences and Bioinformatics, IIIT Hyderabad, Hyderabad, 500032, India
| | - P K Vinod
- Center for Computational Natural Sciences and Bioinformatics, IIIT Hyderabad, Hyderabad, 500032, India.
| |
Collapse
|
43
|
Guven DC, Aktas BY, Simsek C, Aksoy S. Gut microbiota and cancer immunotherapy: prognostic and therapeutic implications. Future Oncol 2020; 16:497-506. [PMID: 32100550 DOI: 10.2217/fon-2019-0783] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The immune checkpoint inhibitors have opened new horizons in oncology. Although the indications for the use of Immune checkpoint inhibitors in cancer patients are expanding, there is still a need for markers that can aid in patient selection. Gastrointestinal microbiota can be among these markers. Recently, gastrointestinal microbiota stated to have a bidirectional relation with cancer immunotherapy with roles in both prognostic and therapeutic sides. Preclinical data suggest that modulation of the microbiota could become a novel strategy for improving the efficacy of immunotherapy. However, its labile structure prone to be affected by many factors. Further research can delineate the mechanisms of the relationship between microbiota and immunotherapy can have clinical implications.
Collapse
Affiliation(s)
- Deniz Can Guven
- Department of Medical Oncology, Hacettepe University Cancer Institute, Ankara 06100, Turkey
| | - Burak Yasin Aktas
- Department of Medical Oncology, Hacettepe University Cancer Institute, Ankara 06100, Turkey
| | - Cem Simsek
- Department of Gastroenterology, Hacettepe University Faculty of Medicine, Ankara 06100, Turkey
| | - Sercan Aksoy
- Department of Medical Oncology, Hacettepe University Cancer Institute, Ankara 06100, Turkey
| |
Collapse
|
44
|
Hamamoto R, Komatsu M, Takasawa K, Asada K, Kaneko S. Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine. Biomolecules 2019; 10:biom10010062. [PMID: 31905969 PMCID: PMC7023005 DOI: 10.3390/biom10010062] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/20/2019] [Accepted: 12/27/2019] [Indexed: 12/14/2022] Open
Abstract
To clarify the mechanisms of diseases, such as cancer, studies analyzing genetic mutations have been actively conducted for a long time, and a large number of achievements have already been reported. Indeed, genomic medicine is considered the core discipline of precision medicine, and currently, the clinical application of cutting-edge genomic medicine aimed at improving the prevention, diagnosis and treatment of a wide range of diseases is promoted. However, although the Human Genome Project was completed in 2003 and large-scale genetic analyses have since been accomplished worldwide with the development of next-generation sequencing (NGS), explaining the mechanism of disease onset only using genetic variation has been recognized as difficult. Meanwhile, the importance of epigenetics, which describes inheritance by mechanisms other than the genomic DNA sequence, has recently attracted attention, and, in particular, many studies have reported the involvement of epigenetic deregulation in human cancer. So far, given that genetic and epigenetic studies tend to be accomplished independently, physiological relationships between genetics and epigenetics in diseases remain almost unknown. Since this situation may be a disadvantage to developing precision medicine, the integrated understanding of genetic variation and epigenetic deregulation appears to be now critical. Importantly, the current progress of artificial intelligence (AI) technologies, such as machine learning and deep learning, is remarkable and enables multimodal analyses of big omics data. In this regard, it is important to develop a platform that can conduct multimodal analysis of medical big data using AI as this may accelerate the realization of precision medicine. In this review, we discuss the importance of genome-wide epigenetic and multiomics analyses using AI in the era of precision medicine.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Correspondence: ; Tel.: +81-3-3547-5271
| | - Masaaki Komatsu
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Takasawa
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ken Asada
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Syuzo Kaneko
- Division of Molecular Modification and Cancer Biology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan; (M.K.); (K.T.); (K.A.); (S.K.)
| |
Collapse
|
45
|
Hao J, Kim Y, Mallavarapu T, Oh JH, Kang M. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC Med Genomics 2019; 12:189. [PMID: 31865908 PMCID: PMC6927105 DOI: 10.1186/s12920-019-0624-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis. Results We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified. Conclusions Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.
Collapse
Affiliation(s)
- Jie Hao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Youngsoon Kim
- Department of Computer Science, Kennesaw State University, Marietta, GA, USA
| | | | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, Las Vegas, NV, USA.
| |
Collapse
|
46
|
Scala G, Federico A, Fortino V, Greco D, Majello B. Knowledge Generation with Rule Induction in Cancer Omics. Int J Mol Sci 2019; 21:E18. [PMID: 31861438 PMCID: PMC6981587 DOI: 10.3390/ijms21010018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 12/21/2022] Open
Abstract
The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.
Collapse
Affiliation(s)
- Giovanni Scala
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland;
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Barbara Majello
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| |
Collapse
|
47
|
Ulfenborg B. Vertical and horizontal integration of multi-omics data with miodin. BMC Bioinformatics 2019; 20:649. [PMID: 31823712 PMCID: PMC6902525 DOI: 10.1186/s12859-019-3224-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 11/14/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies on multiple modalities of omics data such as transcriptomics, genomics and proteomics are growing in popularity, since they allow us to investigate complex mechanisms across molecular layers. It is widely recognized that integrative omics analysis holds the promise to unlock novel and actionable biological insights into health and disease. Integration of multi-omics data remains challenging, however, and requires combination of several software tools and extensive technical expertise to account for the properties of heterogeneous data. RESULTS This paper presents the miodin R package, which provides a streamlined workflow-based syntax for multi-omics data analysis. The package allows users to perform analysis of omics data either across experiments on the same samples (vertical integration), or across studies on the same variables (horizontal integration). Workflows have been designed to promote transparent data analysis and reduce the technical expertise required to perform low-level data import and processing. CONCLUSIONS The miodin package is implemented in R and is freely available for use and extension under the GPL-3 license. Package source, reference documentation and user manual are available at https://gitlab.com/algoromics/miodin.
Collapse
|
48
|
Pierre-Jean M, Deleuze JF, Le Floch E, Mauger F. Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration. Brief Bioinform 2019; 21:2011-2030. [PMID: 31792509 DOI: 10.1093/bib/bbz138] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 10/08/2019] [Accepted: 10/09/2019] [Indexed: 12/22/2022] Open
Abstract
Recent advances in NGS sequencing, microarrays and mass spectrometry for omics data production have enabled the generation and collection of different modalities of high-dimensional molecular data. The integration of multiple omics datasets is a statistical challenge, due to the limited number of individuals, the high number of variables and the heterogeneity of the datasets to integrate. Recently, a lot of tools have been developed to solve the problem of integrating omics data including canonical correlation analysis, matrix factorization and SM. These commonly used techniques aim to analyze simultaneously two or more types of omics. In this article, we compare a panel of 13 unsupervised methods based on these different approaches to integrate various types of multi-omics datasets: iClusterPlus, regularized generalized canonical correlation analysis, sparse generalized canonical correlation analysis, multiple co-inertia analysis (MCIA), integrative-NMF (intNMF), SNF, MoCluster, mixKernel, CIMLR, LRAcluster, ConsensusClustering, PINSPlus and multi-omics factor analysis (MOFA). We evaluate the ability of the methods to recover the subgroups and the variables that drive the clustering on eight benchmarks of simulation. MOFA does not provide any results on these benchmarks. For clustering, SNF, MoCluster, CIMLR, LRAcluster, ConsensusClustering and intNMF provide the best results. For variable selection, MoCluster outperforms the others. However, the performance of the methods seems to depend on the heterogeneity of the datasets (especially for MCIA, intNMF and iClusterPlus). Finally, we apply the methods on three real studies with heterogeneous data and various phenotypes. We conclude that MoCluster is the best method to analyze these omics data. Availability: An R package named CrIMMix is available on GitHub at https://github.com/CNRGH/crimmix to reproduce all the results of this article.
Collapse
|
49
|
Martini P, Chiogna M, Calura E, Romualdi C. MOSClip: multi-omic and survival pathway analysis for the identification of survival associated gene and modules. Nucleic Acids Res 2019; 47:e80. [PMID: 31049575 PMCID: PMC6698707 DOI: 10.1093/nar/gkz324] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 03/29/2019] [Accepted: 04/29/2019] [Indexed: 01/09/2023] Open
Abstract
Survival analyses of gene expression data has been a useful and widely used approach in clinical applications. But, in complex diseases, such as cancer, the identification of survival-associated cell processes - rather than single genes - provides more informative results because the efficacy of survival prediction increases when multiple prognostic features are combined to enlarge the possibility of having druggable targets. Moreover, genome-wide screening in molecular medicine has rapidly grown, providing not only gene expression but also multi-omic measurements such as DNA mutations, methylation, expression, and copy number data. In cancer, virtually all these aberrations can contribute in synergy to pathological processes, and their measurements can improve a patient’s outcome and help in diagnosis and treatment decisions. Here, we present MOSClip, an R package implementing a new topological pathway analysis tool able to integrate multi-omic data and look for survival-associated gene modules. MOSClip tests the survival association of dimensionality-reduced multi-omic data using multivariate models, providing graphical devices for management, browsing and interpretation of results. Using simulated data we evaluated MOSClip performance in terms of false positives and false negatives in different settings, while the TCGA ovarian cancer dataset is used as a case study to highlight MOSClip’s potential.
Collapse
Affiliation(s)
- Paolo Martini
- Department of Biology, University of Padova, Via U.Bassi 58B, 35121 Padova, Italy
| | - Monica Chiogna
- Department of Statistical Sciences 'Paolo Fortunati', University of Bologna, via delle Belle Arti 41, 40126 Bologna, Italy
| | - Enrica Calura
- Department of Biology, University of Padova, Via U.Bassi 58B, 35121 Padova, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Via U.Bassi 58B, 35121 Padova, Italy
| |
Collapse
|
50
|
De Bin R, Boulesteix AL, Benner A, Becker N, Sauerbrei W. Combining clinical and molecular data in regression prediction models: insights from a simulation study. Brief Bioinform 2019; 21:1904-1919. [PMID: 31750518 DOI: 10.1093/bib/bbz136] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 09/20/2019] [Accepted: 10/07/2019] [Indexed: 12/15/2022] Open
Abstract
Data integration, i.e. the use of different sources of information for data analysis, is becoming one of the most important topics in modern statistics. Especially in, but not limited to, biomedical applications, a relevant issue is the combination of low-dimensional (e.g. clinical data) and high-dimensional (e.g. molecular data such as gene expressions) data sources in a prediction model. Not only the different characteristics of the data, but also the complex correlation structure within and between the two data sources, pose challenging issues. In this paper, we investigate these issues via simulations, providing some useful insight into strategies to combine low- and high-dimensional data in a regression prediction model. In particular, we focus on the effect of the correlation structure on the results, while accounting for the influence of our specific choices in the design of the simulation study.
Collapse
Affiliation(s)
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Germany
| | - Axel Benner
- Division of Biostatistics, German Cancer Research Centre of Heidelberg, Germany
| | - Natalia Becker
- Division of Biostatistics, German Cancer Research Centre of Heidelberg, Germany
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, University of Freiburg, Germany
| |
Collapse
|