1
|
Singh Y, Hathaway QA, Erickson BJ. Generative AI in oncological imaging: Revolutionizing cancer detection and diagnosis. Oncotarget 2024; 15:607-608. [PMID: 39236061 PMCID: PMC11376594 DOI: 10.18632/oncotarget.28640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024] Open
Abstract
Generative AI is revolutionizing oncological imaging, enhancing cancer detection and diagnosis. This editorial explores its impact on expanding datasets, improving image quality, and enabling predictive oncology. We discuss ethical considerations and introduce a unique perspective on personalized cancer screening using AI-generated digital twins. This approach could optimize screening protocols, improve early detection, and tailor treatment plans. While challenges remain, generative AI in oncological imaging offers unprecedented opportunities to advance cancer care and improve patient outcomes.
Collapse
|
2
|
Valous NA, Popp F, Zörnig I, Jäger D, Charoentong P. Graph machine learning for integrated multi-omics analysis. Br J Cancer 2024; 131:205-211. [PMID: 38729996 PMCID: PMC11263675 DOI: 10.1038/s41416-024-02706-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open
Abstract
Multi-omics experiments at bulk or single-cell resolution facilitate the discovery of hypothesis-generating biomarkers for predicting response to therapy, as well as aid in uncovering mechanistic insights into cellular and microenvironmental processes. Many methods for data integration have been developed for the identification of key elements that explain or predict disease risk or other biological outcomes. The heterogeneous graph representation of multi-omics data provides an advantage for discerning patterns suitable for predictive/exploratory analysis, thus permitting the modeling of complex relationships. Graph-based approaches-including graph neural networks-potentially offer a reliable methodological toolset that can provide a tangible alternative to scientists and clinicians that seek ideas and implementation strategies in the integrated analysis of their omics sets for biomedical research. Graph-based workflows continue to push the limits of the technological envelope, and this perspective provides a focused literature review of research articles in which graph machine learning is utilized for integrated multi-omics data analyses, with several examples that demonstrate the effectiveness of graph-based approaches.
Collapse
Affiliation(s)
- Nektarios A Valous
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany.
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.
| | - Ferdinand Popp
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Inka Zörnig
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Dirk Jäger
- Applied Tumor Immunity Clinical Cooperation Unit, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Pornpimol Charoentong
- Center for Quantitative Analysis of Molecular and Cellular Biosystems (Bioquant), Heidelberg University, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital (UKHD), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| |
Collapse
|
3
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
4
|
Hussein R, Abou-Shanab AM, Badr E. A multi-omics approach for biomarker discovery in neuroblastoma: a network-based framework. NPJ Syst Biol Appl 2024; 10:52. [PMID: 38760476 PMCID: PMC11101461 DOI: 10.1038/s41540-024-00371-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/16/2024] [Indexed: 05/19/2024] Open
Abstract
Neuroblastoma (NB) is one of the leading causes of cancer-associated death in children. MYCN amplification is a prominent genetic marker for NB, and its targeting to halt NB progression is difficult to achieve. Therefore, an in-depth understanding of the molecular interactome of NB is needed to improve treatment outcomes. Analysis of NB multi-omics unravels valuable insight into the interplay between MYCN transcriptional and miRNA post-transcriptional modulation. Moreover, it aids in the identification of various miRNAs that participate in NB development and progression. This study proposes an integrated computational framework with three levels of high-throughput NB data (mRNA-seq, miRNA-seq, and methylation array). Similarity Network Fusion (SNF) and ranked SNF methods were utilized to identify essential genes and miRNAs. The specified genes included both miRNA-target genes and transcription factors (TFs). The interactions between TFs and miRNAs and between miRNAs and their target genes were retrieved where a regulatory network was developed. Finally, an interaction network-based analysis was performed to identify candidate biomarkers. The candidate biomarkers were further analyzed for their potential use in prognosis and diagnosis. The candidate biomarkers included three TFs and seven miRNAs. Four biomarkers have been previously studied and tested in NB, while the remaining identified biomarkers have known roles in other types of cancer. Although the specific molecular role is yet to be addressed, most identified biomarkers possess evidence of involvement in NB tumorigenesis. Analyzing cellular interactome to identify potential biomarkers is a promising approach that can contribute to optimizing efficient therapeutic regimens to target NB vulnerabilities.
Collapse
Affiliation(s)
- Rahma Hussein
- Biomedical Sciences Program, University of Science and Technology, Zewail City of Science and Technology, Giza, 12578, Egypt
| | - Ahmed M Abou-Shanab
- Biomedical Sciences Program, University of Science and Technology, Zewail City of Science and Technology, Giza, 12578, Egypt
| | - Eman Badr
- Biomedical Sciences Program, University of Science and Technology, Zewail City of Science and Technology, Giza, 12578, Egypt.
- Faculty of Computers and Artificial Intelligence, Cairo University, Giza, 12613, Egypt.
| |
Collapse
|
5
|
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O'Sullivan CC, Boughey JC, Tizhoosh H, Klee EW, Wang L, Goetz MP, Suman V, Kalari KR. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.21.586001. [PMID: 38585820 PMCID: PMC10996492 DOI: 10.1101/2024.03.21.586001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing Deep Neural Networks and incorporating the SHapley Additive exPlanations (SHAP) algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas (TCGA) data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high Area Under Curve (AUC) scores - 0.98±0.02 for lung cancer subtype differentiation, 0.83±0.07 for breast cancer PAM50 subtypes, and successfully distinguishe between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing existing algorithms. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient, and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
|
6
|
Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024; 61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]
Abstract
The integration of artificial intelligence technologies has propelled the progress of clinical and genomic medicine in recent years. The significant increase in computing power has facilitated the ability of artificial intelligence models to analyze and extract features from extensive medical data and images, thereby contributing to the advancement of intelligent diagnostic tools. Artificial intelligence (AI) models have been utilized in the field of personalized medicine to integrate clinical data and genomic information of patients. This integration allows for the identification of customized treatment recommendations, ultimately leading to enhanced patient outcomes. Notwithstanding the notable advancements, the application of artificial intelligence (AI) in the field of medicine is impeded by various obstacles such as the limited availability of clinical and genomic data, the diversity of datasets, ethical implications, and the inconclusive interpretation of AI models' results. In this review, a comprehensive evaluation of multiple machine learning algorithms utilized in the fields of clinical and genomic medicine is conducted. Furthermore, we present an overview of the implementation of artificial intelligence (AI) in the fields of clinical medicine, drug discovery, and genomic medicine. Finally, a number of constraints pertaining to the implementation of artificial intelligence within the healthcare industry are examined.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
| | - Luigi Bonizzi
- Department of Biomedical, Surgical and Dental Science, University of Milan, Milan, Italy
| | - Sara Botti
- PTP Science Park, Via Einstein - Loc. Cascina Codazza, Lodi, Italy
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco
| |
Collapse
|
7
|
Luo H, Liang H, Liu H, Fan Z, Wei Y, Yao X, Cong S. TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction. Int J Mol Sci 2024; 25:1655. [PMID: 38338932 PMCID: PMC10855161 DOI: 10.3390/ijms25031655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/20/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.
Collapse
Affiliation(s)
- Haoran Luo
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Hongwei Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Zhoujie Fan
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
| | - Yanhui Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Xiaohui Yao
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| | - Shan Cong
- Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China; (H.L.); (Z.F.)
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China; (H.L.); (H.L.); (Y.W.)
| |
Collapse
|
8
|
Maiorino E, De Marzio M, Xu Z, Yun JH, Chase RP, Hersh CP, Weiss ST, Silverman EK, Castaldi PJ, Glass K. Joint clinical and molecular subtyping of COPD with variational autoencoders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.08.19.23294298. [PMID: 38260473 PMCID: PMC10802661 DOI: 10.1101/2023.08.19.23294298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a complex, heterogeneous disease. Traditional subtyping methods generally focus on either the clinical manifestations or the molecular endotypes of the disease, resulting in classifications that do not fully capture the disease's complexity. Here, we bridge this gap by introducing a subtyping pipeline that integrates clinical and gene expression data with variational autoencoders. We apply this methodology to the COPDGene study, a large study of current and former smoking individuals with and without COPD. Our approach generates a set of vector embeddings, called Personalized Integrated Profiles (PIPs), that recapitulate the joint clinical and molecular state of the subjects in the study. Prediction experiments show that the PIPs have a predictive accuracy comparable to or better than other embedding approaches. Using trajectory learning approaches, we analyze the main trajectories of variation in the PIP space and identify five well-separated subtypes with distinct clinical phenotypes, expression signatures, and disease outcomes. Notably, these subtypes are more robust to data resampling compared to those identified using traditional clustering approaches. Overall, our findings provide new avenues to establish fine-grained associations between the clinical characteristics, molecular processes, and disease outcomes of COPD.
Collapse
Affiliation(s)
- Enrico Maiorino
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Margherita De Marzio
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Zhonghui Xu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Jeong H. Yun
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Robert P. Chase
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School
| | | | | |
Collapse
|
9
|
de Kok JWTM, van Rosmalen F, Koeze J, Keus F, van Kuijk SMJ, Castela Forte J, Schnabel RM, Driessen RGH, van Herpt TTW, Sels JWEM, Bergmans DCJJ, Lexis CPH, van Doorn WPTM, Meex SJR, Xu M, Borrat X, Cavill R, van der Horst ICC, van Bussel BCT. Deep embedded clustering generalisability and adaptation for integrating mixed datatypes: two critical care cohorts. Sci Rep 2024; 14:1045. [PMID: 38200252 PMCID: PMC10781731 DOI: 10.1038/s41598-024-51699-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 01/08/2024] [Indexed: 01/12/2024] Open
Abstract
We validated a Deep Embedded Clustering (DEC) model and its adaptation for integrating mixed datatypes (in this study, numerical and categorical variables). Deep Embedded Clustering (DEC) is a promising technique capable of managing extensive sets of variables and non-linear relationships. Nevertheless, DEC cannot adequately handle mixed datatypes. Therefore, we adapted DEC by replacing the autoencoder with an X-shaped variational autoencoder (XVAE) and optimising hyperparameters for cluster stability. We call this model "X-DEC". We compared DEC and X-DEC by reproducing a previous study that used DEC to identify clusters in a population of intensive care patients. We assessed internal validity based on cluster stability on the development dataset. Since generalisability of clustering models has insufficiently been validated on external populations, we assessed external validity by investigating cluster generalisability onto an external validation dataset. We concluded that both DEC and X-DEC resulted in clinically recognisable and generalisable clusters, but X-DEC produced much more stable clusters.
Collapse
Affiliation(s)
- Jip W T M de Kok
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands.
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands.
| | - Frank van Rosmalen
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
| | - Jacqueline Koeze
- Department of Critical Care, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Frederik Keus
- Department of Critical Care, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Sander M J van Kuijk
- Department of Clinical Epidemiology and Medical Technical Assessment, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - José Castela Forte
- Department of Clinical Pharmacy and Pharmacology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, The Netherlands
| | - Ronny M Schnabel
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
| | - Rob G H Driessen
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
- Department of Cardiology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Thijs T W van Herpt
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
| | - Jan-Willem E M Sels
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
- Department of Cardiology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Dennis C J J Bergmans
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- School of Nutrition and Translational Research in Metabolism (NUTRIM), Maastricht University, Maastricht, The Netherlands
| | - Chris P H Lexis
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
| | - William P T M van Doorn
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
- Department of Clinical Chemistry, Central Diagnostic Laboratory, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Steven J R Meex
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
- Department of Clinical Chemistry, Central Diagnostic Laboratory, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Minnan Xu
- Takeda Pharmaceuticals, Deerfield, IL, USA
| | - Xavier Borrat
- Department of Biostatistics Harvard T.H, Chan School of Public Health, Boston, MA, USA
- Anaesthesiology and Critical Care Department, Hospital Clinic de Barcelona, Barcelona, Spain
- Medical Informatics Department, Hospital Clinic de Barcelona, Barcelona, Spain
| | - Rachel Cavill
- Department of Advanced Computing Sciences, Maastricht University, Maastricht, The Netherlands
| | - Iwan C C van der Horst
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
| | - Bas C T van Bussel
- Department of Intensive Care Medicine, Maastricht University Medical Centre+, P. Debyelaan, 25, 6229 HX, Maastricht, The Netherlands
- Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, The Netherlands
- Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
10
|
Jurenaite N, León-Periñán D, Donath V, Torge S, Jäkel R. SetQuence & SetOmic: Deep set transformers for whole genome and exome tumour analysis. Biosystems 2024; 235:105095. [PMID: 38065399 DOI: 10.1016/j.biosystems.2023.105095] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/21/2023]
Abstract
In oncology, Deep Learning has shown great potential to personalise tasks such as tumour type classification, based on per-patient omics data-sets. Being high dimensional, incorporation of such data in one model is a challenge, often leading to one-dimensional studies and, therefore, information loss. Instead, we first propose relying on non-fixed sets of whole genome or whole exome variant-associated sequences, which can be used for supervised learning of oncology-relevant tasks by our Set Transformer based Deep Neural Network, SetQuence. We optimise this architecture to improve its efficiency. This allows for exploration of not just coding but also non-coding variants, from large datasets. Second, we extend the model to incorporate these representations together with multiple other sources of omics data in a flexible way with SetOmic. Evaluation, using these representations, shows improved robustness and reduced information loss compared to previous approaches, while still being computationally tractable. By means of Explainable Artificial Intelligence methods, our models are able to recapitulate the biological contribution of highly attributed features in the tumours studied. This validation opens the door to novel directions in multi-faceted genome and exome wide biomarker discovery and personalised treatment among other presently clinically relevant tasks.
Collapse
Affiliation(s)
- Neringa Jurenaite
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - Daniel León-Periñán
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany; Max-Delbrück-Centrum für Molekulare Medizin, Hannoversche Str. 28, Berlin, 10115, Germany.
| | - Veronika Donath
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - Sunna Torge
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| | - René Jäkel
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), TU Dresden, Chemnitzer Str 46b, Dresden, 01187, Saxony, Germany.
| |
Collapse
|
11
|
el Bouhaddani S, Höllerhage M, Uh HW, Moebius C, Bickle M, Höglinger G, Houwing-Duistermaat J. Statistical integration of multi-omics and drug screening data from cell lines. PLoS Comput Biol 2024; 20:e1011809. [PMID: 38295113 PMCID: PMC10878536 DOI: 10.1371/journal.pcbi.1011809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/20/2024] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Data integration methods are used to obtain a unified summary of multiple datasets. For multi-modal data, we propose a computational workflow to jointly analyze datasets from cell lines. The workflow comprises a novel probabilistic data integration method, named POPLS-DA, for multi-omics data. The workflow is motivated by a study on synucleinopathies where transcriptomics, proteomics, and drug screening data are measured in affected LUHMES cell lines and controls. The aim is to highlight potentially druggable pathways and genes involved in synucleinopathies. First, POPLS-DA is used to prioritize genes and proteins that best distinguish cases and controls. For these genes, an integrated interaction network is constructed where the drug screen data is incorporated to highlight druggable genes and pathways in the network. Finally, functional enrichment analyses are performed to identify clusters of synaptic and lysosome-related genes and proteins targeted by the protective drugs. POPLS-DA is compared to other single- and multi-omics approaches. We found that HSPA5, a member of the heat shock protein 70 family, was one of the most targeted genes by the validated drugs, in particular by AT1-blockers. HSPA5 and AT1-blockers have been previously linked to α-synuclein pathology and Parkinson's disease, showing the relevance of our findings. Our computational workflow identified new directions for therapeutic targets for synucleinopathies. POPLS-DA provided a larger interpretable gene set than other single- and multi-omic approaches. An implementation based on R and markdown is freely available online.
Collapse
Affiliation(s)
| | | | - Hae-Won Uh
- Dept. Data science & Biostatistics, UMC Utrecht, Utrecht, Netherlands
| | - Claudia Moebius
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Marc Bickle
- Roche Institute for Translational Bioengineering, Basel, Switzerland
| | - Günter Höglinger
- Department of Neurology, Hannover Medical School, Hannover, Germany
- Department of Neurology, Ludwig-Maximilians-Universität, Munich, Germany
- German Center for Neurodegenerative Diseases, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| | - Jeanine Houwing-Duistermaat
- Dept. Data science & Biostatistics, UMC Utrecht, Utrecht, Netherlands
- Dept. of Mathematics, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
12
|
Kim G, Chun H. Similarity-assisted variational autoencoder for nonlinear dimension reduction with application to single-cell RNA sequencing data. BMC Bioinformatics 2023; 24:432. [PMID: 37964243 PMCID: PMC10647110 DOI: 10.1186/s12859-023-05552-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/30/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Deep generative models naturally become nonlinear dimension reduction tools to visualize large-scale datasets such as single-cell RNA sequencing datasets for revealing latent grouping patterns or identifying outliers. The variational autoencoder (VAE) is a popular deep generative method equipped with encoder/decoder structures. The encoder and decoder are useful when a new sample is mapped to the latent space and a data point is generated from a point in a latent space. However, the VAE tends not to show grouping pattern clearly without additional annotation information. On the other hand, similarity-based dimension reduction methods such as t-SNE or UMAP present clear grouping patterns even though these methods do not have encoder/decoder structures. RESULTS To bridge this gap, we propose a new approach that adopts similarity information in the VAE framework. In addition, for biological applications, we extend our approach to a conditional VAE to account for covariate effects in the dimension reduction step. In the simulation study and real single-cell RNA sequencing data analyses, our method shows great performance compared to existing state-of-the-art methods by producing clear grouping structures using an inferred encoder and decoder. Our method also successfully adjusts for covariate effects, resulting in more useful dimension reduction. CONCLUSIONS Our method is able to produce clearer grouping patterns than those of other regularized VAE methods by utilizing similarity information encoded in the data via the highly celebrated UMAP loss function.
Collapse
Affiliation(s)
- Gwangwoo Kim
- Graduate School of Data Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Hyonho Chun
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
| |
Collapse
|
13
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
14
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
15
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
16
|
Ji Y, Dutta P, Davuluri R. Deep multi-omics integration by learning correlation-maximizing representation identifies prognostically stratified cancer subtypes. BIOINFORMATICS ADVANCES 2023; 3:vbad075. [PMID: 37424943 PMCID: PMC10328436 DOI: 10.1093/bioadv/vbad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 04/08/2023] [Indexed: 07/11/2023]
Abstract
Motivation Molecular subtyping by integrative modeling of multi-omics and clinical data can help the identification of robust and clinically actionable disease subgroups; an essential step in developing precision medicine approaches. Results We developed a novel outcome-guided molecular subgrouping framework, called Deep Multi-Omics Integrative Subtyping by Maximizing Correlation (DeepMOIS-MC), for integrative learning from multi-omics data by maximizing correlation between all input -omics views. DeepMOIS-MC consists of two parts: clustering and classification. In the clustering part, the preprocessed high-dimensional multi-omics views are input into two-layer fully connected neural networks. The outputs of individual networks are subjected to Generalized Canonical Correlation Analysis loss to learn the shared representation. Next, the learned representation is filtered by a regression model to select features that are related to a covariate clinical variable, for example, a survival/outcome. The filtered features are used for clustering to determine the optimal cluster assignments. In the classification stage, the original feature matrix of one of the -omics view is scaled and discretized based on equal frequency binning, and then subjected to feature selection using RandomForest. Using these selected features, classification models (for example, XGBoost model) are built to predict the molecular subgroups that were identified at clustering stage. We applied DeepMOIS-MC on lung and liver cancers, using TCGA datasets. In comparative analysis, we found that DeepMOIS-MC outperformed traditional approaches in patient stratification. Finally, we validated the robustness and generalizability of the classification models on independent datasets. We anticipate that the DeepMOIS-MC can be adopted to many multi-omics integrative analyses tasks. Availability and implementation Source codes for PyTorch implementation of DGCCA and other DeepMOIS-MC modules are available at GitHub (https://github.com/duttaprat/DeepMOIS-MC). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yanrong Ji
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Pratik Dutta
- Department of Biomedical Informatics, Stony Brook Cancer Center, Stony Brook Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ramana Davuluri
- Department of Biomedical Informatics, Stony Brook Cancer Center, Stony Brook Medicine, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
17
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK.
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland.
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP, UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL, UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920, Martigny, Switzerland
| |
Collapse
|
18
|
Dixit S, Kumar A, Srinivasan K. A Current Review of Machine Learning and Deep Learning Models in Oral Cancer Diagnosis: Recent Technologies, Open Challenges, and Future Research Directions. Diagnostics (Basel) 2023; 13:diagnostics13071353. [PMID: 37046571 PMCID: PMC10093759 DOI: 10.3390/diagnostics13071353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 03/25/2023] [Accepted: 04/03/2023] [Indexed: 04/08/2023] Open
Abstract
Cancer is a problematic global health issue with an extremely high fatality rate throughout the world. The application of various machine learning techniques that have appeared in the field of cancer diagnosis in recent years has provided meaningful insights into efficient and precise treatment decision-making. Due to rapid advancements in sequencing technologies, the detection of cancer based on gene expression data has improved over the years. Different types of cancer affect different parts of the body in different ways. Cancer that affects the mouth, lip, and upper throat is known as oral cancer, which is the sixth most prevalent form of cancer worldwide. India, Bangladesh, China, the United States, and Pakistan are the top five countries with the highest rates of oral cavity disease and lip cancer. The major causes of oral cancer are excessive use of tobacco and cigarette smoking. Many people’s lives can be saved if oral cancer (OC) can be detected early. Early identification and diagnosis could assist doctors in providing better patient care and effective treatment. OC screening may advance with the implementation of artificial intelligence (AI) techniques. AI can provide assistance to the oncology sector by accurately analyzing a large dataset from several imaging modalities. This review deals with the implementation of AI during the early stages of cancer for the proper detection and treatment of OC. Furthermore, performance evaluations of several DL and ML models have been carried out to show that the DL model can overcome the difficult challenges associated with early cancerous lesions in the mouth. For this review, we have followed the rules recommended for the extension of scoping reviews and meta-analyses (PRISMA-ScR). Examining the reference lists for the chosen articles helped us gather more details on the subject. Additionally, we discussed AI’s drawbacks and its potential use in research on oral cancer. There are methods for reducing risk factors, such as reducing the use of tobacco and alcohol, as well as immunization against HPV infection to avoid oral cancer, or to lessen the burden of the disease. Additionally, officious methods for preventing oral diseases include training programs for doctors and patients as well as facilitating early diagnosis via screening high-risk populations for the disease.
Collapse
Affiliation(s)
- Shriniket Dixit
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
| | - Anant Kumar
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore 632014, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
| |
Collapse
|
19
|
Cho YS, Kim E, Stafford PL, Oh MH, Kwon Y. Identifying Disease of Interest With Deep Learning Using Diagnosis Code. J Korean Med Sci 2023; 38:e77. [PMID: 36942391 PMCID: PMC10027541 DOI: 10.3346/jkms.2023.38.e77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 12/18/2022] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Autoencoder (AE) is one of the deep learning techniques that uses an artificial neural network to reconstruct its input data in the output layer. We constructed a novel supervised AE model and tested its performance in the prediction of a co-existence of the disease of interest only using diagnostic codes. METHODS Diagnostic codes of one million randomly sampled patients listed in the Korean National Health Information Database in 2019 were used to train, validate, and test the prediction model. The first used AE solely for a feature engineering tool for an input of a classifier. Supervised Multi-Layer Perceptron (sMLP) was added to train a classifier to predict a binary level with latent representation as an input (AE + sMLP). The second model simultaneously updated the parameters in the AE and the connected MLP classifier during the learning process (End-to-End Supervised AE [EEsAE]). We tested the performances of these two models against baseline models, eXtreme Gradient Boosting (XGB) and naïve Bayes, in the prediction of co-existing gastric cancer diagnosis. RESULTS The proposed EEsAE model yielded the highest F1-score and highest area under the curve (0.86). The EEsAE and AE + sMLP gave the highest recalls. XGB yielded the highest precision. Ablation study revealed that iron deficiency anemia, gastroesophageal reflux disease, essential hypertension, gastric ulcers, benign prostate hyperplasia, and shoulder lesion were the top 6 most influential diagnoses on performance. CONCLUSION A novel EEsAE model showed promising performance in the prediction of a disease of interest.
Collapse
Affiliation(s)
- Yoon-Sik Cho
- Department of Artificial Intelligence, Chung-Ang University, Seoul, Korea.
| | - Eunsun Kim
- Department of Data Science, Sejong University, Seoul, Korea
| | - Patrick L Stafford
- Department of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Min-Hwan Oh
- Graduate School of Data Science, Seoul National University, Seoul, Korea
| | - Younghoon Kwon
- Department of Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
20
|
Benkirane H, Pradat Y, Michiels S, Cournède PH. CustOmics: A versatile deep-learning based strategy for multi-omics integration. PLoS Comput Biol 2023; 19:e1010921. [PMID: 36877736 PMCID: PMC10019780 DOI: 10.1371/journal.pcbi.1010921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/16/2023] [Accepted: 02/04/2023] [Indexed: 03/07/2023] Open
Abstract
The availability of patient cohorts with several types of omics data opens new perspectives for exploring the disease's underlying biological processes and developing predictive models. It also comes with new challenges in computational biology in terms of integrating high-dimensional and heterogeneous data in a fashion that captures the interrelationships between multiple genes and their functions. Deep learning methods offer promising perspectives for integrating multi-omics data. In this paper, we review the existing integration strategies based on autoencoders and propose a new customizable one whose principle relies on a two-phase approach. In the first phase, we adapt the training to each data source independently before learning cross-modality interactions in the second phase. By taking into account each source's singularity, we show that this approach succeeds at taking advantage of all the sources more efficiently than other strategies. Moreover, by adapting our architecture to the computation of Shapley additive explanations, our model can provide interpretable results in a multi-source setting. Using multiple omics sources from different TCGA cohorts, we demonstrate the performance of the proposed method for cancer on test cases for several tasks, such as the classification of tumor types and breast cancer subtypes, as well as survival outcome prediction. We show through our experiments the great performances of our architecture on seven different datasets with various sizes and provide some interpretations of the results obtained. Our code is available on (https://github.com/HakimBenkirane/CustOmics).
Collapse
Affiliation(s)
- Hakim Benkirane
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
| | - Yoann Pradat
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
| | - Stefan Michiels
- Oncostat U1018, Inserm, Université Paris-Saclay, Équipe Labellisée Ligue Contre le Cancer, CESP, Villejuif, France
- Bureau de Biostatistique et d’Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
| | - Paul-Henry Cournède
- Université Paris-Saclay, CentraleSupélec, Lab of Mathematics and Informatics (MICS), Gif-sur-Yvette, France
- * E-mail:
| |
Collapse
|
21
|
Badwan BA, Liaropoulos G, Kyrodimos E, Skaltsas D, Tsirigos A, Gorgoulis VG. Machine learning approaches to predict drug efficacy and toxicity in oncology. CELL REPORTS METHODS 2023; 3:100413. [PMID: 36936080 PMCID: PMC10014302 DOI: 10.1016/j.crmeth.2023.100413] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
In recent years, there has been a surge of interest in using machine learning algorithms (MLAs) in oncology, particularly for biomedical applications such as drug discovery, drug repurposing, diagnostics, clinical trial design, and pharmaceutical production. MLAs have the potential to provide valuable insights and predictions in these areas by representing both the disease state and the therapeutic agents used to treat it. To fully utilize the capabilities of MLAs in oncology, it is important to understand the fundamental concepts underlying these algorithms and how they can be applied to assess the efficacy and toxicity of therapeutics. In this perspective, we lay out approaches to represent both the disease state and the therapeutic agents used by MLAs to derive novel insights and make relevant predictions.
Collapse
Affiliation(s)
| | | | - Efthymios Kyrodimos
- First ENT Department, Hippocration Hospital, National Kapodistrian University of Athens, Athens, GR 11527, Greece
| | | | - Aristotelis Tsirigos
- Department of Medicine, New York University School of Medicine, New York, NY 10016, USA
- Department of Pathology, New York University School of Medicine, New York, NY 10016, USA
| | - Vassilis G. Gorgoulis
- Intelligencia Inc, New York, NY 10014, USA
- Department of Histology and Embryology, Faculty of Medicine, School of Health Sciences, National Kapodistrian University of Athens, Athens 11527, Greece
- Ninewells Hospital and Medical School, University of Dundee, Dundee DD1 9SY, UK
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Molecular and Clinical Cancer Sciences, Manchester Cancer Research Centre, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M20 4GJ, UK
| |
Collapse
|
22
|
Gravholt CH, Viuff M, Just J, Sandahl K, Brun S, van der Velden J, Andersen NH, Skakkebaek A. The Changing Face of Turner Syndrome. Endocr Rev 2023; 44:33-69. [PMID: 35695701 DOI: 10.1210/endrev/bnac016] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Indexed: 01/20/2023]
Abstract
Turner syndrome (TS) is a condition in females missing the second sex chromosome (45,X) or parts thereof. It is considered a rare genetic condition and is associated with a wide range of clinical stigmata, such as short stature, ovarian dysgenesis, delayed puberty and infertility, congenital malformations, endocrine disorders, including a range of autoimmune conditions and type 2 diabetes, and neurocognitive deficits. Morbidity and mortality are clearly increased compared with the general population and the average age at diagnosis is quite delayed. During recent years it has become clear that a multidisciplinary approach is necessary toward the patient with TS. A number of clinical advances has been implemented, and these are reviewed. Our understanding of the genomic architecture of TS is advancing rapidly, and these latest developments are reviewed and discussed. Several candidate genes, genomic pathways and mechanisms, including an altered transcriptome and epigenome, are also presented.
Collapse
Affiliation(s)
- Claus H Gravholt
- Department of Endocrinology and Internal Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark.,Department of Molecular Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark
| | - Mette Viuff
- Department of Endocrinology and Internal Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark.,Department of Molecular Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark
| | - Jesper Just
- Department of Molecular Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark
| | - Kristian Sandahl
- Department of Endocrinology and Internal Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark
| | - Sara Brun
- Department of Endocrinology and Internal Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark
| | - Janielle van der Velden
- Department of Pediatrics, Radboud University Medical Centre, Amalia Children's Hospital, 6525 Nijmegen, the Netherlands
| | - Niels H Andersen
- Department of Cardiology, Aalborg University Hospital, Aalborg 9000, Denmark
| | - Anne Skakkebaek
- Department of Molecular Medicine, Aarhus University Hospital, Aarhus 8200 N, Denmark.,Department of Clinical Genetics, Aarhus University Hospital, Aarhus 8200 N, Denmark
| |
Collapse
|
23
|
Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet 2023; 13:1032768. [PMID: 36685873 PMCID: PMC9846505 DOI: 10.3389/fgene.2022.1032768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/15/2022] [Indexed: 01/05/2023] Open
Abstract
Integrating multi-omics data for cancer subtype recognition is an important task in bioinformatics. Recently, deep learning has been applied to recognize the subtype of cancers. However, existing studies almost integrate the multi-omics data simply by concatenation as the single data and then learn a latent low-dimensional representation through a deep learning model, which did not consider the distribution differently of omics data. Moreover, these methods ignore the relationship of samples. To tackle these problems, we proposed SADLN: A self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. SADLN combined encoder, self-attention, decoder, and discriminator into a unified framework, which can not only integrate multi-omics data but also adaptively model the sample's relationship for learning an accurately latent low-dimensional representation. With the integrated representation learned from the network, SADLN used Gaussian Mixture Model to identify cancer subtypes. Experiments on ten cancer datasets of TCGA demonstrated the advantages of SADLN compared to ten methods. The Self-Attention Based Deep Learning Network (SADLN) is an effective method of integrating multi-omics data for cancer subtype recognition.
Collapse
Affiliation(s)
- Qiuwen Sun
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Shuguang Ge
- School of Information and Control Engineering, University of Mining and Technology, Xuzhou, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
24
|
Nasser M, Yusof UK. Deep Learning Based Methods for Breast Cancer Diagnosis: A Systematic Review and Future Direction. Diagnostics (Basel) 2023; 13:diagnostics13010161. [PMID: 36611453 PMCID: PMC9818155 DOI: 10.3390/diagnostics13010161] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 12/19/2022] [Accepted: 12/19/2022] [Indexed: 01/06/2023] Open
Abstract
Breast cancer is one of the precarious conditions that affect women, and a substantive cure has not yet been discovered for it. With the advent of Artificial intelligence (AI), recently, deep learning techniques have been used effectively in breast cancer detection, facilitating early diagnosis and therefore increasing the chances of patients' survival. Compared to classical machine learning techniques, deep learning requires less human intervention for similar feature extraction. This study presents a systematic literature review on the deep learning-based methods for breast cancer detection that can guide practitioners and researchers in understanding the challenges and new trends in the field. Particularly, different deep learning-based methods for breast cancer detection are investigated, focusing on the genomics and histopathological imaging data. The study specifically adopts the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), which offer a detailed analysis and synthesis of the published articles. Several studies were searched and gathered, and after the eligibility screening and quality evaluation, 98 articles were identified. The results of the review indicated that the Convolutional Neural Network (CNN) is the most accurate and extensively used model for breast cancer detection, and the accuracy metrics are the most popular method used for performance evaluation. Moreover, datasets utilized for breast cancer detection and the evaluation metrics are also studied. Finally, the challenges and future research direction in breast cancer detection based on deep learning models are also investigated to help researchers and practitioners acquire in-depth knowledge of and insight into the area.
Collapse
|
25
|
Lin X, Tian T, Wei Z, Hakonarson H. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat Commun 2022; 13:7705. [PMID: 36513636 PMCID: PMC9748135 DOI: 10.1038/s41467-022-35031-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. It provides a unique opportunity to jointly analyze multimodal data at the single-cell level for the identification of distinct cell types. A correct clustering result is essential for the downstream complex biological functional studies. However, combining different data sources for clustering analysis of single-cell multimodal data remains a statistical and computational challenge. Here, we develop a novel multimodal deep learning method, scMDC, for single-cell multi-omics data clustering analysis. scMDC is an end-to-end deep model that explicitly characterizes different data sources and jointly learns latent features of deep embedding for clustering analysis. Extensive simulation and real-data experiments reveal that scMDC outperforms existing single-cell single-modal and multimodal clustering methods on different single-cell multimodal datasets. The linear scalability of running time makes scMDC a promising method for analyzing large multimodal datasets.
Collapse
Affiliation(s)
- Xiang Lin
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Tian Tian
- Center of Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
| | - Hakon Hakonarson
- Center of Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Division of Human Genetics, Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
26
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
27
|
Preto AJ, Matos-Filipe P, Mourão J, Moreira IS. SYNPRED: prediction of drug combination effects in cancer using different synergy metrics and ensemble learning. Gigascience 2022; 11:giac087. [PMID: 36155782 PMCID: PMC9511701 DOI: 10.1093/gigascience/giac087] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 06/14/2022] [Accepted: 08/18/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND In cancer research, high-throughput screening technologies produce large amounts of multiomics data from different populations and cell types. However, analysis of such data encounters difficulties due to disease heterogeneity, further exacerbated by human biological complexity and genomic variability. The specific profile of cancer as a disease (or, more realistically, a set of diseases) urges the development of approaches that maximize the effect while minimizing the dosage of drugs. Now is the time to redefine the approach to drug discovery, bringing an artificial intelligence (AI)-powered informational view that integrates the relevant scientific fields and explores new territories. RESULTS Here, we show SYNPRED, an interdisciplinary approach that leverages specifically designed ensembles of AI algorithms, as well as links omics and biophysical traits to predict anticancer drug synergy. It uses 5 reference models (Bliss, Highest Single Agent, Loewe, Zero Interaction Potency, and Combination Sensitivity Score), which, coupled with AI algorithms, allowed us to attain the ones with the best predictive performance and pinpoint the most appropriate reference model for synergy prediction, often overlooked in similar studies. By using an independent test set, SYNPRED exhibits state-of-the-art performance metrics either in the classification (accuracy, 0.85; precision, 0.91; recall, 0.90; area under the receiver operating characteristic, 0.80; and F1-score, 0.91) or in the regression models, mainly when using the Combination Sensitivity Score synergy reference model (root mean square error, 11.07; mean squared error, 122.61; Pearson, 0.86; mean absolute error, 7.43; Spearman, 0.87). Moreover, data interpretability was achieved by deploying the most current and robust feature importance approaches. A simple web-based application was constructed, allowing easy access by nonexpert researchers. CONCLUSIONS The performance of SYNPRED rivals that of the existing methods that tackle the same problem, yielding unbiased results trained with one of the most comprehensive datasets available (NCI ALMANAC). The leveraging of different reference models allowed deeper insights into which of them can be more appropriately used for synergy prediction. The Combination Sensitivity Score clearly stood out with improved performance among the full scope of surveyed approaches and synergy reference models. Furthermore, SYNPRED takes a particular focus on data interpretability, which has been in the spotlight lately when using the most advanced AI techniques.
Collapse
Affiliation(s)
- António J Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal
- PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, Casa Costa Alemão, 3030-789 Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, 3004-504 Coimbra, Portugal
| | - Joana Mourão
- CNC—Center for Neuroscience and Cell Biology, CIBB—Center for Innovative Biomedicine and Biotechnology, 3004-504 Coimbra, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Calçada Martim de Freitas, 3000-456 Coimbra, Portugal
- CNC—Center for Neuroscience and Cell Biology, CIBB—Center for Innovative Biomedicine and Biotechnology, 3004-504 Coimbra, Portugal
| |
Collapse
|
28
|
Hahn W, Schütte K, Schultz K, Wolkenhauer O, Sedlmayr M, Schuler U, Eichler M, Bej S, Wolfien M. Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care. J Pers Med 2022; 12:1278. [PMID: 36013227 PMCID: PMC9409663 DOI: 10.3390/jpm12081278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open
Abstract
AI model development for synthetic data generation to improve Machine Learning (ML) methodologies is an integral part of research in Computer Science and is currently being transferred to related medical fields, such as Systems Medicine and Medical Informatics. In general, the idea of personalized decision-making support based on patient data has driven the motivation of researchers in the medical domain for more than a decade, but the overall sparsity and scarcity of data are still major limitations. This is in contrast to currently applied technology that allows us to generate and analyze patient data in diverse forms, such as tabular data on health records, medical images, genomics data, or even audio and video. One solution arising to overcome these data limitations in relation to medical records is the synthetic generation of tabular data based on real world data. Consequently, ML-assisted decision-support can be interpreted more conveniently, using more relevant patient data at hand. At a methodological level, several state-of-the-art ML algorithms generate and derive decisions from such data. However, there remain key issues that hinder a broad practical implementation in real-life clinical settings. In this review, we will give for the first time insights towards current perspectives and potential impacts of using synthetic data generation in palliative care screening because it is a challenging prime example of highly individualized, sparsely available patient information. Taken together, the reader will obtain initial starting points and suitable solutions relevant for generating and using synthetic data for ML-based screenings in palliative care and beyond.
Collapse
Affiliation(s)
- Waldemar Hahn
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Katharina Schütte
- University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Kristian Schultz
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany
- Stellenbosch Institute of Advanced Study, Wallenberg Research Centre, Stellenbosch University, Stellenbosch 7602, South Africa
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Ulrich Schuler
- University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Martin Eichler
- National Center for Tumor Diseases Dresden (NCT/UCC), Fetscherstraße 74, 01307 Dresden, Germany
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- Faculty of Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
- Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Bautzner Landstraße 400, 01328 Dresden, Germany
| | - Saptarshi Bej
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany
| | - Markus Wolfien
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| |
Collapse
|
29
|
Akinyelu AA, Zaccagna F, Grist JT, Castelli M, Rundo L. Brain Tumor Diagnosis Using Machine Learning, Convolutional Neural Networks, Capsule Neural Networks and Vision Transformers, Applied to MRI: A Survey. J Imaging 2022; 8:205. [PMID: 35893083 PMCID: PMC9331677 DOI: 10.3390/jimaging8080205] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/20/2022] [Accepted: 07/12/2022] [Indexed: 02/01/2023] Open
Abstract
Management of brain tumors is based on clinical and radiological information with presumed grade dictating treatment. Hence, a non-invasive assessment of tumor grade is of paramount importance to choose the best treatment plan. Convolutional Neural Networks (CNNs) represent one of the effective Deep Learning (DL)-based techniques that have been used for brain tumor diagnosis. However, they are unable to handle input modifications effectively. Capsule neural networks (CapsNets) are a novel type of machine learning (ML) architecture that was recently developed to address the drawbacks of CNNs. CapsNets are resistant to rotations and affine translations, which is beneficial when processing medical imaging datasets. Moreover, Vision Transformers (ViT)-based solutions have been very recently proposed to address the issue of long-range dependency in CNNs. This survey provides a comprehensive overview of brain tumor classification and segmentation techniques, with a focus on ML-based, CNN-based, CapsNet-based, and ViT-based techniques. The survey highlights the fundamental contributions of recent studies and the performance of state-of-the-art techniques. Moreover, we present an in-depth discussion of crucial issues and open challenges. We also identify some key limitations and promising future research directions. We envisage that this survey shall serve as a good springboard for further study.
Collapse
Affiliation(s)
- Andronicus A. Akinyelu
- NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa, Campus de Campolide, 1070-312 Lisboa, Portugal;
- Department of Computer Science and Informatics, University of the Free State, Phuthaditjhaba 9866, South Africa
| | - Fulvio Zaccagna
- Department of Biomedical and Neuromotor Sciences, Alma Mater Studiorum-University of Bologna, 40138 Bologna, Italy;
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Functional and Molecular Neuroimaging Unit, 40139 Bologna, Italy
| | - James T. Grist
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, UK;
- Department of Radiology, Oxford University Hospitals NHS Foundation Trust, Oxford OX3 9DU, UK
- Oxford Centre for Clinical Magnetic Research Imaging, University of Oxford, Oxford OX3 9DU, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2SY, UK
| | - Mauro Castelli
- NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa, Campus de Campolide, 1070-312 Lisboa, Portugal;
| | - Leonardo Rundo
- Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, 84084 Fisciano, Italy
| |
Collapse
|
30
|
Madhumita, Paul S. Capturing the latent space of an Autoencoder for multi-omics integration and cancer subtyping. Comput Biol Med 2022; 148:105832. [PMID: 35834966 DOI: 10.1016/j.compbiomed.2022.105832] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 06/15/2022] [Accepted: 07/03/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND AND OBJECTIVE The motivation behind cancer subtyping is to identify subgroups of cancer patients with distinguishable phenotypes of clinical importance. It can assist in advancement of subtype-targeted based treatments. Subtype identification is a complicated task, therefore requires multi-omics data integration to identify the precise patients' subgroup. Over the years, several computational attempts have been made to identify the cancer subtypes accurately using integrative multi-omics analysis. Some studies have used Autoencoders (AE) to capture multi-omics feature integration in lower dimensions for identifying subtypes in specific types of cancer. However, capturing the highly informative latent space by learning the deep architectures of AE to attain a satisfactory generalized performance is required. Therefore, in this study, a novel AE-assisted cancer subtyping framework is presented that utilizes the compressed latent space of a Sparse AE neural network for multi-omics clustering. METHODS The proposed framework first performs a supervised feature selection based on the survival status of the patients. The selected features from each of the omic data are passed to the AE. The information embedded in the latent space of the trained AE neural networks are then used for cancer subtyping using Spectral clustering. The AE architecture designed in this study exhaustively searches the best compression for multi-omics data by varying the number of neurons in the hidden layers and penalizing activations within the layers. RESULTS AND CONCLUSION The proposed framework is applied to five different multi-omics cancer datasets taken from The Cancer Genome Atlas. It is observed that for getting a robust information bottleneck, a compression of 10-20% of the input features along with an L1 regularization penalty of 0.01 or 0.001 performs well for most of the cancer datasets. Clustering performed on this latent representation generates clusters with better silhouette scores and significantly varying survival patterns. For further biological assessment, differential expression analysis is performed between the identified subtypes of Glioblastoma multiforme (GBM), followed by enrichment analysis of the differentially expressed biomarkers. Several pathways and disease ontology terms coherent to GBM are found to be significantly associated. Varying responses of the identified GBM subtypes towards the drug Temozolomide is also tested to demonstrate its clinical importance. Hence, the study shows that AE-assisted multi-omics integration can be used for the prediction of clinically significant cancer subtypes.
Collapse
Affiliation(s)
- Madhumita
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| | - Sushmita Paul
- Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India; School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, 342037, Rajasthan, India.
| |
Collapse
|
31
|
Mo H, Breitling R, Francavilla C, Schwartz JM. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions. CURRENT OPINION IN ENDOCRINE AND METABOLIC RESEARCH 2022; 24:None. [PMID: 36034741 PMCID: PMC9402443 DOI: 10.1016/j.coemr.2022.100350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Breast cancer is one of the most common cancers threatening women worldwide. A limited number of available treatment options, frequent recurrence, and drug resistance exacerbate the prognosis of breast cancer patients. Thus, there is an urgent need for methods to investigate novel treatment options, while taking into account the vast molecular heterogeneity of breast cancer. Recent advances in molecular profiling technologies, including genomics, epigenomics, transcriptomics, proteomics and metabolomics data, enable approaching breast cancer biology at multiple levels of omics interaction networks. Systems biology approaches, including computational inference of ‘big data’ and mechanistic modelling of specific pathways, are emerging to identify potential novel combinations of breast cancer subtype signatures and more diverse targeted therapies.
Collapse
|
32
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
33
|
Stahlschmidt SR, Ulfenborg B, Synnergren J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform 2022; 23:bbab569. [PMID: 35089332 PMCID: PMC8921642 DOI: 10.1093/bib/bbab569] [Citation(s) in RCA: 76] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 02/06/2023] Open
Abstract
Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.
Collapse
Affiliation(s)
| | | | - Jane Synnergren
- Systems Biology Research Center, University of Skövde, Sweden
| |
Collapse
|
34
|
Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O. Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artif Intell Med 2022; 127:102276. [DOI: 10.1016/j.artmed.2022.102276] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 10/18/2021] [Accepted: 03/04/2022] [Indexed: 02/07/2023]
|
35
|
Mora A, Rakar J, Cobeta IM, Salmani BY, Starkenberg A, Thor S, Bodén M. Variational autoencoding of gene landscapes during mouse CNS development uncovers layered roles of Polycomb Repressor Complex 2. Nucleic Acids Res 2022; 50:1280-1296. [PMID: 35048973 PMCID: PMC8860581 DOI: 10.1093/nar/gkac006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/22/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.
Collapse
Affiliation(s)
- Ariane Mora
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jonathan Rakar
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Ignacio Monedero Cobeta
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- Department of Physiology, Universidad Autonoma de Madrid, Madrid, Spain
| | - Behzad Yaghmaeian Salmani
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- Department of Cell and Molecular Biology, Karolinska Institute, SE-171 65 Stockholm, Sweden
| | - Annika Starkenberg
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Stefan Thor
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
36
|
A Review of Deep Learning Algorithms and Their Applications in Healthcare. ALGORITHMS 2022. [DOI: 10.3390/a15020071] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Deep learning uses artificial neural networks to recognize patterns and learn from them to make decisions. Deep learning is a type of machine learning that uses artificial neural networks to mimic the human brain. It uses machine learning methods such as supervised, semi-supervised, or unsupervised learning strategies to learn automatically in deep architectures and has gained much popularity due to its superior ability to learn from huge amounts of data. It was found that deep learning approaches can be used for big data analysis successfully. Applications include virtual assistants such as Alexa and Siri, facial recognition, personalization, natural language processing, autonomous cars, automatic handwriting generation, news aggregation, the colorization of black and white images, the addition of sound to silent films, pixel restoration, and deep dreaming. As a review, this paper aims to categorically cover several widely used deep learning algorithms along with their architectures and their practical applications: backpropagation, autoencoders, variational autoencoders, restricted Boltzmann machines, deep belief networks, convolutional neural networks, recurrent neural networks, generative adversarial networks, capsnets, transformer, embeddings from language models, bidirectional encoder representations from transformers, and attention in natural language processing. In addition, challenges of deep learning are also presented in this paper, such as AutoML-Zero, neural architecture search, evolutionary deep learning, and others. The pros and cons of these algorithms and their applications in healthcare are explored, alongside the future direction of this domain. This paper presents a review and a checkpoint to systemize the popular algorithms and to encourage further innovation regarding their applications. For new researchers in the field of deep learning, this review can help them to obtain many details about the advantages, disadvantages, applications, and working mechanisms of a number of deep learning algorithms. In addition, we introduce detailed information on how to apply several deep learning algorithms in healthcare, such as in relation to the COVID-19 pandemic. By presenting many challenges of deep learning in one section, we hope to increase awareness of these challenges, and how they can be dealt with. This could also motivate researchers to find solutions for these challenges.
Collapse
|
37
|
Watson ER, Taherian Fard A, Mar JC. Computational Methods for Single-Cell Imaging and Omics Data Integration. Front Mol Biosci 2022; 8:768106. [PMID: 35111809 PMCID: PMC8801747 DOI: 10.3389/fmolb.2021.768106] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open
Abstract
Integrating single cell omics and single cell imaging allows for a more effective characterisation of the underlying mechanisms that drive a phenotype at the tissue level, creating a comprehensive profile at the cellular level. Although the use of imaging data is well established in biomedical research, its primary application has been to observe phenotypes at the tissue or organ level, often using medical imaging techniques such as MRI, CT, and PET. These imaging technologies complement omics-based data in biomedical research because they are helpful for identifying associations between genotype and phenotype, along with functional changes occurring at the tissue level. Single cell imaging can act as an intermediary between these levels. Meanwhile new technologies continue to arrive that can be used to interrogate the genome of single cells and its related omics datasets. As these two areas, single cell imaging and single cell omics, each advance independently with the development of novel techniques, the opportunity to integrate these data types becomes more and more attractive. This review outlines some of the technologies and methods currently available for generating, processing, and analysing single-cell omics- and imaging data, and how they could be integrated to further our understanding of complex biological phenomena like ageing. We include an emphasis on machine learning algorithms because of their ability to identify complex patterns in large multidimensional data.
Collapse
Affiliation(s)
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
38
|
Viaud G, Mayilvahanan P, Cournede PH. Representation Learning for the Clustering of Multi-Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:135-145. [PMID: 33600320 DOI: 10.1109/tcbb.2021.3060340] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The integration of several sources of data for the identification of subtypes of diseases has gained attention over the past few years. The heterogeneity and the high dimensions of the data sets calls for an adequate representation of the data. We summarize the field of representation learning for the multi-omics clustering problem and we investigate several techniques to learn relevant combined representations, using methods from group factor analysis (PCA, MFA and extensions) and from machine learning with autoencoders. We highlight the importance of appropriately designing and training the latter, notably with a novel combination of a disjointed deep autoencoder (DDAE) architecture and a layer-wise reconstruction loss. These different representations can then be clustered to identify biologically meaningful clusters of patients. We provide a unifying framework for model comparison between statistical and deep learning approaches with the introduction of a new weighted internal clustering index that evaluates how well the clustering information is retained from each source, favoring contributions from all data sets. We apply our methodology to two case studies for which previous works of integrative clustering exist, TCGA Breast Cancer and TARGET Neuroblastoma, and show how our method can yield good and well-balanced clusters across the different data sources.
Collapse
|
39
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
40
|
Scherer P, Trębacz M, Simidjievski N, Viñas R, Shams Z, Terre HA, Jamnik M, Liò P. Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases. Bioinformatics 2021; 38:1320-1327. [PMID: 34888618 PMCID: PMC8826027 DOI: 10.1093/bioinformatics/btab830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein-protein interaction (PPI) networks to guide the construction of predictive models. RESULTS We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Scherer
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK,To whom correspondence should be addressed.
| | - Maja Trębacz
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Nikola Simidjievski
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Ramon Viñas
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Zohreh Shams
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Helena Andres Terre
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mateja Jamnik
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| |
Collapse
|
41
|
Harefa E, Zhou W. Performing sequential forward selection and variational autoencoder techniques in soil classification based on laser-induced breakdown spectroscopy. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:4926-4933. [PMID: 34610059 DOI: 10.1039/d1ay01257f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The feasibility and accuracy of several combination classification models, i.e., quadratic discriminant analysis (QDA), random forest (RF), Bernoulli naïve Bayes (BNB), and support vector machine (SVM) classification models combined with either sequential feature selection (SFS) or dimensionality reduction methods, for classifying soil with laser-induced breakdown spectroscopy (LIBS) had been explored in this study. Each algorithm combination was compared to assess their classification performance. After eliminating the irrelevant features of the data using sequential feature selection (SFS), the performances were all improved for the studied four classification models, and the best accuracy reached 97.88% by SFS-SVM. The dimensions of the data were then reduced using variational autoencoder (VAE), truncated singular value decomposition (TSVD), and isometric mapping (Isomap), respectively. The classification accuracy improved for all combination models with dimensionality reduction, and impressive accuracies of 98.12% from TSVD-SVM and 98.24% from VAE-SVM were obtained. These results demonstrate an effective way to reduce uncorrelated features, high dimensionality, and redundant information in the LIBS dataset. In addition, coupling classification models with feature selection and dimensionality reduction techniques could significantly optimize the classification performance of LIBS.
Collapse
Affiliation(s)
- Edward Harefa
- Key Laboratory of Optical Information Detection and Display Technology of Zhejiang, Zhejiang Normal University, Jinhua, 321004, China.
| | - Weidong Zhou
- Key Laboratory of Optical Information Detection and Display Technology of Zhejiang, Zhejiang Normal University, Jinhua, 321004, China.
| |
Collapse
|
42
|
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S. A Survey of Autoencoder Algorithms to Pave the Diagnosis of Rare Diseases. Int J Mol Sci 2021; 22:10891. [PMID: 34639231 PMCID: PMC8509321 DOI: 10.3390/ijms221910891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/04/2021] [Accepted: 10/07/2021] [Indexed: 12/28/2022] Open
Abstract
Rare diseases (RDs) concern a broad range of disorders and can result from various origins. For a long time, the scientific community was unaware of RDs. Impressive progress has already been made for certain RDs; however, due to the lack of sufficient knowledge, many patients are not diagnosed. Nowadays, the advances in high-throughput sequencing technologies such as whole genome sequencing, single-cell and others, have boosted the understanding of RDs. To extract biological meaning using the data generated by these methods, different analysis techniques have been proposed, including machine learning algorithms. These methods have recently proven to be valuable in the medical field. Among such approaches, unsupervised learning methods via neural networks including autoencoders (AEs) or variational autoencoders (VAEs) have shown promising performances with applications on various type of data and in different contexts, from cancer to healthy patient tissues. In this review, we discuss how AEs and VAEs have been used in biomedical settings. Specifically, we discuss their current applications and the improvements achieved in diagnostic and survival of patients. We focus on the applications in the field of RDs, and we discuss how the employment of AEs and VAEs would enhance RD understanding and diagnosis.
Collapse
Affiliation(s)
- David Pratella
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| | - Samira Ait-El-Mkadem Saadi
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Sylvie Bannwarth
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Véronique Paquis-Fluckinger
- Centre Hospitalier Universitaire (CHU) de Nice, Institute for Research on Cancer and Aging, Nice (IRCAN), Université Côte d’Azur, Inserm U1081, CNRS UMR 7284, 06200 Nice, France; (S.A.-E.-M.S.); (S.B.); (V.P.-F.)
| | - Silvia Bottini
- Center of Modeling, Simulation and Interactions, Université Côte d’Azur, 06200 Nice, France;
| |
Collapse
|
43
|
Carrillo-Perez F, Morales JC, Castillo-Secilla D, Molina-Castro Y, Guillén A, Rojas I, Herrera LJ. Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion. BMC Bioinformatics 2021; 22:454. [PMID: 34551733 PMCID: PMC8456075 DOI: 10.1186/s12859-021-04376-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/11/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Adenocarcinoma and squamous cell carcinoma are the two most prevalent lung cancer types, and their distinction requires different screenings, such as the visual inspection of histology slides by an expert pathologist, the analysis of gene expression or computer tomography scans, among others. In recent years, there has been an increasing gathering of biological data for decision support systems in the diagnosis (e.g. histology imaging, next-generation sequencing technologies data, clinical information, etc.). Using all these sources to design integrative classification approaches may improve the final diagnosis of a patient, in the same way that doctors can use multiple types of screenings to reach a final decision on the diagnosis. In this work, we present a late fusion classification model using histology and RNA-Seq data for adenocarcinoma, squamous-cell carcinoma and healthy lung tissue. RESULTS The classification model improves results over using each source of information separately, being able to reduce the diagnosis error rate up to a 64% over the isolate histology classifier and a 24% over the isolate gene expression classifier, reaching a mean F1-Score of 95.19% and a mean AUC of 0.991. CONCLUSIONS These findings suggest that a classification model using a late fusion methodology can considerably help clinicians in the diagnosis between the aforementioned lung cancer cancer subtypes over using each source of information separately. This approach can also be applied to any cancer type or disease with heterogeneous sources of information.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain.
| | - Juan Carlos Morales
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Yésica Molina-Castro
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Alberto Guillén
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| |
Collapse
|
44
|
Dong X, Liu C, Dozmorov M. Review of multi-omics data resources and integrative analysis for human brain disorders. Brief Funct Genomics 2021; 20:223-234. [PMID: 33969380 PMCID: PMC8287916 DOI: 10.1093/bfgp/elab024] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/05/2021] [Accepted: 04/12/2021] [Indexed: 12/20/2022] Open
Abstract
In the last decade, massive omics datasets have been generated for human brain research. It is evolving so fast that a timely update is urgently needed. In this review, we summarize the main multi-omics data resources for the human brains of both healthy controls and neuropsychiatric disorders, including schizophrenia, autism, bipolar disorder, Alzheimer's disease, Parkinson's disease, progressive supranuclear palsy, etc. We also review the recent development of single-cell omics in brain research, such as single-nucleus RNA-seq, single-cell ATAC-seq and spatial transcriptomics. We further investigate the integrative multi-omics analysis methods for both tissue and single-cell data. Finally, we discuss the limitations and future directions of the multi-omics study of human brain disorders.
Collapse
Affiliation(s)
- Xianjun Dong
- Harvard Medical School, head of the Genomics and Bioinformatics Hub at Brigham and Women’s Hospital
| | | | | |
Collapse
|
45
|
Tangherloni A, Ricciuti F, Besozzi D, Liò P, Cvejic A. Analysis of single-cell RNA sequencing data based on autoencoders. BMC Bioinformatics 2021; 22:309. [PMID: 34103004 PMCID: PMC8186186 DOI: 10.1186/s12859-021-04150-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/19/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-Seq) experiments are gaining ground to study the molecular processes that drive normal development as well as the onset of different pathologies. Finding an effective and efficient low-dimensional representation of the data is one of the most important steps in the downstream analysis of scRNA-Seq data, as it could provide a better identification of known or putatively novel cell-types. Another step that still poses a challenge is the integration of different scRNA-Seq datasets. Though standard computational pipelines to gain knowledge from scRNA-Seq data exist, a further improvement could be achieved by means of machine learning approaches. RESULTS Autoencoders (AEs) have been effectively used to capture the non-linearities among gene interactions of scRNA-Seq data, so that the deployment of AE-based tools might represent the way forward in this context. We introduce here scAEspy, a unifying tool that embodies: (1) four of the most advanced AEs, (2) two novel AEs that we developed on purpose, (3) different loss functions. We show that scAEspy can be coupled with various batch-effect removal tools to integrate data by different scRNA-Seq platforms, in order to better identify the cell-types. We benchmarked scAEspy against the most used batch-effect removal tools, showing that our AE-based strategies outperform the existing solutions. CONCLUSIONS scAEspy is a user-friendly tool that enables using the most recent and promising AEs to analyse scRNA-Seq data by only setting up two user-defined parameters. Thanks to its modularity, scAEspy can be easily extended to accommodate new AEs to further improve the downstream analysis of scRNA-Seq data. Considering the relevant results we achieved, scAEspy can be considered as a starting point to build a more comprehensive toolkit designed to integrate multi single-cell omics.
Collapse
Affiliation(s)
- Andrea Tangherloni
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW UK
- Department of Haematology, University of Cambridge, Cambridge, CB2 0AW UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK
- Present Address: Department of Human and Social Sciences, University of Bergamo, 24129 Bergamo, Italy
| | - Federico Ricciuti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan, Italy
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD UK
| | - Ana Cvejic
- Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW UK
- Department of Haematology, University of Cambridge, Cambridge, CB2 0AW UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK
| |
Collapse
|
46
|
Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data. Cancers (Basel) 2021; 13:cancers13092013. [PMID: 33921978 PMCID: PMC8122584 DOI: 10.3390/cancers13092013] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/29/2021] [Accepted: 04/06/2021] [Indexed: 12/14/2022] Open
Abstract
A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.
Collapse
|
47
|
Yee NS. Machine intelligence for precision oncology. World J Transl Med 2021; 9:1-10. [DOI: 10.5528/wjtm.v9.i1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/22/2020] [Accepted: 03/02/2021] [Indexed: 02/06/2023] Open
Abstract
Despite various advances in cancer research, the incidence and mortality rates of malignant diseases have remained high. Accurate risk assessment, prevention, detection, and treatment of cancer tailored to the individual are major challenges in clinical oncology. Artificial intelligence (AI), a field of applied computer science, has shown promising potential of accelerating evolution of healthcare towards precision oncology. This article focuses on highlights of the application of data-driven machine learning (ML) and deep learning (DL) in translational research for cancer diagnosis, prognosis, treatment, and clinical outcomes. ML-based algorithms in radiological and histological images have been demonstrated to improve detection and diagnosis of cancer. DL-based prediction models in molecular or multi-omics datasets of cancer for biomarkers and targets enable drug discovery and treatment. ML approaches combining radiomics with genomics and other omics data enhance the power of AI in improving diagnosis, prognostication, and treatment of cancer. Ethical and regulatory issues involving patient confidentiality and data security impose certain limitations on practical implementation of ML in clinical oncology. However, the ultimate goal of application of AI in cancer research is to develop and implement multi-modal machine intelligence for improving clinical decision on individualized management of patients.
Collapse
Affiliation(s)
- Nelson S Yee
- Department of Medicine, The Pennsylvania State University College of Medicine, Penn State Cancer Institute, Penn State Health Milton S. Hershey Medical Center, Hershey, PA 17033-0850, United States
| |
Collapse
|
48
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
49
|
Kondylakis H, Axenie C, Kiran Bastola D, Katehakis DG, Kouroubali A, Kurz D, Larburu N, Macía I, Maguire R, Maramis C, Marias K, Morrow P, Muro N, Núñez-Benjumea FJ, Rampun A, Rivera-Romero O, Scotney B, Signorelli G, Wang H, Tsiknakis M, Zwiggelaar R. Status and Recommendations of Technological and Data-Driven Innovations in Cancer Care: Focus Group Study. J Med Internet Res 2020; 22:e22034. [PMID: 33320099 PMCID: PMC7772066 DOI: 10.2196/22034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 10/02/2020] [Accepted: 10/26/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The status of the data-driven management of cancer care as well as the challenges, opportunities, and recommendations aimed at accelerating the rate of progress in this field are topics of great interest. Two international workshops, one conducted in June 2019 in Cordoba, Spain, and one in October 2019 in Athens, Greece, were organized by four Horizon 2020 (H2020) European Union (EU)-funded projects: BOUNCE, CATCH ITN, DESIREE, and MyPal. The issues covered included patient engagement, knowledge and data-driven decision support systems, patient journey, rehabilitation, personalized diagnosis, trust, assessment of guidelines, and interoperability of information and communication technology (ICT) platforms. A series of recommendations was provided as the complex landscape of data-driven technical innovation in cancer care was portrayed. OBJECTIVE This study aims to provide information on the current state of the art of technology and data-driven innovations for the management of cancer care through the work of four EU H2020-funded projects. METHODS Two international workshops on ICT in the management of cancer care were held, and several topics were identified through discussion among the participants. A focus group was formulated after the second workshop, in which the status of technological and data-driven cancer management as well as the challenges, opportunities, and recommendations in this area were collected and analyzed. RESULTS Technical and data-driven innovations provide promising tools for the management of cancer care. However, several challenges must be successfully addressed, such as patient engagement, interoperability of ICT-based systems, knowledge management, and trust. This paper analyzes these challenges, which can be opportunities for further research and practical implementation and can provide practical recommendations for future work. CONCLUSIONS Technology and data-driven innovations are becoming an integral part of cancer care management. In this process, specific challenges need to be addressed, such as increasing trust and engaging the whole stakeholder ecosystem, to fully benefit from these innovations.
Collapse
Affiliation(s)
| | - Cristian Axenie
- Audi Konfuzius-Institut Ingolstadt Lab, Technische Hochschule Ingolstadt, Ingolstadt, Germany
| | - Dhundy Kiran Bastola
- School of Interdisciplinary Informatics, University of Nebraska, Omaha, NE, United States
| | | | | | - Daria Kurz
- Interdisziplinäres Brustzentrum, Helios Klinikum München West, Munich, Germany
| | - Nekane Larburu
- Vicomtech, Health Research Institute, San Sebastian, Spain
| | - Iván Macía
- Vicomtech, Health Research Institute, San Sebastian, Spain
| | - Roma Maguire
- University of Strathclyde, Glasgow, United Kingdom
| | - Christos Maramis
- eHealth Lab, Institute of Applied Biosciences - Centre for Research & Technology Hellas, Thessaloniki, Greece
| | | | - Philip Morrow
- School of Computing, Ulster University, Newtownabbey, United Kingdom
| | - Naiara Muro
- Vicomtech, Health Research Institute, San Sebastian, Spain
| | | | - Andrik Rampun
- Academic Unit of Radiology, Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
| | | | - Bryan Scotney
- School of Computing, Ulster University, Newtownabbey, United Kingdom
| | | | - Hui Wang
- School of Computing and Engineering, University of West London, London, United Kingdom
| | | | - Reyer Zwiggelaar
- Department of Computer Science, Aberystwyth University, Aberystwyth, United Kingdom
| |
Collapse
|
50
|
Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020; 10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of "big"-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
Collapse
Affiliation(s)
- Nupur Biswas
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| |
Collapse
|