1
|
Chen H, Lu D, Xiao Z, Li S, Zhang W, Luan X, Zhang W, Zheng G. Comprehensive applications of the artificial intelligence technology in new drug research and development. Health Inf Sci Syst 2024; 12:41. [PMID: 39130617 PMCID: PMC11310389 DOI: 10.1007/s13755-024-00300-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
Purpose Target-based strategy is a prevalent means of drug research and development (R&D), since targets provide effector molecules of drug action and offer the foundation of pharmacological investigation. Recently, the artificial intelligence (AI) technology has been utilized in various stages of drug R&D, where AI-assisted experimental methods show higher efficiency than sole experimental ones. It is a critical need to give a comprehensive review of AI applications in drug R &D for biopharmaceutical field. Methods Relevant literatures about AI-assisted drug R&D were collected from the public databases (Including Google Scholar, Web of Science, PubMed, IEEE Xplore Digital Library, Springer, and ScienceDirect) through a keyword searching strategy with the following terms [("Artificial Intelligence" OR "Knowledge Graph" OR "Machine Learning") AND ("Drug Target Identification" OR "New Drug Development")]. Results In this review, we first introduced common strategies and novel trends of drug R&D, followed by characteristic description of AI algorithms widely used in drug R&D. Subsequently, we depicted detailed applications of AI algorithms in target identification, lead compound identification and optimization, drug repurposing, and drug analytical platform construction. Finally, we discussed the challenges and prospects of AI-assisted methods for drug discovery. Conclusion Collectively, this review provides comprehensive overview of AI applications in drug R&D and presents future perspectives for biopharmaceutical field, which may promote the development of drug industry.
Collapse
Affiliation(s)
- Hongyu Chen
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Dong Lu
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Ziyi Xiao
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Shensuo Li
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wen Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Xin Luan
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Weidong Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Guangyong Zheng
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
2
|
Tan J, Xie J, Huang J, Deng W, Chai H, Yang Y. An interpretable survival model for diffuse large B-cell lymphoma patients using a biologically informed visible neural network. Comput Struct Biotechnol J 2024; 24:523-532. [PMID: 39211335 PMCID: PMC11357880 DOI: 10.1016/j.csbj.2024.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 07/06/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma (NHL) and is characterized by high heterogeneity. Assessment of its prognosis and genetic subtyping hold significant clinical implications. However, existing DLBCL prognostic models are mainly based on transcriptomic profiles, while genetic variation detection is more commonly used in clinical practice. In addition, current clustering-based subtyping methods mostly focus on genes with high mutation frequencies, providing insufficient explanations for the heterogeneity of DLBCL. Here, we proposed VNNSurv (https://bio-web1.nscc-gz.cn/app/VNNSurv), a survival model for DLBCL patients based on a biologically informed visible neural network (VNN). VNNSurv achieved an average C-index of 0.72 on the cross-validation set (HMRN cohort, n = 928), outperforming the baseline methods. The remarkable interpretability of VNNSurv facilitated the identification of the most impactful genes and the underlying pathways through which they act on patient outcomes. When only the 30 highest-impact genes were used as genetic input, the overall performance of VNNSurv improved, and a C-index of 0.70 was achieved on the external TCGA cohort (n = 48). Leveraging these high-impact genes, including 16 genes with low (<5 %) alteration frequencies, we devised a genetic-based prognostic index (GPI) for risk stratification and a subtype identification method. We stratified the patient group according to the International Prognostic Index (IPI) into three risk grades with significant prognostic differences. Furthermore, the defined subtypes exhibited greater prognostic consistency than clustering-based methods. Broadly, VNNSurv is a valuable DLBCL survival model. Its high interpretability has significant value for precision medicine, and its framework is scalable to other diseases.
Collapse
Affiliation(s)
- Jie Tan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Guangzhou KingMed Center for Clinical Laboratory Co. Ltd., Guangzhou, China
| | - Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jiarong Huang
- School of Mathematics and Big Data, Foshan University, Foshan, China
| | - Weizhen Deng
- School of Mathematics and Big Data, Foshan University, Foshan, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
3
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024; 23:549-560. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
4
|
Chai H, Deng W, Wei J, Guan T, He M, Liang Y, Li L. A Contrastive-Learning-Based Deep Neural Network for Cancer Subtyping by Integrating Multi-Omics Data. Interdiscip Sci 2024:10.1007/s12539-024-00641-y. [PMID: 39230797 DOI: 10.1007/s12539-024-00641-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 09/05/2024]
Abstract
BACKGROUND Accurate identification of cancer subtypes is crucial for disease prognosis evaluation and personalized patient management. Recent advances in computational methods have demonstrated that multi-omics data provides valuable insights into tumor molecular subtyping. However, the high dimensionality and small sample size of the data may result in ambiguous and overlapping cancer subtypes during clustering. In this study, we propose a novel contrastive-learning-based approach to address this issue. The proposed end-to-end deep learning method can extract crucial information from the multi-omics features by self-supervised learning for patient clustering. RESULTS By applying our method to nine public cancer datasets, we have demonstrated superior performance compared to existing methods in separating patients with different survival outcomes (p < 0.05). To further evaluate the impact of various omics data on cancer survival, we developed an XGBoost classification model and found that mRNA had the highest importance score, followed by DNA methylation and miRNA. In the presented case study, our method successfully clustered subtypes and identified 14 cancer-related genes, of which 12 (85.7%) were validated through literature review. CONCLUSIONS Our findings demonstrate that our method is capable of identifying cancer subtypes that are both statistically and biologically significant. The code about COLCS is given at: https://github.com/Mercuriiio/COLCS .
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Weizhen Deng
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Junyu Wei
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Ting Guan
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, 518055, China
| | - Le Li
- Faculty of Innovation Engineering, Macau University of Science and Technology, Macao, 999078, China.
- Peng Cheng Laboratory, Shenzhen, 518055, China.
| |
Collapse
|
5
|
Abbasi AF, Asim MN, Ahmed S, Vollmer S, Dengel A. Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases. Front Artif Intell 2024; 7:1428501. [PMID: 39021434 PMCID: PMC11252047 DOI: 10.3389/frai.2024.1428501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024] Open
Abstract
Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Muhammad Nabeel Asim
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sheraz Ahmed
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Sebastian Vollmer
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, Germany
- Smart Data & Knowledge Services, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany
| |
Collapse
|
6
|
Zhang ZW, Zhang KX, Liao X, Quan Y, Zhang HY. Evolutionary screening of precision oncology biomarkers and its applications in prognostic model construction. iScience 2024; 27:109859. [PMID: 38799582 PMCID: PMC11126775 DOI: 10.1016/j.isci.2024.109859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/15/2024] [Accepted: 04/27/2024] [Indexed: 05/29/2024] Open
Abstract
Biomarker screening is critical for precision oncology. However, one of the main challenges in precision oncology is that the screened biomarkers often fail to achieve the expected clinical effects and are rarely approved by regulatory authorities. Considering the close association between cancer pathogenesis and the evolutionary events of organisms, we first explored the evolutionary feature underlying clinically approved biomarkers, and two evolutionary features of approved biomarkers (Ohnologs and specific evolutionary stages of genes) were identified. Subsequently, we utilized evolutionary features for screening potential prognostic biomarkers in four common cancers: head and neck squamous cell carcinoma, liver hepatocellular carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Finally, we constructed an evolution-strengthened prognostic model (ESPM) for cancers. These models can predict cancer patients' survival time across different cancer cohorts effectively and perform better than conventional models. In summary, our study highlights the application potentials of evolutionary information in precision oncology biomarker screening.
Collapse
Affiliation(s)
- Zhi-Wen Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Ke-Xin Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Xuan Liao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Yuan Quan
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
7
|
Chai H, Huang Y, Xu L, Song X, He M, Wang Q. A decentralized federated learning-based cancer survival prediction method with privacy protection. Heliyon 2024; 10:e31873. [PMID: 38845954 PMCID: PMC11153246 DOI: 10.1016/j.heliyon.2024.e31873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 05/18/2024] [Accepted: 05/23/2024] [Indexed: 06/09/2024] Open
Abstract
Background Survival prediction is one of the crucial goals in precision medicine, as accurate survival assessment can aid physicians in selecting appropriate treatment for individual patients. To achieve this aim, extensive data must be utilized to train the prediction model and prevent overfitting. However, the collection of patient data for disease prediction is challenging due to potential variations in data sources across institutions and concerns regarding privacy and ownership issues in data sharing. To facilitate the integration of cancer data from different institutions without violating privacy laws, we developed a federated learning-based data integration framework called AdFed, which can be used to evaluate patients' survival while considering the privacy protection problem by utilizing the decentralized federated learning technology and regularization method. Results AdFed was tested on different cancer datasets that contain the patients' information from different institutions. The experimental results show that AdFed using distributed data can achieve better performance in cancer survival prediction (AUC = 0.605) than the compared federated-learning-based methods (average AUC = 0.554). Additionally, to assess the biological interpretability of our method, in the case study we list 10 identified genes related to liver cancer selected by AdFed, among which 5 genes have been proved by literature review. Conclusions The results indicate that AdFed outperforms better than other federated-learning-based methods, and the interpretable algorithm can select biologically significant genes and pathways while ensuring the confidentiality and integrity of data.
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yiqian Huang
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Lekai Xu
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Xinpeng Song
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Qingyong Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Hefei, 230036, China
| |
Collapse
|
8
|
Liu Z, Petinrin OO, Toseef M, Chen N, Wong KC. Construction of Immune Infiltration-Related LncRNA Signatures Based on Machine Learning for the Prognosis in Colon Cancer. Biochem Genet 2024; 62:1925-1952. [PMID: 37792224 DOI: 10.1007/s10528-023-10516-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/05/2023] [Indexed: 10/05/2023]
Abstract
Colon cancer is one of the malignant tumors with high morbidity, lethality, and prevalence across global human health. Molecular biomarkers play key roles in its prognosis. In particular, immune-related lncRNAs (IRL) have attracted enormous interest in diagnosis and treatment, but less is known about their potential functions. We aimed to investigate dysfunctional IRL and construct a risk model for improving the outcomes of patients. Nineteen immune cell types were collected for identifying house-keeping lncRNAs (HKLncRNA). GSE39582 and TCGA-COAD were treated as the discovery and validation datasets, respectively. Four machine learning algorithms (LASSO, Random Forest, Boruta, and Xgboost) and a Gaussian mixture model were utilized to mine the optimal combination of lncRNAs. Univariate and multivariate Cox regression was utilized to construct the risk score model. We distinguished the functional difference in an immune perspective between low- and high-risk cohorts calculated by this scoring system. Finally, we provided a nomogram. By leveraging the microarray, sequencing, and clinical data for immune cells and colon cancer patients, we identified the 221 HKLncRNAs with a low cell type-specificity index. Eighty-seven lncRNAs were up-regulated in the immune compared to cancer cells. Twelve lncRNAs were beneficial in improving performance. A risk score model with three lncRNAs (CYB561D2, LINC00638, and DANCR) was proposed with robust ROC performance on an independent dataset. According to immune-related analysis, the risk score is strongly associated with the tumor immune microenvironment. Our results emphasized IRL has the potential to be a powerful and effective therapy for enhancing the prognostic of colon cancer.
Collapse
Affiliation(s)
- Zhe Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | | | - Muhammad Toseef
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
9
|
Yang P, Qiu H, Yang X, Wang L, Wang X. SAGL: A self-attention-based graph learning framework for predicting survival of colorectal cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108159. [PMID: 38583291 DOI: 10.1016/j.cmpb.2024.108159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/28/2024] [Accepted: 03/29/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide. The accurate survival prediction for CRC patients plays a significant role in the formulation of treatment strategies. Recently, machine learning and deep learning approaches have been increasingly applied in cancer survival prediction. However, most existing methods inadequately represent and leverage the dependencies among features and fail to sufficiently mine and utilize the comorbidity patterns of CRC. To address these issues, we propose a self-attention-based graph learning (SAGL) framework to improve the postoperative cancer-specific survival prediction for CRC patients. METHODS We present a novel method for constructing dependency graph (DG) to reflect two types of dependencies including comorbidity-comorbidity dependencies and the dependencies between features related to patient characteristics and cancer treatments. This graph is subsequently refined by a disease comorbidity network, which offers a holistic view of comorbidity patterns of CRC. A DG-guided self-attention mechanism is proposed to unearth novel dependencies beyond what DG offers, thus augmenting CRC survival prediction. Finally, each patient will be represented, and these representations will be used for survival prediction. RESULTS The experimental results show that SAGL outperforms state-of-the-art methods on a real-world dataset, with the receiver operating characteristic curve for 3- and 5-year survival prediction achieving 0.849±0.002 and 0.895±0.005, respectively. In addition, the comparison results with different graph neural network-based variants demonstrate the advantages of our DG-guided self-attention graph learning framework. CONCLUSIONS Our study reveals that the potential of the DG-guided self-attention in optimizing feature graph learning which can improve the performance of CRC survival prediction.
Collapse
Affiliation(s)
- Ping Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China; Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
| | - Xulin Yang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Liya Wang
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Xiaodong Wang
- Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, Chengdu, 610041, PR China.
| |
Collapse
|
10
|
Pitchika V, Büttner M, Schwendicke F. Artificial intelligence and personalized diagnostics in periodontology: A narrative review. Periodontol 2000 2024; 95:220-231. [PMID: 38927004 DOI: 10.1111/prd.12586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/29/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
Periodontal diseases pose a significant global health burden, requiring early detection and personalized treatment approaches. Traditional diagnostic approaches in periodontology often rely on a "one size fits all" approach, which may overlook the unique variations in disease progression and response to treatment among individuals. This narrative review explores the role of artificial intelligence (AI) and personalized diagnostics in periodontology, emphasizing the potential for tailored diagnostic strategies to enhance precision medicine in periodontal care. The review begins by elucidating the limitations of conventional diagnostic techniques. Subsequently, it delves into the application of AI models in analyzing diverse data sets, such as clinical records, imaging, and molecular information, and its role in periodontal training. Furthermore, the review also discusses the role of research community and policymakers in integrating personalized diagnostics in periodontal care. Challenges and ethical considerations associated with adopting AI-based personalized diagnostic tools are also explored, emphasizing the need for transparent algorithms, data safety and privacy, ongoing multidisciplinary collaboration, and patient involvement. In conclusion, this narrative review underscores the transformative potential of AI in advancing periodontal diagnostics toward a personalized paradigm, and their integration into clinical practice holds the promise of ushering in a new era of precision medicine for periodontal care.
Collapse
Affiliation(s)
- Vinay Pitchika
- Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Munich, Germany
| | - Martha Büttner
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Falk Schwendicke
- Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Munich, Germany
| |
Collapse
|
11
|
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O'Sullivan CC, Boughey JC, Tizhoosh H, Klee EW, Wang L, Goetz MP, Suman V, Kalari KR. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.21.586001. [PMID: 38585820 PMCID: PMC10996492 DOI: 10.1101/2024.03.21.586001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
The OmicsFootPrint framework addresses the need for advanced multi-omics data analysis methodologies by transforming data into intuitive two-dimensional circular images and facilitating the interpretation of complex diseases. Utilizing Deep Neural Networks and incorporating the SHapley Additive exPlanations (SHAP) algorithm, the framework enhances model interpretability. Tested with The Cancer Genome Atlas (TCGA) data, OmicsFootPrint effectively classified lung and breast cancer subtypes, achieving high Area Under Curve (AUC) scores - 0.98±0.02 for lung cancer subtype differentiation, 0.83±0.07 for breast cancer PAM50 subtypes, and successfully distinguishe between invasive lobular and ductal carcinomas in breast cancer, showcasing its robustness. It also demonstrated notable performance in predicting drug responses in cancer cell lines, with a median AUC of 0.74, surpassing existing algorithms. Furthermore, its effectiveness persists even with reduced training sample sizes. OmicsFootPrint marks an enhancement in multi-omics research, offering a novel, efficient, and interpretable approach that contributes to a deeper understanding of disease mechanisms.
Collapse
|
12
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
13
|
Chai H, Lin S, Lin J, He M, Yang Y, OuYang Y, Zhao H. An uncertainty-based interpretable deep learning framework for predicting breast cancer outcome. BMC Bioinformatics 2024; 25:88. [PMID: 38418940 PMCID: PMC10902951 DOI: 10.1186/s12859-024-05716-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/21/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Predicting outcome of breast cancer is important for selecting appropriate treatments and prolonging the survival periods of patients. Recently, different deep learning-based methods have been carefully designed for cancer outcome prediction. However, the application of these methods is still challenged by interpretability. In this study, we proposed a novel multitask deep neural network called UISNet to predict the outcome of breast cancer. The UISNet is able to interpret the importance of features for the prediction model via an uncertainty-based integrated gradients algorithm. UISNet improved the prediction by introducing prior biological pathway knowledge and utilizing patient heterogeneity information. RESULTS The model was tested in seven public datasets of breast cancer, and showed better performance (average C-index = 0.691) than the state-of-the-art methods (average C-index = 0.650, ranged from 0.619 to 0.677). Importantly, the UISNet identified 20 genes as associated with breast cancer, among which 11 have been proven to be associated with breast cancer by previous studies, and others are novel findings of this study. CONCLUSIONS Our proposed method is accurate and robust in predicting breast cancer outcomes, and it is an effective way to identify breast cancer-associated genes. The method codes are available at: https://github.com/chh171/UISNet .
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Siyin Lin
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Junqi Lin
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yongzhong OuYang
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China.
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510000, China.
| |
Collapse
|
14
|
Díaz-Campos MÁ, Vasquez-Arriaga J, Ochoa S, Hernández-Lemus E. Functional impact of multi-omic interactions in lung cancer. Front Genet 2024; 15:1282241. [PMID: 38389572 PMCID: PMC10881857 DOI: 10.3389/fgene.2024.1282241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/23/2024] [Indexed: 02/24/2024] Open
Abstract
Lung tumors are a leading cause of cancer-related death worldwide. Lung cancers are highly heterogeneous on their phenotypes, both at the cellular and molecular levels. Efforts to better understand the biological origins and outcomes of lung cancer in terms of this enormous variability often require of high-throughput experimental techniques paired with advanced data analytics. Anticipated advancements in multi-omic methodologies hold potential to reveal a broader molecular perspective of these tumors. This study introduces a theoretical and computational framework for generating network models depicting regulatory constraints on biological functions in a semi-automated way. The approach successfully identifies enriched functions in analyzed omics data, focusing on Adenocarcinoma (LUAD) and Squamous cell carcinoma (LUSC, a type of NSCLC) in the lung. Valuable information about novel regulatory characteristics, supported by robust biological reasoning, is illustrated, for instance by considering the role of genes, miRNAs and CpG sites associated with NSCLC, both novel and previously reported. Utilizing multi-omic regulatory networks, we constructed robust models elucidating omics data interconnectedness, enabling systematic generation of mechanistic hypotheses. These findings offer insights into complex regulatory mechanisms underlying these cancer types, paving the way for further exploring their molecular complexity.
Collapse
Affiliation(s)
| | - Jorge Vasquez-Arriaga
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Mohammed Zaidh S, Aher KB, Bhavar GB, Irfan N, Ahmed HN, Ismail Y. Genes adaptability and NOL6 protein inhibition studies of fabricated flavan-3-ols lead skeleton intended to treat breast carcinoma. Int J Biol Macromol 2024; 258:127661. [PMID: 37898257 DOI: 10.1016/j.ijbiomac.2023.127661] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/10/2023] [Accepted: 10/23/2023] [Indexed: 10/30/2023]
Abstract
Breast cancer invasive 2.3 million women worldly and second prominent factor of cancer-related mortality. Finding a new site-specific and safe small molecule is a current need in this field. With the aid of deep learning Algorithms, we analyzed the published big database from cancer CBioportal to find the best target protein. Further, Multi-omics analysis such as enrichment analysis, scores of molecular, RNA biological function at a cellular level, and protein domain were obtained and matched to find the better hit molecules. The gene analysis output shows nucleolar protein 6 plays a significant responsibility in breast carcinoma and 354 natural and synthetic lead molecules are docked inside the active site. Docking result gave the output hit molecule falavan-3-ols with a binding score of -5.325 (Kcal/mol) and interaction analysis illustrates, 13 active amino acids favoring the binding interaction with functional groups of the hit molecule compared to the standard molecule Abemacilib (-2.857 (Kcal/mol)). Best docked complex of flavan-3-ols and NOL6 protein subjected to dynamic simulation 100 ns to study the stability. The results proved that π-π stacked, carbon‑hydrogen and electrostatic interactions are stable throughout the 100 ns simulation. The overall results conclude the hit molecule flavan-3-ol will be a safe and potent lead molecule to generate and treat breast carcinoma patients.
Collapse
Affiliation(s)
- S Mohammed Zaidh
- Crescent School of Pharmacy, BS Abdur Rahman Crescent Institute of Science and Technology, Chennai 600048, India
| | - Kiran Balasaheb Aher
- Department of Pharmaceutical Quality Assurance, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, Maharashtra 424001, India
| | - Girija Balasaheb Bhavar
- Department of Pharmaceutical Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, Maharashtra 424001, India
| | - N Irfan
- Crescent School of Pharmacy, BS Abdur Rahman Crescent Institute of Science and Technology, Chennai 600048, India.
| | - Haja Nazeer Ahmed
- Crescent School of Pharmacy, BS Abdur Rahman Crescent Institute of Science and Technology, Chennai 600048, India
| | - Y Ismail
- Crescent School of Pharmacy, BS Abdur Rahman Crescent Institute of Science and Technology, Chennai 600048, India
| |
Collapse
|
16
|
Cai Y, Wang S. Deeply integrating latent consistent representations in high-noise multi-omics data for cancer subtyping. Brief Bioinform 2024; 25:bbae061. [PMID: 38426322 PMCID: PMC10939425 DOI: 10.1093/bib/bbae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/13/2024] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
17
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
18
|
Li B, Nabavi S. A multimodal graph neural network framework for cancer molecular subtype classification. BMC Bioinformatics 2024; 25:27. [PMID: 38225583 PMCID: PMC10789042 DOI: 10.1186/s12859-023-05622-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 12/15/2023] [Indexed: 01/17/2024] Open
Abstract
BACKGROUND The recent development of high-throughput sequencing has created a large collection of multi-omics data, which enables researchers to better investigate cancer molecular profiles and cancer taxonomy based on molecular subtypes. Integrating multi-omics data has been proven to be effective for building more precise classification models. Most current multi-omics integrative models use either an early fusion in the form of concatenation or late fusion with a separate feature extractor for each omic, which are mainly based on deep neural networks. Due to the nature of biological systems, graphs are a better structural representation of bio-medical data. Although few graph neural network (GNN) based multi-omics integrative methods have been proposed, they suffer from three common disadvantages. One is most of them use only one type of connection, either inter-omics or intra-omic connection; second, they only consider one kind of GNN layer, either graph convolution network (GCN) or graph attention network (GAT); and third, most of these methods have not been tested on a more complex classification task, such as cancer molecular subtypes. RESULTS In this study, we propose a novel end-to-end multi-omics GNN framework for accurate and robust cancer subtype classification. The proposed model utilizes multi-omics data in the form of heterogeneous multi-layer graphs, which combine both inter-omics and intra-omic connections from established biological knowledge. The proposed model incorporates learned graph features and global genome features for accurate classification. We tested the proposed model on the Cancer Genome Atlas (TCGA) Pan-cancer dataset and TCGA breast invasive carcinoma (BRCA) dataset for molecular subtype and cancer subtype classification, respectively. The proposed model shows superior performance compared to four current state-of-the-art baseline models in terms of accuracy, F1 score, precision, and recall. The comparative analysis of GAT-based models and GCN-based models reveals that GAT-based models are preferred for smaller graphs with less information and GCN-based models are preferred for larger graphs with extra information.
Collapse
Affiliation(s)
- Bingjun Li
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA
| | - Sheida Nabavi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, USA.
| |
Collapse
|
19
|
Tong L, Shi W, Isgut M, Zhong Y, Lais P, Gloster L, Sun J, Swain A, Giuste F, Wang MD. Integrating Multi-Omics Data With EHR for Precision Medicine Using Advanced Artificial Intelligence. IEEE Rev Biomed Eng 2024; 17:80-97. [PMID: 37824325 DOI: 10.1109/rbme.2023.3324264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
With the recent advancement of novel biomedical technologies such as high-throughput sequencing and wearable devices, multi-modal biomedical data ranging from multi-omics molecular data to real-time continuous bio-signals are generated at an unprecedented speed and scale every day. For the first time, these multi-modal biomedical data are able to make precision medicine close to a reality. However, due to data volume and the complexity, making good use of these multi-modal biomedical data requires major effort. Researchers and clinicians are actively developing artificial intelligence (AI) approaches for data-driven knowledge discovery and causal inference using a variety of biomedical data modalities. These AI-based approaches have demonstrated promising results in various biomedical and healthcare applications. In this review paper, we summarize the state-of-the-art AI models for integrating multi-omics data and electronic health records (EHRs) for precision medicine. We discuss the challenges and opportunities in integrating multi-omics data with EHRs and future directions. We hope this review can inspire future research and developing in integrating multi-omics data with EHRs for precision medicine.
Collapse
|
20
|
Wang H, Han X, Ren J, Cheng H, Li H, Li Y, Li X. A prognostic prediction model for ovarian cancer using a cross-modal view correlation discovery network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:736-764. [PMID: 38303441 DOI: 10.3934/mbe.2024031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Ovarian cancer is a tumor with different clinicopathological and molecular features, and the vast majority of patients have local or extensive spread at the time of diagnosis. Early diagnosis and prognostic prediction of patients can contribute to the understanding of the underlying pathogenesis of ovarian cancer and the improvement of therapeutic outcomes. The occurrence of ovarian cancer is influenced by multiple complex mechanisms, including the genome, transcriptome and proteome. Different types of omics analysis help predict the survival rate of ovarian cancer patients. Multi-omics data of ovarian cancer exhibit high-dimensional heterogeneity, and existing methods for integrating multi-omics data have not taken into account the variability and inter-correlation between different omics data. In this paper, we propose a deep learning model, MDCADON, which utilizes multi-omics data and cross-modal view correlation discovery network. We introduce random forest into LASSO regression for feature selection on mRNA expression, DNA methylation, miRNA expression and copy number variation (CNV), aiming to select important features highly correlated with ovarian cancer prognosis. A multi-modal deep neural network is used to comprehensively learn feature representations of each omics data and clinical data, and cross-modal view correlation discovery network is employed to construct the multi-omics discovery tensor, exploring the inter-relationships between different omics data. The experimental results demonstrate that MDCADON is superior to the existing methods in predicting ovarian cancer prognosis, which enables survival analysis for patients and facilitates the determination of follow-up treatment plans. Finally, we perform Gene Ontology (GO) term analysis and biological pathway analysis on the genes identified by MDCADON, revealing the underlying mechanisms of ovarian cancer and providing certain support for guiding ovarian cancer treatments.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xiao Han
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Jianxue Ren
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Hao Cheng
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Haolin Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Ying Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| | - Xue Li
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
21
|
Liu H, Shi Y, Li A, Wang M. Multi-modal fusion network with intra- and inter-modality attention for prognosis prediction in breast cancer. Comput Biol Med 2024; 168:107796. [PMID: 38064843 DOI: 10.1016/j.compbiomed.2023.107796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/20/2023] [Accepted: 11/29/2023] [Indexed: 01/10/2024]
Abstract
Accurate breast cancer prognosis prediction can help clinicians to develop appropriate treatment plans and improve life quality for patients. Recent prognostic prediction studies suggest that fusing multi-modal data, e.g., genomic data and pathological images, plays a crucial role in improving predictive performance. Despite promising results of existing approaches, there remain challenges in effective multi-modal fusion. First, albeit a powerful fusion technique, Kronecker product produces high-dimensional quadratic expansion of features that may result in high computational cost and overfitting risk, thereby limiting its performance and applicability in cancer prognosis prediction. Second, most existing methods put more attention on learning cross-modality relations between different modalities, ignoring modality-specific relations that are complementary to cross-modality relations and beneficial for cancer prognosis prediction. To address these challenges, in this study we propose a novel attention-based multi-modal network to accurately predict breast cancer prognosis, which efficiently models both modality-specific and cross-modality relations without bringing in high-dimensional features. Specifically, two intra-modality self-attentional modules and an inter-modality cross-attentional module, accompanied by latent space transformation of channel affinity matrix, are developed to successfully capture modality-specific and cross-modality relations for efficient integration of genomic data and pathological images, respectively. Moreover, we design an adaptive fusion block to take full advantage of both modality-specific and cross-modality relations. Comprehensive experiment demonstrates that our method can effectively boost prognosis prediction performance of breast cancer and compare favorably with the state-of-the-art methods.
Collapse
Affiliation(s)
- Honglei Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Yi Shi
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China.
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China.
| |
Collapse
|
22
|
Lin YT, Zhou Q, Tan J, Tao Y. Multimodal and multi-omics-based deep learning model for screening of optic neuropathy. Heliyon 2023; 9:e22244. [PMID: 38046141 PMCID: PMC10686864 DOI: 10.1016/j.heliyon.2023.e22244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 12/05/2023] Open
Abstract
Purpose To examine the use of multimodal data and multi-omics strategies for optic nerve disease screening. Methods This was a single-center retrospective study. A deep learning model was created from fundus photography and infrared reflectance (IR) images of patients with diabetic optic neuropathy, glaucomatous optic neuropathy, and optic neuritis. Patients who were seen at the Ophthalmology Department of First Affiliated Hospital of Nanchang University in Jiangxi Province from November 2019 to April 2023 were included in this study. The data were analyzed in single and multimodal modes following the traditional omics, Resnet101, and fusion models. The accuracy and area-under-the-curve (AUC) of each model were compared. Results A total of 312 images fundus and infrared fundus photographs were collected from 156 patients. When multi-modal data was used, the accuracy of the traditional omics mode, Resnet101, and fusion models with the training set were 0.97, 0.98, and 0.99, respectively. The accuracy of the same models with the test sets were 0.72, 0.87, and 0.88, respectively. We compared single- and multi-mode states by applying the data to the different groups in the learning model. In the traditional omics model, the macro-average AUCs of the features extracted from fundus photography, IR images, and multimodal data were 0.94, 0.90, and 0.96, respectively. When the same data were processed in the Resnet101 model, the scores were 0.97 equally. However, when multimodal data was utilized, the macro-average AUCs in the traditional omics, Resnet101, and fusion modesl were 0.96, 0.97, and 0.99, respectively. Conclusion The deep learning model based on multimodal data and multi-omics strategies can improve the accuracy of screening and diagnosing diabetic optic neuropathy, glaucomatous optic neuropathy, and optic neuritis.
Collapse
Affiliation(s)
- Ye-ting Lin
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Qiong Zhou
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Jian Tan
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| | - Yulin Tao
- Department of Ophthalmology, The First Affiliated Hospital of Nanchang University, China
| |
Collapse
|
23
|
Zhu S, Wang W, Fang W, Cui M. Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21098-21119. [PMID: 38124589 DOI: 10.3934/mbe.2023933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multi-omics data make great effects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the effectiveness of the proposed framework is confirmed by evaluating five different multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AE-assisted multi-omics clustering method can identify clinically significant cancer subtypes.
Collapse
Affiliation(s)
- Shuwei Zhu
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Wenping Wang
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Wei Fang
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Meiji Cui
- School of Intelligent Manufacturing, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
24
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
25
|
Ellen JG, Jacob E, Nikolaou N, Markuzon N. Autoencoder-based multimodal prediction of non-small cell lung cancer survival. Sci Rep 2023; 13:15761. [PMID: 37737469 PMCID: PMC10517020 DOI: 10.1038/s41598-023-42365-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/09/2023] [Indexed: 09/23/2023] Open
Abstract
The ability to accurately predict non-small cell lung cancer (NSCLC) patient survival is crucial for informing physician decision-making, and the increasing availability of multi-omics data offers the promise of enhancing prognosis predictions. We present a multimodal integration approach that leverages microRNA, mRNA, DNA methylation, long non-coding RNA (lncRNA) and clinical data to predict NSCLC survival and identify patient subtypes, utilizing denoising autoencoders for data compression and integration. Survival performance for patients with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) was compared across modality combinations and data integration methods. Using The Cancer Genome Atlas data, our results demonstrate that survival prediction models combining multiple modalities outperform single modality models. The highest performance was achieved with a combination of only two modalities, lncRNA and clinical, at concordance indices (C-indices) of 0.69 ± 0.03 for LUAD and 0.62 ± 0.03 for LUSC. Models utilizing all five modalities achieved mean C-indices of 0.67 ± 0.04 and 0.63 ± 0.02 for LUAD and LUSC, respectively, while the best individual modality performance reached C-indices of 0.64 ± 0.03 for LUAD and 0.59 ± 0.03 for LUSC. Analysis of biological differences revealed two distinct survival subtypes with over 900 differentially expressed transcripts.
Collapse
Affiliation(s)
- Jacob G Ellen
- Institute of Health Informatics, University College London, London, UK.
| | - Etai Jacob
- AstraZeneca, Oncology Data Science, Waltham, MA, USA
| | | | | |
Collapse
|
26
|
Wang Q, He M, Guo L, Chai H. AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration. Brief Bioinform 2023; 24:bbad269. [PMID: 37497720 DOI: 10.1093/bib/bbad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.
Collapse
Affiliation(s)
- Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei 230000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| | - Longyi Guo
- Guangdong Provincial Hospital of Traditional Chinese Medical, Guangzhou 510000, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| |
Collapse
|
27
|
Zhu J, Oh JH, Simhal AK, Elkin R, Norton L, Deasy JO, Tannenbaum A. Geometric graph neural networks on multi-omics data to predict cancer survival outcomes. Comput Biol Med 2023; 163:107117. [PMID: 37329617 PMCID: PMC10638676 DOI: 10.1016/j.compbiomed.2023.107117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/25/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
The advance of sequencing technologies has enabled a thorough molecular characterization of the genome in human cancers. To improve patient prognosis predictions and subsequent treatment strategies, it is imperative to develop advanced computational methods to analyze large-scale, high-dimensional genomic data. However, traditional machine learning methods face a challenge in handling the high-dimensional, low-sample size problem that is shown in most genomic data sets. To address this, our group has developed geometric network analysis techniques on multi-omics data in connection with prior biological knowledge derived from protein-protein interactions (PPIs) or pathways. Geometric features obtained from the genomic network, such as Ollivier-Ricci curvature and the invariant measure of the associated Markov chain, have been shown to be predictive of survival outcomes in various cancers. In this study, we propose a novel supervised deep learning method called geometric graph neural network (GGNN) that incorporates such geometric features into deep learning for enhanced predictive power and interpretability. More specifically, we utilize a state-of-the-art graph neural network with sparse connections between the hidden layers based on known biology of the PPI network and pathway information. Geometric features along with multi-omics data are then incorporated into the corresponding layers. The proposed approach utilizes a local-global principle in such a manner that highly predictive features are selected at the front layers and fed directly to the last layer for multivariable Cox proportional-hazards regression modeling. The method was applied to multi-omics data from the CoMMpass study of multiple myeloma and ten major cancers in The Cancer Genome Atlas (TCGA). In most experiments, our method showed superior predictive performance compared to other alternative methods.
Collapse
Affiliation(s)
- Jiening Zhu
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA.
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Anish K Simhal
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Rena Elkin
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Larry Norton
- Department of Medicine, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, NY, USA.
| | - Allen Tannenbaum
- Department of Applied Mathematics & Statistics, Stony Brook University, NY, USA; Department of Computer Science, Stony Brook University, NY, USA.
| |
Collapse
|
28
|
Shi Y, Zhang Q, Mei J, Liu J. Editorial: Multi-omics analysis in tumor microenvironment and tumor heterogeneity. Front Genet 2023; 14:1271295. [PMID: 37680200 PMCID: PMC10482244 DOI: 10.3389/fgene.2023.1271295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 08/17/2023] [Indexed: 09/09/2023] Open
Affiliation(s)
- Yuxin Shi
- Department of Oncology, The Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, Jiangsu, China
| | - Qinglin Zhang
- Department of Gastroenterology, The Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, Jiangsu, China
| | - Jie Mei
- Department of Oncology, The Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, Jiangsu, China
| | - Jinhui Liu
- Department of Gynecology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, China
| |
Collapse
|
29
|
Wekesa JS, Kimwele M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front Genet 2023; 14:1199087. [PMID: 37547471 PMCID: PMC10398577 DOI: 10.3389/fgene.2023.1199087] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/11/2023] [Indexed: 08/08/2023] Open
Abstract
Accurate diagnosis is the key to providing prompt and explicit treatment and disease management. The recognized biological method for the molecular diagnosis of infectious pathogens is polymerase chain reaction (PCR). Recently, deep learning approaches are playing a vital role in accurately identifying disease-related genes for diagnosis, prognosis, and treatment. The models reduce the time and cost used by wet-lab experimental procedures. Consequently, sophisticated computational approaches have been developed to facilitate the detection of cancer, a leading cause of death globally, and other complex diseases. In this review, we systematically evaluate the recent trends in multi-omics data analysis based on deep learning techniques and their application in disease prediction. We highlight the current challenges in the field and discuss how advances in deep learning methods and their optimization for application is vital in overcoming them. Ultimately, this review promotes the development of novel deep-learning methodologies for data integration, which is essential for disease detection and treatment.
Collapse
|
30
|
Lee M. Deep Learning Techniques with Genomic Data in Cancer Prognosis: A Comprehensive Review of the 2021-2023 Literature. BIOLOGY 2023; 12:893. [PMID: 37508326 PMCID: PMC10376033 DOI: 10.3390/biology12070893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023]
Abstract
Deep learning has brought about a significant transformation in machine learning, leading to an array of novel methodologies and consequently broadening its influence. The application of deep learning in various sectors, especially biomedical data analysis, has initiated a period filled with noteworthy scientific developments. This trend has majorly influenced cancer prognosis, where the interpretation of genomic data for survival analysis has become a central research focus. The capacity of deep learning to decode intricate patterns embedded within high-dimensional genomic data has provoked a paradigm shift in our understanding of cancer survival. Given the swift progression in this field, there is an urgent need for a comprehensive review that focuses on the most influential studies from 2021 to 2023. This review, through its careful selection and thorough exploration of dominant trends and methodologies, strives to fulfill this need. The paper aims to enhance our existing understanding of applications of deep learning in cancer survival analysis, while also highlighting promising directions for future research. This paper undertakes aims to enrich our existing grasp of the application of deep learning in cancer survival analysis, while concurrently shedding light on promising directions for future research in this vibrant and rapidly proliferating field.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
31
|
Wang L, Bao Y, Yu F, Zhu W, Wang JL, Yang J, Xie H, Huang D. Development of gene model combined with machine learning technology to predict for advanced atherosclerotic plaques. Clin Neurol Neurosurg 2023; 231:107819. [PMID: 37315377 DOI: 10.1016/j.clineuro.2023.107819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/03/2023] [Accepted: 06/04/2023] [Indexed: 06/16/2023]
Abstract
BACKGROUND Atherosclerosis, as a major cause of stroke, is responsible for a quarter of deaths worldwide. In particular, rupture of late-stage plaques in large vessels such as the carotid artery can lead to serious cardiovascular disease. The aim of our study was to establish a genetic model combined with machining leaning techniques to screen out gene signatures and predict for advanced atherosclerosis plaques. METHODS The microarray dataset GSE28829 and GSE43292 which were publicly obtained from the Gene Expression Omnibus database were utilized to screen for potential predictive genes. Differentially expressed genes (DEGs) were identified by using the "limma" R package. Gene Ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) analyses of these DEGs were performed by Metascape. Later, Random Forest (RF) algorithm was applied to further screen out top-30 genes which contribute the most. The expression data of top 30-DEGs were converted into a "Gene Score". Finally, we developed a model based on artificial neural network (ANN) to predict advanced atherosclerotic plaques. The model later was validated in an independent test dataset GSE104140. RESULTS A total of 176 DEGs were identified in the training datasets. GO and KEGG enrichment analysis revealed that these genes were enriched in leukocyte-mediated immune response, cytokine- cytokine interactions, and immunoinflammatory signaling. Further, top-30 genes (including 25 upregulated and 5 downregulated DEGs) were screened as predictors by RF algorithm. The predictive model was developed with a significantly predictive value (AUC = 0.913) in the training datasets, and was validated with an independent dataset GSE104140 (AUC = 0.827). CONCLUSION In present study, our prediction model was established and showed satisfactory predictive power in both training and test datasets. In addition, this is the first study adopted bioinformatics methods combined with machine learning techniques (RF and ANN) to explore and predict for the advanced atherosclerotic plaques. However, further investigations were needed to verify the screened DEGs and predictive effectiveness of this model.
Collapse
Affiliation(s)
- Lufeng Wang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Yiwen Bao
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Fei Yu
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Wenxia Zhu
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jun Lang Wang
- Department of Imaging, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jie Yang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
| | - Hongrong Xie
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.
| | - Dongya Huang
- Department of Neurology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China.
| |
Collapse
|
32
|
Choi JM, Chae H. moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinformatics 2023; 24:169. [PMID: 37101124 PMCID: PMC10131354 DOI: 10.1186/s12859-023-05273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Breast cancer is a highly heterogeneous disease that comprises multiple biological components. Owing its diversity, patients have different prognostic outcomes; hence, early diagnosis and accurate subtype prediction are critical for treatment. Standardized breast cancer subtyping systems, mainly based on single-omics datasets, have been developed to ensure proper treatment in a systematic manner. Recently, multi-omics data integration has attracted attention to provide a comprehensive view of patients but poses a challenge due to the high dimensionality. In recent years, deep learning-based approaches have been proposed, but they still present several limitations. RESULTS In this study, we describe moBRCA-net, an interpretable deep learning-based breast cancer subtype classification framework that uses multi-omics datasets. Three omics datasets comprising gene expression, DNA methylation and microRNA expression data were integrated while considering the biological relationships among them, and a self-attention module was applied to each omics dataset to capture the relative importance of each feature. The features were then transformed to new representations considering the respective learned importance, allowing moBRCA-net to predict the subtype. CONCLUSIONS Experimental results confirmed that moBRCA-net has a significantly enhanced performance compared with other methods, and the effectiveness of multi-omics integration and omics-level attention were identified. moBRCA-net is publicly available at https://github.com/cbi-bioinfo/moBRCA-net .
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea.
| |
Collapse
|
33
|
Wissel D, Rowson D, Boeva V. Systematic comparison of multi-omics survival models reveals a widespread lack of noise resistance. CELL REPORTS METHODS 2023; 3:100461. [PMID: 37159669 PMCID: PMC10162996 DOI: 10.1016/j.crmeth.2023.100461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 02/01/2023] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
As observed in several previous studies, integrating more molecular modalities in multi-omics cancer survival models may not always improve model accuracy. In this study, we compared eight deep learning and four statistical integration techniques for survival prediction on 17 multi-omics datasets, examining model performance in terms of overall accuracy and noise resistance. We found that one deep learning method, mean late fusion, and two statistical methods, PriorityLasso and BlockForest, performed best in terms of both noise resistance and overall discriminative and calibration performance. Nevertheless, all methods struggled to adequately handle noise when too many modalities were added. In summary, we confirmed that current multi-omics survival methods are not sufficiently noise resistant. We recommend relying on only modalities for which there is known predictive value for a particular cancer type until models that have stronger noise-resistance properties are developed.
Collapse
Affiliation(s)
- David Wissel
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
- University of Zurich, Department of Molecular Life Sciences, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniel Rowson
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
| | - Valentina Boeva
- ETH Zurich, Department of Computer Science, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Université de Paris UMR-S1016, Institut Cochin, Inserm U1016, Paris, France
- Corresponding author
| |
Collapse
|
34
|
Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023; 5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]
Abstract
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.
Collapse
Affiliation(s)
- Sandra Steyaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | | | | | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
35
|
Local augmented graph neural network for multi-omics cancer prognosis prediction and analysis. Methods 2023; 213:1-9. [PMID: 36933628 DOI: 10.1016/j.ymeth.2023.02.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 12/30/2022] [Accepted: 02/25/2023] [Indexed: 03/17/2023] Open
Abstract
Cancer prognosis prediction and analysis can help patients understand expected life and help clinicians provide correct therapeutic guidance. Thanks to the development of sequencing technology, multi-omics data, and biological networks have been used for cancer prognosis prediction. Besides, graph neural networks can simultaneously consider multi-omics features and molecular interactions in biological networks, becoming mainstream in cancer prognosis prediction and analysis. However, the limited number of neighboring genes in biological networks restricts the accuracy of graph neural networks. To solve this problem, a local augmented graph convolutional network named LAGProg is proposed in this paper for cancer prognosis prediction and analysis. The process follows: first, given a patient's multi-omics data features and biological network, the corresponding augmented conditional variational autoencoder generates features. Then, the generated augmented features and the original features are fed into a cancer prognosis prediction model to complete the cancer prognosis prediction task. The conditional variational autoencoder consists of two parts: encoder-decoder. In the encoding phase, an encoder learns the conditional distribution of the multi-omics data. As a generative model, a decoder takes the conditional distribution and the original feature as inputs to generate the enhanced features. The cancer prognosis prediction model consists of a two-layer graph convolutional neural network and a Cox proportional risk network. The Cox proportional risk network consists of fully connected layers. Extensive experiments on 15 real-world datasets from TCGA demonstrated the effectiveness and efficiency of the proposed method in predicting cancer prognosis. LAGProg improved the C-index values by an average of 8.5% over the state-of-the-art graph neural network method. Moreover, we confirmed that the local augmentation technique could enhance the model's ability to represent multi-omics features, improve the model's robustness to missing multi-omics features, and prevent the model's over-smoothing during training. Finally, based on genes identified through differential expression analysis, we discovered 13 prognostic markers highly associated with breast cancer, among which ten genes have been proved by literature review.
Collapse
|
36
|
Du X, Zhao Y. Multimodal adversarial representation learning for breast cancer prognosis prediction. Comput Biol Med 2023; 157:106765. [PMID: 36963355 DOI: 10.1016/j.compbiomed.2023.106765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/17/2023]
Abstract
With the increasing incidence of breast cancer, accurate prognosis prediction of breast cancer patients is a key issue in current cancer research, and it is also of great significance for patients' psychological rehabilitation and assisting clinical decision-making. Many studies that integrate data from different heterogeneous modalities such as gene expression profile, clinical data, and copy number alteration, have achieved greater success than those with only one modality in prognostic prediction. However, many of these approaches that exist fail to dramatically reduce the modality gap by aligning multimodal distributions. Therefore, it is crucial to develop a method that fully considers a modality-invariant embedding space to effectively integrate multimodal data. In this study, to reduce the modality gap, we propose a multimodal data adversarial representation framework (MDAR) to reduce the modal heterogeneity by translating source modalities into distributions for the target modality. Additionally, we apply reconstruction and classification losses to embedding space to further constrain it. Then, we design a multi-scale bilinear convolutional neural network (MS-B-CNN) for uni-modality to improve the feature expression ability. In addition, the embedding space generates predictions as stacked feature inputs to the extremely randomized trees classifier. With 10-fold cross-validation, our results show that the proposed adversarial representation learning improves prognostic performance. A comparative study of this method and other existing methods on the METABRIC (1980 patients) dataset showed that Matthews correlation coefficient (Mcc) was significantly enhanced by 7.4% in the prognosis prediction of breast cancer patients.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, China; School of Computer Science and Technology, Anhui University, Hefei, China.
| | - Yuefan Zhao
- School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
37
|
Sheehy J, Rutledge H, Acharya UR, Loh HW, Gururajan R, Tao X, Zhou X, Li Y, Gurney T, Kondalsamy-Chennakesavan S. Gynecological cancer prognosis using machine learning techniques: A systematic review of last three decades (1990–2022). Artif Intell Med 2023; 139:102536. [PMID: 37100507 DOI: 10.1016/j.artmed.2023.102536] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/19/2023] [Accepted: 03/23/2023] [Indexed: 03/30/2023]
Abstract
OBJECTIVE Many Computer Aided Prognostic (CAP) systems based on machine learning techniques have been proposed in the field of oncology. The objective of this systematic review was to assess and critically appraise the methodologies and approaches used in predicting the prognosis of gynecological cancers using CAPs. METHODS Electronic databases were used to systematically search for studies utilizing machine learning methods in gynecological cancers. Study risk of bias (ROB) and applicability were assessed using the PROBAST tool. 139 studies met the inclusion criteria, of which 71 predicted outcomes for ovarian cancer patients, 41 predicted outcomes for cervical cancer patients, 28 predicted outcomes for uterine cancer patients, and 2 predicted outcomes for gynecological malignancies broadly. RESULTS Random forest (22.30 %) and support vector machine (21.58 %) classifiers were used most commonly. Use of clinicopathological, genomic and radiomic data as predictors was observed in 48.20 %, 51.08 % and 17.27 % of studies, respectively, with some studies using multiple modalities. 21.58 % of studies were externally validated. Twenty-three individual studies compared ML and non-ML methods. Study quality was highly variable and methodologies, statistical reporting and outcome measures were inconsistent, preventing generalized commentary or meta-analysis of performance outcomes. CONCLUSION There is significant variability in model development when prognosticating gynecological malignancies with respect to variable selection, machine learning (ML) methods and endpoint selection. This heterogeneity prevents meta-analysis and conclusions regarding the superiority of ML methods. Furthermore, PROBAST-mediated ROB and applicability analysis demonstrates concern for the translatability of existing models. This review identifies ways that this can be improved upon in future works to develop robust, clinically translatable models within this promising field.
Collapse
|
38
|
Mohammed MA, Abdulkareem KH, Dinar AM, Zapirain BG. Rise of Deep Learning Clinical Applications and Challenges in Omics Data: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13040664. [PMID: 36832152 PMCID: PMC9955380 DOI: 10.3390/diagnostics13040664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
This research aims to review and evaluate the most relevant scientific studies about deep learning (DL) models in the omics field. It also aims to realize the potential of DL techniques in omics data analysis fully by demonstrating this potential and identifying the key challenges that must be addressed. Numerous elements are essential for comprehending numerous studies by surveying the existing literature. For example, the clinical applications and datasets from the literature are essential elements. The published literature highlights the difficulties encountered by other researchers. In addition to looking for other studies, such as guidelines, comparative studies, and review papers, a systematic approach is used to search all relevant publications on omics and DL using different keyword variants. From 2018 to 2022, the search procedure was conducted on four Internet search engines: IEEE Xplore, Web of Science, ScienceDirect, and PubMed. These indexes were chosen because they offer enough coverage and linkages to numerous papers in the biological field. A total of 65 articles were added to the final list. The inclusion and exclusion criteria were specified. Of the 65 publications, 42 are clinical applications of DL in omics data. Furthermore, 16 out of 65 articles comprised the review publications based on single- and multi-omics data from the proposed taxonomy. Finally, only a small number of articles (7/65) were included in papers focusing on comparative analysis and guidelines. The use of DL in studying omics data presented several obstacles related to DL itself, preprocessing procedures, datasets, model validation, and testbed applications. Numerous relevant investigations were performed to address these issues. Unlike other review papers, our study distinctly reflects different observations on omics with DL model areas. We believe that the result of this study can be a useful guideline for practitioners who look for a comprehensive view of the role of DL in omics data analysis.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq
- eVIDA Lab, University of Deusto, 48007 Bilbao, Spain
- Correspondence: (M.A.M.); (B.G.Z.)
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq
| | - Ahmed M. Dinar
- Computer Engineering Department, University of Technology- Iraq, Baghdad 19006, Iraq
| | | |
Collapse
|
39
|
Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet 2023; 13:1032768. [PMID: 36685873 PMCID: PMC9846505 DOI: 10.3389/fgene.2022.1032768] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/15/2022] [Indexed: 01/05/2023] Open
Abstract
Integrating multi-omics data for cancer subtype recognition is an important task in bioinformatics. Recently, deep learning has been applied to recognize the subtype of cancers. However, existing studies almost integrate the multi-omics data simply by concatenation as the single data and then learn a latent low-dimensional representation through a deep learning model, which did not consider the distribution differently of omics data. Moreover, these methods ignore the relationship of samples. To tackle these problems, we proposed SADLN: A self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. SADLN combined encoder, self-attention, decoder, and discriminator into a unified framework, which can not only integrate multi-omics data but also adaptively model the sample's relationship for learning an accurately latent low-dimensional representation. With the integrated representation learned from the network, SADLN used Gaussian Mixture Model to identify cancer subtypes. Experiments on ten cancer datasets of TCGA demonstrated the advantages of SADLN compared to ten methods. The Self-Attention Based Deep Learning Network (SADLN) is an effective method of integrating multi-omics data for cancer subtype recognition.
Collapse
Affiliation(s)
- Qiuwen Sun
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Lei Cheng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Ao Meng
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| | - Shuguang Ge
- School of Information and Control Engineering, University of Mining and Technology, Xuzhou, China
| | - Jie Chen
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Longzhen Zhang
- Department of Radiation Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Ping Gong
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
40
|
Liao J, Li X, Gan Y, Han S, Rong P, Wang W, Li W, Zhou L. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2023; 12:998222. [PMID: 36686757 PMCID: PMC9846804 DOI: 10.3389/fonc.2022.998222] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 11/22/2022] [Indexed: 01/06/2023] Open
Abstract
Cancer is a major medical problem worldwide. Due to its high heterogeneity, the use of the same drugs or surgical methods in patients with the same tumor may have different curative effects, leading to the need for more accurate treatment methods for tumors and personalized treatments for patients. The precise treatment of tumors is essential, which renders obtaining an in-depth understanding of the changes that tumors undergo urgent, including changes in their genes, proteins and cancer cell phenotypes, in order to develop targeted treatment strategies for patients. Artificial intelligence (AI) based on big data can extract the hidden patterns, important information, and corresponding knowledge behind the enormous amount of data. For example, the ML and deep learning of subsets of AI can be used to mine the deep-level information in genomics, transcriptomics, proteomics, radiomics, digital pathological images, and other data, which can make clinicians synthetically and comprehensively understand tumors. In addition, AI can find new biomarkers from data to assist tumor screening, detection, diagnosis, treatment and prognosis prediction, so as to providing the best treatment for individual patients and improving their clinical outcomes.
Collapse
Affiliation(s)
- Jinzhuang Liao
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiaoying Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Yu Gan
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shuangze Han
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Pengfei Rong
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Wang
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Li
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Li Zhou
- Department of Radiology, The Third Xiangya Hospital of Central South University, Changsha, Hunan, China
- Cell Transplantation and Gene Therapy Institute, The Third Xiangya Hospital, Central South University, Changsha, Hunan, China
- Department of Pathology, The Xiangya Hospital of Central South University, Changsha, Hunan, China
| |
Collapse
|
41
|
Khairuddin MZF, Hasikin K, Razak NAA, Mohshim SA, Ibrahim SS. Harnessing the Multimodal Data Integration and Deep Learning for Occupational Injury Severity Prediction. IEEE ACCESS 2023; 11:85284-85302. [DOI: 10.1109/access.2023.3304328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Affiliation(s)
| | - Khairunnisa Hasikin
- Department of Biomedical Engineering, Faculty of Engineering, University Malaya, Kuala Lumpur, Malaysia
| | - Nasrul Anuar Abd Razak
- Department of Biomedical Engineering, Faculty of Engineering, University Malaya, Kuala Lumpur, Malaysia
| | - Siti Afifah Mohshim
- Medical Engineering Technology Section, British Malaysian Institute, Universiti Kuala Lumpur, Kuala Lumpur, Selangor, Malaysia
| | - Siti Salwa Ibrahim
- Negeri Sembilan State Health Department, Ministry of Health, Seremban, Negeri Sembilan, Malaysia
| |
Collapse
|
42
|
Han Y, Pan F, Song H, Luo R, Li C, Pi H, Wang J, Li T. Intelligent injury prediction for traumatic airway obstruction. Med Biol Eng Comput 2023; 61:139-153. [PMID: 36331757 DOI: 10.1007/s11517-022-02706-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 10/22/2022] [Indexed: 11/07/2022]
Abstract
Airway obstruction is one of the crucial causes of death in trauma patients during the first aid. It is extremely challenging to accurately treat a great deal of casualties with airway obstruction in hospitals. The diagnosis of airway obstruction in an emergency mostly relies on the medical experience of physicians. In this paper, we propose the feature selection approach genetic algorithm-mean decrease impurity (GA-MDI) to effectively minimize the number of features as well as ensure the accuracy of prediction. Furthermore, we design a multi-modal neural network, called fully convolutional network with squeeze-and-excitation and multilayer perceptron (FCN-SE + MLP), to help physicians to predict the severity of airway obstruction. We validate the effectiveness of the proposed feature selection approach and multi-modal model on the emergency medical database from the Chinese General Hospital of the PLA. The experimental results show that GA-MDI outperforms the existing feature selection algorithms, while it is also validated that the model FCN-SE + MLP can effectively and accurately achieve the prediction of the severity of airway obstruction, which can assist clinicians in making treatment decisions for airway obstruction casualties.
Collapse
Affiliation(s)
- Youfang Han
- School of Software, Tsinghua University, Beijing, China
| | - Fei Pan
- Emergency Department, The First Medical Center of PLA General Hospital, Beijing, China
| | - Hainan Song
- Emergency Department, The First Medical Center of PLA General Hospital, Beijing, China
| | - Ruihong Luo
- School of Software, Tsinghua University, Beijing, China
| | - Chunping Li
- School of Software, Tsinghua University, Beijing, China.
| | - Hongying Pi
- Nursing Department, PLA General Hospital, Beijing, China.
| | - Jianrong Wang
- Nursing Department, PLA General Hospital, Beijing, China.
| | - Tanshi Li
- Emergency Department, The First Medical Center of PLA General Hospital, Beijing, China
| |
Collapse
|
43
|
Rong Z, Liu Z, Song J, Cao L, Yu Y, Qiu M, Hou Y. MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data. Comput Biol Med 2022; 150:106085. [PMID: 36162197 DOI: 10.1016/j.compbiomed.2022.106085] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/30/2022] [Accepted: 09/03/2022] [Indexed: 11/03/2022]
Abstract
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Collapse
Affiliation(s)
- Zhiwei Rong
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Zhilin Liu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Jiali Song
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Lei Cao
- Department of Epidemiology and Biostatistics Harbin, Harbin Medical University School of Public Health, Harbin, 150000, Heilongjiang, China
| | - Yipe Yu
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China
| | - Mantang Qiu
- Department of Thoracic Surgery Beijing, Peking University People's Hospital, Beijing, 100000, China.
| | - Yan Hou
- Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
| |
Collapse
|
44
|
Dickinson Q, Aufschnaiter A, Ott M, Meyer JG. Multi-omic integration by machine learning (MIMaL). Bioinformatics 2022; 38:4908-4918. [PMID: 36106996 PMCID: PMC9801967 DOI: 10.1093/bioinformatics/btac631] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 08/17/2022] [Accepted: 09/14/2022] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Cells respond to environments by regulating gene expression to exploit resources optimally. Recent advances in technologies allow for measuring the abundances of RNA, proteins, lipids and metabolites. These highly complex datasets reflect the states of the different layers in a biological system. Multi-omics is the integration of these disparate methods and data to gain a clearer picture of the biological state. Multi-omic studies of the proteome and metabolome are becoming more common as mass spectrometry technology continues to be democratized. However, knowledge extraction through the integration of these data remains challenging. RESULTS Connections between molecules in different omic layers were discovered through a combination of machine learning and model interpretation. Discovered connections reflected protein control (ProC) over metabolites. Proteins discovered to control citrate were mapped onto known genetic and metabolic networks, revealing that these protein regulators are novel. Further, clustering the magnitudes of ProC over all metabolites enabled the prediction of five gene functions, each of which was validated experimentally. Two uncharacterized genes, YJR120W and YDL157C, were accurately predicted to modulate mitochondrial translation. Functions for three incompletely characterized genes were also predicted and validated, including SDH9, ISC1 and FMP52. A website enables results exploration and also MIMaL analysis of user-supplied multi-omic data. AVAILABILITY AND IMPLEMENTATION The website for MIMaL is at https://mimal.app. Code for the website is at https://github.com/qdickinson/mimal-website. Code to implement MIMaL is at https://github.com/jessegmeyerlab/MIMaL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Quinn Dickinson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Andreas Aufschnaiter
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Martin Ott
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden
| | - Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| |
Collapse
|
45
|
Chen S, Zang Y, Xu B, Lu B, Ma R, Miao P, Chen B. An Unsupervised Deep Learning-Based Model Using Multiomics Data to Predict Prognosis of Patients with Stomach Adenocarcinoma. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5844846. [PMID: 36339684 PMCID: PMC9633210 DOI: 10.1155/2022/5844846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 09/25/2022] [Accepted: 10/08/2022] [Indexed: 09/08/2023]
Abstract
METHODS Patients (363 in total) with stomach adenocarcinoma from The Cancer Genome Atlas (TCGA) cohort were included. An autoencoder was constructed to integrate the RNA sequencing, miRNA sequencing, and methylation data. The features of the bottleneck layer were used to perform the k-means clustering algorithm to obtain different subgroups for evaluating the prognosis-related risk of stomach adenocarcinoma. The model's robustness was verified using a 10-fold cross-validation (CV). Survival was analyzed by the Kaplan-Meier method. Univariate and multivariate Cox regression was used to estimate hazard risk. The model was validated in three independent cohorts with different endpoints. RESULTS The patients were divided into low-risk and high-risk groups according to the k-means clustering algorithm. The high-risk group had a significantly higher risk of poor survival (log-rank P value = 2.80e - 06; adjusted hazard ratio = 2.386, 95% confidence interval: 1.607~3.543), a concordance index (C-index) of 0.714, and a Brier score of 0.184. The model performed well both in the 10-fold CV procedure and three independent cohorts from the Gene Expression Omnibus (GEO) repository. CONCLUSIONS A robust and generalizable model based on the autoencoder was proposed to integrate multiomics data and predict the prognosis of patients with stomach adenocarcinoma. The model demonstrates better performance than two alternative approaches on prognosis prediction. The results might provide the grounds for further exploring the potential biomarkers to predict the prognosis of patients with stomach adenocarcinoma.
Collapse
Affiliation(s)
- Sizhen Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Yiteng Zang
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Biyun Xu
- Department of Biostatistics, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing 210008, China
| | - Beier Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Rongji Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Pengcheng Miao
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| | - Bingwei Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China
| |
Collapse
|
46
|
P D, C G. A systematic review on machine learning and deep learning techniques in cancer survival prediction. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 174:62-71. [PMID: 35933043 DOI: 10.1016/j.pbiomolbio.2022.07.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/13/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
Cancer is a disease which is characterised by the unusual and uncontrollable growth of body cells. This usually happens asymptomatically and gets spread to other parts of the body. The major problem in treating cancer is that its progress is not monitored once it is diagnosed. The progress or the prognosis can be done through survival analysis. The survival analysis is the branch of statistics that deals in predicting the time of event of occurrence. In the case of cancer prognosis the event is the survival time of the patient from the onset of the disease or it can be the recurrence of the disease after undergoing a treatment. This study aims to bring out the machine learning and deep learning models involved in providing the prognosis to the cancer patients.
Collapse
Affiliation(s)
- Deepa P
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Gunavathi C
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
47
|
Mammographic Classification of Breast Cancer Microcalcifications through Extreme Gradient Boosting. ELECTRONICS 2022. [DOI: 10.3390/electronics11152435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this paper, we proposed an effective and efficient approach to the classification of breast cancer microcalcifications and evaluated the mathematical model for calcification on mammography with a large medical dataset. We employed several semi-automatic segmentation algorithms to extract 51 calcification features from mammograms, including morphologic and textural features. We adopted extreme gradient boosting (XGBoost) to classify microcalcifications. Then, we compared other machine learning techniques, including k-nearest neighbor (kNN), adaboostM1, decision tree, random decision forest (RDF), and gradient boosting decision tree (GBDT), with XGBoost. XGBoost showed the highest accuracy (90.24%) for classifying microcalcifications, and kNN demonstrated the lowest accuracy. This result demonstrates that it is essential for the classification of microcalcification to use the feature engineering method for the selection of the best composition of features. One of the contributions of this study is to present the best composition of features for efficient classification of breast cancers. This paper finds a way to select the best discriminative features as a collection to improve the accuracy. This study showed the highest accuracy (90.24%) for classifying microcalcifications with AUC = 0.89. Moreover, we highlighted the performance of various features from the dataset and found ideal parameters for classifying microcalcifications. Furthermore, we found that the XGBoost model is suitable both in theory and practice for the classification of calcifications on mammography.
Collapse
|
48
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
49
|
Mo L, Su Y, Yuan J, Xiao Z, Zhang Z, Lan X, Huang D. Comparisons of Forecasting for Survival Outcome for Head and Neck Squamous Cell Carcinoma by using Machine Learning Models based on Multi-omics. Curr Genomics 2022; 23:94-108. [PMID: 36778975 PMCID: PMC9878835 DOI: 10.2174/1389202923666220204153744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 01/13/2022] [Accepted: 01/19/2022] [Indexed: 11/22/2022] Open
Abstract
Background: Machine learning methods showed excellent predictive ability in a wide range of fields. For the survival of head and neck squamous cell carcinoma (HNSC), its multi-omics influence is crucial. This study attempts to establish a variety of machine learning multi-omics models to predict the survival of HNSC and find the most suitable machine learning prediction method. Methods: The HNSC clinical data and multi-omics data were downloaded from the TCGA database. The important variables were screened by the LASSO algorithm. We used a total of 12 supervised machine learning models to predict the outcome of HNSC survival and compared the results. In vitro qPCR was performed to verify core genes predicted by the random forest algorithm. Results: For omics of HNSC, the results of the twelve models showed that the performance of multi-omics was better than each single-omic alone. Results were presented, which showed that the Bayesian network(BN) model (area under the curve [AUC] 0.8250, F1 score=0.7917) and random forest(RF) model (area under the curve [AUC] 0.8002,F1 score=0.7839) played good prediction performance in HNSC multi-omics data. The results of in vitro qPCR were consistent with the RF algorithm. Conclusion: Machine learning methods could better forecast the survival outcome of HNSC. Meanwhile, this study found that the BN model and the RF model were the most superior. Moreover, the forecast result of multi-omics was better than single-omic alone in HNSC.
Collapse
Affiliation(s)
- Liying Mo
- School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China;,These authors contributed equally to this work
| | - Yuangang Su
- School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China;,Research Centre for Regenerative Medicine, Guangxi Key Laboratory of Regenerative Medicine, Guangxi Medical University, Nanning, Guangxi, China;,These authors contributed equally to this work
| | - Jianhui Yuan
- School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China;,The Laboratory of Biomedical Photonics and Engineering, Guangxi Medical University, Nanning, China
| | - Zhiwei Xiao
- School of Information and Management, Guangxi Medical University, Nanning, Guangxi, China
| | - Ziyan Zhang
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi, China
| | - Xiuwan Lan
- School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China;,These authors contributed equally to this work
| | - Daizheng Huang
- School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China;,The Laboratory of Biomedical Photonics and Engineering, Guangxi Medical University, Nanning, China;,Address correspondence to this author at the School of Basic Medical Sciences, Guangxi Medical University, Nanning, Guangxi, China; The Laboratory of Biomedical Photonics and Engineering, Guangxi Medical University, Nanning, China; Tel: +867715358270; E-mail:
| |
Collapse
|
50
|
Mo H, Breitling R, Francavilla C, Schwartz JM. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions. CURRENT OPINION IN ENDOCRINE AND METABOLIC RESEARCH 2022; 24:None. [PMID: 36034741 PMCID: PMC9402443 DOI: 10.1016/j.coemr.2022.100350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Breast cancer is one of the most common cancers threatening women worldwide. A limited number of available treatment options, frequent recurrence, and drug resistance exacerbate the prognosis of breast cancer patients. Thus, there is an urgent need for methods to investigate novel treatment options, while taking into account the vast molecular heterogeneity of breast cancer. Recent advances in molecular profiling technologies, including genomics, epigenomics, transcriptomics, proteomics and metabolomics data, enable approaching breast cancer biology at multiple levels of omics interaction networks. Systems biology approaches, including computational inference of ‘big data’ and mechanistic modelling of specific pathways, are emerging to identify potential novel combinations of breast cancer subtype signatures and more diverse targeted therapies.
Collapse
|