1
|
Du W, Jia M, Li J, Gao M, Zhang W, Yu Y, Wang H, Peng X. Prognostic prediction model for salivary gland carcinoma based on machine learning. Int J Oral Maxillofac Surg 2024:S0901-5027(24)00216-9. [PMID: 38981745 DOI: 10.1016/j.ijom.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 06/23/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024]
Abstract
Although rare overall, salivary gland carcinomas (SGCs) are among the most common oral and maxillofacial malignancies. The aim of this study was to develop a machine learning-based model to predict the survival of patients with SGC. Patients in whom SGC was confirmed by histological testing and who underwent primary extirpation at the authors' institution between 1963 and 2014 were identified. Demographic and clinicopathological data with complete follow-up information were collected for analysis. Feature selection methods were used to determine the correlation between prognosis-related factors and survival in the collected patient data. The collected clinicopathological data and multiple machine learning algorithms were used to develop a survival prediction model. Three machine learning algorithms were applied to construct the prediction models. The area under the receiver operating characteristic curve (AUC) and accuracy were used to measure model performance. The best classification performance was achieved with a LightGBM algorithm (AUC = 0.83, accuracy = 0.91). This model enabled prognostic prediction of patient survival. The model may be useful in developing personalized diagnostic and treatment strategies and formulating individualized follow-up plans, as well as assisting in the communication between doctors and patients, facilitating a better understanding of and compliance with treatment.
Collapse
Affiliation(s)
- W Du
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - M Jia
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China; Zhongguancun Hospital, Beijing, China
| | - J Li
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - M Gao
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - W Zhang
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - Y Yu
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - H Wang
- School of Mathematical Sciences, Beihang University, Beijing, China
| | - X Peng
- Department of Oral and Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China.
| |
Collapse
|
2
|
Pan L, Peng Y, Li Y, Wang X, Liu W, Xu L, Liang Q, Peng S. SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival. Comput Biol Med 2024; 172:108301. [PMID: 38492453 DOI: 10.1016/j.compbiomed.2024.108301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/03/2024] [Accepted: 03/12/2024] [Indexed: 03/18/2024]
Abstract
Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at https://github.com/panliangrui/Selector.
Collapse
Affiliation(s)
- Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Yijun Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Yan Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Xiang Wang
- Department of Thoracic Surgery, The second xiangya hospital, Central South University, Changsha, 410011, Hunan, China.
| | - Wenjuan Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| | - Qingchun Liang
- Department of Pathology, The second xiangya hospital, Central South University, Changsha, 410011, Hunan, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410083, Hunan, China.
| |
Collapse
|
3
|
Wu Y, Xiao Q, Wang S, Xu H, Fang Y. Establishment and Analysis of an Artificial Neural Network Model for Early Detection of Polycystic Ovary Syndrome Using Machine Learning Techniques. J Inflamm Res 2023; 16:5667-5676. [PMID: 38050562 PMCID: PMC10693771 DOI: 10.2147/jir.s438838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Background To identify novel gene combinations and to develop an early diagnostic model for Polycystic Ovary Syndrome (PCOS) through the integration of artificial neural networks (ANN) and random forest (RF) methods. Methods We retrieved and processed gene expression datasets for PCOS from the Gene Expression Omnibus (GEO) database. Differential expression analysis of genes (DEGs) within the training set was performed using the "limma" R package. Enrichment analyses on DEGs using gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), and immune cell infiltration. The identification of critical genes from DEGs was then performed using random forests, followed by the developing of new diagnostic models for PCOS using artificial neural networks. Results We identified 130 up-regulated genes and 132 down-regulated genes in PCOS compared to normal samples. Gene Ontology analysis revealed significant enrichment in myofibrils and highlighted crucial biological functions related to myofilament sliding, myofibril, and actin-binding. Compared with normal tissues, the types of immune cells expressed in PCOS samples are different. A random forest algorithm identified 10 significant genes proposed as potential PCOS-specific biomarkers. Using these genes, an artificial neural network diagnostic model accurately distinguished PCOS from normal samples. The diagnostic model underwent validation using the independent validation set, and the resulting area under the receiver operating characteristic curve (AUC) values was consistent with the anticipated outcomes. Conclusion Utilizing unique gene combinations, this research created a diagnostic model by merging random forest techniques with artificial neural networks. The AUC indicated a notably superior performance of the diagnostic model.
Collapse
Affiliation(s)
- Yumi Wu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - QiWei Xiao
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - ShouDong Wang
- The Out-Patient Department of TCM of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - Huanfang Xu
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| | - YiGong Fang
- Institute of Acupuncture and Moxibustion of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
- Acupuncture and Moxibustion Hospital of China Academy of Chinese Medical Sciences, Beijing, People’s Republic of China
| |
Collapse
|
4
|
Timilsina M, Fey D, Buosi S, Janik A, Costabello L, Carcereny E, Abreu DR, Cobo M, Castro RL, Bernabé R, Minervini P, Torrente M, Provencio M, Nováček V. Synergy between imputed genetic pathway and clinical information for predicting recurrence in early stage non-small cell lung cancer. J Biomed Inform 2023; 144:104424. [PMID: 37352900 DOI: 10.1016/j.jbi.2023.104424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/06/2023] [Accepted: 06/11/2023] [Indexed: 06/25/2023]
Abstract
OBJECTIVE Lung cancer exhibits unpredictable recurrence in low-stage tumors and variable responses to different therapeutic interventions. Predicting relapse in early-stage lung cancer can facilitate precision medicine and improve patient survivability. While existing machine learning models rely on clinical data, incorporating genomic information could enhance their efficiency. This study aims to impute and integrate specific types of genomic data with clinical data to improve the accuracy of machine learning models for predicting relapse in early-stage, non-small cell lung cancer patients. METHODS The study utilized a publicly available TCGA lung cancer cohort and imputed genetic pathway scores into the Spanish Lung Cancer Group (SLCG) data, specifically in 1348 early-stage patients. Initially, tumor recurrence was predicted without imputed pathway scores. Subsequently, the SLCG data were augmented with pathway scores imputed from TCGA. The integrative approach aimed to enhance relapse risk prediction performance. RESULTS The integrative approach achieved improved relapse risk prediction with the following evaluation metrics: an area under the precision-recall curve (PR-AUC) score of 0.75, an area under the ROC (ROC-AUC) score of 0.80, an F1 score of 0.61, and a Precision of 0.80. The prediction explanation model SHAP (SHapley Additive exPlanations) was employed to explain the machine learning model's predictions. CONCLUSION We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk while also improving the predictive power by incorporating proxy genomic data not available for specific patients.
Collapse
Affiliation(s)
- Mohan Timilsina
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Ireland.
| | - Samuele Buosi
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | | | | | - Enric Carcereny
- Catalan Institute of Oncology, Hospital Universitari Germans Trias i Pujol, B-ARGO, IGTP, Badalona, Spain.
| | | | - Manuel Cobo
- Medical Oncology Intercenter Unit. Regional and Virgen de la Victoria University Hospitals. IBIMA. Málaga., Spain.
| | | | - Reyes Bernabé
- Hospital Universitario Virgen del Rocio, Sevilla, Spain.
| | | | - Maria Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Vít Nováček
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland; Faculty of Informatics, Masaryk University Brno, Czech Republic; Masaryk Memorial Cancer Institute, Brno, Czech Republic.
| |
Collapse
|
5
|
Huang S, Arpaci I, Al-Emran M, Kılıçarslan S, Al-Sharafi MA. A comparative analysis of classical machine learning and deep learning techniques for predicting lung cancer survivability. MULTIMEDIA TOOLS AND APPLICATIONS 2023. [DOI: 10.1007/s11042-023-16349-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 05/22/2023] [Accepted: 07/17/2023] [Indexed: 09/01/2023]
|
6
|
Shakir H, Aijaz B, Khan TMR, Hussain M. A deep learning-based cancer survival time classifier for small datasets. Comput Biol Med 2023; 160:106896. [PMID: 37150085 DOI: 10.1016/j.compbiomed.2023.106896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/07/2023] [Accepted: 04/09/2023] [Indexed: 05/09/2023]
Abstract
Cancer survival time prediction using Deep Learning (DL) has been an emerging area of research. However, non-availability of large-sized annotated medical imaging databases affects the training performance of DL models leading to their arguable usage in many clinical applications. In this research work, a neural network model is customized for small sample space to avoid data over-fitting for DL training. A set of prognostic radiomic features is selected through an iterative process using average of multiple dropouts which results in back-propagated gradients with low variance, thus increasing the network learning capability, reliable feature selection and better training over a small database. The proposed classifier is further compared with erasing feature selection method proposed in the literature for improved network training and with other well-known classifiers on small sample size. Achieved results which were statistically validated show efficient and improved classification of cancer survival time into three intervals of 6 months, between 6 months up to 2 years, and above 2 years; and has the potential to aid health care professionals in lung tumor evaluation for timely treatment and patient care.
Collapse
Affiliation(s)
- Hina Shakir
- Department of Software Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| | - Bushra Aijaz
- Department of Electrical Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| | - Tariq Mairaj Rasool Khan
- Department of Electrical and Power Engineering, Pakistan Navy Engineering College, National University of Science and Technology, Karachi, Pakistan.
| | - Muhammad Hussain
- Department of Electrical Engineering, Bahria University, 13-National Stadium Road Karachi, 75620, Pakistan.
| |
Collapse
|
7
|
Luo Y, Zhou LQ, Yang F, Chen JC, Chen JJ, Wang YJ. Construction and analysis of a conjunctive diagnostic model of HNSCC with random forest and artificial neural network. Sci Rep 2023; 13:6736. [PMID: 37185487 PMCID: PMC10130066 DOI: 10.1038/s41598-023-32620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is a heterogeneous tumor that is highly aggressive and ranks fifth among the most common cancers worldwide. Although, the researches that attempted to construct a diagnostic model were deficient in HNSCC. Currently, the gold standard for diagnosing head and neck tumors is pathology, but this requires a traumatic biopsy. There is still a lack of a noninvasive test for such a high-incidence tumor. In order to screen genetic markers and construct diagnostic model, the methods of random forest (RF) and artificial neural network (ANN) were utilized. The data of HNSCC gene expression was accessed from Gene Expression Omnibus (GEO) database; we selected three datasets totally, and we combined 2 datasets (GSE6631 and GSE55547) for screening differentially expressed genes (DEGs) and chose another dataset (GSE13399) for validation. Firstly, the 6 DEGs (CRISP3, SPINK5, KRT4, MMP1, MAL, SPP1) were screened by RF. Subsequently, ANN was applied to calculate the weights of 6 genes. Besides, we created a diagnostic model and nominated it as neuralHNSCC, and the performance of neuralHNSCC by area under curve (AUC) was verified using another dataset. Our model achieved an AUC of 0.998 in the training cohort, and 0.734 in the validation cohort. Furthermore, we used the Cell-type Identification using Estimating Relative Subsets of RNA Transcripts (CIBERSORT) algorithm to investigate the difference in immune cell infiltration between HNSCC and normal tissues initially. The selected 6 DEGs and the constructed novel diagnostic model of HNSCC would make contributions to the diagnosis.
Collapse
Affiliation(s)
- Yao Luo
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Liu-Qing Zhou
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Fan Yang
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Jing-Cai Chen
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Jian-Jun Chen
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| | - Yan-Jun Wang
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| |
Collapse
|
8
|
Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules 2023; 13:biom13030432. [PMID: 36979367 PMCID: PMC10046262 DOI: 10.3390/biom13030432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/12/2023] [Accepted: 02/14/2023] [Indexed: 03/03/2023] Open
Abstract
Insulin resistance (IR) is considered the precursor and the key pathophysiological mechanism of type 2 diabetes (T2D) and metabolic syndrome (MetS). However, the pathways that IR shares with T2D are not clearly understood. Meta-analysis of multiple DNA microarray datasets could provide a robust set of metagenes identified across multiple studies. These metagenes would likely include a subset of genes (key metagenes) shared by both IR and T2D, and possibly responsible for the transition between them. In this study, we attempted to find these key metagenes using a feature selection method, LASSO, and then used the expression profiles of these genes to train five machine learning models: LASSO, SVM, XGBoost, Random Forest, and ANN. Among them, ANN performed well, with an area under the curve (AUC) > 95%. It also demonstrated fairly good performance in differentiating diabetics from normal glucose tolerant (NGT) persons in the test dataset, with 73% accuracy across 64 human adipose tissue samples. Furthermore, these core metagenes were also enriched in diabetes-associated terms and were found in previous genome-wide association studies of T2D and its associated glycemic traits HOMA-IR and HOMA-B. Therefore, this metagenome deserves further investigation with regard to the cardinal molecular pathological defects/pathways underlying both IR and T2D.
Collapse
|
9
|
Kumar K, Chaudhury K, Tripathi SL. Future of Machine Learning ( ML) and Deep Learning ( DL) in Healthcare Monitoring System. MACHINE LEARNING ALGORITHMS FOR SIGNAL AND IMAGE PROCESSING 2022:293-313. [DOI: 10.1002/9781119861850.ch17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
10
|
Kim T, Lee SJ, Jang T. Application of several machine learning algorithms for the prediction of afatinib treatment outcome in advanced-stage EGFR-mutated non-small-cell lung cancer. Thorac Cancer 2022; 13:3353-3361. [PMID: 36278315 PMCID: PMC9715822 DOI: 10.1111/1759-7714.14694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 09/28/2022] [Accepted: 09/30/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND The present study aimed to evaluate the performance of several machine learning (ML) algorithms in predicting 1-year afatinib continuation and 2-year survival after afatinib initiation and to identify the differences in survival outcomes between ML-classified strata. METHODS Data that were also used in the RESET study were retrospectively collected from 16 hospitals in South Korea. A stratified random sampling method was applied to split the data into training and test sets (70:30 split ratio). Clinical information, such as age, sex, tumor stage, smoking, performance status, metastasis, type of metastasis, dose adjustment, and pathologic information on EGFR mutations were inputted. Training was performed using eight ML algorithms: logistic regression, decision tree, deep neural network, random forest, support vector machine, boosting, bagging, and the naïve Bayes classifier. The model performance was assessed based on sensitivity, specificity, and accuracy. Area under the receiver operator characteristic curve (AUC) was calculated and compared between the ML models using DeLong's test. A Kaplan-Meier (KM) curve was used to visualize the identified strata obtained from the ML models. RESULTS No significant differences in the input variables were observed between the training and test datasets. The best-performing models were support vector machine in predicting 1-year afatinib continuation (AUC 0.626) and decision tree in 2-year survival after afatinib start (AUC 0.644), although the performances of the ML models were comparable and did not display any predictive roles. KM analysis and log-rank test revealed significant differences between the strata identified from the ML model (p < 0.001) in terms of both time-on-treatment (TOT) and overall survival (OS). CONCLUSION The performances of ML models in our study found no discernible roles in predicting afatinib-related outcomes, although the identified strata revealed different TOT and OS in the KM analysis. This implies the strength of ML in predicting the survival outcome, as well as the limitation of electronic medical record-based variables in ML algorithms. Careful consideration of variable inclusion is likely to improve the general model performance.
Collapse
Affiliation(s)
- Taeyun Kim
- Division of Pulmonology, Department of Internal MedicineThe Armed Forces Goyang HospitalGoyangRepublic of Korea
| | - Sang Jin Lee
- Department of StatisticsPusan National UniversityBusanRepublic of Korea
| | - Tae‐Won Jang
- Division of Pulmonology, Department of Internal MedicineKosin University College of Medicine, Kosin University Gospel HospitalBusanRepublic of Korea
| |
Collapse
|
11
|
Ghosh Roy G, Geard N, Verspoor K, He S. MPVNN: Mutated Pathway Visible Neural Network architecture for interpretable prediction of cancer-specific survival risk. Bioinformatics 2022; 38:5026-5032. [PMID: 36124954 DOI: 10.1093/bioinformatics/btac636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 08/04/2022] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with a lack of interpretability. More interpretable visible neural network architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. RESULTS We propose a novel Mutated Pathway Visible Neural Network (MPVNN) architecture, designed using prior signaling pathway knowledge and random replacement of known pathway edges using gene mutation data simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction of MPVNN over other similar-sized NN and standard survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that is important in risk prediction for particular cancer types, is reliable. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/gourabghoshroy/MPVNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gourab Ghosh Roy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne 3052, Australia.,School of Computing Technologies, RMIT University, Melbourne 3000, Australia
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
12
|
Sandeep Ganesh G, Kolusu AS, Prasad K, Samudrala PK, Nemmani KV. Advancing health care via artificial intelligence: From concept to clinic. Eur J Pharmacol 2022; 934:175320. [DOI: 10.1016/j.ejphar.2022.175320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/30/2022] [Accepted: 10/04/2022] [Indexed: 11/26/2022]
|
13
|
Wang S, Liu W, Ye Z, Xia X, Guo M. Development of a joint diagnostic model of thyroid papillary carcinoma with artificial neural network and random forest. Front Genet 2022; 13:957718. [PMCID: PMC9585230 DOI: 10.3389/fgene.2022.957718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
Objective: Papillary thyroid carcinoma (PTC) accounts for 80% of thyroid malignancy, and the occurrence of PTC is increasing rapidly. The present study was conducted with the purpose of identifying novel and important gene panels and developing an early diagnostic model for PTC by combining artificial neural network (ANN) and random forest (RF).Methods and results: Samples were searched from the Gene Expression Omnibus (GEO) database, and gene expression datasets (GSE27155, GSE60542, and GSE33630) were collected and processed. GSE27155 and GSE60542 were merged into the training set, and GSE33630 was defined as the validation set. Differentially expressed genes (DEGs) in the training set were obtained by “limma” of R software. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis as well as immune cell infiltration analysis were conducted based on DEGs. Important genes were identified from the DEGs by random forest. Finally, an artificial neural network was used to develop a diagnostic model. Also, the diagnostic model was validated by the validation set, and the area under the receiver operating characteristic curve (AUC) value was satisfactory.Conclusion: A diagnostic model was established by a joint of random forest and artificial neural network based on a novel gene panel. The AUC showed that the diagnostic model had significantly excellent performance.
Collapse
|
14
|
Li Y, Wu X, Yang P, Jiang G, Luo Y. Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:850-866. [PMID: 36462630 PMCID: PMC10025752 DOI: 10.1016/j.gpb.2022.11.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 10/03/2022] [Accepted: 11/17/2022] [Indexed: 12/03/2022]
Abstract
The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively characterized lung cancer through the use of different perspectives from these accrued data. In this review, we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy, including early detection, auxiliary diagnosis, prognosis prediction, and immunotherapy practice. Moreover, we highlight the challenges and opportunities for future applications of machine learning in lung cancer.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Xin Wu
- Department of Medicine, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905 / Scottsdale, AZ 85259, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
15
|
Gokhale M, Mohanty SK, Ojha A. A stacked autoencoder based gene selection and cancer classification framework. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
16
|
Zhang M, Ke B, Zhuo H, Guo B. Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets. BMC Pediatr 2022; 22:512. [PMID: 36042431 PMCID: PMC9425821 DOI: 10.1186/s12887-022-03557-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 08/17/2022] [Indexed: 12/03/2022] Open
Abstract
Background Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. Methods Gene expression datasets were downloaded from Gene Expression Omnibus (GEO), and gene sets with a potential pathogenic mechanism in KD were identified using differential expressed gene (DEG) screening, pathway enrichment analysis, random forest (RF) screening, and artificial neural network (ANN) construction. Results We extracted 2,017 DEGs (1,130 with upregulated and 887 with downregulated expression) from GEO. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were significantly enriched in innate/adaptive immune response-related processes. Subsequently, the results of weighted gene co-expression network analysis and DEG screening were combined and, using RF and ANN, a model with eight genes (VPS9D1, CACNA1E, SH3GLB1, RAB32, ADM, GYG1, PGS1, and HIST2H2AC) was constructed. Classification results of the new model for KD diagnosis showed excellent performance for different datasets, including those of patients with KD, convalescents, and healthy individuals, with area under the curve values of 1, 0.945, and 0.95, respectively. Conclusions We used machine learning methods to construct and validate a diagnostic model using multiple bioinformatic datasets, and identified molecules expected to serve as new biomarkers for or therapeutic targets in KD. Supplementary Information The online version contains supplementary material available at 10.1186/s12887-022-03557-y.
Collapse
Affiliation(s)
- Mengyi Zhang
- Department of Laboratory Medicine, West China Second University Hospital, Sichuan University, No. 20, Section 3, Renmin South Road, Chengdu, 610041, PR, Sichuan Province, China.,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Bocuo Ke
- Department of Laboratory Medicine, West China Second University Hospital, Sichuan University, No. 20, Section 3, Renmin South Road, Chengdu, 610041, PR, Sichuan Province, China.,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Huichuan Zhuo
- Department of Laboratory Medicine, West China Second University Hospital, Sichuan University, No. 20, Section 3, Renmin South Road, Chengdu, 610041, PR, Sichuan Province, China.,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China
| | - Binhan Guo
- Department of Laboratory Medicine, West China Second University Hospital, Sichuan University, No. 20, Section 3, Renmin South Road, Chengdu, 610041, PR, Sichuan Province, China. .,Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, Chengdu, China.
| |
Collapse
|
17
|
Development and Verification of a Combined Diagnostic Model for Sarcopenia with Random Forest and Artificial Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:2957731. [PMID: 36050999 PMCID: PMC9427323 DOI: 10.1155/2022/2957731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/16/2022] [Accepted: 07/26/2022] [Indexed: 11/18/2022]
Abstract
Background Sarcopenia is a chronic disease characterized by an age-related decline in skeletal muscle mass and function, and diagnosis is challenging owing to the lack of a clear “gold standard” assessment method. Objective This study is aimed at combining random forest (RF) and artificial neural network (ANN) methods to screen key potential biomarkers and establish an early sarcopenia diagnostic model. Methods Three gene expression datasets were downloaded and merged by searching the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) in the merged dataset were identified by R software and subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Afterward, the STRING database was employed for interaction analysis of the differentially encoded proteins. Then, RF was used to identify key genes from the DEGs, and a sarcopenia diagnostic model was constructed by ANN. Finally, the diagnostic model was assessed using a validation dataset, while its diagnostic performance was evaluated by the area under curve (AUC) value. Results 107 sarcopenia-related DEGs were identified, and they were mainly enriched in the FoxO and AMPK signaling pathways involved in the molecular pathogenesis of sarcopenia. Thereafter, seven key genes (MT1X, FAM171A1, ZNF415, ARHGAP36, CISD1, ETNPPL, and WISP2) were identified by the RF classifier. The proteins encoded by three of these genes (CISD1, ETNPPL, and WISP2) may be potential biomarkers for sarcopenia. Finally, a diagnostic model for sarcopenia was successfully designed by ANN, achieving an AUC of 0.999 and 0.85 in the training and testing datasets, respectively. Conclusion We identified several potential genetic biomarkers and successfully developed an early predictive model with high diagnostic performance for sarcopenia. Moreover, our results provide a valuable reference for the early diagnosis and screening of sarcopenia in the future.
Collapse
|
18
|
Theophanous S, Lønne PI, Choudhury A, Berbee M, Dekker A, Dennis K, Dewdney A, Gambacorta MA, Gilbert A, Guren MG, Holloway L, Jadon R, Kochhar R, Mohamed AA, Muirhead R, Parés O, Raszewski L, Roy R, Scarsbrook A, Sebag-Montefiore D, Spezi E, Spindler KLG, van Triest B, Vassiliou V, Malinen E, Wee L, Appelt AL. Development and validation of prognostic models for anal cancer outcomes using distributed learning: protocol for the international multi-centre atomCAT2 study. Diagn Progn Res 2022; 6:14. [PMID: 35922837 PMCID: PMC9351222 DOI: 10.1186/s41512-022-00128-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/09/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Anal cancer is a rare cancer with rising incidence. Despite the relatively good outcomes conferred by state-of-the-art chemoradiotherapy, further improving disease control and reducing toxicity has proven challenging. Developing and validating prognostic models using routinely collected data may provide new insights for treatment development and selection. However, due to the rarity of the cancer, it can be difficult to obtain sufficient data, especially from single centres, to develop and validate robust models. Moreover, multi-centre model development is hampered by ethical barriers and data protection regulations that often limit accessibility to patient data. Distributed (or federated) learning allows models to be developed using data from multiple centres without any individual-level patient data leaving the originating centre, therefore preserving patient data privacy. This work builds on the proof-of-concept three-centre atomCAT1 study and describes the protocol for the multi-centre atomCAT2 study, which aims to develop and validate robust prognostic models for three clinically important outcomes in anal cancer following chemoradiotherapy. METHODS This is a retrospective multi-centre cohort study, investigating overall survival, locoregional control and freedom from distant metastasis after primary chemoradiotherapy for anal squamous cell carcinoma. Patient data will be extracted and organised at each participating radiotherapy centre (n = 18). Candidate prognostic factors have been identified through literature review and expert opinion. Summary statistics will be calculated and exchanged between centres prior to modelling. The primary analysis will involve developing and validating Cox proportional hazards models across centres for each outcome through distributed learning. Outcomes at specific timepoints of interest and factor effect estimates will be reported, allowing for outcome prediction for future patients. DISCUSSION The atomCAT2 study will analyse one of the largest available cross-institutional cohorts of patients with anal cancer treated with chemoradiotherapy. The analysis aims to provide information on current international clinical practice outcomes and may aid the personalisation and design of future anal cancer clinical trials through contributing to a better understanding of patient risk stratification.
Collapse
Affiliation(s)
- Stelios Theophanous
- Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
| | - Per-Ivar Lønne
- Department of Medical Physics, Oslo University Hospital, Oslo, Norway
| | - Ananya Choudhury
- MAASTRO (Dept of Radiotherapy), GROW School of Oncology and Developmental Biology, Maastricht University and Maastricht University Medical Centre+, P. Debyelaan 25, 6229, Maastricht, Netherlands
| | - Maaike Berbee
- MAASTRO (Dept of Radiotherapy), GROW School of Oncology and Developmental Biology, Maastricht University and Maastricht University Medical Centre+, P. Debyelaan 25, 6229, Maastricht, Netherlands
| | - Andre Dekker
- MAASTRO (Dept of Radiotherapy), GROW School of Oncology and Developmental Biology, Maastricht University and Maastricht University Medical Centre+, P. Debyelaan 25, 6229, Maastricht, Netherlands
| | | | | | | | - Alexandra Gilbert
- Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
| | - Marianne Grønlie Guren
- Department of Oncology, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Lois Holloway
- Ingham Research Institute and Liverpool Hospital, Liverpool, New South Wales, Australia
| | | | | | | | | | | | | | - Rajarshi Roy
- Hull University Teaching Hospitals NHS Trust, Hull, UK
| | - Andrew Scarsbrook
- Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
- Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | | | | | | | - Baukelien van Triest
- The Netherlands Cancer Institute-Antoni van Leeuwenhoek (NKI-AVL), Amsterdam, The Netherlands
| | | | - Eirik Malinen
- Department of Medical Physics, Oslo University Hospital, Oslo, Norway
| | - Leonard Wee
- MAASTRO (Dept of Radiotherapy), GROW School of Oncology and Developmental Biology, Maastricht University and Maastricht University Medical Centre+, P. Debyelaan 25, 6229, Maastricht, Netherlands
| | - Ane L Appelt
- Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
- Leeds Teaching Hospitals NHS Trust, Leeds, UK
| |
Collapse
|
19
|
Predicting the Prognostic Value of POLI Expression in Different Cancers via a Machine Learning Approach. Int J Mol Sci 2022; 23:ijms23158571. [PMID: 35955705 PMCID: PMC9369001 DOI: 10.3390/ijms23158571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 07/22/2022] [Accepted: 07/25/2022] [Indexed: 11/17/2022] Open
Abstract
Translesion synthesis (TLS) is a cell signaling pathway that facilitates the tolerance of replication stress. Increased TLS activity, the particularly elevated expression of TLS polymerases, has been linked to resistance to cancer chemotherapeutics and significantly altered patient outcomes. Building upon current knowledge, we found that the expression of one of these TLS polymerases (POLI) is associated with significant differences in cervical and pancreatic cancer survival. These data led us to hypothesize that POLI expression is associated with cancer survival more broadly. However, when cancers were grouped cancer type, POLI expression did not have a significant prognostic value. We presented a binary cancer random forest classifier using 396 genes that influence the prognostic characteristics of POLI in cervical and pancreatic cancer selected via graphical least absolute shrinkage and selection operator. The classifier was then used to cluster patients with bladder, breast, colorectal, head and neck, liver, lung, ovary, melanoma, stomach, and uterus cancer when high POLI expression was associated with worsened survival (Group I) or with improved survival (Group II). This approach allowed us to identify cancers where POLI expression is a significant prognostic factor for survival (p = 0.028 in Group I and p = 0.0059 in Group II). Multiple independent validation approaches, including the gene ontology enrichment analysis and visualization tool and network visualization support the classification scheme. The functions of the selected genes involving mitochondrial translational elongation, Wnt signaling pathway, and tumor necrosis factor-mediated signaling pathway support their association with TLS and replication stress. Our multidisciplinary approach provides a novel way of identifying tumors where increased TLS polymerase expression is associated with significant differences in cancer survival.
Collapse
|
20
|
Pei Q, Luo Y, Chen Y, Li J, Xie D, Ye T. Artificial intelligence in clinical applications for lung cancer: diagnosis, treatment and prognosis. Clin Chem Lab Med 2022; 60:1974-1983. [PMID: 35771735 DOI: 10.1515/cclm-2022-0291] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 06/17/2022] [Indexed: 12/12/2022]
Abstract
Artificial Intelligence (AI) is a branch of computer science that includes research in robotics, language recognition, image recognition, natural language processing, and expert systems. AI is poised to change medical practice, and oncology is not an exception to this trend. As the matter of fact, lung cancer has the highest morbidity and mortality worldwide. The leading cause is the complexity of associating early pulmonary nodules with neoplastic changes and numerous factors leading to strenuous treatment choice and poor prognosis. AI can effectively enhance the diagnostic efficiency of lung cancer while providing optimal treatment and evaluating prognosis, thereby reducing mortality. This review seeks to provide an overview of AI relevant to all the fields of lung cancer. We define the core concepts of AI and cover the basics of the functioning of natural language processing, image recognition, human-computer interaction and machine learning. We also discuss the most recent breakthroughs in AI technologies and their clinical application regarding diagnosis, treatment, and prognosis in lung cancer. Finally, we highlight the future challenges of AI in lung cancer and its impact on medical practice.
Collapse
Affiliation(s)
- Qin Pei
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| | - Yanan Luo
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| | - Yiyu Chen
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| | - Jingyuan Li
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| | - Dan Xie
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| | - Ting Ye
- Department of Laboratory Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, P.R. China
| |
Collapse
|
21
|
Yu C, Wang J. Data mining and mathematical models in cancer prognosis and prediction. MEDICAL REVIEW (BERLIN, GERMANY) 2022; 2:285-307. [PMID: 37724193 PMCID: PMC10388766 DOI: 10.1515/mr-2021-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/29/2021] [Indexed: 09/20/2023]
Abstract
Cancer is a fetal and complex disease. Individual differences of the same cancer type or the same patient at different stages of cancer development may require distinct treatments. Pathological differences are reflected in tissues, cells and gene levels etc. The interactions between the cancer cells and nearby microenvironments can also influence the cancer progression and metastasis. It is a huge challenge to understand all of these mechanistically and quantitatively. Researchers applied pattern recognition algorithms such as machine learning or data mining to predict cancer types or classifications. With the rapidly growing and available computing powers, researchers begin to integrate huge data sets, multi-dimensional data types and information. The cells are controlled by the gene expressions determined by the promoter sequences and transcription regulators. For example, the changes in the gene expression through these underlying mechanisms can modify cell progressing in the cell-cycle. Such molecular activities can be governed by the gene regulations through the underlying gene regulatory networks, which are essential for cancer study when the information and gene regulations are clear and available. In this review, we briefly introduce several machine learning methods of cancer prediction and classification which include Artificial Neural Networks (ANNs), Decision Trees (DTs), Support Vector Machine (SVM) and naive Bayes. Then we describe a few typical models for building up gene regulatory networks such as Correlation, Regression and Bayes methods based on available data. These methods can help on cancer diagnosis such as susceptibility, recurrence, survival etc. At last, we summarize and compare the modeling methods to analyze the development and progression of cancer through gene regulatory networks. These models can provide possible physical strategies to analyze cancer progression in a systematic and quantitative way.
Collapse
Affiliation(s)
- Chong Yu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
- Department of Statistics, JiLin University of Finance and Economics, Changchun, Jilin Province, China
| | - Jin Wang
- Department of Chemistry and of Physics and Astronomy, State University of New York, Stony Brook, NY, USA
| |
Collapse
|
22
|
Yang Y, Xu L, Sun L, Zhang P, Farid SS. Machine learning application in personalised lung cancer recurrence and survivability prediction. Comput Struct Biotechnol J 2022; 20:1811-1820. [PMID: 35521553 PMCID: PMC9043969 DOI: 10.1016/j.csbj.2022.03.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 03/30/2022] [Accepted: 03/30/2022] [Indexed: 12/24/2022] Open
Abstract
Machine learning is an important artificial intelligence technique that is widely applied in cancer diagnosis and detection. More recently, with the rise of personalised and precision medicine, there is a growing trend towards machine learning applications for prognosis prediction. However, to date, building reliable prediction models of cancer outcomes in everyday clinical practice is still a hurdle. In this work, we integrate genomic, clinical and demographic data of lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) patients from The Cancer Genome Atlas (TCGA) and introduce copy number variation (CNV) and mutation information of 15 selected genes to generate predictive models for recurrence and survivability. We compare the accuracy and benefits of three well-established machine learning algorithms: decision tree methods, neural networks and support vector machines. Although the accuracy of predictive models using the decision tree method has no significant advantage, the tree models reveal the most important predictors among genomic information (e.g. KRAS, EGFR, TP53), clinical status (e.g. TNM stage and radiotherapy) and demographics (e.g. age and gender) and how they influence the prediction of recurrence and survivability for both early stage LUAD and LUSC. The machine learning models have the potential to help clinicians to make personalised decisions on aspects such as follow-up timeline and to assist with personalised planning of future social care needs.
Collapse
|
23
|
Prognosis Model of Advanced Non-Small-Cell Lung Cancer Based on Max-Min Hill-Climbing Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:9173913. [PMID: 35371284 PMCID: PMC8975666 DOI: 10.1155/2022/9173913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 12/12/2021] [Accepted: 03/07/2022] [Indexed: 11/17/2022]
Abstract
A safer and more effective treatment is need for the comprehensive treatment based on chemotherapy in patients with advanced non-small-cell lung cancer (NSCLC). The max-min hill-climbing (MMHC) is a common algorithm for disease prediction. This study is aimed at analyzing the efficacy of the MMHC algorithm in prognosis evaluation of advanced NSCLC. In this study, the prognosis model of lung cancer was first established by the MMHC algorithm. Then, according to the MMHC algorithm results, 40 patients with advanced NSCLC were divided into the research group and control group before anlotinib hydrochloride capsule combined with pemetrexed disodium chemotherapy. The diameter of solid tumor lesions, objective response rate (ORR), disease control rate (DCR), and progression-free survival (PFS) was compared between the two groups. The results showed that the MMHC model has a higher prediction accuracy of survival status of lung cancer patients. Under the guidance of the model, the research group has a smaller diameter of primary foci and metastatic foci, a higher ORR, DCR, and a longer PFS than the control group (P < 0.05). We can conclude that the MMHC algorithm can guide the maintenance treatment of advanced NSCLC, which is conducive to the prognosis judgment and treatment cost control.
Collapse
|
24
|
Thomas LB, Mastorides SM, Viswanadhan NA, Jakey CE, Borkowski AA. Artificial Intelligence: Review of Current and Future Applications in Medicine. Fed Pract 2022; 38:527-538. [PMID: 35136337 DOI: 10.12788/fp.0174] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Background The role of artificial intelligence (AI) in health care is expanding rapidly. Currently, there are at least 29 US Food and Drug Administration-approved AI health care devices that apply to numerous medical specialties and many more are in development. Observations With increasing expectations for all health care sectors to deliver timely, fiscally-responsible, high-quality health care, AI has potential utility in numerous areas, such as image analysis, improved workflow and efficiency, public health, and epidemiology, to aid in processing large volumes of patient and medical data. In this review, we describe basic terminology, principles, and general AI applications relating to health care. We then discuss current and future applications for a variety of medical specialties. Finally, we discuss the future potential of AI along with the potential risks and limitations of current AI technology. Conclusions AI can improve diagnostic accuracy, increase patient safety, assist with patient triage, monitor disease progression, and assist with treatment decisions.
Collapse
Affiliation(s)
- L Brannon Thomas
- James A. Haley Veterans' Hospital, Tampa, Florida.,University of South Florida, Morsani College of Medicine, Tampa
| | - Stephen M Mastorides
- James A. Haley Veterans' Hospital, Tampa, Florida.,University of South Florida, Morsani College of Medicine, Tampa
| | | | - Colleen E Jakey
- James A. Haley Veterans' Hospital, Tampa, Florida.,University of South Florida, Morsani College of Medicine, Tampa
| | - Andrew A Borkowski
- James A. Haley Veterans' Hospital, Tampa, Florida.,University of South Florida, Morsani College of Medicine, Tampa
| |
Collapse
|
25
|
Kaur I, Doja M, Ahmad T. Data Mining and Machine Learning in Cancer Survival Research: An Overview and Future Recommendations. J Biomed Inform 2022; 128:104026. [DOI: 10.1016/j.jbi.2022.104026] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 02/07/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022]
|
26
|
Ma C, Wu M, Ma S. Analysis of cancer omics data: a selective review of statistical techniques. Brief Bioinform 2022; 23:6510158. [PMID: 35039832 DOI: 10.1093/bib/bbab585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.
Collapse
Affiliation(s)
- Chenjin Ma
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
27
|
Hayashi H, Uemura N, Matsumura K, Zhao L, Sato H, Shiraishi Y, Yamashita YI, Baba H. Recent advances in artificial intelligence for pancreatic ductal adenocarcinoma. World J Gastroenterol 2021; 27:7480-7496. [PMID: 34887644 PMCID: PMC8613738 DOI: 10.3748/wjg.v27.i43.7480] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 08/02/2021] [Accepted: 11/15/2021] [Indexed: 02/06/2023] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) remains the most lethal type of cancer. The 5-year survival rate for patients with early-stage diagnosis can be as high as 20%, suggesting that early diagnosis plays a pivotal role in the prognostic improvement of PDAC cases. In the medical field, the broad availability of biomedical data has led to the advent of the "big data" era. To overcome this deadly disease, how to fully exploit big data is a new challenge in the era of precision medicine. Artificial intelligence (AI) is the ability of a machine to learn and display intelligence to solve problems. AI can help to transform big data into clinically actionable insights more efficiently, reduce inevitable errors to improve diagnostic accuracy, and make real-time predictions. AI-based omics analyses will become the next alterative approach to overcome this poor-prognostic disease by discovering biomarkers for early detection, providing molecular/genomic subtyping, offering treatment guidance, and predicting recurrence and survival. Advances in AI may therefore improve PDAC survival outcomes in the near future. The present review mainly focuses on recent advances of AI in PDAC for clinicians. We believe that breakthroughs will soon emerge to fight this deadly disease using AI-navigated precision medicine.
Collapse
Affiliation(s)
- Hiromitsu Hayashi
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Norio Uemura
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Kazuki Matsumura
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Liu Zhao
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Hiroki Sato
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Yuta Shiraishi
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Yo-ichi Yamashita
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| | - Hideo Baba
- Department of Gastroenterological Surgery, Graduate School of Life Sciences, Kumamoto University, Kumamoto 860-8556, Japan
| |
Collapse
|
28
|
Tumor Nonimmune-Microenvironment-Related Gene Expression Signature Predicts Brain Metastasis in Lung Adenocarcinoma Patients after Surgery: A Machine Learning Approach Using Gene Expression Profiling. Cancers (Basel) 2021; 13:cancers13174468. [PMID: 34503278 PMCID: PMC8430997 DOI: 10.3390/cancers13174468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/26/2022] Open
Abstract
Simple Summary It is important to be able to predict brain metastasis in lung adenocarcinoma patients; however, research in this area is still lacking. Much of the previous work on tumor microenvironments in lung adenocarcinoma with brain metastasis concerns the tumor immune microenvironment. The importance of the tumor nonimmune microenvironment (extracellular matrix (ECM), epithelial–mesenchymal transition (EMT) feature, and angiogenesis) has been overlooked with regard to brain metastasis. We evaluated tumor nonimmune-microenvironment-related gene expression signatures that could predict brain metastasis after the surgical resection of lung adenocarcinoma using a machine learning approach. We identified a tumor nonimmune-microenvironment-related 17-gene expression signature, and this signature showed high brain metastasis predictive power in four machine learning classifiers. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. Our tumor nonimmune-microenvironment-related gene expression signatures are important biological markers that can predict brain metastasis and provide patient-specific treatment options. Abstract Using a machine learning approach with a gene expression profile, we discovered a tumor nonimmune-microenvironment-related gene expression signature, including extracellular matrix (ECM) remodeling, epithelial–mesenchymal transition (EMT), and angiogenesis, that could predict brain metastasis (BM) after the surgical resection of 64 lung adenocarcinomas (LUAD). Gene expression profiling identified a tumor nonimmune-microenvironment-related 17-gene expression signature that significantly correlated with BM. Of the 17 genes, 11 were ECM-remodeling-related genes. The 17-gene expression signature showed high BM predictive power in four machine learning classifiers (areas under the receiver operating characteristic curve = 0.845 for naïve Bayes, 0.849 for support vector machine, 0.858 for random forest, and 0.839 for neural network). Subgroup analysis revealed that the BM predictive power of the 17-gene signature was higher in the early-stage LUAD than in the late-stage LUAD. Pathway enrichment analysis showed that the upregulated differentially expressed genes were mainly enriched in the ECM–receptor interaction pathway. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. The tumor nonimmune-microenvironment-related gene expression signatures found in this study are important biological markers that can predict BM and provide patient-specific treatment options.
Collapse
|
29
|
Liu X, Luo Y, He T, Ren M, Xu Y. Predicting essential genes of 37 prokaryotes by combining information-theoretic features. J Microbiol Methods 2021; 188:106297. [PMID: 34343487 DOI: 10.1016/j.mimet.2021.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/30/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| |
Collapse
|
30
|
Feng C, Xiang T, Yi Z, Meng X, Chu X, Huang G, Zhao X, Chen F, Xiong B, Feng J. A Deep-Learning Model With the Attention Mechanism Could Rigorously Predict Survivals in Neuroblastoma. Front Oncol 2021; 11:653863. [PMID: 34336652 PMCID: PMC8317851 DOI: 10.3389/fonc.2021.653863] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/24/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Neuroblastoma is one of the most devastating forms of childhood cancer. Despite large amounts of attempts in precise survival prediction in neuroblastoma, the prediction efficacy remains to be improved. METHODS Here, we applied a deep-learning (DL) model with the attention mechanism to predict survivals in neuroblastoma. We utilized 2 groups of features separated from 172 genes, to train 2 deep neural networks and combined them by the attention mechanism. RESULTS This classifier could accurately predict survivals, with areas under the curve of receiver operating characteristic (ROC) curves and time-dependent ROC reaching 0.968 and 0.974 in the training set respectively. The accuracy of the model was further confirmed in a validation cohort. Importantly, the two feature groups were mapped to two groups of patients, which were prognostic in Kaplan-Meier curves. Biological analyses showed that they exhibited diverse molecular backgrounds which could be linked to the prognosis of the patients. CONCLUSIONS In this study, we applied artificial intelligence methods to improve the accuracy of neuroblastoma survival prediction based on gene expression and provide explanations for better understanding of the molecular mechanisms underlying neuroblastoma.
Collapse
Affiliation(s)
- Chenzhao Feng
- Department of Pediatric Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Tianyu Xiang
- Department of Control Science and Engineering, College of Electronics and Information Engineering, Tongji University, Shanghai, China
- State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Zixuan Yi
- School of Mathematics and Statistics, College of Arts and Sciences, Wuhan University, Wuhan, China
| | - Xinyao Meng
- Department of Pediatric Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xufeng Chu
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Guiyang Huang
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiang Zhao
- Department of Pediatric Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Feng Chen
- Department of Pediatric Surgery, Fujian Medical University Union Hospital, Fuzhou, China
| | - Bo Xiong
- Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jiexiong Feng
- Department of Pediatric Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
31
|
Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int J Mol Sci 2021; 22:4394. [PMID: 33922356 PMCID: PMC8122817 DOI: 10.3390/ijms22094394] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine-specifically, to cancer research-and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors' predictive capacity and achieve individualised therapies in the near future.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Jorge Peña-García
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Adrian Iftene
- Faculty of Computer Science, Universitatea Alexandru Ioan Cuza (UAIC), 700505 Jashi, Romania;
| | - Fiorella Guadagni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Patrizia Ferroni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Noemi Scarpato
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Fabio Massimo Zanzotto
- Dipartimento di Ingegneria dell’Impresa “Mario Lucertini”, University of Rome Tor Vergata, 00133 Rome, Italy;
| | - Andrés Bueno-Crespo
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| |
Collapse
|
32
|
Kong Y, Yu T. forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction. Bioinformatics 2020; 36:3507-3515. [PMID: 32163118 DOI: 10.1093/bioinformatics/btaa164] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 02/07/2020] [Accepted: 03/08/2020] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION A unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This 'n≪p' property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relationships between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. RESULTS To address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data. AVAILABILITY AND IMPLEMENTATION The method is available at https://github.com/yunchuankong/forgeNet. CONTACT tianwei.yu@emory.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yunchuan Kong
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
33
|
Ayyappan V, Chang A, Zhang C, Paidi SK, Bordett R, Liang T, Barman I, Pandey R. Identification and Staging of B-Cell Acute Lymphoblastic Leukemia Using Quantitative Phase Imaging and Machine Learning. ACS Sens 2020; 5:3281-3289. [PMID: 33092347 DOI: 10.1021/acssensors.0c01811] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Identification and classification of leukemia cells in a rapid and label-free fashion is clinically challenging and thus presents a prime arena for implementing new diagnostic tools. Quantitative phase imaging, which maps optical path length delays introduced by the specimen, has been demonstrated to discern cellular phenotypes based on differential morphological attributes. Rapid acquisition capability and the availability of label-free images with high information content have enabled researchers to use machine learning (ML) to reveal latent features. We developed a set of ML classifiers, including convolutional neural networks, to discern healthy B cells from lymphoblasts and classify stages of B cell acute lymphoblastic leukemia. Here, we show that the average dry mass and volume of normal B cells are lower than those of cancerous cells and that these morphologic parameters increase further alongside disease progression. We find that the relaxed training requirements of a ML approach are conducive to the classification of cell type, with minimal space, training time, and memory requirements. Our findings pave the way for a larger study on clinical samples of acute lymphoblastic leukemia, with the overarching goal of its broader use in hematopathology, where the prospect of objective diagnoses with minimal sample preparation remains highly desirable.
Collapse
Affiliation(s)
- Vinay Ayyappan
- sDepartment of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Alex Chang
- sDepartment of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Chi Zhang
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Santosh Kumar Paidi
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Rosalie Bordett
- Connecticut Children’s Innovation Center, University of Connecticut School of Medicine, Farmington, Connecticut 06032, United States
| | - Tiffany Liang
- Connecticut Children’s Innovation Center, University of Connecticut School of Medicine, Farmington, Connecticut 06032, United States
| | - Ishan Barman
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, United States
- The Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, United States
| | - Rishikesh Pandey
- Connecticut Children’s Innovation Center, University of Connecticut School of Medicine, Farmington, Connecticut 06032, United States
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut 06269, United States
| |
Collapse
|
34
|
Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network. BIOMED RESEARCH INTERNATIONAL 2020; 2020:2613091. [PMID: 32884937 PMCID: PMC7455828 DOI: 10.1155/2020/2613091] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/27/2020] [Accepted: 08/03/2020] [Indexed: 12/14/2022]
Abstract
Polycystic ovary syndrome (PCOS) is one of the most common metabolic and reproductive endocrinopathies. However, few studies have tried to develop a diagnostic model based on gene biomarkers. In this study, we applied a computational method by combining two machine learning algorithms, including random forest (RF) and artificial neural network (ANN), to identify gene biomarkers and construct diagnostic model. We collected gene expression data from Gene Expression Omnibus (GEO) database containing 76 PCOS samples and 57 normal samples; five datasets were utilized, including one dataset for screening differentially expressed genes (DEGs), two training datasets, and two validation datasets. Firstly, based on RF, 12 key genes in 264 DEGs were identified to be vital for classification of PCOS and normal samples. Moreover, the weights of these key genes were calculated using ANN with microarray and RNA-seq training dataset, respectively. Furthermore, the diagnostic models for two types of datasets were developed and named neuralPCOS. Finally, two validation datasets were used to test and compare the performance of neuralPCOS with other two set of marker genes by area under curve (AUC). Our model achieved an AUC of 0.7273 in microarray dataset, and 0.6488 in RNA-seq dataset. To conclude, we uncovered gene biomarkers and developed a novel diagnostic model of PCOS, which would be helpful for diagnosis.
Collapse
|
35
|
Sanchez-Ibarra HE, Jiang X, Gallegos-Gonzalez EY, Cavazos-González AC, Chen Y, Morcos F, Barrera-Saldaña HA. KRAS, NRAS, and BRAF mutation prevalence, clinicopathological association, and their application in a predictive model in Mexican patients with metastatic colorectal cancer: A retrospective cohort study. PLoS One 2020; 15:e0235490. [PMID: 32628708 PMCID: PMC7337295 DOI: 10.1371/journal.pone.0235490] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 06/16/2020] [Indexed: 01/10/2023] Open
Abstract
Mutations in KRAS, NRAS, and BRAF (RAS/BRAF) genes are the main predictive biomarkers for the response to anti-EGFR monoclonal antibodies (MAbs) targeted therapy in metastatic colorectal cancer (mCRC). This retrospective study aimed to report the mutational status prevalence of these genes, explore their possible associations with clinicopathological features, and build and validate a predictive model. To achieve these objectives, 500 mCRC Mexican patients were screened for clinically relevant mutations in RAS/BRAF genes. Fifty-two percent of these specimens harbored clinically relevant mutations in at least one screened gene. Among these, 86% had a mutation in KRAS, 7% in NRAS, 6% in BRAF, and 2% in both NRAS and BRAF. Only tumor location in the proximal colon exhibited a significant correlation with KRAS and BRAF mutational status (p-value = 0.0414 and 0.0065, respectively). Further t-SNE analyses were made to 191 specimens to reveal patterns among patients with clinical parameters and KRAS mutational status. Then, directed by the results from classical statistical tests and t-SNE analysis, neural network models utilized entity embeddings to learn patterns and build predictive models using a minimal number of trainable parameters. This study could be the first step in the prediction for RAS/BRAF mutational status from tumoral features and could lead the way to a more detailed and more diverse dataset that could benefit from machine learning methods.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Evolutionary Information Laboratory, Department of Biological Sciences, the University of Texas at Dallas, Richardson, Texas, United States of America
| | | | | | - Yenho Chen
- Evolutionary Information Laboratory, Department of Biological Sciences, the University of Texas at Dallas, Richardson, Texas, United States of America
| | - Faruck Morcos
- Evolutionary Information Laboratory, Department of Biological Sciences, the University of Texas at Dallas, Richardson, Texas, United States of America
| | | |
Collapse
|
36
|
Wilentzik Müller R, Gat-Viks I. Exploring Neural Networks and Related Visualization Techniques in Gene Expression Data. Front Genet 2020; 11:402. [PMID: 32499810 PMCID: PMC7243731 DOI: 10.3389/fgene.2020.00402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/30/2020] [Indexed: 12/04/2022] Open
Abstract
Over the past decade, neural networks have become one of the cutting-edge methods in various research fields, outshining specifically in complex classification problems. In this paper, we propose two main contributions: first, we conduct a methodological study of neural network modeling for classifying biological traits based on structured gene expression data. Then, we suggest an innovative approach for utilizing deep learning visualization techniques in order to reveal the specific genes important for the correct classification of each trait within the trained models. Our data suggests that this approach have great potential for becoming a standard feature importance tool used in complex medical research problems, and that it can further be generalized to various structured data classification problems outside the biological domain.
Collapse
Affiliation(s)
- Roni Wilentzik Müller
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Tel Aviv, Israel
| | - Irit Gat-Viks
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
37
|
Menaga D, Revathi S. AN EMPIRICAL STUDY OF CANCER CLASSIFICATION TECHNIQUES BASED ON THE NEURAL NETWORKS. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2020. [DOI: 10.4015/s1016237220500131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Cancer is one of the most common dreadful diseases prevailing worldwide, and patients with cancer are rescued only when the cancer is detected at a very early stage. Early detection of cancer is appropriate as in the fourth stage, but the chance of survival is limited. The symptoms of cancers are rigorous, and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying the tumor, i.e. malignant or benign tumor. Over the past few years, cancer classification is increased rapidly, but there is no general technique to find novel cancer classes (class discovery) or to assign tumors to known classes. Accordingly, this survey analyzes distinct cancer classification techniques. Thus, this review article provides a detailed review of 50 research papers presenting the suggested cancer classification techniques, like Deep learning-based techniques, Neural network-based techniques, and Hybrid techniques. Moreover, an elaborative analysis and discussion are made based on the year of publication, utilized datasets, accuracy range, evaluation metrics, implementation tool, and adopted classification methods. Eventually, the research gaps and issues of various cancer classification schemes are presented for extending the researchers towards a better future scope.
Collapse
Affiliation(s)
- D. Menaga
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| | - S. Revathi
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| |
Collapse
|
38
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
39
|
Yang J, Li Y, Liu Q, Li L, Feng A, Wang T, Zheng S, Xu A, Lyu J. Brief introduction of medical database and data mining technology in big data era. J Evid Based Med 2020; 13:57-69. [PMID: 32086994 PMCID: PMC7065247 DOI: 10.1111/jebm.12373] [Citation(s) in RCA: 257] [Impact Index Per Article: 64.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 01/23/2020] [Indexed: 01/14/2023]
Abstract
Data mining technology can search for potentially valuable knowledge from a large amount of data, mainly divided into data preparation and data mining, and expression and analysis of results. It is a mature information processing technology and applies database technology. Database technology is a software science that researches manages, and applies databases. The data in the database are processed and analyzed by studying the underlying theory and implementation methods of the structure, storage, design, management, and application of the database. We have introduced several databases and data mining techniques to help a wide range of clinical researchers better understand and apply database technology.
Collapse
Affiliation(s)
- Jin Yang
- Department of Clinical ResearchThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
- School of Public HealthXi'an Jiaotong University Health Science CenterXi'anShaanxiChina
| | - Yuanjie Li
- Department of Human AnatomyHistology and Embryology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science CenterXi'anShaanxiChina
| | - Qingqing Liu
- Department of Clinical ResearchThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
- School of Public HealthXi'an Jiaotong University Health Science CenterXi'anShaanxiChina
| | - Li Li
- Department of Clinical ResearchThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
| | - Aozi Feng
- Department of Clinical ResearchThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
| | - Tianyi Wang
- School of Public HealthShaanxi University of Chinese MedicineXianyangShaanxiChina
- Xianyang Central HospitalXianyangShaanxiChina
| | - Shuai Zheng
- School of Public HealthShaanxi University of Chinese MedicineXianyangShaanxiChina
| | - Anding Xu
- Department of NeurologyThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
| | - Jun Lyu
- Department of Clinical ResearchThe First Affiliated Hospital of Jinan UniversityGuangzhouGuangdongChina
- School of Public HealthXi'an Jiaotong University Health Science CenterXi'anShaanxiChina
| |
Collapse
|
40
|
Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-019-00346-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
41
|
Xu L, Guo Z, Liu X. Prediction of essential genes in prokaryote based on artificial neural network. Genes Genomics 2019; 42:97-106. [DOI: 10.1007/s13258-019-00884-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Accepted: 10/30/2019] [Indexed: 12/12/2022]
|
42
|
Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, Chen E, Elfiky AA. Applied Informatics Decision Support Tool for Mortality Predictions in Patients With Cancer. JCO Clin Cancer Inform 2019; 2:1-11. [PMID: 30652575 DOI: 10.1200/cci.18.00003] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE With rapidly evolving treatment options in cancer, the complexity in the clinical decision-making process for oncologists represents a growing challenge magnified by oncologists' disposition of intuition-based assessment of treatment risks and overall mortality. Given the unmet need for accurate prognostication with meaningful clinical rationale, we developed a highly interpretable prediction tool to identify patients with high mortality risk before the start of treatment regimens. METHODS We obtained electronic health record data between 2004 and 2014 from a large national cancer center and extracted 401 predictors, including demographics, diagnosis, gene mutations, treatment history, comorbidities, resource utilization, vital signs, and laboratory test results. We built an actionable tool using novel developments in modern machine learning to predict 60-, 90- and 180-day mortality from the start of an anticancer regimen. The model was validated in unseen data against benchmark models. RESULTS We identified 23,983 patients who initiated 46,646 anticancer treatment lines, with a median survival of 514 days. Our proposed prediction models achieved significantly higher estimation quality in unseen data (area under the curve, 0.83 to 0.86) compared with benchmark models. We identified key predictors of mortality, such as change in weight and albumin levels. The results are presented in an interactive and interpretable tool ( www.oncomortality.com ). CONCLUSION Our fully transparent prediction model was able to distinguish with high precision between highest- and lowest-risk patients. Given the rich data available in electronic health records and advances in machine learning methods, this tool can have significant implications for value-based shared decision making at the point of care and personalized goals-of-care management to catalyze practice reforms.
Collapse
Affiliation(s)
- Dimitris Bertsimas
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Jack Dunn
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Colin Pawlowski
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - John Silberholz
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Alexander Weinstein
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Ying Daisy Zhuo
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Eddy Chen
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| | - Aymen A Elfiky
- Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, John Silberholz, Alexander Weinstein, and Ying Daisy Zhuo, Massachusetts Institute of Technology, Cambridge; Eddy Chen, Massachusetts General Hospital Cancer Center; Harvard Medical School; Aymen A. Elfiky, Dana-Farber Cancer Institute; Brigham and Women's Hospital; Harvard Medical School, Boston, MA
| |
Collapse
|
43
|
Alanni R, Hou J, Azzawi H, Xiang Y. Cancer adjuvant chemotherapy prediction model for non‐small cell lung cancer. IET Syst Biol 2019; 13:129-135. [DOI: 10.1049/iet-syb.2018.5060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Affiliation(s)
- Russul Alanni
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Jingyu Hou
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Hasseeb Azzawi
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Yong Xiang
- School of Information Technology, Deakin UniversityBurwoodAustralia
| |
Collapse
|
44
|
Chung D, Zhang K, Yang J. Method for Identifying Cancer-Related Genes Using Gene Similarity-Based Collaborative Filtering. J Comput Biol 2019; 26:875-881. [PMID: 31120387 DOI: 10.1089/cmb.2018.0115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The aim of this study is to diagnose the stage of renal cell carcinoma and to predict the prognosis of breast cancer by using RNA sequencing and microarray data that are representative gene expression data. To identify biomarkers for prediction, top-N genes of each class of cancer or noncancer are recommended by collaborative filtering method based on three gene similarity coefficients. We then construct a machine learning model for classification using the union of the recommended genes as the final feature set. The optimal genetic markers were used to identify the set with the highest classification performance in the model. Experiments conducted by the proposed method showed higher performance than those conducted by the machine learning model using all the gene features without performing feature selection. In addition, it showed better performance than other studies based on existing correlation-based feature selection.
Collapse
Affiliation(s)
- Dahye Chung
- Department of Computer Science and Engineering, Sogang University, Seoul, Korea
| | - Kaiyuan Zhang
- Department of Computer Science and Engineering, Sogang University, Seoul, Korea
| | - Jihoon Yang
- Department of Computer Science and Engineering, Sogang University, Seoul, Korea
| |
Collapse
|
45
|
Bartholomai JA, Frieboes HB. Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2019; 2018:632-637. [PMID: 31312809 DOI: 10.1109/isspit.2018.8642753] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This study employs an approach for which regression models are used in combination with a classification model to predict survival time. A set of de-identified lung cancer patient data was obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The models use a subset of factors selected by ANOVA. Model accuracy is measured by a confusion matrix for classification and by Root Mean Square Error (RMSE) for regression. Random Forests are used for classification, while general Linear Regression, Gradient Boosted Machines (GBM), and Random Forests are used for regression. The regression results show that RF had the best performance for survival times ≤6 and >24 months (RMSE 10.52 and 20.51, respectively), while GBM performed best for 7-24 months (RMSE 15.65). Comparison plots of the results further indicate that the regression models perform better for shorter survival times than the RMSE values are able to reflect.
Collapse
|
46
|
A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 2019; 97:204-214. [PMID: 30797633 DOI: 10.1016/j.artmed.2019.01.006] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 10/22/2018] [Accepted: 01/27/2019] [Indexed: 12/17/2022]
Abstract
Neural networks are powerful tools used widely for building cancer prediction models from microarray data. We review the most recently proposed models to highlight the roles of neural networks in predicting cancer from gene expression data. We identified articles published between 2013-2018 in scientific databases using keywords such as cancer classification, cancer analysis, cancer prediction, cancer clustering and microarray data. Analyzing the studies reveals that neural network methods have been either used for filtering (data engineering) the gene expressions in a prior step to prediction; predicting the existence of cancer, cancer type or the survivability risk; or for clustering unlabeled samples. This paper also discusses some practical issues that can be considered when building a neural network-based cancer prediction model. Results indicate that the functionality of the neural network determines its general architecture. However, the decision on the number of hidden layers, neurons, hypermeters and learning algorithm is made using trail-and-error techniques.
Collapse
|
47
|
Boon IS, Au Yong TPT, Boon CS. Assessing the Role of Artificial Intelligence (AI) in Clinical Oncology: Utility of Machine Learning in Radiotherapy Target Volume Delineation. MEDICINES (BASEL, SWITZERLAND) 2018; 5:E131. [PMID: 30544901 PMCID: PMC6313566 DOI: 10.3390/medicines5040131] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 12/04/2018] [Accepted: 12/07/2018] [Indexed: 12/16/2022]
Abstract
The fields of radiotherapy and clinical oncology have been rapidly changed by the advances of technology. Improvement in computer processing power and imaging quality heralded precision radiotherapy allowing radiotherapy to be delivered efficiently, safely and effectively for patient benefit. Artificial intelligence (AI) is an emerging field of computer science which uses computer models and algorithms to replicate human-like intelligence and perform specific tasks which offers a huge potential to healthcare. We reviewed and presented the history, evolution and advancement in the fields of radiotherapy, clinical oncology and machine learning. Radiotherapy target delineation is a complex task of outlining tumour and organ at risks volumes to allow accurate delivery of radiotherapy. We discussed the radiotherapy planning, treatment delivery and reviewed how technology can help with this challenging process. We explored the evidence and clinical application of machine learning to radiotherapy. We concluded on the challenges, possible future directions and potential collaborations to achieve better outcome for cancer patients.
Collapse
Affiliation(s)
- Ian S Boon
- Department of Clinical Oncology, Leeds Cancer Centre, St James's Institute of Oncology, Leeds Teaching Hospitals NHS Trust, Leeds LS9 7TF, UK.
| | - Tracy P T Au Yong
- Department of Radiology, Worcestershire Acute Hospitals NHS Trust, Worcester WR5 1DD, UK.
| | - Cheng S Boon
- Worcestershire Oncology Centre, Worcestershire Acute Hospitals NHS Trust, Worcester WR5 1DD, UK.
| |
Collapse
|
48
|
Kong Y, Yu T. A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Sci Rep 2018; 8:16477. [PMID: 30405137 PMCID: PMC6220289 DOI: 10.1038/s41598-018-34833-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 10/06/2018] [Indexed: 01/10/2023] Open
Abstract
In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features (p). This "n ≪ p" property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under "n > p" scenarios in other application fields, such as image classification. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN's capability. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks.
Collapse
Affiliation(s)
- Yunchuan Kong
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd, Atlanta, GA, 30322, USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd, Atlanta, GA, 30322, USA.
| |
Collapse
|
49
|
Kong Y, Yu T. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics 2018; 34:3727-3737. [PMID: 29850911 PMCID: PMC6198851 DOI: 10.1093/bioinformatics/bty429] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/30/2018] [Accepted: 05/23/2018] [Indexed: 12/16/2022] Open
Abstract
Motivation Gene expression data represents a unique challenge in predictive model building, because of the small number of samples (n) compared with the huge amount of features (p). This 'n≪p' property has hampered application of deep learning techniques for disease outcome classification. Sparse learning by incorporating external gene network information could be a potential solution to this issue. Still, the problem is very challenging because (i) there are tens of thousands of features and only hundreds of training samples, (ii) the scale-free structure of the gene network is unfriendly to the setup of convolutional neural networks. Results To address these issues and build a robust classification model, we propose the Graph-Embedded Deep Feedforward Networks (GEDFN), to integrate external relational information of features into the deep neural network architecture. The method is able to achieve sparse connection between network layers to prevent overfitting. To validate the method's capability, we conducted both simulation experiments and real data analysis using a breast invasive carcinoma RNA-seq dataset and a kidney renal clear cell carcinoma RNA-seq dataset from The Cancer Genome Atlas. The resulting high classification accuracy and easily interpretable feature selection results suggest the method is a useful addition to the current graph-guided classification models and feature selection procedures. Availability and implementation The method is available at https://github.com/yunchuankong/GEDFN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yunchuan Kong
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, USA
| |
Collapse
|
50
|
Identifying a miRNA signature for predicting the stage of breast cancer. Sci Rep 2018; 8:16138. [PMID: 30382159 PMCID: PMC6208346 DOI: 10.1038/s41598-018-34604-3] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 10/12/2018] [Indexed: 12/13/2022] Open
Abstract
Breast cancer is a heterogeneous disease and one of the most common cancers among women. Recently, microRNAs (miRNAs) have been used as biomarkers due to their effective role in cancer diagnosis. This study proposes a support vector machine (SVM)-based classifier SVM-BRC to categorize patients with breast cancer into early and advanced stages. SVM-BRC uses an optimal feature selection method, inheritable bi-objective combinatorial genetic algorithm, to identify a miRNA signature which is a small set of informative miRNAs while maximizing prediction accuracy. MiRNA expression profiles of a 386-patient cohort of breast cancer were retrieved from The Cancer Genome Atlas. SVM-BRC identified 34 of 503 miRNAs as a signature and achieved a 10-fold cross-validation mean accuracy, sensitivity, specificity, and Matthews correlation coefficient of 80.38%, 0.79, 0.81, and 0.60, respectively. Functional enrichment of the 10 highest ranked miRNAs was analysed in terms of Kyoto Encyclopedia of Genes and Genomes and Gene Ontology annotations. Kaplan-Meier survival analysis of the highest ranked miRNAs revealed that four miRNAs, hsa-miR-503, hsa-miR-1307, hsa-miR-212 and hsa-miR-592, were significantly associated with the prognosis of patients with breast cancer.
Collapse
|