1
|
Hachache R, Yahyaouy A, Riffi J, Tairi H, Abibou S, Adoui ME, Benjelloun M. Advancing personalized oncology: a systematic review on the integration of artificial intelligence in monitoring neoadjuvant treatment for breast cancer patients. BMC Cancer 2024; 24:1300. [PMID: 39434042 PMCID: PMC11495077 DOI: 10.1186/s12885-024-13049-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Accepted: 10/08/2024] [Indexed: 10/23/2024] Open
Abstract
PURPOSE Despite suffering from the same disease, each patient exhibits a distinct microbiological profile and variable reactivity to prescribed treatments. Most doctors typically use a standardized treatment approach for all patients suffering from a specific disease. Consequently, the challenge lies in the effectiveness of this standardized treatment and in adapting it to each individual patient. Personalized medicine is an emerging field in which doctors use diagnostic tests to identify the most effective medical treatments for each patient. Prognosis, disease monitoring, and treatment planning rely on manual, error-prone methods. Artificial intelligence (AI) uses predictive techniques capable of automating prognostic and monitoring processes, thus reducing the error rate associated with conventional methods. METHODS This paper conducts an analysis of current literature, encompassing the period from January 2015 to 2023, based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). RESULTS In assessing 25 pertinent studies concerning predicting neoadjuvant treatment (NAT) response in breast cancer (BC) patients, the studies explored various imaging modalities (Magnetic Resonance Imaging, Ultrasound, etc.), evaluating results based on accuracy, sensitivity, and area under the curve. Additionally, the technologies employed, such as machine learning (ML), deep learning (DL), statistics, and hybrid models, were scrutinized. The presentation of datasets used for predicting complete pathological response (PCR) was also considered. CONCLUSION This paper seeks to unveil crucial insights into the application of AI techniques in personalized oncology, particularly in the monitoring and prediction of responses to NAT for BC patients. Finally, the authors suggest avenues for future research into AI-based monitoring systems.
Collapse
Affiliation(s)
- Rachida Hachache
- Department of Computer Sciences, LISAC Laboratory, Sidi Mohammed Ben Abdellah University, Fez, Morocco.
| | - Ali Yahyaouy
- Department of Computer Sciences, LISAC Laboratory, Sidi Mohammed Ben Abdellah University, Fez, Morocco
- USPN, La Maison Des Sciences Numériques, Paris, France
| | - Jamal Riffi
- Department of Computer Sciences, LISAC Laboratory, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| | - Hamid Tairi
- Department of Computer Sciences, LISAC Laboratory, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| | - Soukayna Abibou
- Department of Computer Sciences, LISAC Laboratory, Sidi Mohammed Ben Abdellah University, Fez, Morocco
| | - Mohammed El Adoui
- Computer Science Unit, Faculty of Engineering, University of Mons, Place du Parc, 20, Mons, 7000, Belgium
| | - Mohammed Benjelloun
- Computer Science Unit, Faculty of Engineering, University of Mons, Place du Parc, 20, Mons, 7000, Belgium
| |
Collapse
|
2
|
Alelyani T, Alshammari MM, Almuhanna A, Asan O. Explainable Artificial Intelligence in Quantifying Breast Cancer Factors: Saudi Arabia Context. Healthcare (Basel) 2024; 12:1025. [PMID: 38786433 PMCID: PMC11120946 DOI: 10.3390/healthcare12101025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
Breast cancer represents a significant health concern, particularly in Saudi Arabia, where it ranks as the most prevalent cancer type among women. This study focuses on leveraging eXplainable Artificial Intelligence (XAI) techniques to predict benign and malignant breast cancer cases using various clinical and pathological features specific to Saudi Arabian patients. Six distinct models were trained and evaluated based on common performance metrics such as accuracy, precision, recall, F1 score, and AUC-ROC score. To enhance interpretability, Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) were applied. The analysis identified the Random Forest model as the top performer, achieving an accuracy of 0.72, along with robust precision, recall, F1 score, and AUC-ROC score values. Conversely, the Support Vector Machine model exhibited the poorest performance metrics, indicating its limited predictive capability. Notably, the XAI approaches unveiled variations in the feature importance rankings across models, underscoring the need for further investigation. These findings offer valuable insights into breast cancer diagnosis and machine learning interpretation, aiding healthcare providers in understanding and potentially integrating such technologies into clinical practices.
Collapse
Affiliation(s)
- Turki Alelyani
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 1988, Saudi Arabia
| | - Maha M. Alshammari
- Department of Environmental Health, Institute for Research and Medical Consultations, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;
| | - Afnan Almuhanna
- Department of Radiology, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam 31441, Saudi Arabia;
| | - Onur Asan
- School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA;
| |
Collapse
|
3
|
Castro GA, Almeida JM, Machado-Neto JA, Almeida TA. A decision support system to recommend appropriate therapy protocol for AML patients. Front Artif Intell 2024; 7:1343447. [PMID: 38510471 PMCID: PMC10950921 DOI: 10.3389/frai.2024.1343447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/19/2024] [Indexed: 03/22/2024] Open
Abstract
Introduction Acute Myeloid Leukemia (AML) is one of the most aggressive hematological neoplasms, emphasizing the critical need for early detection and strategic treatment planning. The association between prompt intervention and enhanced patient survival rates underscores the pivotal role of therapy decisions. To determine the treatment protocol, specialists heavily rely on prognostic predictions that consider the response to treatment and clinical outcomes. The existing risk classification system categorizes patients into favorable, intermediate, and adverse groups, forming the basis for personalized therapeutic choices. However, accurately assessing the intermediate-risk group poses significant challenges, potentially resulting in treatment delays and deterioration of patient conditions. Methods This study introduces a decision support system leveraging cutting-edge machine learning techniques to address these issues. The system automatically recommends tailored oncology therapy protocols based on outcome predictions. Results The proposed approach achieved a high performance close to 0.9 in F1-Score and AUC. The model generated with gene expression data exhibited superior performance. Discussion Our system can effectively support specialists in making well-informed decisions regarding the most suitable and safe therapy for individual patients. The proposed decision support system has the potential to not only streamline treatment initiation but also contribute to prolonged survival and improved quality of life for individuals diagnosed with AML. This marks a significant stride toward optimizing therapeutic interventions and patient outcomes.
Collapse
Affiliation(s)
- Giovanna A. Castro
- Department of Computer Science, Federal University of São Carlos (UFSCar) Sorocaba, São Paulo, Brazil
| | - Jade M. Almeida
- Department of Computer Science, Federal University of São Carlos (UFSCar) Sorocaba, São Paulo, Brazil
| | - João A. Machado-Neto
- Institute of Biomedical Sciences, The University of São Paulo (USP), São Paulo, Brazil
| | - Tiago A. Almeida
- Department of Computer Science, Federal University of São Carlos (UFSCar) Sorocaba, São Paulo, Brazil
| |
Collapse
|
4
|
Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024; 52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]
Abstract
In the 21st century, the development of medical science has entered the era of big data, and machine learning has become an essential tool for mining medical big data. The establishment of the SEER database has provided a wealth of epidemiological data for cancer clinical research, and the number of studies based on SEER and machine learning has been growing in recent years. This article reviews recent research based on SEER and machine learning and finds that the current focus of such studies is primarily on the development and validation of models using machine learning algorithms, with the main directions being lymph node metastasis prediction, distant metastasis prediction, and prognosis-related research. Compared to traditional models, machine learning algorithms have the advantage of stronger adaptability, but also suffer from disadvantages such as overfitting and poor interpretability, which need to be weighed in practical applications. At present, machine learning algorithms, as the foundation of artificial intelligence, have just begun to emerge in the field of cancer clinical research. The future development of oncology will enter a more precise era of cancer research, characterized by larger data, higher dimensions, and more frequent information exchange. Machine learning is bound to shine brightly in this field.
Collapse
Affiliation(s)
- Shujun Li
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, 410008, China; National Clinical Research Center for Geriatric Diseases (Xiangya Hospital), China; Hunan Hematology Oncology Clinical Medical Research Center, China
| | - Hang Yi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qihao Leng
- Xiangya School of Medicine, Central South University, Changsha, 410013, Hunan Province, China
| | - You Wu
- Institute for Hospital Management, School of Medicine, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Yousheng Mao
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
5
|
Zarean Shahraki S, Azizmohammad Looha M, Mohammadi kazaj P, Aria M, Akbari A, Emami H, Asadi F, Akbari ME. Time-related survival prediction in molecular subtypes of breast cancer using time-to-event deep-learning-based models. Front Oncol 2023; 13:1147604. [PMID: 37342184 PMCID: PMC10277681 DOI: 10.3389/fonc.2023.1147604] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023] Open
Abstract
Background Breast cancer (BC) survival prediction can be a helpful tool for identifying important factors selecting the effective treatment reducing mortality rates. This study aims to predict the time-related survival probability of BC patients in different molecular subtypes over 30 years of follow-up. Materials and methods This study retrospectively analyzed 3580 patients diagnosed with invasive breast cancer (BC) from 1991 to 2021 in the Cancer Research Center of Shahid Beheshti University of Medical Science. The dataset contained 18 predictor variables and two dependent variables, which referred to the survival status of patients and the time patients survived from diagnosis. Feature importance was performed using the random forest algorithm to identify significant prognostic factors. Time-to-event deep-learning-based models, including Nnet-survival, DeepHit, DeepSurve, NMLTR and Cox-time, were developed using a grid search approach with all variables initially and then with only the most important variables selected from feature importance. The performance metrics used to determine the best-performing model were C-index and IBS. Additionally, the dataset was clustered based on molecular receptor status (i.e., luminal A, luminal B, HER2-enriched, and triple-negative), and the best-performing prediction model was used to estimate survival probability for each molecular subtype. Results The random forest method identified tumor state, age at diagnosis, and lymph node status as the best subset of variables for predicting breast cancer (BC) survival probabilities. All models yielded very close performance, with Nnet-survival (C-index=0.77, IBS=0.13) slightly higher using all 18 variables or the three most important variables. The results showed that the Luminal A had the highest predicted BC survival probabilities, while triple-negative and HER2-enriched had the lowest predicted survival probabilities over time. Additionally, the luminal B subtype followed a similar trend as luminal A for the first five years, after which the predicted survival probability decreased steadily in 10- and 15-year intervals. Conclusion This study provides valuable insight into the survival probability of patients based on their molecular receptor status, particularly for HER2-positive patients. This information can be used by healthcare providers to make informed decisions regarding the appropriateness of medical interventions for high-risk patients. Future clinical trials should further explore the response of different molecular subtypes to treatment in order to optimize the efficacy of breast cancer treatments.
Collapse
Affiliation(s)
- Saba Zarean Shahraki
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehdi Azizmohammad Looha
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Pooya Mohammadi kazaj
- Geographic Information Systems Department, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran, Iran
| | - Mehrad Aria
- Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University, Tehran, Iran
| | - Atieh Akbari
- Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hassan Emami
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farkhondeh Asadi
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
6
|
Huang Y, Zhang R, Li H, Xia Y, Yu X, Liu S, Yang Y. A multi-label learning prediction model for heart failure in patients with atrial fibrillation based on expert knowledge of disease duration. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04487-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
7
|
Wu R, Luo J, Wan H, Zhang H, Yuan Y, Hu H, Feng J, Wen J, Wang Y, Li J, Liang Q, Gan F, Zhang G. Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database. PLoS One 2023; 18:e0280340. [PMID: 36701415 PMCID: PMC9879508 DOI: 10.1371/journal.pone.0280340] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 12/26/2022] [Indexed: 01/27/2023] Open
Abstract
INTRODUCTION Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance. OBJECTIVE The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others. METHODS This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database. RESULTS Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820-0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/). CONCLUSIONS The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.
Collapse
Affiliation(s)
- Ruiyang Wu
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jing Luo
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Hangyu Wan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Haiyan Zhang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Yewei Yuan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Huihua Hu
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jinyan Feng
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Jing Wen
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Yan Wang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Junyan Li
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Qi Liang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Fengjiao Gan
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
| | - Gang Zhang
- Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children’s Hospital of Chengdu Medical College), Chengdu, China
- * E-mail:
| |
Collapse
|
8
|
Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population. Front Public Health 2022; 10:846118. [PMID: 35444985 PMCID: PMC9013842 DOI: 10.3389/fpubh.2022.846118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/23/2022] [Indexed: 12/12/2022] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) is a common serious health problem worldwide, which lacks efficient medical treatment. We aimed to develop and validate the machine learning (ML) models which could be used to the accurate screening of large number of people. This paper included 304,145 adults who have joined in the national physical examination and used their questionnaire and physical measurement parameters as model's candidate covariates. Absolute shrinkage and selection operator (LASSO) was used to feature selection from candidate covariates, then four ML algorithms were used to build the screening model for NAFLD, used a classifier with the best performance to output the importance score of the covariate in NAFLD. Among the four ML algorithms, XGBoost owned the best performance (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951), and the importance ranking of covariates is accordingly BMI, age, waist circumference, gender, type 2 diabetes, gallbladder disease, smoking, hypertension, dietary status, physical activity, oil-loving and salt-loving. ML classifiers could help medical agencies achieve the early identification and classification of NAFLD, which is particularly useful for areas with poor economy, and the covariates' importance degree will be helpful to the prevention and treatment of NAFLD.
Collapse
Affiliation(s)
- Weidong Ji
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
| | - Yushan Zhang
- Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
- *Correspondence: Yushan Wang
| |
Collapse
|
9
|
Chen X, Zhang C, Guo D, Wang Y, Hu J, Hu J, Wang S, Liu X. Distant metastasis and prognostic factors in patients with invasive ductal carcinoma of the breast. Eur J Clin Invest 2022; 52:e13704. [PMID: 34725819 DOI: 10.1111/eci.13704] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/28/2021] [Accepted: 10/31/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE To explore the risk factors and prognostic factors of invasive ductal carcinoma (IDC) and to predict the survival of IDC patients with metastasis. METHOD We used multivariate logistic regression to identify independent risk factors affecting metastasis in IDC patients and used Cox regression to identify independent prognostic factors affecting the overall survival of patients with metastasis. Nomogram was used to predict survival, while C-index and calibration curves were used to measure the performance of nomogram. Kaplan-Meier method was used to calculate the survival curves of patients with different independent prognostics factors and different metastatic sites, and the differences were compared by log-rank test. The data of our study were obtained from the Surveillance, Epidemiology and End Results cancer registry. RESULT Our study included 226,094 patients with IDC. In multivariate analysis, independent risk factors of metastasis included age, race, marital status, income, geographic region, grade, T stage, N stage, subtype, surgery and radiotherapy. Independent prognostic factors included age, race, marital status, income, geographic region, grade, T stage, N stage, subtype, surgery and chemotherapy. We established a nomogram, of which the C-index was 0.701 (0.693, 0.709), with the calibration curves showing that the disease-specific survival between actual observation and prediction had a good consistency. The survival curves of different metastatic patterns were significantly different (log-rank test: χ2 = 18784, p < 0.001; χ2 = 47.1, p < 0.001; χ2 = 20, p < 0.001). CONCLUSION The nomogram we established may provide risk assessment and survival prediction for IDC patients with metastasis, which can be used for clinical decision-making and reference.
Collapse
Affiliation(s)
- Xiaofei Chen
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Chenyang Zhang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Dingjie Guo
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Yashan Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Junjun Hu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Jiayi Hu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Song Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Xin Liu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| |
Collapse
|
10
|
Application of Deep Learning to Construct Breast Cancer Diagnosis Model. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12041957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
(1) Background: According to Taiwan’s ministry of health statistics, the rate of breast cancer in women is increasing annually. Each year, more than 10,000 women suffer from breast cancer, and over 2000 die of the disease. The mortality rate is annually increasing, but if breast cancer tumors are detected earlier, and appropriate treatment is provided immediately, the survival rate of patients will increase enormously. (2) Methods: This research aimed to develop a stepwise breast cancer model architecture to improve diagnostic accuracy and reduce the misdiagnosis rate of breast cancer. In the first stage, a breast cancer risk factor dataset was utilized. After pre-processing, Artificial Neural Network (ANN) and the support vector machine (SVM) were applied to the dataset to classify breast cancer tumors and compare their performances. The ANN achieved 76.6% classification accuracy, and the SVM using radial functions achieved the best classification accuracy of 91.6%. Therefore, SVM was utilized in the determination of results concerning the relevant breast cancer risk factors. In the second stage, we trained AlexNet, ResNet101, and InceptionV3 networks using transfer learning. The networks were studied using Adaptive Moment Estimation (ADAM) and Stochastic Gradient Descent with Momentum (SGDM) based optimization algorithm to diagnose benign and malignant tumors, and the results were evaluated; (3) Results: According to the results, AlexNet obtained 81.16%, ResNet101 85.51%, and InceptionV3 achieved a remarkable accuracy of 91.3%. The results of the three models were utilized in establishing a voting combination, and the soft-voting method was applied to average the prediction result for which a test accuracy of 94.20% was obtained; (4) Conclusions: Despite the small number of images in this study, the accuracy is higher compared to other literature. The proposed method has demonstrated the need for an additional productive tool in clinical settings when radiologists are evaluating mammography images of patients.
Collapse
|
11
|
Kaur I, Doja M, Ahmad T. Data Mining and Machine Learning in Cancer Survival Research: An Overview and Future Recommendations. J Biomed Inform 2022; 128:104026. [DOI: 10.1016/j.jbi.2022.104026] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 02/07/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022]
|
12
|
Dag AZ, Akcam Z, Kibis E, Simsek S, Delen D. A probabilistic data analytics methodology based on Bayesian belief network for predicting and understanding breast cancer survival. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:6342226. [PMID: 34992648 PMCID: PMC8727098 DOI: 10.1155/2021/6342226] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/27/2021] [Indexed: 12/31/2022]
Abstract
Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients' survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients.
Collapse
|
14
|
Zhang T, Li X, Qu Z. Lesion attentive thoracic disease diagnosis with large decision margin loss. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Min N, Wei Y, Zheng Y, Li X. Advancement of prognostic models in breast cancer: a narrative review. Gland Surg 2021; 10:2815-2831. [PMID: 34733730 DOI: 10.21037/gs-21-441] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/13/2021] [Indexed: 11/06/2022]
Abstract
Objective To provide a reference for clinical work and guide the decision-making of healthcare providers and end-users, we systematically reviewed the development, validation and classification of classical prognostic models for breast cancer. Background Patients suffering from breast cancer have different prognosis for its high heterogeneity. Accurate prognosis prediction and risk stratification for breast cancer are crucial for individualized treatment. There is a lack of systematic summary of breast cancer prognostic models. Methods We conducted a PubMed search with keywords "breast neoplasm", "prognostic model", "recurrence" and "metastasis", and screened the retrieved publications at three levels: title, abstract and full text. We identified the articles presented the development and/or validation of models based on clinicopathological factors, genomics, and machine learning (ML) methods to predict survival and/or benefits of adjuvant therapy in female breast cancer patients. Conclusions Combining prognostic-related variables with long-term clinical outcomes, researchers have developed a series of prognostic models based on clinicopathological parameters, genomic assays, and medical figures. The discrimination, calibration, overall performance, and clinical usefulness were validated by internal and/or external verifications. Clinicopathological models integrated the clinical parameters, including tumor size, histological grade, lymph node status, hormone receptor status to provide prognostic information for patients and doctors. Gene-expression assays deeply revealed the molecular heterogeneity of breast cancer, some of which have been cited by AJCC and National Comprehensive Cancer Network (NCCN) guidelines. In addition, the models based on the ML methods provided more detailed information for prognosis prediction by increasing the data dimension. Combined models incorporating clinical variables and genomics information are still required to be developed as the focus of further researches.
Collapse
Affiliation(s)
- Ningning Min
- School of Medicine, Nankai University, Tianjin, China.,Department of General Surgery, Chinese People's Liberation Army General Hospital, Beijing, China
| | - Yufan Wei
- School of Medicine, Nankai University, Tianjin, China.,Department of General Surgery, Chinese People's Liberation Army General Hospital, Beijing, China
| | - Yiqiong Zheng
- Department of General Surgery, Chinese People's Liberation Army General Hospital, Beijing, China
| | - Xiru Li
- Department of General Surgery, Chinese People's Liberation Army General Hospital, Beijing, China
| |
Collapse
|
16
|
Parimbelli E, Wilk S, Cornet R, Sniatala P, Sniatala K, Glaser SLC, Fraterman I, Boekhout AH, Ottaviano M, Peleg M. A review of AI and Data Science support for cancer management. Artif Intell Med 2021; 117:102111. [PMID: 34127240 DOI: 10.1016/j.artmed.2021.102111] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/23/2020] [Accepted: 05/11/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home. OBJECTIVE Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt. METHODS We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient's state from it and deliver coaching/behavior change interventions. RESULTS Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching. CONCLUSION Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
Collapse
Affiliation(s)
| | - S Wilk
- Poznan University of Technology, Poland
| | - R Cornet
- Amsterdam University Medical Centre, the Netherlands
| | | | | | - S L C Glaser
- Amsterdam University Medical Centre, the Netherlands
| | - I Fraterman
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - A H Boekhout
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | | | |
Collapse
|
17
|
A two-stage modeling approach for breast cancer survivability prediction. Int J Med Inform 2021; 149:104438. [PMID: 33730681 DOI: 10.1016/j.ijmedinf.2021.104438] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/24/2021] [Accepted: 03/08/2021] [Indexed: 01/14/2023]
Abstract
BACKGROUND Despite the increasing number of studies in breast cancer survival prediction, there is little attention put toward deceased patients and their survival lengths. Moreover, developing a model that is both accurate and interpretable remains a challenge. OBJECTIVE This paper proposes a two-stage data analytic framework, where Stage I classifies the survival and deceased statuses and Stage II predicts the number of survival months for deceased females with cancer. Since medical data are not entirely clean nor prepared for model development, we aim to show that data preparation can strengthen a simple Generalized Linear Model (GLM)1 to predict as accurate as the complex models like Extreme Gradient Boosting (XGB)2 and Multilayer Perceptron based on Artificial Neural Networks (MLP-ANNs)3 in both stages. METHODS In Stage I, we use recent Surveillance, Epidemiology, and End Results (SEER)4 data from 2004 to 2016 to predict short term survival statuses from 6-months to 3-years with 6 month increments. Synthetic Minority Over-sampling Technique (SMOTE),5 Relocating Safe-Level SMOTE (RSLS)6, Adaptive Synthetic (ADASYN)7 re-sampling techniques, Least Absolute Shrinkage and Selection Operator (LASSO)8 and Random Forest (RF)9 feature selection methods along with integer and one-hot encoding are combined with the three popular data mining methods: GLM, XGB, and MLP. In Stage II, we predict the number of survival months for patients who are correctly predicted as deceased within 3-years. Again, we employ GLM, XGB, and MLP for regression along with LASSO and RF for feature selection and one-hot encoding to encode the categorical features. RESULTS We obtain Area Under the Receiver Operating Characteristic Curve (AUC)10 values of 0.900, 0.898, 0.877, 0.852, 0.852, and 0.858 for 6-month, 1-, 1.5-, 2-, 2.5, and 3-year survival time-points, respectively, using OneHotEncoding-GLM-LASSO-ADASYN. We use the change in the Odds Ratio values in GLM to manifest the impact of individual categorical levels and numerical features on the odds of death. In Stage II, we obtain Mean Absolute Error (MAE)11 of 7.960 months using OneHotEncoding-GLM-LASSO when predicting the number of survival months for deceased patients. We present the top contributing features and their coefficient values to illustrate how the presence of each feature alters the predicted number of survival months. CONCLUSION To the best of our knowledge, this is the first study that implements both breast cancer survival classification and regression in a two-stage approach. All data-driven findings are presented in order to assist clinicians make better care decisions using GLM, an interpretable and computationally efficient method that predicts survival status and survival lengths for deceased patients, to help foster human and machine interactions.
Collapse
|
18
|
Lotfnezhad Afshar H, Jabbari N, Khalkhali HR, Esnaashari O. Prediction of Breast Cancer Survival by Machine Learning Methods: An Application of Multiple Imputation. IRANIAN JOURNAL OF PUBLIC HEALTH 2021; 50:598-605. [PMID: 34178808 PMCID: PMC8214598 DOI: 10.18502/ijph.v50i3.5606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Background: The low breast cancer survival rates in less developed countries are critical. The machine learning techniques predict cancers survival with high accuracy. Missing data are the most important limitation for using the highest potential of these techniques to predict cancers survival. Multiple imputation (MI) was implemented and analyzed in detail to impute the missing data of a breast cancer dataset. Methods: The dataset was from The Omid Treatment and Research Center Urmia, Iran between Jan 2006 and Dec 2012 and had information from 856 women. The algorithms such as C5 and repeated incremental pruning to produce error reduction were applied on the imputed versions of the original dataset and the non-imputed dataset to predict and extract clinical rules, respectively. Results: The findings showed the performance of C5 in all the evaluation criteria including accuracy (84.42%), sensitivity (92.21%), specificity (64%), Kappa statistic (59.06%), and the area under the receiver operator characteristic (ROC) curve (0.84), was improved after imputation. Conclusion: The dataset of the present study met the requirements for using the multiple imputation method. The extracted rules after the application of MI were more comprehensive and contained knowledge that is more clinical. However, the clinical value of the extracted rules after filling in the missing data did not noticeably increase.
Collapse
Affiliation(s)
- Hadi Lotfnezhad Afshar
- Department of Health Information Technology, School of Paramedical, Urmia University of Medical Sciences, Urmia, Iran
| | - Nasrollah Jabbari
- Department of Medical Physics, Solid Tumor Research Center, School of Paramedical, Urmia University of Medical Sciences, Urmia, Iran
| | - Hamid Reza Khalkhali
- Department of Biostatistics and Epidemiology, Patient Safety Research Center, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | | |
Collapse
|
19
|
Chugh G, Kumar S, Singh N. Survey on Machine Learning and Deep Learning Applications in Breast Cancer Diagnosis. Cognit Comput 2021. [DOI: 10.1007/s12559-020-09813-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
20
|
Abstract
COVID-19 is a disease currently ravaging the world, bringing unprecedented health and economic challenges to several nations. There are presently close to five million reported cases in over 200 countries with fatalities numbering over 300,000 persons. This study presents machine-learning models for the prediction and visualization of the significant factors that determine the survivability of COVID-19 patients. This study develops prediction models using a decision tree, logistic regression (LR), gradient boosting, and LR algorithms to identify the significant factors and predict the survivability of COVID-19 patients. The results of the simulation showed that the LR model had the lowest prediction accuracy. The other three showed over 95% correct accuracy and indicated that the essential factors in determining patients' survivability were underlying health conditions and age. The findings of this study agreed with the medical claims that patients with underlying health challenges and those advanced in age are liable to have complications; hence, providing a research-based credence to this belief. This proposed model thus serves as a decision support system for the management of COVID-19 patients, as well as predicts a patient’s chances of survival at the first presentation at the hospitals.
Collapse
|
21
|
Lu J, Wilfred P, Korbie D, Trau M. Regulation of Canonical Oncogenic Signaling Pathways in Cancer via DNA Methylation. Cancers (Basel) 2020; 12:E3199. [PMID: 33143142 PMCID: PMC7692324 DOI: 10.3390/cancers12113199] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/24/2020] [Accepted: 10/28/2020] [Indexed: 02/07/2023] Open
Abstract
Disruption of signaling pathways that plays a role in the normal development and cellular homeostasis may lead to the dysregulation of cellular signaling and bring about the onset of different diseases, including cancer. In addition to genetic aberrations, DNA methylation also acts as an epigenetic modifier to drive the onset and progression of cancer by mediating the reversible transcription of related genes. Although the role of DNA methylation as an alternative driver of carcinogenesis has been well-established, the global effects of DNA methylation on oncogenic signaling pathways and the presentation of cancer is only emerging. In this article, we introduced a differential methylation parsing pipeline (MethylMine) which mined for epigenetic biomarkers based on feature selection. This pipeline was used to mine for biomarkers, which presented a substantial difference in methylation between the tumor and the matching normal tissue samples. Combined with the Data Integration Analysis for Biomarker discovery (DIABLO) framework for machine learning and multi-omic analysis, we revisited the TCGA DNA methylation and RNA-Seq datasets for breast, colorectal, lung, and prostate cancer, and identified differentially methylated genes within the NRF2-KEAP1/PI3K oncogenic pathway, which regulates the expression of cytoprotective genes, that serve as potential therapeutic targets to treat different cancers.
Collapse
Affiliation(s)
- Jennifer Lu
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Premila Wilfred
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Darren Korbie
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Matt Trau
- Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia; (J.L.); (P.W.)
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD 4072, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
22
|
Kaur I, Doja MN, Ahmad T. Time-range based sequential mining for survival prediction in prostate cancer. J Biomed Inform 2020; 110:103550. [PMID: 32882394 DOI: 10.1016/j.jbi.2020.103550] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/30/2020] [Accepted: 08/27/2020] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND OBJECTIVE Metastatic prostate cancer has a higher mortality rate than localized cancers. There is a need to investigate the survival outcome of metastatic prostate cancers separately. Also, the treatments undertaken by the patients affect their overall survival. The present study tries to analyze the sequence of treatments given to the patients, along with the time intervals between each set of treatments. The time when medication needs to be changed may provide some useful insights into the survival outcome of the patients. MATERIALS AND METHODS A total of 407 metastatic prostate cancer patients' data was collected and analyzed from an Indian tertiary care center. Popular sequence mining algorithms with exact order constraint have been applied to the treatment data. Appropriate time intervals were added in the resulted frequent sequences and fed to machine learning techniques along with other clinical data. RESULTS The study suggests that the proposed methodology of the time range based sequence mining approach gave better results than the existing methods with 84.5% accuracy and 0.89 AUC. The time intervals in the existing sequence mining algorithms can give the clinicians some useful insights into the survival analysis and in determining the best lines of treatments for a particular patient.
Collapse
Affiliation(s)
| | - M N Doja
- Indian Institute of Information Technology, Sonepat, India
| | | |
Collapse
|
23
|
An Y, Wang J, Zhang L, Zhao H, Gao Z, Huang H, Du Z, Jiao Z, Yan J, Wei X, Jin B. PASCAL: a pseudo cascade learning framework for breast cancer treatment entity normalization in Chinese clinical text. BMC Med Inform Decis Mak 2020; 20:204. [PMID: 32859189 PMCID: PMC7456389 DOI: 10.1186/s12911-020-01216-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 08/12/2020] [Indexed: 12/04/2022] Open
Abstract
Backgrounds Knowledge discovery from breast cancer treatment records has promoted downstream clinical studies such as careflow mining and therapy analysis. However, the clinical treatment text from electronic health data might be recorded by different doctors under their hospital guidelines, making the final data rich in author- and domain-specific idiosyncrasies. Therefore, breast cancer treatment entity normalization becomes an essential task for the above downstream clinical studies. The latest studies have demonstrated the superiority of deep learning methods in named entity normalization tasks. Fundamentally, most existing approaches adopt pipeline implementations that treat it as an independent process after named entity recognition, which can propagate errors to later tasks. In addition, despite its importance in clinical and translational research, few studies directly deal with the normalization task in Chinese clinical text due to the complexity of composition forms. Methods To address these issues, we propose PASCAL, an end-to-end and accurate framework for breast cancer treatment entity normalization (TEN). PASCAL leverages a gated convolutional neural network to obtain a representation vector that can capture contextual features and long-term dependencies. Additionally, it treats treatment entity recognition (TER) as an auxiliary task that can provide meaningful information to the primary TEN task and as a particular regularization to further optimize the shared parameters. Finally, by concatenating the context-aware vector and probabilistic distribution vector from TEN, we utilize the conditional random field layer (CRF) to model the normalization sequence and predict the TEN sequential results. Results To evaluate the effectiveness of the proposed framework, we employ the three latest sequential models as baselines and build the model in single- and multitask on a real-world database. Experimental results show that our method achieves better accuracy and efficiency than state-of-the-art approaches. Conclusions The effectiveness and efficiency of the presented pseudo cascade learning framework were validated for breast cancer treatment normalization in clinical text. We believe the predominant performance lies in its ability to extract valuable information from unstructured text data, which will significantly contribute to downstream tasks, such as treatment recommendations, breast cancer staging and careflow mining.
Collapse
Affiliation(s)
- Yang An
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, Liaoning, 116024, China
| | - Jianlin Wang
- First Hospital of Lanzhou University, 1 Donggang W Rd, Chengguan District, Lanzhou, Gansu, 730000, China
| | - Liang Zhang
- International Bussiness College, Dongbei University of Finance and Economics, No.20 Jianshan Street, Shahekou District, Dalian, Liaoning, 116025, China.
| | - Hanyu Zhao
- Dalian University, No.10 Xuefu Street, Economic and Technological Development Zone, Dalian, Liaoning, 116622, China
| | - Zhan Gao
- BeiJing Haoyisheng Cloud Hospital Management Technology Ltd., No.10 Dewai Street, Xicheng District, Beijing, 100088, China
| | - Haitao Huang
- The People's Hospital of Liaoning Province, No.33 Shenhe District, Shenyang, Liaoning, 110016, China
| | - Zhenguang Du
- The People's Hospital of Liaoning Province, No.33 Shenhe District, Shenyang, Liaoning, 110016, China
| | - Zengtao Jiao
- AI Lab, Yidu Cloud, No.35 of Huayuan North Road, Haidian District, Beijing, 100191, China
| | - Jun Yan
- AI Lab, Yidu Cloud, No.35 of Huayuan North Road, Haidian District, Beijing, 100191, China
| | - Xiaopeng Wei
- School of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, Liaoning, 116024, China
| | - Bo Jin
- School of Innovation and Entrepreneurship, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, Liaoning, 116024, China
| |
Collapse
|
24
|
Hu D, Li S, Huang Z, Wu N, Lu X. Predicting postoperative non-small cell lung cancer prognosis via long short-term relational regularization. Artif Intell Med 2020; 107:101921. [PMID: 32828458 DOI: 10.1016/j.artmed.2020.101921] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 05/19/2020] [Accepted: 06/29/2020] [Indexed: 12/20/2022]
Abstract
OBJECTIVES Lung cancer is the leading cause of cancer death worldwide. Prognosis of lung cancer plays a crucial role in the clinical decision-making process to optimize the treatment for patients. Most of the existing data-driven prognostic prediction models explore the relations between patient's characteristics and outcomes at a specific time interval. Although valuable, they neglect the relations between long-term and short-term prognoses and thus may limit the prediction performance. METHODS In this study, we present a novel prognostic prediction approach for postoperative NSCLC patients. Specifically, we formulate the learning objective function by exploiting the relations between long-term and short-term prognoses via a long short-term relational regularization. The regularization term is composed of two parts, i.e., the similarities between prognoses measured by patients' outcomes and the L2 -norms between the corresponding prognoses' weight vectors. Based on this regularization, the proposed method can extract critical risk factors that comprehensively consider the long-term and short-term prognoses to facilitate the estimation of clinical risks. RESULTS We evaluate the proposed model on a clinical dataset containing 693 consecutive postoperative NSCLC patients with more than 5-year follow-up from 2006 to 2015. Our best models achieve 0.743, 0.709, and 0.746 AUCs for 1-year, 3-year, and 5-year survival prediction, 0.696, 0.724, and 0.736 AUCs for 1-year, 3-year, and 5-year recurrence prediction, respectively. The experimental results show the efficiency of our proposed model in improving the performances on 1-year prognostic prediction in comparison with benchmark models. By comparing with the model without the long short-term relational regularization, the proposed model extracts more consistent critical risk factors for both long-term and short-term prognoses and contains fewer unreasonable risk factors under the clinician's review. CONCLUSIONS We conclude that the proposed model can effectively exploit the relations between long-term and short-term prognoses. And the risk factors recognized by the proposed model have the potentials for further prognostic prediction of postoperative non-small cell lung cancer patients.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou 310027, China; Key Laboratory for Biomedical Engineering, Ministry of Education, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Zhengxing Huang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou 310027, China; Key Laboratory for Biomedical Engineering, Ministry of Education, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing 100142, China.
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou 310027, China; Key Laboratory for Biomedical Engineering, Ministry of Education, China.
| |
Collapse
|
25
|
Shin D, Park J, Han D, Moon JH, Ryu HS, Kim Y. Identification of TUBB2A by quantitative proteomic analysis as a novel biomarker for the prediction of distant metastatic breast cancer. Clin Proteomics 2020; 17:16. [PMID: 32489334 PMCID: PMC7247212 DOI: 10.1186/s12014-020-09280-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 05/15/2020] [Indexed: 12/20/2022] Open
Abstract
Background Metastasis of breast cancer to distal organs is fatal. However, few studies have identified biomarkers that are associated with distant metastatic breast cancer. Furthermore, the inability of current biomarkers, such as HER2, ER, and PR, to differentiate between distant and nondistant metastatic breast cancers accurately has necessitated the development of novel biomarker candidates. Methods An integrated proteomics approach that combined filter-aided sample preparation, tandem mass tag labeling (TMT), high pH fractionation, and high-resolution MS was applied to acquire in-depth proteomic data from FFPE distant metastatic breast cancer tissues. A bioinformatics analysis was performed with regard to gene ontology and signaling pathways using differentially expressed proteins (DEPs) to examine the molecular characteristics of distant metastatic breast cancer. In addition, real-time polymerase chain reaction (RT-PCR) and invasion/migration assays were performed to validate the differential regulation and function of our protein targets. Results A total of 9441 and 8746 proteins were identified from the pooled and individual sample sets, respectively. Based on our criteria, TUBB2A was selected as a novel biomarker candidate. The metastatic activities of TUBB2A were subsequently validated. In our bioinformatics analysis using DEPs, we characterized the overall molecular features of distant metastasis and measured differences in the molecular functions of distant metastatic breast cancer between breast cancer subtypes. Conclusions Our report is the first study to examine the distant metastatic breast cancer proteome using FFPE tissues. The depth of our dataset allowed us to discover a novel biomarker candidate and a proteomic characteristics of distant metastatic breast cancer. Distinct molecular features of various breast cancer subtypes were also established. Our proteomic data constitute a valuable resource for research on distant metastatic breast cancer.
Collapse
Affiliation(s)
- Dongyoon Shin
- Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehakro, Seoul, 30380 Korea
| | - Joonho Park
- Interdisciplinary Program for Bioengineering, Seoul National University College of Engineering, Seoul, Korea
| | - Dohyun Han
- Biomedical Research Institute, Seoul National University Hospital, 101 Daehakro, Seoul, Korea
| | - Ji Hye Moon
- Department of Pathology, Seoul National University Hospital, 101 Daehakro, Seoul, 03080 Korea
| | - Han Suk Ryu
- Department of Pathology, Seoul National University Hospital, 101 Daehakro, Seoul, 03080 Korea
| | - Youngsoo Kim
- Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehakro, Seoul, 30380 Korea.,Interdisciplinary Program for Bioengineering, Seoul National University College of Engineering, Seoul, Korea
| |
Collapse
|
26
|
Du X, Min J, Shah CP, Bishnoi R, Hogan WR, Lemas DJ. Predicting in-hospital mortality of patients with febrile neutropenia using machine learning models. Int J Med Inform 2020; 139:104140. [PMID: 32325370 DOI: 10.1016/j.ijmedinf.2020.104140] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/12/2020] [Accepted: 04/03/2020] [Indexed: 11/30/2022]
Abstract
BACKGROUND Febrile neutropenia (FN) has been associated with high mortality among adults with cancer. Current systems for early detection of inpatient FN mortality are based on scoring indexes that require intensive physicians' subjective evaluation. OBJECTIVE In this study, we leveraged machine learning techniques to build a FN mortality risk evaluation tool focused on FN admissions without physicians' subjective evaluation. METHODS We used the National Inpatient Sample and Nationwide Inpatient Sample (NIS) that included mortality data among adult inpatients who were diagnosed with FN during a hospital admission. Machine learning techniques that we compared included linear models (ridge logistic regression and linear support vector machine) and non-linear models (gradient boosting tree and neural network). The primary outcome for this study was death among individuals with a recorded FN admission. Model comparison was evaluated based on areas under the receiver operating characteristic curve (AUROC) and model performance was estimated using 30 % test set created via stratified split. RESULTS Our analysis detected 126,013 adult admissions within the NIS data that were diagnosed with FN, among which 5,856 were declared as deceased (4.6 %). Our machine learning results demonstrate linear models and non-linear models achieved areas under the receiver operating characteristic (AUROC) around 92 % in survival prediction. CONCLUSIONS We developed machine learning models that do not require physicians' subjective evaluation for FN mortality risk prediction.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jae Min
- Department of Epidemiology, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Chintan P Shah
- Division of Hematology and Oncology, Department of Medicine, University of Florida, Gainesville, FL, United States
| | - Rohit Bishnoi
- Division of Hematology and Oncology, Department of Medicine, University of Florida, Gainesville, FL, United States
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
27
|
Wang H, Wang X, Xu L, Zhang J, Cao H. Integrated analysis of the E2F transcription factors across cancer types. Oncol Rep 2020; 43:1133-1146. [PMID: 32323836 PMCID: PMC7058048 DOI: 10.3892/or.2020.7504] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 01/17/2020] [Indexed: 12/11/2022] Open
Abstract
E2F transcription factors are associated with the development of cancer. However, the E2F family genes have not yet been studied in a comprehensive manner. Using The Cancer Genome Atlas, the present study analyzed the functions of the E2F family genes across different types of tumor. It was revealed that compared with normal tissues, the E2F family genes are highly expressed in several types of tumor tissue. Furthermore, E2F transcription factors were significantly enriched in tumor samples across different types of tumor. The high expression levels of E2F family genes were associated with an unfavorable prognosis in liver hepatocellular carcinoma (LIHC) and lung adenocarcinoma (LUAD). Furthermore, patients with pathological T1 stage and iCluster2 molecular subtype of LIHC expressed particularly low levels of E2F family genes. The present study demonstrated that hypo-DNA methylation, DNA amplification and TP53 mutation contributed to the high expression levels of E2F family genes in cancer cells. Finally, the present study revealed that, compared with other types of tumor, the E2F family genes were specifically downregulated in patients with LIHC. The expression levels and prognostic effects of the E2F family genes were validated using the Gene Expression Omnibus database. The results of the present study revealed the biological functions of E2F family genes in the development of cancer and provided potential biomarkers for further therapeutic studies, particularly for patients with LIHC and LUAD.
Collapse
Affiliation(s)
- Haiwei Wang
- Fujian Provincial Prenatal Diagnosis Center, Fujian Provincial Maternity and Children's Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian 350001, P.R. China
| | - Xinrui Wang
- Fujian Provincial Prenatal Diagnosis Center, Fujian Provincial Maternity and Children's Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian 350001, P.R. China
| | - Liangpu Xu
- Fujian Provincial Prenatal Diagnosis Center, Fujian Provincial Maternity and Children's Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian 350001, P.R. China
| | - Ji Zhang
- State Key Laboratory for Medical Genomics, Shanghai Institute of Hematology, Rui‑Jin Hospital Affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, P.R. China
| | - Hua Cao
- Fujian Provincial Prenatal Diagnosis Center, Fujian Provincial Maternity and Children's Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian 350001, P.R. China
| |
Collapse
|
28
|
Moreau JT, Hankinson TC, Baillet S, Dudley RWR. Individual-patient prediction of meningioma malignancy and survival using the Surveillance, Epidemiology, and End Results database. NPJ Digit Med 2020; 3:12. [PMID: 32025573 PMCID: PMC6992687 DOI: 10.1038/s41746-020-0219-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 01/10/2020] [Indexed: 01/17/2023] Open
Abstract
Meningiomas are known to have relatively lower aggressiveness and better outcomes than other central nervous system (CNS) tumors. However, there is considerable overlap between clinical and radiological features characterizing benign, atypical, and malignant tumors. In this study, we developed methods and a practical app designed to assist with the diagnosis and prognosis of meningiomas. Statistical learning models were trained and validated on 62,844 patients from the Surveillance, Epidemiology, and End Results database. We used balanced logistic regression-random forest ensemble classifiers and proportional hazards models to learn multivariate patterns of association between malignancy, survival, and a series of basic clinical variables-such as tumor size, location, and surgical procedure. We demonstrate that our models are capable of predicting meaningful individual-specific clinical outcome variables and show good generalizability across 16 SEER registries. A free smartphone and web application is provided for readers to access and test the predictive models (www.meningioma.app). Future model improvements and prospective replication will be necessary to demonstrate true clinical utility. Rather than being used in isolation, we expect that the proposed models will be integrated into larger and more comprehensive models that integrate imaging and molecular biomarkers. Whether for meningiomas or other tumors of the CNS, the power of these methods to make individual-patient predictions could lead to improved diagnosis, patient counseling, and outcomes.
Collapse
Affiliation(s)
- Jeremy T. Moreau
- McConnell Brain Imaging Centre, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC Canada
- Department of Pediatric Surgery, Division of Neurosurgery, Montreal Children’s Hospital, Montreal, QC Canada
| | - Todd C. Hankinson
- Department of Pediatric Neurosurgery, Children’s Hospital Colorado, University of Colorado Anschutz Medical Campus, Aurora, CO USA
- Morgan Adams Foundation Pediatric Brain Tumor Research Program, Aurora, CO USA
| | - Sylvain Baillet
- McConnell Brain Imaging Centre, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC Canada
| | - Roy W. R. Dudley
- Department of Pediatric Surgery, Division of Neurosurgery, Montreal Children’s Hospital, Montreal, QC Canada
| |
Collapse
|
29
|
Huang S, Yang J, Fong S, Zhao Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett 2019; 471:61-71. [PMID: 31830558 DOI: 10.1016/j.canlet.2019.12.007] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/04/2019] [Accepted: 12/06/2019] [Indexed: 02/06/2023]
Abstract
Cancer is an aggressive disease with a low median survival rate. Ironically, the treatment process is long and very costly due to its high recurrence and mortality rates. Accurate early diagnosis and prognosis prediction of cancer are essential to enhance the patient's survival rate. Developments in statistics and computer engineering over the years have encouraged many scientists to apply computational methods such as multivariate statistical analysis to analyze the prognosis of the disease, and the accuracy of such analyses is significantly higher than that of empirical predictions. Furthermore, as artificial intelligence (AI), especially machine learning and deep learning, has found popular applications in clinical cancer research in recent years, cancer prediction performance has reached new heights. This article reviews the literature on the application of AI to cancer diagnosis and prognosis, and summarizes its advantages. We explore how AI assists cancer diagnosis and prognosis, specifically with regard to its unprecedented accuracy, which is even higher than that of general statistical applications in oncology. We also demonstrate ways in which these methods are advancing the field. Finally, opportunities and challenges in the clinical implementation of AI are discussed. Hence, this article provides a new perspective on how AI technology can help improve cancer diagnosis and prognosis, and continue improving human health in the future.
Collapse
Affiliation(s)
- Shigao Huang
- Cancer Center, Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macao, China
| | - Jie Yang
- Department of Computer and Information Science, University of Macau, Taipa, Macau, China; Chongqing Industry&Trade Polytechnic, Chongqing, China
| | - Simon Fong
- Department of Computer and Information Science, University of Macau, Taipa, Macau, China; Zhuhai Institute of Advanced Technology Chinese Academy of Sciences, Zhuhai, China.
| | - Qi Zhao
- Cancer Center, Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macao, China.
| |
Collapse
|
30
|
Persistence of data-driven knowledge to predict breast cancer survival. Int J Med Inform 2019; 129:303-311. [DOI: 10.1016/j.ijmedinf.2019.06.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 06/05/2019] [Accepted: 06/20/2019] [Indexed: 11/23/2022]
|
31
|
Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019; 19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open
Abstract
Background Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes. Methods In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification. Results The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke. Conclusions Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yang Gong
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hong Kang
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
32
|
A tree ensemble-based two-stage model for advanced-stage colorectal cancer survival prediction. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.09.046] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
33
|
Said AA, Abd-Elmegid LA, Kholeif S, Gaber AA. Stage – Specific predictive models for main prognosis measures of breast cancer. FUTURE COMPUTING AND INFORMATICS JOURNAL 2018; 3:391-397. [DOI: 10.1016/j.fcij.2018.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|