1
|
Oh H, Cho S, Lee JA, Ryu S, Chang Y. Risk prediction model for gastric cancer within 5 years in healthy Korean adults. Gastric Cancer 2024; 27:675-683. [PMID: 38561527 DOI: 10.1007/s10120-024-01488-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 03/08/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Although endoscopy is commonly used for gastric cancer screening in South Korea, predictive models that integrate endoscopy results are scarce. We aimed to develop a 5-year gastric cancer risk prediction model using endoscopy results as a predictor. METHODS We developed a predictive model using the cohort data of the Kangbuk Samsung Health Study from 2011 to 2019. Among the 260,407 participants aged ≥20 years who did not have any previous history of cancer, 435 cases of gastric cancer were observed. A Cox proportional hazard regression model was used to evaluate the predictors and calculate the 5-year risk of gastric cancer. Harrell's C-statistics and Nam-D'Agostino χ2 test were used to measure the quality of discrimination and calibration ability, respectively. RESULTS We included age, sex, smoking status, alcohol consumption, family history of cancer, and previous results for endoscopy in the risk prediction model. This model showed sufficient discrimination ability [development cohort: C-Statistics: 0.800, 95% confidence interval (CI) 0.770-0.829; validation cohort: C-Statistics: 0.799, 95% CI 0.743-0.856]. It also performed well with effective calibration (development cohort: χ2 = 13.65, P = 0.135; validation cohort: χ2 = 15.57, P = 0.056). CONCLUSION Our prediction model, including young adults, showed good discrimination and calibration. Furthermore, this model considered a fixed time interval of 5 years to predict the risk of developing gastric cancer, considering endoscopic results. Thus, it could be clinically useful, especially for adults with endoscopic results.
Collapse
Affiliation(s)
- Hyungseok Oh
- Workplace Health Institute, Total Healthcare Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Sunwoo Cho
- Workplace Health Institute, Total Healthcare Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Jung Ah Lee
- Department of Family Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea.
| | - Seungho Ryu
- Center for Cohort Studies, Total Healthcare Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Occupational and Environmental Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Clinical Research Design & Evaluation, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | - Yoosoo Chang
- Center for Cohort Studies, Total Healthcare Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Occupational and Environmental Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Clinical Research Design & Evaluation, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea
| |
Collapse
|
2
|
Ding Y. Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality. Diseases 2024; 12:139. [PMID: 39057110 PMCID: PMC11275333 DOI: 10.3390/diseases12070139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 06/24/2024] [Accepted: 06/29/2024] [Indexed: 07/28/2024] Open
Abstract
In recent years, the escalating environmental challenges have contributed to a rising incidence of cancer. The precise anticipation of cancer incidence and mortality rates has emerged as a pivotal focus in scientific inquiry, exerting a profound impact on the formulation of public health policies. This investigation adopts a pioneering machine learning framework to address this critical issue, utilizing a dataset encompassing 72,591 comprehensive records that include essential variables such as age, case count, population size, race, gender, site, and year of diagnosis. Diverse machine learning algorithms, including decision trees, random forests, logistic regression, support vector machines, and neural networks, were employed in this study. The ensuing analysis revealed testing accuracies of 62.17%, 61.92%, 54.53%, 55.72%, and 62.30% for the respective models. This state-of-the-art model not only enhances our understanding of cancer dynamics but also equips researchers and policymakers with the capability of making meticulous projections concerning forthcoming cancer incidence and mortality rates. Considering sustainability, the application of this advanced machine learning framework emphasizes the importance of judiciously utilizing extensive and intricate databases. By doing so, it facilitates a more sustainable approach to healthcare planning, allowing for informed decision-making that takes into account the long-term ecological and societal impacts of cancer-related policies. This integrative perspective underscores the broader commitment to sustainable practices in both health research and public policy formulation.
Collapse
Affiliation(s)
- Yuanzhao Ding
- School of Geography and the Environment, University of Oxford, South Parks Road, Oxford OX1 3QY, UK
| |
Collapse
|
3
|
Pattilachan TM, Christodoulou M, Ross S. Diagnosis to dissection: AI's role in early detection and surgical intervention for gastric cancer. J Robot Surg 2024; 18:259. [PMID: 38900376 DOI: 10.1007/s11701-024-02005-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 06/01/2024] [Indexed: 06/21/2024]
Abstract
Gastric cancer remains a formidable health challenge worldwide; early detection and effective surgical intervention are critical for improving patient outcomes. This comprehensive review explores the evolving landscape of gastric cancer management, emphasizing the significant contributions of artificial intelligence (AI) in revolutionizing both diagnostic and therapeutic approaches. Despite advancements in the medical field, the subtle nature of early gastric cancer symptoms often leads to late-stage diagnoses, where survival rates are notably decreased. Historically, the treatment of gastric cancer has transitioned from palliative care to surgical resection, evolving further with the introduction of minimally invasive surgical (MIS) techniques. In the current era, AI has emerged as a transformative force, enhancing the precision of early gastric cancer detection through sophisticated image analysis, and supporting surgical decision-making with predictive modeling and real-time preop-, intraop-, and postoperative guidance. However, the deployment of AI in healthcare raises significant ethical, legal, and practical challenges, including the necessity for ongoing professional education and the development of standardized protocols to ensure patient safety and the effective use of AI technologies. Future directions point toward a synergistic integration of AI with clinical best practices, promising a new era of personalized, efficient, and safer gastric cancer management.
Collapse
Affiliation(s)
- Tara Menon Pattilachan
- AdventHealth Tampa, Surgery College of Medicine, Digestive Health Institute, University of Central Florida (UCF), 3000 Medical Park Drive, Suite #500, Tampa, FL, 33613, USA
| | - Maria Christodoulou
- AdventHealth Tampa, Surgery College of Medicine, Digestive Health Institute, University of Central Florida (UCF), 3000 Medical Park Drive, Suite #500, Tampa, FL, 33613, USA
| | - Sharona Ross
- AdventHealth Tampa, Surgery College of Medicine, Digestive Health Institute, University of Central Florida (UCF), 3000 Medical Park Drive, Suite #500, Tampa, FL, 33613, USA.
| |
Collapse
|
4
|
Yazici H, Ugurlu O, Aygul Y, Ugur MA, Sen YK, Yildirim M. Predicting severity of acute appendicitis with machine learning methods: a simple and promising approach for clinicians. BMC Emerg Med 2024; 24:101. [PMID: 38886641 PMCID: PMC11184860 DOI: 10.1186/s12873-024-01023-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUNDS Acute Appendicitis (AA) is one of the most common surgical emergencies worldwide. This study aims to investigate the predictive performances of 6 different Machine Learning (ML) algorithms for simple and complicated AA. METHODS Data regarding operated AA patients between 2012 and 2022 were analyzed retrospectively. Based on operative findings, patients were evaluated under two groups: perforated AA and none-perforated AA. The features that showed statistical significance (p < 0.05) in both univariate and multivariate analysis were included in the prediction models as input features. Five different error metrics and the area under the receiver operating characteristic curve (AUC) were used for model comparison. RESULTS A total number of 1132 patients were included in the study. Patients were divided into training (932 samples), testing (100 samples), and validation (100 samples) sets. Age, gender, neutrophil count, lymphocyte count, Neutrophil to Lymphocyte ratio, total bilirubin, C-Reactive Protein (CRP), Appendix Diameter, and PeriAppendicular Liquid Collection (PALC) were significantly different between the two groups. In the multivariate analysis, age, CRP, and PALC continued to show a significant difference in the perforated AA group. According to univariate and multivariate analysis, two data sets were used in the prediction model. K-Nearest Neighbors and Logistic Regression algorithms achieved the best prediction performance in the validation group with an accuracy of 96%. CONCLUSION The results showed that using only three input features (age, CRP, and PALC), the severity of AA can be predicted with high accuracy. The developed prediction model can be useful in clinical practice.
Collapse
Affiliation(s)
- Hilmi Yazici
- General Surgery Department, Marmara University Pendik Research and Training Hospital, Istanbul, Turkey.
| | - Onur Ugurlu
- Faculty of Engineering and Architecture, Izmir Bakircay University, Izmir, Turkey
| | - Yesim Aygul
- Department of Mathematics, Ege University, Izmir, Turkey
| | - Mehmet Alperen Ugur
- General Surgery Department, University of Health Sciences Izmir Bozyaka Research and Training Hospital, Izmir, Turkey
| | - Yigit Kaan Sen
- General Surgery Department, University of Health Sciences Izmir Bozyaka Research and Training Hospital, Izmir, Turkey
| | - Mehmet Yildirim
- General Surgery Department, University of Health Sciences Izmir Bozyaka Research and Training Hospital, Izmir, Turkey
| |
Collapse
|
5
|
Lin J, Zhu F, Dong X, Li R, Liu J, Xia J. Enhancing gastric cancer early detection: A multi-verse optimized feature selection model with crossover-information feedback. Comput Biol Med 2024; 175:108535. [PMID: 38714049 DOI: 10.1016/j.compbiomed.2024.108535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/05/2024] [Accepted: 04/28/2024] [Indexed: 05/09/2024]
Abstract
Gastric cancer (GC), an acknowledged malignant neoplasm, threatens life and digestive system functionality if not detected and addressed promptly in its nascent stages. The indispensability of early detection for GC to augment treatment efficacy and survival prospects forms the crux of this investigation. Our study introduces an innovative wrapper-based feature selection methodology, referred to as bCIFMVO-FKNN-FS, which integrates a crossover-information feedback multi-verse optimizer (CIFMVO) with the fuzzy k-nearest neighbors (FKNN) classifier. The primary goal of this initiative is to develop an advanced screening model designed to accelerate the identification of patients with early-stage GC. Initially, the capability of CIFMVO is validated through its application to the IEEE CEC benchmark functions, during which its optimization efficiency is measured against eleven cutting-edge algorithms across various dimensionalities-10, 30, 50, and 100. Subsequent application of the bCIFMVO-FKNN-FS model to the clinical data of 1632 individuals from Wenzhou Central Hospital-diagnosed with either early-stage GC or chronic gastritis-demonstrates the model's formidable predictive accuracy (83.395%) and sensitivity (87.538%). Concurrently, this investigation delineates age, gender, serum gastrin-17, serum pepsinogen I, and the serum pepsinogen I to serum pepsinogen II ratio as parameters significantly associated with early-stage GC. These insights not only validate the efficacy of our proposed model in the early screening of GC but also contribute substantively to the corpus of knowledge facilitating early diagnosis.
Collapse
Affiliation(s)
- Jiejun Lin
- Department of Gastroenterology, The Dingli Clinical College of Wenzhou Medical University (Wenzhou Central Hospital), Wenzhou, Zhejiang, 325000, China.
| | - Fangchao Zhu
- Department of Gastroenterology, The Dingli Clinical College of Wenzhou Medical University (Wenzhou Central Hospital), Wenzhou, Zhejiang, 325000, China.
| | - Xiaoyu Dong
- Department of Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325000, China.
| | - Rizeng Li
- Department of General Surgery, The Dingli Clinical College of Wenzhou Medical University (Wenzhou Central Hospital), Wenzhou, Zhejiang, 325000, China.
| | - Jisheng Liu
- Department of General Surgery, The Dingli Clinical College of Wenzhou Medical University (Wenzhou Central Hospital), Wenzhou, Zhejiang, 325000, China.
| | - Jianfu Xia
- Department of General Surgery, The Dingli Clinical College of Wenzhou Medical University (Wenzhou Central Hospital), Wenzhou, Zhejiang, 325000, China.
| |
Collapse
|
6
|
Wang DQ, Xu WH, Cheng XW, Hua L, Ge XS, Liu L, Gao X. Interpretable machine learning for predicting the response duration to Sintilimab plus chemotherapy in patients with advanced gastric or gastroesophageal junction cancer. Front Immunol 2024; 15:1407632. [PMID: 38840913 PMCID: PMC11150638 DOI: 10.3389/fimmu.2024.1407632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/08/2024] [Indexed: 06/07/2024] Open
Abstract
Background Sintilimab plus chemotherapy has proven effective as a combination immunotherapy for patients with advanced gastric and gastroesophageal junction adenocarcinoma (GC/GEJC). A multi-center study conducted in China revealed a median progression-free survival (PFS) of 7.1 months. However, the prediction of response duration to this immunotherapy has not been thoroughly investigated. Additionally, the potential of baseline laboratory features in predicting PFS remains largely unexplored. Therefore, we developed an interpretable machine learning (ML) framework, iPFS-SC, aimed at predicting PFS using baseline (pre-treatment) laboratory features and providing interpretations of the predictions. Materials and methods A cohort of 146 patients with advanced GC/GEJC, along with their baseline laboratory features, was included in the iPFS-SC framework. Through a forward feature selection process, predictive baseline features were identified, and four ML algorithms were developed to categorize PFS duration based on a threshold of 7.1 months. Furthermore, we employed explainable artificial intelligence (XAI) methodologies to elucidate the relationship between features and model predictions. Results The findings demonstrated that LightGBM achieved an accuracy of 0.70 in predicting PFS for advanced GC/GEJC patients. Furthermore, an F1-score of 0.77 was attained for identifying patients with PFS durations shorter than 7.1 months. Through the feature selection process, we identified 11 predictive features. Additionally, our framework facilitated the discovery of relationships between laboratory features and PFS. Conclusion A ML-based framework was developed to predict Sintilimab plus chemotherapy response duration with high accuracy. The suggested predictive features are easily accessible through routine laboratory tests. Furthermore, XAI techniques offer comprehensive explanations, both at the global and individual level, regarding PFS predictions. This framework enables patients to better understand their treatment plans, while clinicians can customize therapeutic approaches based on the explanations provided by the model.
Collapse
Affiliation(s)
- Dan-qi Wang
- Big Data Center, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Wen-huan Xu
- Department of Oncology, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Xiao-wei Cheng
- Department of Oncology, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Lei Hua
- Big Data Center, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Xiao-song Ge
- Department of Oncology, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Li Liu
- Big Data Center, Affiliated Hospital of Jiangnan University, Wuxi, China
| | - Xiang Gao
- Department of Oncology, Affiliated Hospital of Jiangnan University, Wuxi, China
| |
Collapse
|
7
|
Li M, Gao N, Wang SL, Guo YF, Liu Z. Hotspots and trends of risk factors in gastric cancer: A visualization and bibliometric analysis. World J Gastrointest Oncol 2024; 16:2200-2218. [PMID: 38764808 PMCID: PMC11099465 DOI: 10.4251/wjgo.v16.i5.2200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/08/2024] [Accepted: 03/11/2024] [Indexed: 05/09/2024] Open
Abstract
BACKGROUND The lack of specific symptoms of gastric cancer (GC) causes great challenges in its early diagnosis. Thus it is essential to identify the risk factors for early diagnosis and treatment of GC and to improve the survival rates. AIM To assist physicians in identifying changes in the output of publications and research hotspots related to risk factors for GC, constructing a list of key risk factors, and providing a reference for early identification of patients at high risk for GC. METHODS Research articles on risk factors for GC were searched in the Web of Science core collection, and relevant information was extracted after screening. The literature was analyzed using Microsoft Excel 2019, CiteSpace V, and VOSviewer 1.6.18. RESULTS A total of 2514 papers from 72 countries and 2507 research institutions were retrieved. China (n = 1061), National Cancer Center (n = 138), and Shoichiro Tsugane (n = 36) were the most productive country, institution, or author, respectively. The research hotspots in the study of risk factors for GC are summarized in four areas, namely: Helicobacter pylori (H. pylori) infection, single nucleotide polymorphism, bio-diagnostic markers, and GC risk prediction models. CONCLUSION In this study, we found that H. pylori infection is the most significant risk factor for GC; single-nucleotide polymorphism (SNP) is the most dominant genetic factor for GC; bio-diagnostic markers are the most promising diagnostic modality for GC. GC risk prediction models are the latest current research hotspot. We conclude that the most important risk factors for the development of GC are H. pylori infection, SNP, smoking, diet, and alcohol.
Collapse
Affiliation(s)
- Meng Li
- Department of Gastroenterology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Ning Gao
- Department of Acupuncture and Moxibustion, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Shao-Li Wang
- Department of Gastroenterology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Yu-Feng Guo
- Department of Acupuncture and Moxibustion, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Zhen Liu
- Department of Gastroenterology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| |
Collapse
|
8
|
Chen Z, Wang Y, Ying MTC, Su Z. Interpretable machine learning model integrating clinical and elastosonographic features to detect renal fibrosis in Asian patients with chronic kidney disease. J Nephrol 2024; 37:1027-1039. [PMID: 38315278 PMCID: PMC11239734 DOI: 10.1007/s40620-023-01878-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 12/26/2023] [Indexed: 02/07/2024]
Abstract
BACKGROUND Non-invasive renal fibrosis assessment is critical for tailoring personalized decision-making and managing follow-up in patients with chronic kidney disease (CKD). We aimed to exploit machine learning algorithms using clinical and elastosonographic features to distinguish moderate-severe fibrosis from mild fibrosis among CKD patients. METHODS A total of 162 patients with CKD who underwent shear wave elastography examinations and renal biopsies at our institution were prospectively enrolled. Four classifiers using machine learning algorithms, including eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), and K-Nearest Neighbor (KNN), which integrated elastosonographic features and clinical characteristics, were established to differentiate moderate-severe renal fibrosis from mild forms. The area under the receiver operating characteristic curve (AUC) and average precision were employed to compare the performance of constructed models, and the SHapley Additive exPlanations (SHAP) strategy was used to visualize and interpret the model output. RESULTS The XGBoost model outperformed the other developed machine learning models, demonstrating optimal diagnostic performance in both the primary (AUC = 0.97, 95% confidence level (CI) 0.94-0.99; average precision = 0.97, 95% CI 0.97-0.98) and five-fold cross-validation (AUC = 0.85, 95% CI 0.73-0.98; average precision = 0.90, 95% CI 0.86-0.93) datasets. The SHAP approach provided visual interpretation for XGBoost, highlighting the features' impact on the diagnostic process, wherein the estimated glomerular filtration rate provided the largest contribution to the model output, followed by the elastic modulus, then renal length, renal resistive index, and hypertension. CONCLUSION This study proposed an XGBoost model for distinguishing moderate-severe renal fibrosis from mild forms in CKD patients, which could be used to assist clinicians in decision-making and follow-up strategies. Moreover, the SHAP algorithm makes it feasible to visualize and interpret the feature processing and diagnostic processes of the model output.
Collapse
Affiliation(s)
- Ziman Chen
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
| | - Yingli Wang
- Ultrasound Department, EDAN Instruments, Inc., Shenzhen, China
| | - Michael Tin Cheung Ying
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
| | - Zhongzhen Su
- Department of Ultrasound, Fifth Affiliated Hospital of Sun Yat-Sen University, Zhuhai, China
| |
Collapse
|
9
|
Islam W, Abdoli N, Alam TE, Jones M, Mutembei BM, Yan F, Tang Q. A Neoteric Feature Extraction Technique to Predict the Survival of Gastric Cancer Patients. Diagnostics (Basel) 2024; 14:954. [PMID: 38732368 PMCID: PMC11083029 DOI: 10.3390/diagnostics14090954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/26/2024] [Accepted: 04/28/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND At the time of cancer diagnosis, it is crucial to accurately classify malignant gastric tumors and the possibility that patients will survive. OBJECTIVE This study aims to investigate the feasibility of identifying and applying a new feature extraction technique to predict the survival of gastric cancer patients. METHODS A retrospective dataset including the computed tomography (CT) images of 135 patients was assembled. Among them, 68 patients survived longer than three years. Several sets of radiomics features were extracted and were incorporated into a machine learning model, and their classification performance was characterized. To improve the classification performance, we further extracted another 27 texture and roughness parameters with 2484 superficial and spatial features to propose a new feature pool. This new feature set was added into the machine learning model and its performance was analyzed. To determine the best model for our experiment, Random Forest (RF) classifier, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) (four of the most popular machine learning models) were utilized. The models were trained and tested using the five-fold cross-validation method. RESULTS Using the area under ROC curve (AUC) as an evaluation index, the model that was generated using the new feature pool yields AUC = 0.98 ± 0.01, which was significantly higher than the models created using the traditional radiomics feature set (p < 0.04). RF classifier performed better than the other machine learning models. CONCLUSIONS This study demonstrated that although radiomics features produced good classification performance, creating new feature sets significantly improved the model performance.
Collapse
Affiliation(s)
- Warid Islam
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (W.I.); (N.A.)
| | - Neman Abdoli
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (W.I.); (N.A.)
| | - Tasfiq E. Alam
- School of Industrial and Systems Engineering, University of Oklahoma, Norman, OK 73019, USA;
| | - Meredith Jones
- Stephenson School of Biomedical Engineering, University of Oklahoma, Norman, OK 73019, USA; (M.J.); (B.M.M.); (F.Y.)
| | - Bornface M. Mutembei
- Stephenson School of Biomedical Engineering, University of Oklahoma, Norman, OK 73019, USA; (M.J.); (B.M.M.); (F.Y.)
| | - Feng Yan
- Stephenson School of Biomedical Engineering, University of Oklahoma, Norman, OK 73019, USA; (M.J.); (B.M.M.); (F.Y.)
| | - Qinggong Tang
- Stephenson School of Biomedical Engineering, University of Oklahoma, Norman, OK 73019, USA; (M.J.); (B.M.M.); (F.Y.)
| |
Collapse
|
10
|
Huang RJ, Huang ES, Mudiganti S, Chen T, Martinez MC, Ramrakhiani S, Han SS, Hwang JH, Palaniappan LP, Liang SY. Risk of Gastric Adenocarcinoma in a Multiethnic Population Undergoing Routine Care: An Electronic Health Records Cohort Study. Cancer Epidemiol Biomarkers Prev 2024; 33:547-556. [PMID: 38231023 PMCID: PMC10990787 DOI: 10.1158/1055-9965.epi-23-1200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/05/2023] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND Gastric adenocarcinoma (GAC) is often diagnosed at advanced stages and portends a poor prognosis. We hypothesized that electronic health records (EHR) could be leveraged to identify individuals at highest risk for GAC from the population seeking routine care. METHODS This was a retrospective cohort study, with endpoint of GAC incidence as ascertained through linkage to an institutional tumor registry. We utilized 2010 to 2020 data from the Palo Alto Medical Foundation, a large multispecialty practice serving Northern California. The analytic cohort comprised individuals ages 40-75 receiving regular ambulatory care. Variables collected included demographic, medical, pharmaceutical, social, and familial data. Electronic phenotyping was based on rule-based methods. RESULTS The cohort comprised 316,044 individuals and approximately 2 million person-years (p-y) of observation. 157 incident GACs occurred (incidence 7.9 per 100,000 p-y), of which 102 were non-cardia GACs (incidence 5.1 per 100,000 p-y). In multivariable analysis, male sex [HR: 2.2, 95% confidence interval (CI): 1.6-3.1], older age, Asian race (HR: 2.5, 95% CI: 1.7-3.7), Hispanic ethnicity (HR: 1.9, 95% CI: 1.1-3.3), atrophic gastritis (HR: 4.6, 95% CI: 2.2-9.3), and anemia (HR: 1.9, 95% CI: 1.3-2.6) were associated with GAC risk; use of NSAID was inversely associated (HR: 0.3, 95% CI: 0.2-0.5). Older age, Asian race, Hispanic ethnicity, atrophic gastritis, and anemia were associated with non-cardia GAC. CONCLUSIONS Routine EHR data can stratify the general population for GAC risk. IMPACT Such methods may help triage populations for targeted screening efforts, such as upper endoscopy.
Collapse
Affiliation(s)
- Robert J Huang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, California
| | - Edward S Huang
- Department of Gastroenterology, Palo Alto Medical Foundation, San Jose, California
| | - Satish Mudiganti
- Palo Alto Medical Foundation Research Institute, Palo Alto Medical Foundation, Palo Alto, California
| | - Tony Chen
- Palo Alto Medical Foundation Research Institute, Palo Alto Medical Foundation, Palo Alto, California
| | - Meghan C Martinez
- Palo Alto Medical Foundation Research Institute, Palo Alto Medical Foundation, Palo Alto, California
| | - Sanjay Ramrakhiani
- Department of Gastroenterology, Palo Alto Medical Foundation, San Jose, California
| | - Summer S Han
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California
| | - Joo Ha Hwang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, California
| | - Latha P Palaniappan
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California
| | - Su-Ying Liang
- Palo Alto Medical Foundation Research Institute, Palo Alto Medical Foundation, Palo Alto, California
| |
Collapse
|
11
|
Srivastava S, Jain P. Computational Approaches: A New Frontier in Cancer Research. Comb Chem High Throughput Screen 2024; 27:1861-1876. [PMID: 38031782 DOI: 10.2174/0113862073265604231106112203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/08/2023] [Accepted: 09/21/2023] [Indexed: 12/01/2023]
Abstract
Cancer is a broad category of disease that can start in virtually any organ or tissue of the body when aberrant cells assault surrounding organs and proliferate uncontrollably. According to the most recent statistics, cancer will be the cause of 10 million deaths worldwide in 2020, accounting for one death out of every six worldwide. The typical approach used in anti-cancer research is highly time-consuming and expensive, and the outcomes are not particularly encouraging. Computational techniques have been employed in anti-cancer research to advance our understanding. Recent years have seen a significant and exceptional impact on anticancer research due to the rapid development of computational tools for novel drug discovery, drug design, genetic studies, genome characterization, cancer imaging and detection, radiotherapy, cancer metabolomics, and novel therapeutic approaches. In this paper, we examined the various subfields of contemporary computational techniques, including molecular docking, artificial intelligence, bioinformatics, virtual screening, and QSAR, and their applications in the study of cancer.
Collapse
Affiliation(s)
- Shubham Srivastava
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| | - Pushpendra Jain
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| |
Collapse
|
12
|
Ke X, Cai X, Bian B, Shen Y, Zhou Y, Liu W, Wang X, Shen L, Yang J. Predicting early gastric cancer risk using machine learning: A population-based retrospective study. Digit Health 2024; 10:20552076241240905. [PMID: 38559579 PMCID: PMC10979538 DOI: 10.1177/20552076241240905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
Background Early detection and treatment are crucial for reducing gastrointestinal tumour-related mortality. The diagnostic efficiency of the most commonly used diagnostic markers for gastric cancer (GC) is not very high. A single laboratory test cannot meet the requirements of early screening, and machine learning methods are needed to aid the early diagnosis of GC by combining multiple indicators. Methods Based on the XGBoost algorithm, a new model was developed to distinguish between GC and precancerous lesions in newly admitted patients between 2018 and 2023 using multiple laboratory tests. We evaluated the ability of the prediction score derived from this model to predict early GC. In addition, we investigated the efficacy of the model in correctly screening for GC given negative protein tumour marker results. Results The XHGC20 model constructed using the XGBoost algorithm could distinguish GC from precancerous disease well (area under the receiver operating characteristic curve [AUC] = 0.901), with a sensitivity, specificity and cut-off value of 0.830, 0.806 and 0.265, respectively. The prediction score was very effective in the diagnosis of early GC. When the cut-off value was 0.27, and the AUC was 0.888, the sensitivity and specificity were 0.797 and 0.807, respectively. The model was also effective at evaluating GC given negative conventional markers (AUC = 0.970), with the sensitivity and specificity of 0.941 and 0.906, respectively, which helped to reduce the rate of missed diagnoses. Conclusions The XHGC20 model established by the XGBoost algorithm integrates information from 20 clinical laboratory tests and can aid in the early screening of GC, providing a useful new method for auxiliary laboratory diagnosis.
Collapse
Affiliation(s)
- Xing Ke
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Faculty of Medical Laboratory Science, College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Institute of Artificial Intelligence Medicine, Shanghai Academy of Experimental Medicine, Shanghai, China
- Department of Pathology, Ruijin Hospital and College of Basic Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xinyu Cai
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Bingxian Bian
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuanheng Shen
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yunlan Zhou
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Wei Liu
- Department of Research Collaboration, R&D Center, Beijing Deepwise & League of PHD Technology Co., Ltd, Beijing, China
| | - Xu Wang
- Department of Pathology, Ruijin Hospital and College of Basic Medical Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lisong Shen
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Faculty of Medical Laboratory Science, College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Institute of Artificial Intelligence Medicine, Shanghai Academy of Experimental Medicine, Shanghai, China
| | - Junyao Yang
- Department of Clinical Laboratory, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Faculty of Medical Laboratory Science, College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Institute of Artificial Intelligence Medicine, Shanghai Academy of Experimental Medicine, Shanghai, China
| |
Collapse
|
13
|
Kumar V, Gaddam M, Moustafa A, Iqbal R, Gala D, Shah M, Gayam VR, Bandaru P, Reddy M, Gadaputi V. The Utility of Artificial Intelligence in the Diagnosis and Management of Pancreatic Cancer. Cureus 2023; 15:e49560. [PMID: 38156176 PMCID: PMC10754023 DOI: 10.7759/cureus.49560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/28/2023] [Indexed: 12/30/2023] Open
Abstract
Artificial intelligence (AI) has made significant advancements in the medical domain in recent years. AI, an expansive field comprising Machine Learning (ML) and, within it, Deep Learning (DL), seeks to emulate the intricate operations of the human brain. It examines vast amounts of data and plays a crucial role in decision-making, overcoming limitations related to human evaluation. DL utilizes complex algorithms to analyze data. ML and DL are subsets of AI that utilize hard statistical techniques that help machines consistently improve at tasks with experience. Pancreatic cancer is more common in developed countries and is one of the leading causes of cancer-related mortality worldwide. Managing pancreatic cancer remains a challenge despite significant advancements in diagnosis and treatment. AI has secured an almost ubiquitous presence in the field of oncological workup and management, especially in gastroenterology malignancies. AI is particularly useful for various investigations of pancreatic carcinoma because it has specific radiological features that enable diagnostic procedures without the requirement of a histological study. However, interpreting and evaluating resulting images is not always simple since images vary as the disease progresses. Secondly, a number of factors may impact prognosis and response to the treatment process. Currently, AI models have been created for diagnosing, grading, staging, and predicting prognosis and treatment response. This review presents the most up-to-date knowledge on the use of AI in the diagnosis and treatment of pancreatic carcinoma.
Collapse
Affiliation(s)
- Vikash Kumar
- Internal Medicine, The Brooklyn Hospital Center, Brooklyn, USA
| | | | - Amr Moustafa
- Internal Medicine, The Brooklyn Hospital Center, Brooklyn, USA
| | - Rabia Iqbal
- Internal Medicine, The Brooklyn Hospital Center, Brooklyn, USA
| | - Dhir Gala
- Internal Medicine, American University of the Caribbean School of Medicine, Sint Maarten, SXM
| | - Mili Shah
- Internal Medicine, American University of the Caribbean School of Medicine, Sint Maarten, SXM
| | - Vijay Reddy Gayam
- Gastroenterology and Hepatology, The Brooklyn Hospital Center, Brooklyn, USA
| | - Praneeth Bandaru
- Gastroenterology and Hepatology, The Brooklyn Hospital Center, Brooklyn, USA
| | - Madhavi Reddy
- Gastroenterology and Hepatology, The Brooklyn Hospital Center, Brooklyn, USA
| | - Vinaya Gadaputi
- Gastroenterology and Hepatology, Blanchard Valley Health System, Findlay, USA
| |
Collapse
|
14
|
Wong MCS, Leung EY, Yau STY, Chan SC, Xie S, Xu W, Huang J. Prediction algorithm for gastric cancer in a general population: A validation study. Cancer Med 2023; 12:20544-20553. [PMID: 37855240 PMCID: PMC10660462 DOI: 10.1002/cam4.6629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/04/2023] [Accepted: 09/30/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Worldwide, gastric cancer is a leading cause of cancer incidence and mortality. This study aims to devise and validate a scoring system based on readily available clinical data to predict the risk of gastric cancer in a large Chinese population. METHODS We included a total of 6,209,697 subjects aged between 18 and 70 years who have received upper digestive endoscopy in Hong Kong from 1997 to 2018. A binary logistic regression model was constructed to examine the predictors of gastric cancer in a derivation cohort (n = 4,347,224), followed by model evaluation in a validation cohort (n = 1,862,473). The algorithm's discriminatory ability was evaluated as the area under the curve (AUC) of the mathematically constructed receiver operating characteristic (ROC) curve. RESULTS Age, male gender, history of Helicobacter pylori infection, use of proton pump inhibitors, non-use of aspirin, non-steroidal anti-inflammatory drugs (NSAIDs), and statins were significantly associated with gastric cancer. A scoring of ≤8 was designated as "average risk (AR)". Scores at 9 or above were assigned as "high risk (HR)". The prevalence of gastric cancer was 1.81% and 0.096%, respectively, for the HR and LR groups. The AUC for the risk score in the validation cohort was 0.834, implying an excellent fit of the model. CONCLUSIONS This study has validated a simple, accurate, and easy-to-use scoring algorithm which has a high discriminatory capability to predict gastric cancer. The score could be adopted to risk stratify subjects suspected as having gastric cancer, thus allowing prioritized upper digestive tract investigation.
Collapse
Affiliation(s)
- Martin C. S. Wong
- The Jockey Club School of Public Health and Primary Care, Faculty of MedicineChinese University of Hong KongHong KongSARChina
- Centre for Health Education and Health Promotion, Faculty of MedicineChinese University of Hong KongHong KongSARChina
- School of Public HealthThe Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijingChina
- School of Public HealthThe Peking UniversityBeijingChina
- School of Public HealthFudan UniversityShanghaiChina
| | - Eman Yee‐man Leung
- The Jockey Club School of Public Health and Primary Care, Faculty of MedicineChinese University of Hong KongHong KongSARChina
| | - Sarah T. Y. Yau
- The Jockey Club School of Public Health and Primary Care, Faculty of MedicineChinese University of Hong KongHong KongSARChina
| | - Sze Chai Chan
- The Jockey Club School of Public Health and Primary Care, Faculty of MedicineChinese University of Hong KongHong KongSARChina
| | - Shaohua Xie
- Department of Molecular medicine and SurgeryKarolinska InstitutetSweden
| | - Wanghong Xu
- School of Public HealthFudan UniversityShanghaiChina
| | - Junjie Huang
- The Jockey Club School of Public Health and Primary Care, Faculty of MedicineChinese University of Hong KongHong KongSARChina
- Centre for Health Education and Health Promotion, Faculty of MedicineChinese University of Hong KongHong KongSARChina
| |
Collapse
|
15
|
Ma X, Pierce E, Anand H, Aviles N, Kunk P, Alemazkoor N. Early prediction of response to palliative chemotherapy in patients with stage-IV gastric and esophageal cancer. BMC Cancer 2023; 23:910. [PMID: 37759332 PMCID: PMC10536729 DOI: 10.1186/s12885-023-11422-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 09/20/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND The goal of therapy for many patients with advanced stage malignancies, including those with metastatic gastric and esophageal cancers, is to extend overall survival while also maintaining quality of life. After weighing the risks and benefits of treatment with palliative chemotherapy (PC) with non-curative intent, many patients decide to pursue treatment. It is known that a subset of patients who are treated with PC experience significant side effects without clinically significant survival benefits from PC. METHODS We use data from 150 patients with stage-IV gastric and esophageal cancers to train machine learning models that predict whether a patient with stage-IV gastric or esophageal cancers would benefit from PC, in terms of increased survival duration, at very early stages of the treatment. RESULTS Our findings show that machine learning can predict with high accuracy whether a patient will benefit from PC at the time of diagnosis. More accurate predictions can be obtained after only two cycles of PC (i.e., about 4 weeks after diagnosis). The results from this study are promising with regard to potential improvements in quality of life for patients near the end of life and a potential overall survival benefit by optimizing systemic therapy earlier in the treatment course of patients.
Collapse
Affiliation(s)
- Xiaoyuan Ma
- Department of Statistics, University of Virginia, Charlottesville, USA
| | - Eric Pierce
- School of Medicine, University of Virginia, Charlottesville, USA
| | - Harsh Anand
- System and Information Engineering, University of Virginia, Charlottesville, USA
| | - Natalie Aviles
- Department of Sociology, University of Virginia, Charlottesville, USA
| | - Paul Kunk
- School of Medicine, University of Virginia, Charlottesville, USA
| | - Negin Alemazkoor
- System and Information Engineering, University of Virginia, Charlottesville, USA.
| |
Collapse
|
16
|
Xing L, Zhang X, Guo Y, Bai D, Xu H. XGBoost-aided prediction of lip prominence based on hard-tissue measurements and demographic characteristics in an Asian population. Am J Orthod Dentofacial Orthop 2023; 164:357-367. [PMID: 36959014 DOI: 10.1016/j.ajodo.2023.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 01/01/2023] [Accepted: 01/01/2023] [Indexed: 03/25/2023]
Abstract
INTRODUCTION Prediction of lip prominence based on hard-tissue measurements could be helpful in orthodontic treatment planning and has been challenging and formidable thus far. METHODS A machine learning-based cross-sectional study was conducted on 1549 patients. Hard-tissue measurements and demographic information were used as the input features. Seven popular machine learning algorithms were applied to the datasets to predict upper and lower lip prominence. The algorithm that performed the best was selected for the construction of the prediction model. Evaluation of feature importance was conducted using 3 classical methods. RESULTS Among the 7 algorithms, the XGBoost model performed the best in the prediction of the distances between labrale superius or labrale inferius to the esthetics plane (UL-EP and LL-EP distances), with root mean square error values of 1.25, 1.49 and r2 values of 0.755 and 0.683, respectively. Among the 14 input features, the L1-NB distance contributed the most to the prominences of the upper and lower lips. A lip prominence predictor was developed to facilitate clinical application by deploying the prediction model into a downloadable tool kit. CONCLUSIONS The XGBoost model performed well with high accuracy and practicability in predicting upper and lower lip prominence. The artificial intelligence-aided predictor could serve as a reference for orthodontic treatment planning.
Collapse
Affiliation(s)
- Lu Xing
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Sichuan University, Chengdu, China
| | - Xiaoqi Zhang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Sichuan University, Chengdu, China
| | - Yongwen Guo
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Sichuan University, Chengdu, China; Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Ding Bai
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Sichuan University, Chengdu, China; Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Hui Xu
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases, Sichuan University, Chengdu, China; Department of Orthodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| |
Collapse
|
17
|
Nguyen NLT, Dang NDT, Vu QVAN, Dang AK, Ta TVAN. A Model for Gastric Cancer Risk Prediction Based on MUC1 Polymorphisms and Health-risk Behaviors in a Vietnamese Population. In Vivo 2023; 37:2347-2356. [PMID: 37652501 PMCID: PMC10500499 DOI: 10.21873/invivo.13339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 07/02/2023] [Accepted: 07/05/2023] [Indexed: 09/02/2023]
Abstract
BACKGROUND/AIM Although the expression of mucin 1(MUC1) and prostate stem cell antigen (PSCA) genes is correlated with gastric cancer development and progression, the utility of these two genes as biomarkers of gastric cancer prognosis still needs to be confirmed in clinical practice. This study aimed to develop a model predictive of gastric cancer that integrates several significant single nucleotide polymorphisms (SNPs) of MUC1 and PSCA genes, and some health-risk behavior factors in a Vietnamese population. PATIENTS AND METHODS A total of 302 patients with primary gastric carcinoma and 304 healthy persons were included in a case-control study. The generalized linear model was used with the profile of age, sex, history of smoking and using alcohol, personal and family medical history of stomach diseases, and the SNPs of MUC1 and PSCA. The prognostic value of the model was assessed by the area under a receiver operating characteristic curve (AUC) and Akaike Information Criterion (AIC) values. RESULTS In male participants, the final model, consisting of age, sex, history of smoking and using alcohol, personal and family medical history of stomach diseases and SNP MUC1 rs4072037, provided acceptable discrimination, with an AUC of 0.6374 and the lowest AIC value (539.53). In female participants, the predictive model including age, sex, history of smoking and using alcohol, personal and family medical history of stomach diseases, SNPs MUC1 rs4072037 and rs2070803 had an AUC of 0.6937 and AIC of 266.80. The calibration plots of the male model approximately fitted the ideal calibration line. CONCLUSION The predictive model based on age, sex, medical history, and genetic and health-risk behavior factors has a high potential in determining gastric cancer. Further studies that elucidate other genetic variants should be carried out to define high-risk gastric cancer groups and propose appropriate personalized prevention.
Collapse
Affiliation(s)
| | - Ngoc Dung Thi Dang
- Hanoi Medical University Hospital, Hanoi Medical University, Hanoi, Vietnam;
| | - Quy VAN Vu
- Hanoi Medical University Hospital, Hanoi Medical University, Hanoi, Vietnam
| | - Anh Kim Dang
- School of Preventive Medicine and Public Health, Hanoi Medical University, Hanoi, Vietnam
| | - Thanh-VAN Ta
- Hanoi Medical University Hospital, Hanoi Medical University, Hanoi, Vietnam;
| |
Collapse
|
18
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023; 157:120-133. [PMID: 36935090 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
OBJECTIVES In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; Meta-Research Centre, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
19
|
He S, Sun D, Li H, Cao M, Yu X, Lei L, Peng J, Li J, Li N, Chen W. Real-World Practice of Gastric Cancer Prevention and Screening Calls for Practical Prediction Models. Clin Transl Gastroenterol 2023; 14:e00546. [PMID: 36413795 PMCID: PMC9944379 DOI: 10.14309/ctg.0000000000000546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 10/11/2022] [Indexed: 11/23/2022] Open
Abstract
INTRODUCTION Some gastric cancer prediction models have been published. Still, the value of these models for application in real-world practice remains unclear. We aim to summarize and appraise modeling studies for gastric cancer risk prediction and identify potential barriers to real-world use. METHODS This systematic review included studies that developed or validated gastric cancer prediction models in the general population. RESULTS A total of 4,223 studies were screened. We included 18 development studies for diagnostic models, 10 for prognostic models, and 1 external validation study. Diagnostic models commonly included biomarkers, such as Helicobacter pylori infection indicator, pepsinogen, hormone, and microRNA. Age, sex, smoking, body mass index, and family history of gastric cancer were frequently used in prognostic models. Most of the models were not validated. Only 25% of models evaluated the calibration. All studies had a high risk of bias, but over half had acceptable applicability. Besides, most studies failed to clearly report the application scenarios of prediction models. DISCUSSION Most gastric cancer prediction models showed common shortcomings in methods, validation, and reports. Model developers should further minimize the risk of bias, improve models' applicability, and report targeting application scenarios to promote real-world use.
Collapse
Affiliation(s)
- Siyi He
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Dianqin Sun
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - He Li
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Maomao Cao
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Xinyang Yu
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Lin Lei
- Department of Cancer Prevention and Control, Shenzhen Center for Chronic Disease Control, Shenzhen, Guangdong Province, China
| | - Ji Peng
- Department of Cancer Prevention and Control, Shenzhen Center for Chronic Disease Control, Shenzhen, Guangdong Province, China
| | - Jiang Li
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Ni Li
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| | - Wanqing Chen
- Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College/Chinese Academy of Medical Sciences Key Laboratory for National Cancer Big Data Analysis and Implement, Beijing, China
| |
Collapse
|
20
|
Afrash MR, Shafiee M, Kazemi-Arpanahi H. Establishing machine learning models to predict the early risk of gastric cancer based on lifestyle factors. BMC Gastroenterol 2023; 23:6. [PMID: 36627564 PMCID: PMC9832798 DOI: 10.1186/s12876-022-02626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 12/19/2022] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Gastric cancer is one of the leading causes of death worldwide. Screening for gastric cancer greatly relies on endoscopy and pathology biopsy, which are invasive and pose financial burdens. Thus, the prevention of the disease by modifying lifestyle-related behaviors and dietary habits or even the prevention of risk factor formation is of great importance. This study aimed to construct an inexpensive, non-invasive, fast, and high-precision diagnostic model using six machine learning (ML) algorithms to classify patients at high or low risk of developing gastric cancer by analyzing individual lifestyle factors. METHODS This retrospective study used the data of 2029 individuals from the gastric cancer database of Ayatollah Taleghani Hospital in Abadan City, Iran. The data were randomly separated into training and test sets (ratio 0.7:0.3). Six ML methods, including multilayer perceptron (MLP), support vector machine (SVM) (linear kernel), SVM (RBF kernel), k-nearest neighbors (KNN) (K = 1, 3, 7, 9), random forest (RF), and eXtreme Gradient Boosting (XGBoost), were trained to construct prognostic models before and after performing the relief feature selection method. Finally, to evaluate the models' performance, the metrics derived from the confusion matrix were calculated via a test split and cross-validation. RESULTS This study found 11 important influence factors for the risk of gastric cancer, such as Helicobacter pylori infection, high salt intake, and chronic atrophic gastritis, among other factors. Comparisons indicated that the XGBoost had the best performance for the risk prediction of gastric cancer. CONCLUSIONS The results suggest that based on simple baseline patient data, the ML techniques have the potential to start the prescreening of gastric cancer and identify high-risk individuals who should proceed with invasive examinations. Our model could also considerably lessen the number of cases that need endoscopic surveillance. Future studies are required to validate the efficacy of the models in a larger and multicenter population.
Collapse
Affiliation(s)
- Mohammad Reza Afrash
- grid.411705.60000 0001 0166 0922Department of Artificial Intelligence, Smart University of Medical Sciences, Tehran, Iran
| | - Mohsen Shafiee
- Department of Nursing, Abadan University of Medical Sciences, Abadan, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|
21
|
Mokhria RK, Singh J. Role of artificial intelligence in the diagnosis and treatment of hepatocellular carcinoma. Artif Intell Gastroenterol 2022; 3:96-104. [DOI: 10.35712/aig.v3.i4.96] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/30/2022] [Accepted: 09/14/2022] [Indexed: 02/07/2023] Open
Abstract
Artificial intelligence (AI) evolved many years ago, but it gained much advancement in recent years for its use in the medical domain. AI with its different subsidiaries, i.e. deep learning and machine learning, examine a large amount of data and performs an essential part in decision-making in addition to conquering the limitations related to human evaluation. Deep learning tries to imitate the functioning of the human brain. It utilizes much more data and intricate algorithms. Machine learning is AI based on automated learning. It utilizes earlier given data and uses algorithms to arrange and identify models. Globally, hepatocellular carcinoma is a major cause of illness and fatality. Although with substantial progress in the whole treatment strategy for hepatocellular carcinoma, managing it is still a major issue. AI in the area of gastroenterology, especially in hepatology, is particularly useful for various investigations of hepatocellular carcinoma because it is a commonly found tumor, and has specific radiological features that enable diagnostic procedures without the requirement of the histological study. However, interpreting and analyzing the resulting images is not always easy due to change of images throughout the disease process. Further, the prognostic process and response to the treatment process could be influenced by numerous components. Currently, AI is utilized in order to diagnose, curative and prediction goals. Future investigations are essential to prevent likely bias, which might subsequently influence the analysis of images and therefore restrict the consent and utilization of such models in medical practices. Moreover, experts are required to realize the real utility of such approaches, along with their associated potencies and constraints.
Collapse
Affiliation(s)
- Rajesh Kumar Mokhria
- Government Model Sanskriti Senior Secondary School, Chulkana, 132101, Panipat, Haryana, India
| | - Jasbir Singh
- Department of Biochemistry, Kurukshetra University, Kurukshetra, 136119, Haryana, India
| |
Collapse
|
22
|
Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data. Sci Rep 2022; 12:17917. [PMID: 36289292 PMCID: PMC9606301 DOI: 10.1038/s41598-022-23011-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 10/21/2022] [Indexed: 01/20/2023] Open
Abstract
When enabled by machine learning (ML), Learning Health Systems (LHS) hold promise for improving the effectiveness of healthcare delivery to patients. One major barrier to LHS research and development is the lack of access to EHR patient data. To overcome this challenge, this study demonstrated the feasibility of developing a simulated ML-enabled LHS using synthetic patient data. The ML-enabled LHS was initialized using a dataset of 30,000 synthetic Synthea patients and a risk prediction XGBoost base model for lung cancer. 4 additional datasets of 30,000 patients were generated and added to the previous updated dataset sequentially to simulate addition of new patients, resulting in datasets of 60,000, 90,000, 120,000 and 150,000 patients. New XGBoost models were built in each instance, and performance improved with data size increase, attaining 0.936 recall and 0.962 AUC (area under curve) in the 150,000 patients dataset. The effectiveness of the new ML-enabled LHS process was verified by implementing XGBoost models for stroke risk prediction on the same Synthea patient populations. By making the ML code and synthetic patient data publicly available for testing and training, this first synthetic LHS process paves the way for more researchers to start developing LHS with real patient data.
Collapse
|
23
|
Artificial intelligence for distinguishment of hammering sound in total hip arthroplasty. Sci Rep 2022; 12:9826. [PMID: 35701656 PMCID: PMC9198079 DOI: 10.1038/s41598-022-14006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 05/31/2022] [Indexed: 11/30/2022] Open
Abstract
Recent studies have focused on hammering sound analysis during insertion of the cementless stem to decrease complications in total hip arthroplasty. However, the nature of the hammering sound is complex to analyse and varies widely owing to numerous possible variables. Therefore, we performed a preliminary feasibility study that aimed to clarify the accuracy of a prediction model using a machine learning algorithm to identify the final rasping hammering sound recorded during surgery. The hammering sound data of 29 primary THA without complication were assessed. The following definitions were adopted. Undersized rasping: all undersized stem rasping before the rasping of the final stem size, Final size rasping: rasping of the final stem size, Positive example: hammering sound during final size rasping, Negative example A: hammering sound during minimum size stem rasping, Negative example B: hammering sound during all undersized rasping. Three datasets for binary classification were set. Finally, binary classification was analysed in six models for the three datasets. The median values of the ROC-AUC in models A–F among each dataset were dataset a: 0.79, 0.76, 0.83, 0.90, 0.91, and 0.90, dataset B: 0.61, 0.53, 0.67, 0.69, 0.71, and 0.72, dataset C: 0.60, 0.48, 0.57, 0.63, 0.67, and 0.63, respectively. Our study demonstrated that artificial intelligence using machine learning was able to distinguish the final rasping hammering sound from the previous hammering sound with a relatively high degree of accuracy. Future studies are warranted to establish a prediction model using hammering sound analysis with machine learning to prevent complications in THA.
Collapse
|
24
|
Abineza C, Balas VE, Nsengiyumva P. A machine-learning-based prediction method for easy COPD classification based on pulse oximetry clinical use. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-219270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a progressive, obstructive lung disease that restricts airflow from the lungs. COPD patients are at risk of sudden and acute worsening of symptoms called exacerbations. Early identification and classification of COPD exacerbation can reduce COPD risks and improve patient’s healthcare and management. Pulse oximetry is a non-invasive technique used to assess patients with acutely worsening symptoms. As part of manual diagnosis based on pulse oximetry, clinicians examine three warning signs to classify COPD patients. This may lack high sensitivity and specificity which requires a blood test. However, laboratory tests require time, further delayed treatment and additional costs. This research proposes a prediction method for COPD patients’ classification based on pulse oximetry three manual warning signs and the resulting derived few key features that can be obtained in a short time. The model was developed on a robust physician labeled dataset with clinically diverse patient cases. Five classification algorithms were applied on the mentioned dataset and the results showed that the best algorithm is XGBoost with the accuracy of 91.04%, precision of 99.86%, recall of 82.19%, F1 measure value of 90.05% with an AUC value of 95.8%. Age, current and baseline heart rate, current and baseline pulse ox. (SPO2) were found the top most important predictors. These findings suggest the strength of XGBoost model together with the availability and the simplicity of input variables in classifying COPD daily living using a (wearable) pulse oximeter.
Collapse
Affiliation(s)
- Claudia Abineza
- African Center of Excellence in Internet of Things, University of Rwanda, Kigali, Rwanda
| | - Valentina E. Balas
- Department of Automatics and Applied Software, “Aurel Vlaicu” University, Arad, Romania
| | - Philibert Nsengiyumva
- African Center of Excellence in Internet of Things, University of Rwanda, Kigali, Rwanda
| |
Collapse
|
25
|
Huang RJ, Kwon NSE, Tomizawa Y, Choi AY, Hernandez-Boussard T, Hwang JH. A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams. JCO Clin Cancer Inform 2022; 6:e2200039. [PMID: 35763703 DOI: 10.1200/cci.22.00039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
PURPOSE Noncardia gastric cancer (NCGC) is a leading cause of global cancer mortality, and is often diagnosed at advanced stages. Development of NCGC risk models within electronic health records (EHR) may allow for improved cancer prevention. There has been much recent interest in use of machine learning (ML) for cancer prediction, but few studies comparing ML with classical statistical models for NCGC risk prediction. METHODS We trained models using logistic regression (LR) and four commonly used ML algorithms to predict NCGC from age-/sex-matched controls in two EHR systems: Stanford University and the University of Washington (UW). The LR model contained well-established NCGC risk factors (intestinal metaplasia histology, prior Helicobacter pylori infection, race, ethnicity, nativity status, smoking history, anemia), whereas ML models agnostically selected variables from the EHR. Models were developed and internally validated in the Stanford data, and externally validated in the UW data. Hyperparameter tuning of models was achieved using cross-validation. Model performance was compared by accuracy, sensitivity, and specificity. RESULTS In internal validation, LR performed with comparable accuracy (0.732; 95% CI, 0.698 to 0.764), sensitivity (0.697; 95% CI, 0.647 to 0.744), and specificity (0.767; 95% CI, 0.720 to 0.809) to penalized lasso, support vector machine, K-nearest neighbor, and random forest models. In external validation, LR continued to demonstrate high accuracy, sensitivity, and specificity. Although K-nearest neighbor demonstrated higher accuracy and specificity, this was offset by significantly lower sensitivity. No ML model consistently outperformed LR across evaluation criteria. CONCLUSION Drawing data from two independent EHRs, we find LR on the basis of established risk factors demonstrated comparable performance to optimized ML algorithms. This study demonstrates that classical models built on robust, hand-chosen predictor variables may not be inferior to data-driven models for NCGC risk prediction.
Collapse
Affiliation(s)
- Robert J Huang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| | - Nicole Sung-Eun Kwon
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| | - Yutaka Tomizawa
- Division of Gastroenterology, University of Washington, Seattle, WA
| | - Alyssa Y Choi
- Division of Gastroenterology and Hepatology, University of California Irvine, Irvine, CA
| | | | - Joo Ha Hwang
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA
| |
Collapse
|
26
|
Xu X, Fairley CK, Chow EPF, Lee D, Aung ET, Zhang L, Ong JJ. Using machine learning approaches to predict timely clinic attendance and the uptake of HIV/STI testing post clinic reminder messages. Sci Rep 2022; 12:8757. [PMID: 35610227 PMCID: PMC9128330 DOI: 10.1038/s41598-022-12033-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 04/07/2022] [Indexed: 11/09/2022] Open
Abstract
Timely and regular testing for HIV and sexually transmitted infections (STI) is important for controlling HIV and STI (HIV/STI) among men who have sex with men (MSM). We established multiple machine learning models (e.g., logistic regression, lasso regression, ridge regression, elastic net regression, support vector machine, k-nearest neighbour, naïve bayes, random forest, gradient boosting machine, XGBoost, and multi-layer perceptron) to predict timely (i.e., within 30 days) clinic attendance and HIV/STI testing uptake after receiving a reminder message via short message service (SMS) or email). Our study used 3044 clinic consultations among MSM within 12 months after receiving an email or SMS reminder at the Melbourne Sexual Health Centre between April 11, 2019, and April 30, 2020. About 29.5% [899/3044] were timely clinic attendance post reminder messages, and 84.6% [761/899] had HIV/STI testing. The XGBoost model performed best in predicting timely clinic attendance [mean [SD] AUC 62.8% (3.2%); F1 score 70.8% (1.2%)]. The elastic net regression model performed best in predicting HIV/STI testing within 30 days [AUC 82.7% (6.3%); F1 score 85.3% (1.8%)]. The machine learning approach is helpful in predicting timely clinic attendance and HIV/STI re-testing. Our predictive models could be incorporated into clinic websites to inform sexual health care or follow-up service.
Collapse
Affiliation(s)
- Xianglong Xu
- Central Clinical School, Monash University, Melbourne, Australia.,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia
| | - Christopher K Fairley
- Central Clinical School, Monash University, Melbourne, Australia.,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia
| | - Eric P F Chow
- Central Clinical School, Monash University, Melbourne, Australia.,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia.,Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, Australia
| | - David Lee
- Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia
| | - Ei T Aung
- Central Clinical School, Monash University, Melbourne, Australia.,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia
| | - Lei Zhang
- Central Clinical School, Monash University, Melbourne, Australia. .,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia. .,China Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Centre, Xi'an, Shaanxi, China. .,Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, China.
| | - Jason J Ong
- Central Clinical School, Monash University, Melbourne, Australia. .,Melbourne Sexual Health Centre, The Alfred, Melbourne, 3053, Australia. .,Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.
| |
Collapse
|
27
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review. BMC Med Res Methodol 2022; 22:101. [PMID: 35395724 PMCID: PMC8991704 DOI: 10.1186/s12874-022-01577-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 03/18/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
- Basel Institute for Clinical Epidemiology and Biostatistics, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, ST5 5BG, UK
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
- EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
28
|
Open and Crowd-Based Platforms: Impact on Organizational and Market Performance. SUSTAINABILITY 2022. [DOI: 10.3390/su14042223] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The aim of the research was to present the state of the art on the use of open and crowd-based platforms and the advantages in terms of business performance that emerging practices employing such technologies are able to provide. The analysis was performed by extracting information on emerging practices from the repository Business Process Framework for Emerging Technologies developed by the Department of Industrial Engineering of the University of Salerno (Italy). Contingency tables allowed analysis of the association of such practices with industry, business function, business process, and impact on performance. From the analysis of the results, many implementation opportunities emerge, mainly in manufacturing, healthcare, and transportation industries, providing benefits not only in terms of efficiency and productivity, cost reduction, and information management but also in product/service differentiation. Therefore, the research provides an overview of opportunities for organizations employing open and crowd-based platforms in order to improve market and organizational performance. Moreover, the article highlights in what specific business contexts these technologies can be mainly useful.
Collapse
|
29
|
Gu J, Chen R, Wang SM, Li M, Fan Z, Li X, Zhou J, Sun K, Wei W. Prediction models for gastric cancer risk in the general population: a systematic review. Cancer Prev Res (Phila) 2022; 15:309-318. [PMID: 35017181 DOI: 10.1158/1940-6207.capr-21-0426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 11/15/2021] [Accepted: 01/07/2022] [Indexed: 11/16/2022]
Abstract
Risk prediction models for gastric cancer (GC) could identify high-risk individuals in the general population. The objective of this study was to systematically review the available evidence about the construction and verification of GC predictive models. We searched PubMed, Embase, and Cochrane Library databases for articles that developed or validated GC risk prediction models up to November 2021. Data extracted included study characteristics, predictor selection, missing data, and evaluation metrics. Risk of bias (ROB) was assessed using the Prediction model study Risk Of Bias Assessment Tool (PROBAST). We identified a total of 12 original risk prediction models that fulfilled the criteria for analysis. The area under the receiver operating characteristic curve ranged from 0.73 to 0.93 in derivation sets (n=6), 0.68 to 0.90 in internal validation sets (n=5), 0.71 to 0.92 in external validation sets (n=7). The higher-performing models usually include age, salt preference, Helicobacter pylori, smoking, BMI, family history, pepsinogen and sex. According to PROBAST, at least one domain with a high ROB was present in all studies mainly due to methodologic limitations in the analysis domain. In conclusion, although some risk prediction models including similar predictors have displayed sufficient discriminative abilities, many have a high ROB due to methodological limitations and are not externally validated efficiently. Future prediction models should adherence to well-established standards and guidelines to benefit GC screening.
Collapse
Affiliation(s)
- Jianhua Gu
- National Central Cancer Registry, National Cancer Center/ National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| | - Ru Chen
- National Central Cancer Registry, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| | - Shao-Ming Wang
- National Central Cancer Registry Office, National Cancer Center/National Clinical Research Center for Cancer/ Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| | - Minjuan Li
- National Central Cancer Registry, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| | - Zhiyuan Fan
- National Cancer Registry Office, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| | - Xinqing Li
- 1. Office of National Central Cancer Registry, Cancer Institute/Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiachen Zhou
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center
| | - Kexin Sun
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Science and Peking Union Medical College
| | - Wenqiang Wei
- National Central Cancer Registry, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College
| |
Collapse
|
30
|
Network Biology and Artificial Intelligence Drive the Understanding of the Multidrug Resistance Phenotype in Cancer. Drug Resist Updat 2022; 60:100811. [DOI: 10.1016/j.drup.2022.100811] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/22/2022] [Accepted: 01/24/2022] [Indexed: 02/07/2023]
|
31
|
Christou CD, Tsoulfas G. Challenges and opportunities in the application of artificial intelligence in gastroenterology and hepatology. World J Gastroenterol 2021; 27:6191-6223. [PMID: 34712027 PMCID: PMC8515803 DOI: 10.3748/wjg.v27.i37.6191] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/06/2021] [Accepted: 08/31/2021] [Indexed: 02/06/2023] Open
Abstract
Artificial intelligence (AI) is an umbrella term used to describe a cluster of interrelated fields. Machine learning (ML) refers to a model that learns from past data to predict future data. Medicine and particularly gastroenterology and hepatology, are data-rich fields with extensive data repositories, and therefore fruitful ground for AI/ML-based software applications. In this study, we comprehensively review the current applications of AI/ML-based models in these fields and the opportunities that arise from their application. Specifically, we refer to the applications of AI/ML-based models in prevention, diagnosis, management, and prognosis of gastrointestinal bleeding, inflammatory bowel diseases, gastrointestinal premalignant and malignant lesions, other nonmalignant gastrointestinal lesions and diseases, hepatitis B and C infection, chronic liver diseases, hepatocellular carcinoma, cholangiocarcinoma, and primary sclerosing cholangitis. At the same time, we identify the major challenges that restrain the widespread use of these models in healthcare in an effort to explore ways to overcome them. Notably, we elaborate on the concerns regarding intrinsic biases, data protection, cybersecurity, intellectual property, liability, ethical challenges, and transparency. Even at a slower pace than anticipated, AI is infiltrating the healthcare industry. AI in healthcare will become a reality, and every physician will have to engage with it by necessity.
Collapse
Affiliation(s)
- Chrysanthos D Christou
- Organ Transplant Unit, Hippokration General Hospital, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
| | - Georgios Tsoulfas
- Organ Transplant Unit, Hippokration General Hospital, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
| |
Collapse
|
32
|
Li ZM, Zhuang X. Application of artificial intelligence in microbiome study promotes precision medicine for gastric cancer. Artif Intell Gastroenterol 2021; 2:105-110. [DOI: 10.35712/aig.v2.i4.105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/22/2021] [Accepted: 07/09/2021] [Indexed: 02/06/2023] Open
Abstract
The microbiome has been identified as a causing factor for many cancers. Helicobacter pylori contributes to the development of gastric cancer (GC) and impacts disease treatments. The rapid development of sequencing technology is increasingly producing large-scale and complex big data. However, there are many obstacles in the analysis of these data by humans, which limit clinicians from making rapid decisions. Recently, the emergence of artificial intelligence (AI), including machine learning and deep learning, has greatly assisted clinicians in processing and interpreting large microbiome data. This paper reviews the application of AI in the study of the microbiome and discusses its potential in the diagnosis and therapy of GC. We also exemplify strategies for implementing microbiome-based precision medicines for patients with GC.
Collapse
Affiliation(s)
- Zhi-Ming Li
- Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, Hubei Province, China
- Department of Urology, The First Affiliated Hospital of Xiamen University, Xiamen 361003, Fujian Province, China
| | - Xuan Zhuang
- Department of Urology, The First Affiliated Hospital of Xiamen University, Xiamen 361003, Fujian Province, China
- Department of Clinical Medicine, Fujian Medical University, Fuzhou 350122, Fujian Province, China
| |
Collapse
|
33
|
A Deep Recurrent Neural Network-Based Explainable Prediction Model for Progression from Atrophic Gastritis to Gastric Cancer. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11136194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Gastric cancer is the fifth most common cancer type worldwide and one of the most frequently diagnosed cancers in South Korea. In this study, we propose DeepPrevention, which comprises a prediction module to predict the possibility of progression from atrophic gastritis to gastric cancer and an explanation module to identify risk factors for progression from atrophic gastritis to gastric cancer, to identify patients with atrophic gastritis who are at high risk of gastric cancer. The data set used in this study was South Korea National Health Insurance Service (NHIS) medical checkup data for atrophic gastritis patients from 2002 to 2013. Our experimental results showed that the most influential predictors of gastric cancer development were sex, smoking duration, and current smoking status. In addition, we found that the average age of gastric cancer diagnosis in a group of high-risk patients was 57, and income, BMI, regular exercise, and the number of endoscopic screenings did not show any significant difference between groups. At the individual level, we identified that there were relatively strong associations between gastric cancer and smoking duration and smoking status.
Collapse
|
34
|
Iivanainen S, Ekstrom J, Virtanen H, Kataja VV, Koivunen JP. Electronic patient-reported outcomes and machine learning in predicting immune-related adverse events of immune checkpoint inhibitor therapies. BMC Med Inform Decis Mak 2021; 21:205. [PMID: 34193140 PMCID: PMC8243435 DOI: 10.1186/s12911-021-01564-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 06/22/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Immune-checkpoint inhibitors (ICIs) have introduced novel immune-related adverse events (irAEs), arising from various organ systems without strong timely dependency on therapy dosing. Early detection of irAEs could result in improved toxicity profile and quality of life. Symptom data collected by electronic (e) patient-reported outcomes (PRO) could be used as an input for machine learning (ML) based prediction models for the early detection of irAEs. METHODS The utilized dataset consisted of two data sources. The first dataset consisted of 820 completed symptom questionnaires from 34 ICI treated advanced cancer patients, including 18 monitored symptoms collected using the Kaiku Health digital platform. The second dataset included prospectively collected irAE data, Common Terminology Criteria for Adverse Events (CTCAE) class, and the severity of 26 irAEs. The ML models were built using extreme gradient boosting algorithms. The first model was trained to detect the presence and the second the onset of irAEs. RESULTS The model trained to predict the presence of irAEs had an excellent performance based on four metrics: accuracy score 0.97, Area Under the Curve (AUC) value 0.99, F1-score 0.94 and Matthew's correlation coefficient (MCC) 0.92. The prediction of the irAE onset was more difficult with accuracy score 0.96, AUC value 0.93, F1-score 0.66 and MCC 0.64 but the model performance was still at a good level. CONCLUSION The current study suggests that ML based prediction models, using ePRO data as an input, can predict the presence and onset of irAEs with a high accuracy, indicating that ePRO follow-up with ML algorithms could facilitate the detection of irAEs in ICI-treated cancer patients. The results should be validated with a larger dataset. Trial registration Clinical Trials Register (NCT3928938), registration date the 26th of April, 2019.
Collapse
Affiliation(s)
- Sanna Iivanainen
- Department of Oncology and Radiotherapy, Oulu University Hospital and MRC Oulu, OYS, P.B. 22, 90029, Oulu, Finland.
| | | | | | | | - Jussi P Koivunen
- Department of Oncology and Radiotherapy, Oulu University Hospital and MRC Oulu, OYS, P.B. 22, 90029, Oulu, Finland
| |
Collapse
|
35
|
Chen K, Xu H, Lei Y, Lio P, Li Y, Guo H, Ali Moni M. Integration and interplay of machine learning and bioinformatics approach to identify genetic interaction related to ovarian cancer chemoresistance. Brief Bioinform 2021; 22:6272796. [PMID: 33971668 DOI: 10.1093/bib/bbab100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 11/15/2022] Open
Abstract
Although chemotherapy is the first-line treatment for ovarian cancer (OCa) patients, chemoresistance (CR) decreases their progression-free survival. This paper investigates the genetic interaction (GI) related to OCa-CR. To decrease the complexity of establishing gene networks, individual signature genes related to OCa-CR are identified using a gradient boosting decision tree algorithm. Additionally, the genetic interaction coefficient (GIC) is proposed to measure the correlation of two signature genes quantitatively and explain their joint influence on OCa-CR. Gene pair that possesses high GIC is identified as signature pair. A total of 24 signature gene pairs are selected that include 10 individual signature genes and the influence of signature gene pairs on OCa-CR is explored. Finally, a signature gene pair-based prediction of OCa-CR is identified. The area under curve (AUC) is a widely used performance measure for machine learning prediction. The AUC of signature gene pair reaches 0.9658, whereas the AUC of individual signature gene-based prediction is 0.6823 only. The identified signature gene pairs not only build an efficient GI network of OCa-CR but also provide an interesting way for OCa-CR prediction. This improvement shows that our proposed method is a useful tool to investigate GI related to OCa-CR.
Collapse
Affiliation(s)
- Kexin Chen
- School of Electronics Engineering and Computer Science, Peking University, 100871, Beijing, China
| | - Haoming Xu
- Department of Biomedical Engineering, Duke University, 27708, Durham, United States
| | - Yiming Lei
- School of Electronics Engineering and Computer Science, Peking University, 100871, Beijing, China
| | - Pietro Lio
- Computer Laboratory, University of Cambridge, CB3-0FD, Cambridge, United Kingdom
| | - Yuan Li
- Department of Obstetrics and Gynecology, Peking University Third Hospital, 100083, Beijing, China
| | - Hongyan Guo
- Department of Obstetrics and Gynecology, Peking University Third Hospital, 100083, Beijing, China
| | - Mohammad Ali Moni
- School of Public health and Community Medicine, University of New South Wales, 2052, Sydney, Australia
| |
Collapse
|
36
|
Banegas-Luna AJ, Peña-García J, Iftene A, Guadagni F, Ferroni P, Scarpato N, Zanzotto FM, Bueno-Crespo A, Pérez-Sánchez H. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey. Int J Mol Sci 2021; 22:4394. [PMID: 33922356 PMCID: PMC8122817 DOI: 10.3390/ijms22094394] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/18/2022] Open
Abstract
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine-specifically, to cancer research-and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors' predictive capacity and achieve individualised therapies in the near future.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Jorge Peña-García
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Adrian Iftene
- Faculty of Computer Science, Universitatea Alexandru Ioan Cuza (UAIC), 700505 Jashi, Romania;
| | - Fiorella Guadagni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Patrizia Ferroni
- Interinstitutional Multidisciplinary Biobank (BioBIM), IRCCS San Raffaele Roma, 00166 Rome, Italy; (F.G.); (P.F.)
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Noemi Scarpato
- Department of Human Sciences and Promotion of the Quality of Life, San Raffaele Roma Open University, 00166 Rome, Italy;
| | - Fabio Massimo Zanzotto
- Dipartimento di Ingegneria dell’Impresa “Mario Lucertini”, University of Rome Tor Vergata, 00133 Rome, Italy;
| | - Andrés Bueno-Crespo
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.P.-G.); (A.B.-C.)
| |
Collapse
|
37
|
Luxton JJ, McKenna MJ, Lewis AM, Taylor LE, Jhavar SG, Swanson GP, Bailey SM. Telomere Length Dynamics and Chromosomal Instability for Predicting Individual Radiosensitivity and Risk via Machine Learning. J Pers Med 2021; 11:188. [PMID: 33800260 PMCID: PMC8002073 DOI: 10.3390/jpm11030188] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 02/23/2021] [Accepted: 03/02/2021] [Indexed: 12/11/2022] Open
Abstract
The ability to predict a cancer patient's response to radiotherapy and risk of developing adverse late health effects would greatly improve personalized treatment regimens and individual outcomes. Telomeres represent a compelling biomarker of individual radiosensitivity and risk, as exposure can result in dysfunctional telomere pathologies that coincidentally overlap with many radiation-induced late effects, ranging from degenerative conditions like fibrosis and cardiovascular disease to proliferative pathologies like cancer. Here, telomere length was longitudinally assessed in a cohort of fifteen prostate cancer patients undergoing Intensity Modulated Radiation Therapy (IMRT) utilizing Telomere Fluorescence in situ Hybridization (Telo-FISH). To evaluate genome instability and enhance predictions for individual patient risk of secondary malignancy, chromosome aberrations were assessed utilizing directional Genomic Hybridization (dGH) for high-resolution inversion detection. We present the first implementation of individual telomere length data in a machine learning model, XGBoost, trained on pre-radiotherapy (baseline) and in vitro exposed (4 Gy γ-rays) telomere length measurements, to predict post radiotherapy telomeric outcomes, which together with chromosomal instability provide insight into individual radiosensitivity and risk for radiation-induced late effects.
Collapse
Affiliation(s)
- Jared J. Luxton
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA; (J.J.L.); (M.J.M.); (A.M.L.); (L.E.T.)
- Cell and Molecular Biology Program, Colorado State University, Fort Collins, CO 80523, USA
| | - Miles J. McKenna
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA; (J.J.L.); (M.J.M.); (A.M.L.); (L.E.T.)
- Cell and Molecular Biology Program, Colorado State University, Fort Collins, CO 80523, USA
| | - Aidan M. Lewis
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA; (J.J.L.); (M.J.M.); (A.M.L.); (L.E.T.)
| | - Lynn E. Taylor
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA; (J.J.L.); (M.J.M.); (A.M.L.); (L.E.T.)
| | - Sameer G. Jhavar
- Baylor Scott & White Medical Center, Temple, TX 76508, USA; (S.G.J.); (G.P.S.)
| | - Gregory P. Swanson
- Baylor Scott & White Medical Center, Temple, TX 76508, USA; (S.G.J.); (G.P.S.)
| | - Susan M. Bailey
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO 80523, USA; (J.J.L.); (M.J.M.); (A.M.L.); (L.E.T.)
- Cell and Molecular Biology Program, Colorado State University, Fort Collins, CO 80523, USA
| |
Collapse
|
38
|
Clift AK, Le Lannou E, Tighe CP, Shah SS, Beatty M, Hyvärinen A, Lane SJ, Strauss T, Dunn DD, Lu J, Aral M, Vahdat D, Ponzo S, Plans D. Development and Validation of Risk Scores for All-Cause Mortality for a Smartphone-Based "General Health Score" App: Prospective Cohort Study Using the UK Biobank. JMIR Mhealth Uhealth 2021; 9:e25655. [PMID: 33591285 PMCID: PMC7925156 DOI: 10.2196/25655] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 12/16/2020] [Accepted: 01/20/2021] [Indexed: 12/23/2022] Open
Abstract
Background Given the established links between an individual’s behaviors and lifestyle factors and potentially adverse health outcomes, univariate or simple multivariate health metrics and scores have been developed to quantify general health at a given point in time and estimate risk of negative future outcomes. However, these health metrics may be challenging for widespread use and are unlikely to be successful at capturing the broader determinants of health in the general population. Hence, there is a need for a multidimensional yet widely employable and accessible way to obtain a comprehensive health metric. Objective The objective of the study was to develop and validate a novel, easily interpretable, points-based health score (“C-Score”) derived from metrics measurable using smartphone components and iterations thereof that utilize statistical modeling and machine learning (ML) approaches. Methods A literature review was conducted to identify relevant predictor variables for inclusion in the first iteration of a points-based model. This was followed by a prospective cohort study in a UK Biobank population for the purposes of validating the C-Score and developing and comparatively validating variations of the score using statistical and ML models to assess the balance between expediency and ease of interpretability and model complexity. Primary and secondary outcome measures were discrimination of a points-based score for all-cause mortality within 10 years (Harrell c-statistic) and discrimination and calibration of Cox proportional hazards models and ML models that incorporate C-Score values (or raw data inputs) and other predictors to predict the risk of all-cause mortality within 10 years. Results The study cohort comprised 420,560 individuals. During a cohort follow-up of 4,526,452 person-years, there were 16,188 deaths from any cause (3.85%). The points-based model had good discrimination (c-statistic=0.66). There was a 31% relative reduction in risk of all-cause mortality per decile of increasing C-Score (hazard ratio of 0.69, 95% CI 0.663-0.675). A Cox model integrating age and C-Score had improved discrimination (8 percentage points; c-statistic=0.74) and good calibration. ML approaches did not offer improved discrimination over statistical modeling. Conclusions The novel health metric (“C-Score”) has good predictive capabilities for all-cause mortality within 10 years. Embedding the C-Score within a smartphone app may represent a useful tool for democratized, individualized health risk prediction. A simple Cox model using C-Score and age balances parsimony and accuracy of risk predictions and could be used to produce absolute risk estimations for app users.
Collapse
Affiliation(s)
- Ashley K Clift
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | | | - Christian P Tighe
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom.,Huma Therapeutics, London, United Kingdom
| | | | | | | | | | | | | | - Jiahe Lu
- Huma Therapeutics, London, United Kingdom
| | - Mert Aral
- Huma Therapeutics, London, United Kingdom
| | - Dan Vahdat
- Huma Therapeutics, London, United Kingdom
| | | | - David Plans
- Huma Therapeutics, London, United Kingdom.,Department of Science, Innovation, Technology, and Entrepreneurship, University of Exeter, Exeter, United Kingdom.,Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
39
|
Zeng J, Lawrence WR, Yang J, Tian J, Li C, Lian W, He J, Qu H, Wang X, Liu H, Li G, Li G. Association between serum uric acid and obesity in Chinese adults: a 9-year longitudinal data analysis. BMJ Open 2021; 11:e041919. [PMID: 33550245 PMCID: PMC7908911 DOI: 10.1136/bmjopen-2020-041919] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVES Hyperuricaemia has been reported to be significantly associated with risk of obesity. However, previous studies on the association between serum uric acid (SUA) and body mass index (BMI) yielded conflicting results. The present study examined the relationship between SUA and obesity among Chinese adults. METHODS Data were collected at Guangdong Second Provincial General Hospital in Guangzhou City, China, between January 2010 and December 2018. Participants with ≥2 medical check-up times were included in our analyses. Physical examinations and laboratory measurement variables were obtained from the medical check-up system. The high SUA level group was classified as participants with hyperuricaemia, and obesity was defined as BMI ≥28 kg/m2. Logistic regression model was performed for data at baseline. For all participants, generalised estimation equation (GEE) model was used to assess the association between SUA and obesity, where the data were repeatedly measured over the 9-year study period. Subgroup analyses were performed by gender and age group. We calculated the cut-off values for SUA of obesity using the receiver operating characteristic curves (ROC) technique. RESULTS A total of 15 959 participants (10 023 men and 5936 women) were included in this study, with an average age of 37.38 years (SD: 13.27) and average SUA of 367.05 μmol/L (SD: 97.97) at baseline, respectively. Finally, 1078 participants developed obesity over the 9-year period. The prevalence of obesity was approximately 14.2% for high SUA level. In logistic regression analysis at baseline, we observed a positive association between SUA and risk of obesity: OR=1.84 (95% CI: 1.77 to 1.90) for per-SD increase in SUA. Considering repeated measures over 9 year for all participants in the GEE model, the per-SD OR was 1.85 (95% CI: 1.77 to 1.91) for SUA and the increased risk of obesity were greater for men (OR=1.45) and elderly participants (OR=1.01). In subgroup analyses by gender and age, we observed significant associations between SUA and obesity with higher risk in women (OR=2.35) and young participants (OR=1.87) when compared with men (OR=1.70) and elderly participants (OR=1.48). The SUA cut-off points for risk of obesity using ROC curves were approximately consistent with the international standard. CONCLUSIONS Our study observed higher SUA level was associated with increased risk of obesity. More high-quality research is needed to further support these findings.
Collapse
Affiliation(s)
- Jie Zeng
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
- Institute of Ultrasound in Musculoskeletal Sports Medicine, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Wayne R Lawrence
- Department of Epidemiology and Biostatistics, University at Albany State University of New York, Albany, New York, USA
| | - Jun Yang
- Institute for Environmental and Climate Research, Jinan University, Guangzhou, China
| | - Junzhang Tian
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
| | - Cheng Li
- Guangdong Traditional Medical and Sports Injury Rehabilitation Research Institute, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Wanmin Lian
- Center for Information, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Jingjun He
- Center for Health Management and Examination, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Hongying Qu
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
- Center for Health Management and Examination, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Xiaojie Wang
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
| | - Hongmei Liu
- Institute of Ultrasound in Musculoskeletal Sports Medicine, Guangdong Second Provincial General Hospital, Guangzhou, China
- Department of Ultrasound, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Guanming Li
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
| | - Guowei Li
- Center for Clinical Epidemiology and Methodology, Guangdong Second Provincial General Hospital, Guangdong, China
- Department of Health Research Methods, Evidence, and Impact (HEI), McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
40
|
Martínez-Velasco A, Perez-Ortiz AC, Antonio-Aguirre B, Martínez-Villaseñor L, Lira-Romero E, Palacio-Pastrana C, Zenteno JC, Ramirez I, Zepeda-Palacio C, Mendoza-Velásquez C, Camacho-Ordóñez A, Ortiz Bibriesca DM, Estrada-Mena FJ. Assessment of CFH and HTRA1 polymorphisms in age-related macular degeneration using classic and machine-learning approaches. Ophthalmic Genet 2020; 41:539-547. [PMID: 32838591 DOI: 10.1080/13816810.2020.1804945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 07/26/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND CFH and HTRA1 are pivotal genes driving increased risk for age-related macular degeneration (AMD) among several populations. Here, we performed a hospital-based case-control study to evaluate the effects of three single nucleotide polymorphisms (SNPs) among Hispanics from Mexico. MATERIALS AND METHODS 122 cases and 249 controls were genotyped using Taqman probes. Experienced ophthalmologists diagnosed AMD following the American Association of Ophthalmology guidelines. We studied CFH (rs1329428, rs203687) and HTRA1 (rs11200638) SNPs thoroughly by logistic regression models (assuming different modes of inheritance) and machine learning-based methods (ML). RESULTS HTRA1 rs11200638 is the most significant polymorphism associated with AMD in our studied population. In a multivariate regression model adjusted for clinically and statistically meaningful covariates, the A/G and A/A genotypes increased the odds of disease by a factor of 2.32 and 7.81, respectively (P < .05) suggesting a multiplicative effect of the polymorphic A allele. Furthermore, this observation remains statistically meaningful in the allelic, dominant, and recessive models, and ML algorithms. When stratifying by phenotype, this polymorphism was significantly associated with increased odds for geographic atrophy (GA) in a recessive mode of inheritance (12.4, p < .05). CONCLUSIONS In sum, this work supports a strong association between HTRA1 genetic variants and AMD in Hispanics from Mexico, especially with GA. Moreover, ML was able to replicate the results of conventional biostatistics methods unbiasedly.
Collapse
Affiliation(s)
| | - Andric C Perez-Ortiz
- Universidad Panamericana. Facultad De Ciencias De La Salud. Ciudad De México, México
- Transplant Center, Division of Surgery, Massachusetts General Hospital , Boston, MA, USA
| | - Bani Antonio-Aguirre
- Universidad Panamericana. Facultad De Ciencias De La Salud. Ciudad De México, México
| | | | - Esmeralda Lira-Romero
- Universidad Panamericana. Facultad De Ciencias De La Salud. Ciudad De México, México
| | - Claudia Palacio-Pastrana
- Department of Microsurgery of the Anterior Segment, Fundación Hospital Nuestra Señora De La Luz, IAP , Ciudad De México, México
- Department of Microsurgery of the Anterior Segment, Clínicas Oftalmologicas Salauno Salud, Hamburgo, Ciudad de México, México
| | | | - Israel Ramirez
- Universidad Panamericana. Facultad De Ciencias De La Salud. Ciudad De México, México
| | - Claudia Zepeda-Palacio
- Department of Microsurgery of the Anterior Segment, Fundación Hospital Nuestra Señora De La Luz, IAP , Ciudad De México, México
| | - Cristina Mendoza-Velásquez
- Department of Microsurgery of the Anterior Segment, Fundación Hospital Nuestra Señora De La Luz, IAP , Ciudad De México, México
| | - Azyadeh Camacho-Ordóñez
- Department of Microsurgery of the Anterior Segment, Fundación Hospital Nuestra Señora De La Luz, IAP , Ciudad De México, México
| | | | - F Javier Estrada-Mena
- Universidad Panamericana. Facultad De Ciencias De La Salud. Ciudad De México, México
| |
Collapse
|
41
|
Wang R, Luo W, Liu Z, Liu W, Liu C, Liu X, Zhu H, Li R, Song J, Hu X, Han S, Qiu W. Integration of the Extreme Gradient Boosting model with electronic health records to enable the early diagnosis of multiple sclerosis. Mult Scler Relat Disord 2020; 47:102632. [PMID: 33276240 DOI: 10.1016/j.msard.2020.102632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 10/31/2020] [Accepted: 11/13/2020] [Indexed: 10/22/2022]
Abstract
BACKGROUND Delayed multiple sclerosis (MS) diagnoses are not uncommon, an early diagnostic tool is urgently warranted. We aimed to develop an effective tool through electronic health records and machine learning techniques to early recognize MS patients from hospital visitors in China. METHODS Two case sets were collected from January 2016 to December 2018. The training set had 239 MS and 1142 controls, and the test set had 23 MS and 92 controls. The utility of Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes, K-nearest-neighbor (KNN) and Support Vector Machine (SVM) in early diagnosis of MS was evaluated by the area under curve of receiver operating characteristic, precision, recall, specificity, accuracy and F1 score. RESULTS The XGBoost performed the best and was used to generate the results. Thirty-four variables which were highly relevant to MS diagnosis were set for the XGBoost model, and their relative importance with MS were ranked. The training set recall was 0.632, with a precision of 0.576, and the test set recall was 0.609, with a precision of 0.609. Our study found that 61%, 51%, and 49% of the patients could be diagnosed with MS, 1, 2, and 3 years earlier than their real diagnostic time point, respectively. CONCLUSIONS A diagnostic tool for early MS recognition based on the XGBoost model and electronic health records were developed to help reduce diagnostic delays in MS.
Collapse
Affiliation(s)
- Ruoning Wang
- Department of Continuing Medical Education, Peking University Health Science Center, Beijing, China
| | - Wenjing Luo
- Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Zifeng Liu
- Department of clinical data center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Weilong Liu
- Medical Data Operation Department, Chengdu Medlinker Science and Technology Co., Ltd, Beijing, China
| | - Chunxin Liu
- Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xun Liu
- Department of clinical data center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - He Zhu
- Department of Real-World Evidence and Pharmacoeconomics, International Research Center for Medicinal Administration, Peking University, Beijing, China
| | - Rui Li
- Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Jiafang Song
- Department of Real-World Evidence and Pharmacoeconomics, International Research Center for Medicinal Administration, Peking University, Beijing, China
| | - Xueqiang Hu
- Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Sheng Han
- Department of Real-World Evidence and Pharmacoeconomics, International Research Center for Medicinal Administration, Peking University, Beijing, China.
| | - Wei Qiu
- Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
| |
Collapse
|
42
|
Schperberg AV, Boichard A, Tsigelny IF, Richard SB, Kurzrock R. Machine learning model to predict oncologic outcomes for drugs in randomized clinical trials. Int J Cancer 2020; 147:2537-2549. [PMID: 32745254 DOI: 10.1002/ijc.33240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 07/15/2020] [Accepted: 07/17/2020] [Indexed: 11/12/2022]
Abstract
Predicting oncologic outcome is challenging due to the diversity of cancer histologies and the complex network of underlying biological factors. In this study, we determine whether machine learning (ML) can extract meaningful associations between oncologic outcome and clinical trial, drug-related biomarker and molecular profile information. We analyzed therapeutic clinical trials corresponding to 1102 oncologic outcomes from 104 758 cancer patients with advanced colorectal adenocarcinoma, pancreatic adenocarcinoma, melanoma and nonsmall-cell lung cancer. For each intervention arm, a dataset with the following attributes was curated: line of treatment, the number of cytotoxic chemotherapies, small-molecule inhibitors, or monoclonal antibody agents, drug class, molecular alteration status of the clinical arm's population, cancer type, probability of drug sensitivity (PDS) (integrating the status of genomic, transcriptomic and proteomic biomarkers in the population of interest) and outcome. A total of 467 progression-free survival (PFS) and 369 overall survival (OS) data points were used as training sets to build our ML (random forest) model. Cross-validation sets were used for PFS and OS, obtaining correlation coefficients (r) of 0.82 and 0.70, respectively (outcome vs model's parameters). A total of 156 PFS and 110 OS data points were used as test sets. The Spearman correlation (rs ) between predicted and actual outcomes was statistically significant (PFS: rs = 0.879, OS: rs = 0.878, P < .0001). The better outcome arm was predicted in 81% (PFS: N = 59/73, z = 5.24, P < .0001) and 71% (OS: N = 37/52, z = 2.91, P = .004) of randomized trials. The success of our algorithm to predict clinical outcome may be exploitable as a model to optimize clinical trial design with pharmaceutical agents.
Collapse
Affiliation(s)
- Alexander V Schperberg
- CureMatch, Inc., San Diego, California, USA.,Department of Mechanical and Aerospace Engineering, University of California Los Angeles, Los Angeles, California, USA
| | - Amélie Boichard
- Center for Personalized Cancer Therapy and Division of Hematology and Oncology, University of California San Diego Moores Cancer Center, La Jolla, California, USA
| | - Igor F Tsigelny
- CureMatch, Inc., San Diego, California, USA.,San Diego Supercomputer Center, University of California San Diego, La Jolla, California, USA.,Department of Neurosciences, University of California San Diego, La Jolla, California, USA
| | - Stéphane B Richard
- CureMatch, Inc., San Diego, California, USA.,Oncodesign, Inc., New York, New York, USA
| | - Razelle Kurzrock
- Center for Personalized Cancer Therapy and Division of Hematology and Oncology, University of California San Diego Moores Cancer Center, La Jolla, California, USA
| |
Collapse
|
43
|
Liang G, Fan W, Luo H, Zhu X. The emerging roles of artificial intelligence in cancer drug development and precision therapy. Biomed Pharmacother 2020; 128:110255. [DOI: 10.1016/j.biopha.2020.110255] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/22/2020] [Accepted: 05/10/2020] [Indexed: 12/12/2022] Open
|
44
|
Cancer Prevention Using Machine Learning, Nudge Theory and Social Impact Bond. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17030790. [PMID: 32012838 PMCID: PMC7037430 DOI: 10.3390/ijerph17030790] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 01/21/2020] [Accepted: 01/23/2020] [Indexed: 12/17/2022]
Abstract
There have been prior attempts to utilize machine learning to address issues in the medical field, particularly in diagnoses using medical images and developing therapeutic regimens. However, few cases have demonstrated the usefulness of machine learning for enhancing health consciousness of patients or the public in general, which is necessary to cause behavioral changes. This paper describes a novel case wherein the uptake rate for colorectal cancer examinations has significantly increased due to the application of machine learning and nudge theory. The paper also discusses the effectiveness of social impact bonds (SIBs) as a scheme for realizing these applications. During a healthcare SIB project conducted in the city of Hachioji, Tokyo, machine learning, based on historical data obtained from designated periodical health examinations, digitalized medical insurance receipts, and medical examination records for colorectal cancer, was used to deduce segments for whom the examination was recommended. The result revealed that out of the 12,162 people for whom the examination was recommended, 3264 (26.8%) received it, which exceeded the upper expectation limit of the initial plan (19.0%). We conclude that this was a successful case that stimulated discussion on potential further applications of this approach to wider regions and more diseases.
Collapse
|