1
|
Caruso CM, Guarrasi V, Ramella S, Soda P. A deep learning approach for overall survival prediction in lung cancer with missing values. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108308. [PMID: 38968829 DOI: 10.1016/j.cmpb.2024.108308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 06/24/2024] [Accepted: 06/24/2024] [Indexed: 07/07/2024]
Abstract
BACKGROUND AND OBJECTIVE In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. METHODS We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. RESULTS We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used. CONCLUSIONS The results show that our model not only outperforms the state-of-the-art's performance but also simplifies the analysis in the presence of missing data, by effectively eliminating the need to identify the most appropriate imputation strategy for predicting OS in NSCLC patients.
Collapse
Affiliation(s)
- Camillo Maria Caruso
- Research Unit of Computer Systems and Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy.
| | - Valerio Guarrasi
- Research Unit of Computer Systems and Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy.
| | - Sara Ramella
- Operative Research Unit of Radiation Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Rome, Italy.
| | - Paolo Soda
- Research Unit of Computer Systems and Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy; Department of Diagnostics and Intervention, Radiation Physics, Biomedical Engineering, Umeå University, Umeå, Sweden.
| |
Collapse
|
2
|
Fujimoto D, Hayashi H, Murotani K, Toi Y, Yokoyama T, Kato T, Yamaguchi T, Tanaka K, Miura S, Tamiya M, Tachihara M, Shukuya T, Tsuchiya-Kawano Y, Sato Y, Ikeda S, Sakata S, Masuda T, Takemoto S, Otsubo K, Shibaki R, Makino M, Okamoto I, Yamamoto N. Prediction of prognosis in lung cancer using machine learning with inter-institutional generalizability: A multicenter cohort study (WJOG15121L: REAL-WIND). Lung Cancer 2024; 194:107896. [PMID: 39043076 DOI: 10.1016/j.lungcan.2024.107896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/19/2024] [Accepted: 07/14/2024] [Indexed: 07/25/2024]
Abstract
OBJECTIVES Predicting the prognosis of lung cancer is crucial for providing optimal medical care. However, a method to accurately predict the overall prognosis in patients with stage IV lung cancer, even with the use of machine learning, has not been established. Moreover, the inter-institutional generalizability of such algorithms remains unexplored. This study aimed to establish machine learning-based algorithms with inter-institutional generalizability to predict prognosis. MATERIALS AND METHODS This multicenter, retrospective, hospital-based cohort study included consecutive patients with stage IV lung cancer who were randomly categorized into the training and independent test cohorts with a 2:1 ratio, respectively. The primary metric to assess algorithm performance was the area under the receiver operating characteristic curve in the independent test cohort. To assess the inter-institutional generalizability of the algorithms, we investigated their ability to predict patient outcomes in the remaining facility after being trained using data from 15 other facilities. RESULTS Overall, 6,751 patients (median age, 70 years) were enrolled, and 1,515 (22 %) showed mutated epidermal growth factor receptor expression. The median overall survival was 16.6 (95 % confidence interval, 15.9-17.5) months. Algorithm performance metrics in the test cohort showed that the areas under the curves were 0.90 (95 % confidence interval, 0.88-0.91), 0.85 (0.84-0.87), 0.83 (0.81-0.85), and 0.85 (0.82-0.87) at 180, 360, 720, and 1,080 predicted survival days, respectively. The performance test of 16 algorithms for investigating inter-institutional generalizability showed median areas under the curves of 0.87 (range, 0.84-0.92), 0.84 (0.78-0.88), 0.84 (0.76-0.89), and 0.84 (0.75-0.90) at 180, 360, 720, and 1,080 days, respectively. CONCLUSION This study developed machine learning algorithms that could accurately predict the prognosis in patients with stage IV lung cancer with high inter-institutional generalizability. This can enhance the accuracy of prognosis prediction and support informed and shared decision-making in clinical settings.
Collapse
Affiliation(s)
- Daichi Fujimoto
- Internal Medicine III, Wakayama Medical University, Wakayama, Japan
| | - Hidetoshi Hayashi
- Department of Medical Oncology, Kindai University Faculty of Medicine, Osaka, Japan.
| | | | - Yukihiro Toi
- Department of Pulmonary Medicine, Sendai Kousei Hospital, Sendai, Japan
| | - Toshihide Yokoyama
- Department of Respiratory Medicine, Kurashiki Central Hospital, Kurashiki, Japan
| | - Terufumi Kato
- Department of Thoracic Oncology, Kanagawa Cancer Center, Yokohama, Japan
| | - Teppei Yamaguchi
- Department of Thoracic Oncology, Aichi Cancer Center Hospital, Nagoya, Japan
| | - Kaoru Tanaka
- Department of Medical Oncology, Kindai University Faculty of Medicine, Osaka, Japan
| | - Satoru Miura
- Department of Internal Medicine, Niigata Cancer Center Hospital, Niigata, Japan
| | - Motohiro Tamiya
- Department of Thoracic Oncology, Osaka International Cancer Institute, Osaka, Japan
| | - Motoko Tachihara
- Division of Respiratory Medicine, Department of Internal Medicine, Kobe University Graduate School of Medicine, Kobe, Japan
| | - Takehito Shukuya
- Department of Respiratory Medicine, Juntendo University, Graduate School of Medicine, Tokyo, Japan
| | - Yuko Tsuchiya-Kawano
- Department of Respiratory Medicine, Kitakyushu Municipal Medical Center, Kitakyushu, Japan
| | - Yuki Sato
- Department of Respiratory Medicine, Kobe City Medical Center General Hospital, Kobe, Japan
| | - Satoshi Ikeda
- Department of Respiratory Medicine, Kanagawa Cardiovascular and Respiratory Center, Yokohama, Japan
| | - Shinya Sakata
- Department of Respiratory Medicine, Kumamoto University Hospital, Kumamoto, Japan
| | - Takeshi Masuda
- Department of Respiratory Medicine, Hiroshima University Hospital, Hiroshima, Japan
| | - Shinnosuke Takemoto
- Department of Respiratory Medicine, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Kohei Otsubo
- Department of Respiratory Medicine, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Ryota Shibaki
- Internal Medicine III, Wakayama Medical University, Wakayama, Japan
| | - Miki Makino
- NTT Data Corp., Res. & Dev. Headquarters, Tokyo, Japan
| | - Isamu Okamoto
- Department of Respiratory Medicine, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | | |
Collapse
|
3
|
Li Y, Yang AY, Marelli A, Li Y. MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records. J Biomed Inform 2024; 153:104638. [PMID: 38631461 DOI: 10.1016/j.jbi.2024.104638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 03/07/2024] [Accepted: 04/03/2024] [Indexed: 04/19/2024]
Abstract
Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.
Collapse
Affiliation(s)
- Yixuan Li
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada
| | - Archer Y Yang
- Department of Mathematics and Statistics, McGill University, Montreal, Canada; Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| | - Ariane Marelli
- McGill Adult Unit for Congenital Heart Disease (MAUDE Unit), McGill University of Health Centre, Montreal, Canada.
| | - Yue Li
- Mila - Quebec AI institute, Montreal, Canada; School of Computer Science, McGill University, Montreal, Canada.
| |
Collapse
|
4
|
Schreidah CM, Gordon ER, Adeuyan O, Chen C, Lapolla BA, Kent JA, Reynolds GB, Fahmy LM, Weng C, Tatonetti NP, Chase HS, Pe’er I, Geskin LJ. Current status of artificial intelligence methods for skin cancer survival analysis: a scoping review. Front Med (Lausanne) 2024; 11:1243659. [PMID: 38711781 PMCID: PMC11070520 DOI: 10.3389/fmed.2024.1243659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 02/22/2024] [Indexed: 05/08/2024] Open
Abstract
Skin cancer mortality rates continue to rise, and survival analysis is increasingly needed to understand who is at risk and what interventions improve outcomes. However, current statistical methods are limited by inability to synthesize multiple data types, such as patient genetics, clinical history, demographics, and pathology and reveal significant multimodal relationships through predictive algorithms. Advances in computing power and data science enabled the rise of artificial intelligence (AI), which synthesizes vast amounts of data and applies algorithms that enable personalized diagnostic approaches. Here, we analyze AI methods used in skin cancer survival analysis, focusing on supervised learning, unsupervised learning, deep learning, and natural language processing. We illustrate strengths and weaknesses of these approaches with examples. Our PubMed search yielded 14 publications meeting inclusion criteria for this scoping review. Most publications focused on melanoma, particularly histopathologic interpretation with deep learning. Such concentration on a single type of skin cancer amid increasing focus on deep learning highlight growing areas for innovation; however, it also demonstrates opportunity for additional analysis that addresses other types of cutaneous malignancies and expands the scope of prognostication to combine both genetic, histopathologic, and clinical data. Moreover, researchers may leverage multiple AI methods for enhanced benefit in analyses. Expanding AI to this arena may enable improved survival analysis, targeted treatments, and outcomes.
Collapse
Affiliation(s)
- Celine M. Schreidah
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States
| | - Emily R. Gordon
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States
| | - Oluwaseyi Adeuyan
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States
| | - Caroline Chen
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States
| | - Brigit A. Lapolla
- Department of Dermatology, Columbia University Irving Medical Center, New York, NY, United States
| | - Joshua A. Kent
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| | | | - Lauren M. Fahmy
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States
| | - Chunhua Weng
- The Data Science Institute, Columbia University, New York, NY, United States
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Nicholas P. Tatonetti
- The Data Science Institute, Columbia University, New York, NY, United States
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
- Cedars-Sinai Cancer, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Herbert S. Chase
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Itsik Pe’er
- The Data Science Institute, Columbia University, New York, NY, United States
- Department of Systems Biology, Columbia University, New York, NY, United States
- Department of Computer Science, Columbia University, New York, NY, United States
| | - Larisa J. Geskin
- Department of Dermatology, Columbia University Irving Medical Center, New York, NY, United States
| |
Collapse
|
5
|
Oh W, Jayaraman P, Tandon P, Chaddha US, Kovatch P, Charney AW, Glicksberg BS, Nadkarni GN. A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19. Artif Intell Med 2024; 148:102750. [PMID: 38325922 PMCID: PMC10864255 DOI: 10.1016/j.artmed.2023.102750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024]
Abstract
Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.
Collapse
Affiliation(s)
- Wonsuk Oh
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Pushkala Jayaraman
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pranai Tandon
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Udit S Chaddha
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Patricia Kovatch
- Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexander W Charney
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Character Biosciences, New York, NY, USA
| | - Girish N Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
6
|
Wen J, Zhang T, Ye S, Zhang P, Han R, Chen X, Huang R, Chen A, Li Q. Quantitative patient graph analysis for transient ischemic attack risk factor distribution based on electronic medical records. Heliyon 2024; 10:e22766. [PMID: 38163107 PMCID: PMC10755279 DOI: 10.1016/j.heliyon.2023.e22766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 10/26/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open
Abstract
A transient ischemic attack (TIA) affects millions of people worldwide. Although TIA risk factors have been identified individually, a systemic quantitative analysis of all health factors relevant to TIA using electronic medical records (EMR) remains lacking. This study employed a data-driven approach, leveraging hospital EMR data to create a TIA patient health factor graph. This graph consisted of 737 TIA and 737 control patient nodes, 740 health factor nodes, and over 33,000 relations between patients and factors. For all health factors in the graph, the connection delta ratios (CDRs) were determined and ranked, generating a quantitative distribution of TIA health factors. A literature review confirmed 56 risk factors in the distribution and unveiled a potential new risk factor "rhinosinusitis" for future validation. Moreover, the patient graph was visualized together with the TIA knowledge graph in the Unified Medical Language System. This integration enables clinicians to access and visualize patient data and international standard knowledge within a unified graph. In conclusion, graph CDR analysis can effectively quantify the distribution of TIA risk factors. The resulting TIA risk factor distribution might be instrumental in developing new risk prediction machine learning models for screening and early detection of TIA.
Collapse
Affiliation(s)
- Jian Wen
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Tianmei Zhang
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Shangrong Ye
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Peng Zhang
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Ruobing Han
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Xiaowang Chen
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| | - Ran Huang
- West China Hospital, Chengdu, Sichuan, China
| | - Anjun Chen
- Learning Health Community, Palo Alto, CA, USA
| | - Qinghua Li
- Guilin Medical University Affiliated Hospital, Guilin, Guangxi, China
| |
Collapse
|
7
|
Hrebinko KA, Huckaby LV, Silver D, Ratnayake C, Hong Y, Curtis B, Handzel RM, van der Windt DJ, Dadashzadeh ER. Predictors of acute incisional hernia incarceration at initial hernia diagnosis on computed tomography. J Trauma Acute Care Surg 2024; 96:129-136. [PMID: 37335920 DOI: 10.1097/ta.0000000000003994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
BACKGROUND Acute incisional hernia incarceration is associated with high morbidity and mortality yet there is little evidence to guide which patients will benefit most from prophylactic repair. We explored baseline computed tomography (CT) characteristics associated with incarceration. METHODS A case-control study design was utilized to explore adults (≥18 years) diagnosed with an incisional hernia between 2010 and 2017 at a single institution with a 1-year minimum follow-up. Computed tomography imaging at the time of initial hernia diagnosis was examined. Following propensity score matching for baseline characteristics, multivariable logistic regression was performed to identify independent predictors associated with acute incarceration. RESULTS A total of 532 patients (27.26% male, mean 61.55 years) were examined, of whom 238 experienced an acute incarceration. Between two well-matched cohorts with and without incarceration, the presence of small bowel in the hernia sac (odds ratio [OR], 7.50; 95% confidence interval [CI], 3.35-16.38), increasing sac height (OR, 1.34; 95% CI, 1.10-1.64), more acute hernia angle (OR, 0.98 per degree; 95% CI, 0.97-0.99), decreased fascial defect width (OR, 0.68; 95% CI, 0.58-0.81), and greater outer abdominal fat (OR, 1.28; 95% CI, 1.02-1.60) were associated with acute incarceration. Using threshold analysis, a hernia angle of <91 degrees and a sac height of >3.25 cm were associated with increased incarceration risk. CONCLUSION Computed tomography features present at the time of hernia diagnosis provide insight into later acute incarceration risk. Improved understanding of acute incisional hernia incarceration can guide selection for prophylactic repair and thereby may mitigate the excess morbidity associated with incarceration. LEVEL OF EVIDENCE Prognostic and Epidemiological; Level III.
Collapse
Affiliation(s)
- Katherine A Hrebinko
- From the Department of Surgery (K.A.H., L.V.H., D.S., Y.H., R.M.H.), University of Pittsburgh Medical Center; Department of Emergency Medicine, University of Pennsylvania, (C.R.,), Philadelphia, PA; Department of Internal Medicine, University of Michigan (B.C.), Ann Arbor, MI; Department of Surgery (D.J.W.), University of Michigan, Ann Arbor, Michigan; and Section of Vascular Surgery, Department of Surgery (E.R.D.), Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Çalışkan M, Tazaki K. AI/ML advances in non-small cell lung cancer biomarker discovery. Front Oncol 2023; 13:1260374. [PMID: 38148837 PMCID: PMC10750392 DOI: 10.3389/fonc.2023.1260374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/16/2023] [Indexed: 12/28/2023] Open
Abstract
Lung cancer is the leading cause of cancer deaths among both men and women, representing approximately 25% of cancer fatalities each year. The treatment landscape for non-small cell lung cancer (NSCLC) is rapidly evolving due to the progress made in biomarker-driven targeted therapies. While advancements in targeted treatments have improved survival rates for NSCLC patients with actionable biomarkers, long-term survival remains low, with an overall 5-year relative survival rate below 20%. Artificial intelligence/machine learning (AI/ML) algorithms have shown promise in biomarker discovery, yet NSCLC-specific studies capturing the clinical challenges targeted and emerging patterns identified using AI/ML approaches are lacking. Here, we employed a text-mining approach and identified 215 studies that reported potential biomarkers of NSCLC using AI/ML algorithms. We catalogued these studies with respect to BEST (Biomarkers, EndpointS, and other Tools) biomarker sub-types and summarized emerging patterns and trends in AI/ML-driven NSCLC biomarker discovery. We anticipate that our comprehensive review will contribute to the current understanding of AI/ML advances in NSCLC biomarker research and provide an important catalogue that may facilitate clinical adoption of AI/ML-derived biomarkers.
Collapse
Affiliation(s)
- Minal Çalışkan
- Translational Science Department, Precision Medicine Function, Daiichi Sankyo, Inc., Basking Ridge, NJ, United States
| | - Koichi Tazaki
- Translational Science Department I, Precision Medicine Function, Daiichi Sankyo, Tokyo, Japan
| |
Collapse
|
9
|
Matsumoto K, Nohara Y, Sakaguchi M, Takayama Y, Fukushige S, Soejima H, Nakashima N, Kamouchi M. Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study. JMIR Perioper Med 2023; 6:e50895. [PMID: 37883164 PMCID: PMC10636625 DOI: 10.2196/50895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 09/24/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. OBJECTIVE The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. METHODS The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. RESULTS A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. CONCLUSIONS The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.
Collapse
Affiliation(s)
| | - Yasunobu Nohara
- Big Data Science and Technology, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan
| | - Mikako Sakaguchi
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Yohei Takayama
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Syota Fukushige
- Department of Inspection, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Hidehisa Soejima
- Institute for Medical Information Research and Analysis, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan
| | - Masahiro Kamouchi
- Department of Health Care Administration and Management, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
- Center for Cohort Studies, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| |
Collapse
|
10
|
Vidula N, Peppercorn J. Clicking Away to Capture Cancer Staging-The Benefits and Challenges of Completing Standardized Staging Modules. JCO Oncol Pract 2023; 19:835-838. [PMID: 37729599 DOI: 10.1200/op.23.00500] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 08/15/2023] [Indexed: 09/22/2023] Open
Abstract
This article by Neelima Vidula and Jeffrey Peppercorn @MGHCancerCenter explores the benefits and challenges of completing standardized modules for cancer staging, and opportunities to improve module compliance while reducing burnout
Collapse
Affiliation(s)
- Neelima Vidula
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA
| | - Jeffrey Peppercorn
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA
| |
Collapse
|
11
|
Wang X, Ayakulangara Panickan V, Cai T, Xiong X, Cho K, Cai T, Bourgeois FT. Endovascular Aneurysm Repair Devices as a Use Case for Postmarketing Surveillance of Medical Devices. JAMA Intern Med 2023; 183:1090-1097. [PMID: 37603326 PMCID: PMC10442779 DOI: 10.1001/jamainternmed.2023.3562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 05/31/2023] [Indexed: 08/22/2023]
Abstract
Importance The US Food and Drug Administration (FDA) is building a national postmarketing surveillance system for medical devices, moving to a "total product life cycle" approach whereby more limited premarketing data are balanced with postmarketing surveillance to capture rare adverse events and long-term safety issues. Objective To assess the methodological requirements and feasibility of postmarketing device surveillance using endovascular aneurysm repair devices (EVARs), which have been the subject of safety concerns, using clinical data from a large health care system. Design, Setting, and Participants This retrospective cohort study included patients with electronic health record (EHR) data in the Veterans Affairs Corporate Data Warehouse. Exposure Implantation of an AFX Endovascular AAA System (AFX) device (any of 3 iterations) or a non-AFX comparator EVAR device from January 1, 2011, to December 21, 2021. Main Outcomes and Measures The primary outcomes were rates of type III endoleaks and all-cause mortality; and rates of these outcomes associated with AFX devices compared with non-AFX devices, assessed using Cox proportional hazards regression models and doubly robust causal modeling. Information on type III endoleaks was available only as free-text mentions in clinical notes, while all-cause mortality data could be extracted using structured data. Device-specific information required by the FDA is ascertained using unique device identifiers (UDIs), which include factors such as model numbers, catalog numbers, and manufacturer-specific product codes. The availability of UDIs in EHRs was assessed. Results In total, 13 941 patients (mean [SD] age, 71.8 [7.4] years) received 1 of the devices of interest (AFX with Strata [AFX-S]: 718 patients [5.2%]; AFX with Duraply [AFX-D]: 404 patients [2.9%]; or AFX2: 682 patients [4.9%]), and 12 137 (87.1%) received non-AFX devices. The UDIs were not recorded in the EHR for any patient with an AFX device, and partial UDIs were available for 19 patients (0.1%) with a non-AFX device. This necessitated the development of advanced natural language processing tools to define the cohort of patients for analysis. The study identified a significantly higher risk of type III endoleaks at 5 years among patients receiving any of the AFX device iterations, including the most recent version, AFX2 (11.6%; 95% CI, 8.1%-15.1%) compared with that among patients with non-AFX devices (5.7%; 95% CI, 2.2%-9.2%; absolute risk difference, 5.9%; 95% CI, 2.3%-9.4%). However, there was no significantly higher all-cause mortality for any of the AFX device iterations, including for AFX2 (19.0%; 95% CI, 16.0%-22.0%) compared with non-AFX devices (18.0%; 95% CI, 15.0%-21.0%; absolute risk difference, 1.0%; 95% CI, -2.1% to 4.1%). Conclusions and Relevance The findings of this cohort study suggest that clinical data can be used for the postmarketing device surveillance required by the FDA. The study also highlights ongoing challenges to performing larger-scale surveillance, including lack of consistent use of UDIs and insufficient relevant structured data to efficiently capture certain outcomes of interest.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
- Department of Population Health Sciences, University of Utah, Salt Lake City, Utah
| | | | - Tianrun Cai
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
| | - Xin Xiong
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Kelly Cho
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts
- Division of Population Health and Data Sciences, Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Division of Population Health and Data Sciences, Massachusetts Veterans Epidemiology Research and Information Center, VA Boston Healthcare System, Boston
| | - Florence T. Bourgeois
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
12
|
Morís DI, de Moura J, Marcos PJ, Rey EM, Novo J, Ortega M. Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models. Biomed Signal Process Control 2023; 84:104818. [PMID: 36915863 PMCID: PMC9995330 DOI: 10.1016/j.bspc.2023.104818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 11/22/2022] [Accepted: 03/05/2023] [Indexed: 03/11/2023]
Abstract
COVID-19 is a global threat for the healthcare systems due to the rapid spread of the pathogen that causes it. In such situation, the clinicians must take important decisions, in an environment where medical resources can be insufficient. In this task, the computer-aided diagnosis systems can be very useful not only in the task of supporting the clinical decisions but also to perform relevant analyses, allowing them to understand better the disease and the factors that can identify the high risk patients. For those purposes, in this work, we use several machine learning algorithms to estimate the outcome of COVID-19 patients given their clinical information. Particularly, we perform 2 different studies: the first one estimates whether the patient is at low or at high risk of death whereas the second estimates if the patient needs hospitalization or not. The results of the analyses of this work show the most relevant features for each studied scenario, as well as the classification performance of the considered machine learning models. In particular, the XGBoost algorithm is able to estimate the need for hospitalization of a patient with an AUC-ROC of 0 . 8415 ± 0 . 0217 while it can also estimate the risk of death with an AUC-ROC of 0 . 7992 ± 0 . 0104 . Results have demonstrated the great potential of the proposal to determine those patients that need a greater amount of medical resources for being at a higher risk. This provides the healthcare services with a tool to better manage their resources.
Collapse
Affiliation(s)
- Daniel I Morís
- Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.,Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain
| | - Joaquim de Moura
- Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.,Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain
| | - Pedro J Marcos
- Dirección Asistencial y Servicio de Neumología, Complejo Hospitalario Universitario de A Coruña (CHUAC), Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Sergas, 15006 A Coruña, Spain
| | - Enrique Míguez Rey
- Grupo de Investigación en Virología Clínica, Sección de Enfermedades Infecciosas, Servicio de Medicina Interna, Instituto de Investigación Biomédica de A Coruña (INIBIC), Área Sanitaria A Coruña y CEE (ASCC), SERGAS, 15006 A Coruña, Spain
| | - Jorge Novo
- Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.,Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain
| | - Marcos Ortega
- Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.,Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain
| |
Collapse
|
13
|
Eskofier BM, Klucken J. Predictive Models for Health Deterioration: Understanding Disease Pathways for Personalized Medicine. Annu Rev Biomed Eng 2023; 25:131-156. [PMID: 36854259 DOI: 10.1146/annurev-bioeng-110220-030247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) methods are currently widely employed in medicine and healthcare. A PubMed search returns more than 100,000 articles on these topics published between 2018 and 2022 alone. Notwithstanding several recent reviews in various subfields of AI and ML in medicine, we have yet to see a comprehensive review around the methods' use in longitudinal analysis and prediction of an individual patient's health status within a personalized disease pathway. This review seeks to fill that gap. After an overview of the AI and ML methods employed in this field and of specific medical applications of models of this type, the review discusses the strengths and limitations of current studies and looks ahead to future strands of research in this field. We aim to enable interested readers to gain a detailed impression of the research currently available and accordingly plan future work around predictive models for deterioration in health status.
Collapse
Affiliation(s)
- Bjoern M Eskofier
- Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany;
| | - Jochen Klucken
- Digital Medicine Group, Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, Belvaux, Luxembourg
- Digital Medicine Group, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg
- Centre Hospitalier de Luxembourg, Luxembourg City, Luxembourg
| |
Collapse
|
14
|
Fatapour Y, Abiri A, Kuan EC, Brody JP. Development of a Machine Learning Model to Predict Recurrence of Oral Tongue Squamous Cell Carcinoma. Cancers (Basel) 2023; 15:2769. [PMID: 37345106 DOI: 10.3390/cancers15102769] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 05/10/2023] [Accepted: 05/12/2023] [Indexed: 06/23/2023] Open
Abstract
Despite diagnostic advancements, the development of reliable prognostic systems for assessing the risk of cancer recurrence still remains a challenge. In this study, we developed a novel framework to generate highly representative machine-learning prediction models for oral tongue squamous cell carcinoma (OTSCC) cancer recurrence. We identified cases of 5- and 10-year OTSCC recurrence from the SEER database. Four classification models were trained using the H2O ai platform, whose performances were assessed according to their accuracy, recall, precision, and the area under the curve (AUC) of their receiver operating characteristic (ROC) curves. By evaluating Shapley additive explanation contribution plots, feature importance was studied. Of the 130,979 patients studied, 36,042 (27.5%) were female, and the mean (SD) age was 58.2 (13.7) years. The Gradient Boosting Machine model performed the best, achieving 81.8% accuracy and 97.7% precision for 5-year prediction. Moreover, 10-year predictions demonstrated 80.0% accuracy and 94.0% precision. The number of prior tumors, patient age, the site of cancer recurrence, and tumor histology were the most significant predictors. The implementation of our novel SEER framework enabled the successful identification of patients with OTSCC recurrence, with which highly accurate and sensitive prediction models were generated. Thus, we demonstrate our framework's potential for application in various cancers to build generalizable screening tools to predict tumor recurrence.
Collapse
Affiliation(s)
- Yasaman Fatapour
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| | - Arash Abiri
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
- Department of Otolaryngology-Head and Neck Surgery, University of California, Irvine, CA 92604, USA
| | - Edward C Kuan
- Department of Otolaryngology-Head and Neck Surgery, University of California, Irvine, CA 92604, USA
| | - James P Brody
- Department of Biomedical Engineering, University of California, Irvine, CA 92617, USA
| |
Collapse
|
15
|
Sondhi A, Rich AS, Wang S, Leek JT. Postprediction Inference for Clinical Characteristics Extracted With Machine Learning on Electronic Health Records. JCO Clin Cancer Inform 2023; 7:e2200174. [PMID: 37159871 PMCID: PMC10281422 DOI: 10.1200/cci.22.00174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/10/2023] [Accepted: 03/14/2023] [Indexed: 05/11/2023] Open
Abstract
PURPOSE Real-world data (RWD) derived from electronic health records (EHRs) are often used to understand population-level relationships between patient characteristics and cancer outcomes. Machine learning (ML) methods enable researchers to extract characteristics from unstructured clinical notes, and represent a more cost-effective and scalable approach than manual expert abstraction. These extracted data are then used in epidemiologic or statistical models as if they were abstracted observations. Analytical results derived from extracted data in this way may differ from those given by abstracted data, and the magnitude of this difference is not directly informed by standard ML performance metrics. METHODS In this paper, we define the task of postprediction inference, which is to recover similar estimation and inference from an ML-extracted variable that would be obtained from abstracting the variable. We consider fitting a Cox proportional hazards model that uses a binary ML-extracted variable as a covariate and evaluate four approaches for postprediction inference in this setting. The first two approaches only require the ML-predicted probability, while the latter two additionally require a labeled (human abstracted) validation data set. RESULTS Our results for both simulated data and EHR-derived RWD from a national cohort demonstrate that we can improve inference from ML-extracted variables by leveraging a limited amount of labeled data. CONCLUSION We describe and evaluate methods for fitting statistical models using ML-extracted variables subject to model error. We show that estimation and inference is generally valid when using extracted data from high-performing ML models. More complex methods that incorporate auxiliary labeled data provide further improvements.
Collapse
Affiliation(s)
| | | | - Siruo Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| | - Jeffery T. Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| |
Collapse
|
16
|
Maier D, Vehreschild JJ, Uhl B, Meyer S, Berger-Thürmel K, Boerries M, Braren R, Grünwald V, Hadaschik B, Palm S, Singer S, Stuschke M, Juárez D, Delpy P, Lambarki M, Hummel M, Engels C, Andreas S, Gökbuget N, Ihrig K, Burock S, Keune D, Eggert A, Keilholz U, Schulz H, Büttner D, Löck S, Krause M, Esins M, Ressing F, Schuler M, Brandts C, Brucker DP, Husmann G, Oellerich T, Metzger P, Voigt F, Illert AL, Theobald M, Kindler T, Sudhof U, Reckmann A, Schwinghammer F, Nasseh D, Weichert W, von Bergwelt-Baildon M, Bitzer M, Malek N, Öner Ö, Schulze-Osthoff K, Bartels S, Haier J, Ammann R, Schmidt AF, Guenther B, Janning M, Kasper B, Loges S, Stilgenbauer S, Kuhn P, Tausch E, Runow S, Kerscher A, Neumann M, Breu M, Lablans M, Serve H. Profile of the multicenter cohort of the German Cancer Consortium's Clinical Communication Platform. Eur J Epidemiol 2023; 38:573-586. [PMID: 37017830 PMCID: PMC10073785 DOI: 10.1007/s10654-023-00990-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 03/09/2023] [Indexed: 04/06/2023]
Abstract
Treatment concepts in oncology are becoming increasingly personalized and diverse. Successively, changes in standards of care mandate continuous monitoring of patient pathways and clinical outcomes based on large, representative real-world data. The German Cancer Consortium's (DKTK) Clinical Communication Platform (CCP) provides such opportunity. Connecting fourteen university hospital-based cancer centers, the CCP relies on a federated IT-infrastructure sourcing data from facility-based cancer registry units and biobanks. Federated analyses resulted in a cohort of 600,915 patients, out of which 232,991 were incident since 2013 and for which a comprehensive documentation is available. Next to demographic data (i.e., age at diagnosis: 2.0% 0-20 years, 8.3% 21-40 years, 30.9% 41-60 years, 50.1% 61-80 years, 8.8% 81+ years; and gender: 45.2% female, 54.7% male, 0.1% other) and diagnoses (five most frequent tumor origins: 22,523 prostate, 18,409 breast, 15,575 lung, 13,964 skin/malignant melanoma, 9005 brain), the cohort dataset contains information about therapeutic interventions and response assessments and is connected to 287,883 liquid and tissue biosamples. Focusing on diagnoses and therapy-sequences, showcase analyses of diagnosis-specific sub-cohorts (pancreas, larynx, kidney, thyroid gland) demonstrate the analytical opportunities offered by the cohort's data. Due to its data granularity and size, the cohort is a potential catalyst for translational cancer research. It provides rapid access to comprehensive patient groups and may improve the understanding of the clinical course of various (even rare) malignancies. Therefore, the cohort may serve as a decisions-making tool for clinical trial design and contributes to the evaluation of scientific findings under real-world conditions.
Collapse
Affiliation(s)
- Daniel Maier
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jörg Janne Vehreschild
- University Hospital Frankfurt, Frankfurt, Germany.
- Department of Internal Medicine I, University Hospital of Cologne, Cologne, Germany.
- German Centre for Infection Research (DZIF), Partner Site Bonn-Cologne, Cologne, Germany.
| | - Barbara Uhl
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Sandra Meyer
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Karin Berger-Thürmel
- University Hospital Munich, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Boerries
- Faculty of Medicine, Institute of Medical Bioinformatics and Systems Medicine, Medical Center, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK), Partner Site Freiburg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Rickmer Braren
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
- School of Medicine, Technical University Munich, Munich, Germany
| | - Viktor Grünwald
- West German Cancer Center, University Hospital Essen, Essen, Germany
- German Cancer Consortium (DKTK), Partner Site Essen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Boris Hadaschik
- West German Cancer Center, University Hospital Essen, Essen, Germany
- German Cancer Consortium (DKTK), Partner Site Essen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Stefan Palm
- West German Cancer Center, University Hospital Essen, Essen, Germany
- German Cancer Consortium (DKTK), Partner Site Essen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Susanne Singer
- University Medical Center of the Johannes Gutenberg University, Mainz, Germany
- German Cancer Consortium (DKTK), Partner Site Mainz and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Martin Stuschke
- West German Cancer Center, University Hospital Essen, Essen, Germany
- German Cancer Consortium (DKTK), Partner Site Essen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - David Juárez
- German Cancer Research Center (DKFZ), Federated Information Systems, Heidelberg, Germany
- German Cancer Consortium (DKTK), Partner Site Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Pierre Delpy
- German Cancer Research Center (DKFZ), Federated Information Systems, Heidelberg, Germany
- German Cancer Consortium (DKTK), Partner Site Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mohamed Lambarki
- German Cancer Research Center (DKFZ), Federated Information Systems, Heidelberg, Germany
- German Cancer Consortium (DKTK), Partner Site Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Hummel
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Cäcilia Engels
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Stefanie Andreas
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Nicola Gökbuget
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Kristina Ihrig
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Susen Burock
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dietmar Keune
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Angelika Eggert
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ulrich Keilholz
- Charité Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hagen Schulz
- University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- German Cancer Consortium (DKTK), Partner Site Dresden and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel Büttner
- University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Steffen Löck
- University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- German Cancer Consortium (DKTK), Partner Site Dresden and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mechthild Krause
- University Hospital and Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- German Cancer Consortium (DKTK), Partner Site Dresden and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mirko Esins
- West German Cancer Center, University Hospital Essen, Essen, Germany
| | - Frank Ressing
- West German Cancer Center, University Hospital Essen, Essen, Germany
| | - Martin Schuler
- West German Cancer Center, University Hospital Essen, Essen, Germany
- German Cancer Consortium (DKTK), Partner Site Essen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christian Brandts
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel P Brucker
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Gabriele Husmann
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thomas Oellerich
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Patrick Metzger
- Faculty of Medicine, Institute of Medical Bioinformatics and Systems Medicine, Medical Center, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK), Partner Site Freiburg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Frederik Voigt
- Faculty of Medicine, Institute of Medical Bioinformatics and Systems Medicine, Medical Center, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK), Partner Site Freiburg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Anna L Illert
- German Cancer Consortium (DKTK), Partner Site Freiburg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Medicine I, Faculty of Medicine, Medical Center, University of Freiburg, Freiburg, Germany
| | - Matthias Theobald
- University Medical Center of the Johannes Gutenberg University, Mainz, Germany
- German Cancer Consortium (DKTK), Partner Site Mainz and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thomas Kindler
- University Medical Center of the Johannes Gutenberg University, Mainz, Germany
- German Cancer Consortium (DKTK), Partner Site Mainz and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ursula Sudhof
- University Medical Center of the Johannes Gutenberg University, Mainz, Germany
| | - Achim Reckmann
- University Medical Center of the Johannes Gutenberg University, Mainz, Germany
- German Cancer Consortium (DKTK), Partner Site Mainz and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Felix Schwinghammer
- University Hospital Munich, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel Nasseh
- University Hospital Munich, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Wilko Weichert
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
- School of Medicine, Technical University Munich, Munich, Germany
| | - Michael von Bergwelt-Baildon
- University Hospital Munich, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Bitzer
- Center for Personalized Medicine, Eberhard-Karls University of Tübingen, Tübingen, Germany
- German Cancer Consortium (DKTK), Partner Site Tübingen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Nisar Malek
- Center for Personalized Medicine, Eberhard-Karls University of Tübingen, Tübingen, Germany
- German Cancer Consortium (DKTK), Partner Site Tübingen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Öznur Öner
- Center for Personalized Medicine, Eberhard-Karls University of Tübingen, Tübingen, Germany
- German Cancer Consortium (DKTK), Partner Site Tübingen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Klaus Schulze-Osthoff
- Center for Personalized Medicine, Eberhard-Karls University of Tübingen, Tübingen, Germany
- German Cancer Consortium (DKTK), Partner Site Tübingen and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Stefan Bartels
- University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Jörg Haier
- Comprehensive Cancer Center Hannover (Claudia von Schilling-Zentrum), Hannover Medical School, Hannover, Germany
| | - Raimund Ammann
- Comprehensive Cancer Center Hannover (Claudia von Schilling-Zentrum), Hannover Medical School, Hannover, Germany
| | - Anja Franziska Schmidt
- Comprehensive Cancer Center Hannover (Claudia von Schilling-Zentrum), Hannover Medical School, Hannover, Germany
| | - Bernd Guenther
- Comprehensive Cancer Center Hannover (Claudia von Schilling-Zentrum), Hannover Medical School, Hannover, Germany
| | - Melanie Janning
- DKFZ-Hector Cancer Institute at the University Medical Center Mannheim, Mannheim, Germany
- Mannheim University Medical Center, University of Heidelberg, Mannheim, Germany
- Department of Personalized Medical Oncology (A420), DKFZ German Cancer Research Center, Heidelberg, Germany
| | - Bernd Kasper
- Mannheim University Medical Center, University of Heidelberg, Mannheim, Germany
| | - Sonja Loges
- DKFZ-Hector Cancer Institute at the University Medical Center Mannheim, Mannheim, Germany
- Mannheim University Medical Center, University of Heidelberg, Mannheim, Germany
- Department of Personalized Medical Oncology (A420), DKFZ German Cancer Research Center, Heidelberg, Germany
| | | | - Peter Kuhn
- Neu-Ulm University of Applied Sciences, Neu-Ulm, Germany
| | | | | | | | | | - Martin Breu
- University Hospital of Würzburg, Würzburg, Germany
| | - Martin Lablans
- German Cancer Research Center (DKFZ), Federated Information Systems, Heidelberg, Germany
- German Cancer Consortium (DKTK), Partner Site Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hubert Serve
- University Hospital Frankfurt, Frankfurt, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Frankfurt Cancer Institute, Frankfurt, Germany
| |
Collapse
|
17
|
Susič D, Syed-Abdul S, Dovgan E, Jonnagaddala J, Gradišek A. Artificial intelligence based personalized predictive survival among colorectal cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107435. [PMID: 36842345 DOI: 10.1016/j.cmpb.2023.107435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 12/14/2022] [Accepted: 02/18/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer is a major health concern. It is now the third most common cancer and the fourth leading cause of cancer mortality worldwide. The aim of this study was to evaluate the performance of machine learning algorithms for predicting survival of colorectal cancer patients 1 to 5 years after diagnosis, and identify the most important variables. METHODS A sample of 1236 patients diagnosed with colorectal cancer and 118 predictor variables has been used. The outcome of interest was a binary variable indicating whether the patient survived the number of years in question or not. 20 predictor variables were selected using mutual information score with the outcome. We implemented 11 machine learning algorithms and evaluated their performance with a 5 by 2-fold cross-validation with stratified folds and with paired Student's t-tests. We compared the results with the Kaplan-Meier estimator and Cox's proportional hazard regression. RESULTS Using the 20 most important predictor variables for each of the survival years, the logistic regression algorithm achieved an area under the receiver operating characteristic curve of 0.850 (0.014 SD, 0.840-0.860 95 % CI) for the 1-year, and 0.872 (0.014 SD, 0.861-0.882 95% CI) for the 5-year survival prediction. Using only the 5 most important predictor variables, the corresponding values are 0.793 (0.020 SD, 0.778-0.807 95% CI) and 0.794 (0.011 SD, 0.785-0.802 95% CI). The most important variables for 1-year prediction were number of R residual, M distant metastasis, overall stage, probable recurrence within 5 years, and tumour length, whereas for 5-year prediction the most important were probable recurrence within 5 years, R residual, M distant metastasis, number of positive lymph nodes, and palliative chemotherapy. Biomarkers do not appear among the top 20 most important ones. For all survival intervals, the probability of the top model agrees with the Kaplan-Meier estimate, both in the interval of one standard deviation and in the 95% confidence interval. CONCLUSIONS The findings suggest that machine learning algorithms can predict the survival probability of colorectal cancer patients and can be used to inform the patients and assist decision-making in clinical care management. In addition, this study unveils the most essential variables for estimating survival short- and long-term among patients with Colorectal cancer.
Collapse
Affiliation(s)
- David Susič
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia; Jožef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | - Shabbir Syed-Abdul
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan.
| | - Erik Dovgan
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | | | - Anton Gradišek
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia.
| |
Collapse
|
18
|
Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023; 5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]
Abstract
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.
Collapse
Affiliation(s)
- Sandra Steyaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | | | | | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
19
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
20
|
Araki K, Matsumoto N, Togo K, Yonemoto N, Ohki E, Xu L, Hasegawa Y, Satoh D, Takemoto R, Miyazaki T. Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records. Adv Ther 2023; 40:934-950. [PMID: 36547809 PMCID: PMC9988800 DOI: 10.1007/s12325-022-02397-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 12/01/2022] [Indexed: 12/24/2022]
Abstract
INTRODUCTION A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. METHODS We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. RESULTS For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan-Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. CONCLUSION We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.
Collapse
Affiliation(s)
- Kenji Araki
- Patient Advocacy Center, University of Miyazaki Hospital, Miyazaki, Japan
| | - Nobuhiro Matsumoto
- Division of Respirology, Rheumatology, Infectious Diseases, and Neurology, Department of Internal Medicine, University of Miyazaki, Miyazaki, Japan
| | - Kanae Togo
- Health & Value, Pfizer Japan Inc., Tokyo, Japan.
| | | | - Emiko Ohki
- Oncology Medical Affairs, Pfizer Japan Inc, Tokyo, Japan
| | - Linghua Xu
- Health & Value, Pfizer Japan Inc., Tokyo, Japan
| | | | - Daisuke Satoh
- Research and Development Headquarters, NTT DATA Corporation, Tokyo, Japan
| | - Ryota Takemoto
- Manufacturing IT Innovation Sector, NTT DATA Corporation, Tokyo, Japan
| | - Taiga Miyazaki
- Division of Respirology, Rheumatology, Infectious Diseases, and Neurology, Department of Internal Medicine, University of Miyazaki, Miyazaki, Japan
| |
Collapse
|
21
|
Nunez JJ, Leung B, Ho C, Bates AT, Ng RT. Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing. JAMA Netw Open 2023; 6:e230813. [PMID: 36848085 PMCID: PMC9972192 DOI: 10.1001/jamanetworkopen.2023.0813] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
IMPORTANCE Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer. OBJECTIVE To investigate whether natural language processing can predict survival of patients with general cancer from a patient's initial oncologist consultation document. DESIGN, SETTING, AND PARTICIPANTS This retrospective prognostic study used data from 47 625 of 59 800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded. EXPOSURES Initial oncologist consultation documents were analyzed using traditional and neural language models. MAIN OUTCOMES AND MEASURES The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used. RESULTS Of the 47 625 patients in the sample, 25 428 (53.4%) were female and 22 197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41 447 patients (87.0%) survived 6 months, 31 143 (65.4%) survived 36 months, and 27 880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found. CONCLUSIONS AND RELEVANCE These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.
Collapse
Affiliation(s)
- John-Jose Nunez
- BC Cancer, Vancouver, British Columbia, Canada
- Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Cheryl Ho
- BC Cancer, Vancouver, British Columbia, Canada
| | - Alan T. Bates
- BC Cancer, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond T. Ng
- Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
22
|
Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez-Luna C, Jordan DM, Rocheleau G, Cho JH, Rosenson RS, Narula J, Nadkarni GN, Do R. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet 2023; 401:215-225. [PMID: 36563696 PMCID: PMC10069625 DOI: 10.1016/s0140-6736(22)02079-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/05/2022] [Accepted: 10/18/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model. METHODS In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae. FINDINGS Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines. INTERPRETATION Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals. FUNDING National Institutes of Health.
Collapse
Affiliation(s)
- Iain S Forrest
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ben O Petrazzini
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Áine Duffy
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joshua K Park
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Carla Marquez-Luna
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniel M Jordan
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert S Rosenson
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Metabolism and Lipids Unit, Zena and Michael A Wiener Cardiovascular Institute, Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Jagat Narula
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; The BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
23
|
Segura T, Medrano IH, Collazo S, Maté C, Sguera C, Del Rio-Bermudez C, Casero H, Salcedo I, García-García J, Alcahut-Rodríguez C, Taberna M. Symptoms timeline and outcomes in amyotrophic lateral sclerosis using artificial intelligence. Sci Rep 2023; 13:702. [PMID: 36639403 PMCID: PMC9839769 DOI: 10.1038/s41598-023-27863-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal, neurodegenerative motor neuron disease. Although an early diagnosis is crucial to provide adequate care and improve survival, patients with ALS experience a significant diagnostic delay. This study aimed to use real-world data to describe the clinical profile and timing between symptom onset, diagnosis, and relevant outcomes in ALS. Retrospective and multicenter study in 5 representative hospitals and Primary Care services in the SESCAM Healthcare Network (Castilla-La Mancha, Spain). Using Natural Language Processing (NLP), the clinical information in electronic health records of all patients with ALS was extracted between January 2014 and December 2018. From a source population of all individuals attended in the participating hospitals, 250 ALS patients were identified (61.6% male, mean age 64.7 years). Of these, 64% had spinal and 36% bulbar ALS. For most defining symptoms, including dyspnea, dysarthria, dysphagia and fasciculations, the overall diagnostic delay from symptom onset was 11 (6-18) months. Prior to diagnosis, only 38.8% of patients had visited the neurologist. In a median post-diagnosis follow-up of 25 months, 52% underwent gastrostomy, 64% non-invasive ventilation, 16.4% tracheostomy, and 87.6% riluzole treatment; these were more commonly reported (all Ps < 0.05) and showed greater probability of occurrence (all Ps < 0.03) in bulbar ALS. Our results highlight the diagnostic delay in ALS and revealed differences in the clinical characteristics and occurrence of major disease-specific events across ALS subtypes. NLP holds great promise for its application in the wider context of rare neurological diseases.
Collapse
Affiliation(s)
- Tomás Segura
- University Hospital of Albacete, Albacete, Spain.
| | | | | | | | - Carlo Sguera
- Savana Research, Madrid, Spain.,UC3M-Santander Big Data Institute, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Natural Language Processing Applications for Computer-Aided Diagnosis in Oncology. Diagnostics (Basel) 2023; 13:diagnostics13020286. [PMID: 36673096 PMCID: PMC9857980 DOI: 10.3390/diagnostics13020286] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/24/2022] [Accepted: 01/05/2023] [Indexed: 01/15/2023] Open
Abstract
In the era of big data, text-based medical data, such as electronic health records (EHR) and electronic medical records (EMR), are growing rapidly. EHR and EMR are collected from patients to record their basic information, lab tests, vital signs, clinical notes, and reports. EHR and EMR contain the helpful information to assist oncologists in computer-aided diagnosis and decision making. However, it is time consuming for doctors to extract the valuable information they need and analyze the information from the EHR and EMR data. Recently, more and more research works have applied natural language processing (NLP) techniques, i.e., rule-based, machine learning-based, and deep learning-based techniques, on the EHR and EMR data for computer-aided diagnosis in oncology. The objective of this review is to narratively review the recent progress in the area of NLP applications for computer-aided diagnosis in oncology. Moreover, we intend to reduce the research gap between artificial intelligence (AI) experts and clinical specialists to design better NLP applications. We originally identified 295 articles from the three electronic databases: PubMed, Google Scholar, and ACL Anthology; then, we removed the duplicated papers and manually screened the irrelevant papers based on the content of the abstract; finally, we included a total of 23 articles after the screening process of the literature review. Furthermore, we provided an in-depth analysis and categorized these studies into seven cancer types: breast cancer, lung cancer, liver cancer, prostate cancer, pancreatic cancer, colorectal cancer, and brain tumors. Additionally, we identified the current limitations of NLP applications on supporting the clinical practices and we suggest some promising future research directions in this paper.
Collapse
|
25
|
Lu SC, Swisher CL, Chung C, Jaffray D, Sidey-Gibbons C. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol 2023; 13:1129380. [PMID: 36925929 PMCID: PMC10013157 DOI: 10.3389/fonc.2023.1129380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 02/14/2023] [Indexed: 03/04/2023] Open
Abstract
Machine learning-based tools are capable of guiding individualized clinical management and decision-making by providing predictions of a patient's future health state. Through their ability to model complex nonlinear relationships, ML algorithms can often outperform traditional statistical prediction approaches, but the use of nonlinear functions can mean that ML techniques may also be less interpretable than traditional statistical methodologies. While there are benefits of intrinsic interpretability, many model-agnostic approaches now exist and can provide insight into the way in which ML systems make decisions. In this paper, we describe how different algorithms can be interpreted and introduce some techniques for interpreting complex nonlinear algorithms.
Collapse
Affiliation(s)
- Sheng-Chieh Lu
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Christine L Swisher
- The Ronin Project, San Mateo, CA, United States.,The Lawrence J. Ellison Institute for Transformative Medicine, Los Angeles, CA, United States
| | - Caroline Chung
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - David Jaffray
- Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Chris Sidey-Gibbons
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
26
|
Extracting patient-level data from the electronic health record: Expanding opportunities for health system research. PLoS One 2023; 18:e0280342. [PMID: 36897886 PMCID: PMC10004557 DOI: 10.1371/journal.pone.0280342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 12/27/2022] [Indexed: 03/11/2023] Open
Abstract
BACKGROUND Epidemiological studies of interstitial lung disease (ILD) are limited by small numbers and tertiary care bias. Investigators have leveraged the widespread use of electronic health records (EHRs) to overcome these limitations, but struggle to extract patient-level, longitudinal clinical data needed to address many important research questions. We hypothesized that we could automate longitudinal ILD cohort development using the EHR of a large, community-based healthcare system. STUDY DESIGN AND METHODS We applied a previously validated algorithm to the EHR of a community-based healthcare system to identify ILD cases between 2012-2020. We then extracted disease-specific characteristics and outcomes using fully automated data-extraction algorithms and natural language processing of selected free-text. RESULTS We identified a community cohort of 5,399 ILD patients (prevalence = 118 per 100,000). Pulmonary function tests (71%) and serologies (54%) were commonly used in the diagnostic evaluation, whereas lung biopsy was rare (5%). IPF was the most common ILD diagnosis (n = 972, 18%). Prednisone was the most commonly prescribed medication (911, 17%). Nintedanib and pirfenidone were rarely prescribed (n = 305, 5%). ILD patients were high-utilizers of inpatient (40%/year hospitalized) and outpatient care (80%/year with pulmonary visit), with sustained utilization throughout the post-diagnosis study period. DISCUSSION We demonstrated the feasibility of robustly characterizing a variety of patient-level utilization and health services outcomes in a community-based EHR cohort. This represents a substantial methodological improvement by alleviating traditional constraints on the accuracy and clinical resolution of such ILD cohorts; we believe this approach will make community-based ILD research more efficient, effective, and scalable.
Collapse
|
27
|
Hamamoto R, Koyama T, Kouno N, Yasuda T, Yui S, Sudo K, Hirata M, Sunami K, Kubo T, Takasawa K, Takahashi S, Machino H, Kobayashi K, Asada K, Komatsu M, Kaneko S, Yatabe Y, Yamamoto N. Introducing AI to the molecular tumor board: one direction toward the establishment of precision medicine using large-scale cancer clinical and biological information. Exp Hematol Oncol 2022; 11:82. [PMID: 36316731 PMCID: PMC9620610 DOI: 10.1186/s40164-022-00333-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/05/2022] [Indexed: 11/10/2022] Open
Abstract
Since U.S. President Barack Obama announced the Precision Medicine Initiative in his New Year's State of the Union address in 2015, the establishment of a precision medicine system has been emphasized worldwide, particularly in the field of oncology. With the advent of next-generation sequencers specifically, genome analysis technology has made remarkable progress, and there are active efforts to apply genome information to diagnosis and treatment. Generally, in the process of feeding back the results of next-generation sequencing analysis to patients, a molecular tumor board (MTB), consisting of experts in clinical oncology, genetic medicine, etc., is established to discuss the results. On the other hand, an MTB currently involves a large amount of work, with humans searching through vast databases and literature, selecting the best drug candidates, and manually confirming the status of available clinical trials. In addition, as personalized medicine advances, the burden on MTB members is expected to increase in the future. Under these circumstances, introducing cutting-edge artificial intelligence (AI) technology and information and communication technology to MTBs while reducing the burden on MTB members and building a platform that enables more accurate and personalized medical care would be of great benefit to patients. In this review, we introduced the latest status of elemental technologies that have potential for AI utilization in MTB, and discussed issues that may arise in the future as we progress with AI implementation.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Takafumi Koyama
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Nobuji Kouno
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.258799.80000 0004 0372 2033Department of Surgery, Graduate School of Medicine, Kyoto University, Yoshida-konoe-cho, Sakyo-ku, Kyoto, 606-8303 Japan
| | - Tomohiro Yasuda
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
| | - Shuntaro Yui
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.417547.40000 0004 1763 9564Research and Development Group, Hitachi, Ltd., 1-280 Higashi-koigakubo, Kokubunji, Tokyo, 185-8601 Japan
| | - Kazuki Sudo
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.272242.30000 0001 2168 5385Department of Medical Oncology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Makoto Hirata
- grid.272242.30000 0001 2168 5385Department of Genetic Medicine and Services, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Kuniko Sunami
- grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Takashi Kubo
- grid.272242.30000 0001 2168 5385Department of Laboratory Medicine, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Ken Takasawa
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Satoshi Takahashi
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Hidenori Machino
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Kazuma Kobayashi
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Ken Asada
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Masaaki Komatsu
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Syuzo Kaneko
- grid.272242.30000 0001 2168 5385Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.509456.bCancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027 Japan
| | - Yasushi Yatabe
- grid.272242.30000 0001 2168 5385Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan ,grid.272242.30000 0001 2168 5385Division of Molecular Pathology, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| | - Noboru Yamamoto
- grid.272242.30000 0001 2168 5385Department of Experimental Therapeutics, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045 Japan
| |
Collapse
|
28
|
Li Y, Brendel M, Wu N, Ge W, Zhang H, Rietschel P, Quek RGW, Pouliot JF, Wang F, Harnett J. Machine learning models for identifying predictors of clinical outcomes with first-line immune checkpoint inhibitor therapy in advanced non-small cell lung cancer. Sci Rep 2022; 12:17670. [PMID: 36271096 PMCID: PMC9586943 DOI: 10.1038/s41598-022-20061-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/08/2022] [Indexed: 01/18/2023] Open
Abstract
Immune checkpoint inhibitors (ICIs) are standard-of-care as first-line (1L) therapy for advanced non-small cell lung cancer (aNSCLC) without actionable oncogenic driver mutations. While clinical trials demonstrated benefits of ICIs over chemotherapy, variation in outcomes across patients has been observed and trial populations may not be representative of clinical practice. Predictive models can help understand heterogeneity of treatment effects, identify predictors of meaningful clinical outcomes, and may inform treatment decisions. We applied machine learning (ML)-based survival models to a real-world cohort of patients with aNSCLC who received 1L ICI therapy extracted from a US-based electronic health record database. Model performance was evaluated using metrics including concordance index (c-index), and we used explainability techniques to identify significant predictors of overall survival (OS) and progression-free survival (PFS). The ML model achieved c-indices of 0.672 and 0.612 for OS and PFS, respectively, and Kaplan-Meier survival curves showed significant differences between low- and high-risk groups for OS and PFS (both log-rank test p < 0.0001). Identified predictors were mostly consistent with the published literature and/or clinical expectations and largely overlapped for OS and PFS; Eastern Cooperative Oncology Group performance status, programmed cell death-ligand 1 expression levels, and serum albumin were among the top 5 predictors for both outcomes. Prospective and independent data set evaluation is required to confirm these results.
Collapse
Affiliation(s)
- Ying Li
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Matthew Brendel
- grid.5386.8000000041936877XInstitute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY USA
| | - Ning Wu
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Wenzhen Ge
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Hao Zhang
- grid.5386.8000000041936877XDepartment of Population Health Sciences, Weill Cornell Medicine, New York, NY USA
| | - Petra Rietschel
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Ruben G. W. Quek
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Jean-Francois Pouliot
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| | - Fei Wang
- grid.5386.8000000041936877XDepartment of Population Health Sciences, Weill Cornell Medicine, New York, NY USA
| | - James Harnett
- grid.418961.30000 0004 0472 2713Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY 10591 USA
| |
Collapse
|
29
|
Machine learning models to prognose 30-Day Mortality in Postoperative Disseminated Cancer Patients. Surg Oncol 2022; 44:101810. [DOI: 10.1016/j.suronc.2022.101810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/14/2022] [Accepted: 07/03/2022] [Indexed: 11/18/2022]
|
30
|
Hou J, Zhao R, Cai T, Beaulieu-Jones B, Seyok T, Dahal K, Yuan Q, Xiong X, Bonzel CL, Fox C, Christiani DC, Jemielita T, Liao KP, Liaw KL, Cai T. Temporal Trends in Clinical Evidence of 5-Year Survival Within Electronic Health Records Among Patients With Early-Stage Colon Cancer Managed With Laparoscopy-Assisted Colectomy vs Open Colectomy. JAMA Netw Open 2022; 5:e2218371. [PMID: 35737384 PMCID: PMC9227003 DOI: 10.1001/jamanetworkopen.2022.18371] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Temporal shifts in clinical knowledge and practice need to be adjusted for in treatment outcome assessment in clinical evidence. OBJECTIVE To use electronic health record (EHR) data to (1) assess the temporal trends in treatment decisions and patient outcomes and (2) emulate a randomized clinical trial (RCT) using EHR data with proper adjustment for temporal trends. DESIGN, SETTING, AND PARTICIPANTS The Clinical Outcomes of Surgical Therapy (COST) Study Group Trial assessing overall survival of patients with stages I to III early-stage colon cancer was chosen as the target trial. The RCT was emulated using EHR data of patients from a single health care system cohort who underwent colectomy for early-stage colon cancer from January 1, 2006, to December 31, 2017, and were followed up to January 1, 2020, from Mass General Brigham. Analyses were conducted from December 2, 2019, to January 24, 2022. EXPOSURES Laparoscopy-assisted colectomy (LAC) vs open colectomy (OC). MAIN OUTCOMES AND MEASURES The primary outcome was 5-year overall survival. To address confounding in the emulation, pretreatment variables were selected and adjusted. The temporal trends were adjusted by stratification of the calendar year when the colectomies were performed with cotraining across strata. RESULTS A total of 943 patients met key RCT eligibility criteria in the EHR emulation cohort, including 518 undergoing LAC (median age, 63 [range, 20-95] years; 268 [52%] women; 121 [23%] with stage I, 165 [32%] with stage II, and 232 [45%] with stage III cancer; 32 [6%] with colon adhesion; 278 [54%] with right-sided colon cancer; 18 [3%] with left-sided colon cancer; and 222 [43%] with sigmoid colon cancer) and 425 undergoing OC (median age, 65 [range, 28-99] years; 223 [52%] women; 61 [14%] with stage I, 153 [36%] with stage II, and 211 [50%] with stage III cancer; 39 [9%] with colon adhesion; 202 [47%] with right-sided colon cancer; 39 [9%] with left-sided colon cancer; and 201 [47%] with sigmoid colon cancer). Tests for temporal trends in treatment assignment (χ2 = 60.3; P < .001) and overall survival (χ2 = 137.2; P < .001) were significant. The adjusted EHR emulation reached the same conclusion as the RCT: LAC is not inferior to OC in overall survival rate with risk difference at 5 years of -0.007 (95% CI, -0.070 to 0.057). The results were consistent for stratified analysis within each temporal period. CONCLUSIONS AND RELEVANCE These findings suggest that confounding bias from temporal trends should be considered when conducting clinical evidence studies with long time spans. Stratification of calendar time and cotraining of models is one solution. With proper adjustment, clinical evidence may supplement RCTs in the assessment of treatment outcome over time.
Collapse
Affiliation(s)
- Jue Hou
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Rachel Zhao
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tianrun Cai
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Brett Beaulieu-Jones
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Thany Seyok
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Kumar Dahal
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
| | - Qianyu Yuan
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Xin Xiong
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Clara-Lea Bonzel
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | | | - David C. Christiani
- Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | | | - Katherine P. Liao
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | | | - Tianxi Cai
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
31
|
Wongvibulsin S, Frech TM, Chren MM, Tkaczyk ER. Expanding Personalized, Data-Driven Dermatology: Leveraging Digital Health Technology and Machine Learning to Improve Patient Outcomes. JID INNOVATIONS 2022; 2:100105. [PMID: 35462957 PMCID: PMC9026581 DOI: 10.1016/j.xjidi.2022.100105] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/13/2021] [Accepted: 01/07/2022] [Indexed: 11/30/2022] Open
Abstract
The current revolution of digital health technology and machine learning offers enormous potential to improve patient care. Nevertheless, it is essential to recognize that dermatology requires an approach different from those of other specialties. For many dermatological conditions, there is a lack of standardized methodology for quantitatively tracking disease progression and treatment response (clinimetrics). Furthermore, dermatological diseases impact patients in complex ways, some of which can be measured only through patient reports (psychometrics). New tools using digital health technology (e.g., smartphone applications, wearable devices) can aid in capturing both clinimetric and psychometric variables over time. With these data, machine learning can inform efforts to improve health care by, for example, the identification of high-risk patient groups, optimization of treatment strategies, and prediction of disease outcomes. We use the term personalized, data-driven dermatology to refer to the use of comprehensive data to inform individual patient care and improve patient outcomes. In this paper, we provide a framework that includes data from multiple sources, leverages digital health technology, and uses machine learning. Although this framework is applicable broadly to dermatological conditions, we use the example of a serious inflammatory skin condition, chronic cutaneous graft-versus-host disease, to illustrate personalized, data-driven dermatology.
Collapse
Affiliation(s)
- Shannon Wongvibulsin
- Department of Dermatology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Tracy M. Frech
- Division of Rheumatology and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- VA Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, Tennessee, USA
| | - Mary-Margaret Chren
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Eric R. Tkaczyk
- VA Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, Tennessee, USA
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biomedical Engineering, School of Engineering, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
32
|
Hu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inform 2022; 10:e35475. [PMID: 35468085 PMCID: PMC9086872 DOI: 10.2196/35475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/31/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open
Abstract
Background Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non–small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician’s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician’s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician’s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Huanyao Zhang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
33
|
Zhou J, Xin H. Emerging artificial intelligence methods for fighting lung cancer: a survey. CLINICAL EHEALTH 2022. [DOI: 10.1016/j.ceh.2022.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
34
|
Ren G, Yu K, Xie Z, Liu L, Wang P, Zhang W, Wang Y, Wu X. Differentiation of lumbar disc herniation and lumbar spinal stenosis using natural language processing–based machine learning based on positive symptoms. Neurosurg Focus 2022; 52:E7. [DOI: 10.3171/2022.1.focus21561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/20/2022] [Indexed: 11/06/2022]
Abstract
OBJECTIVE
The purpose of this study was to develop natural language processing (NLP)–based machine learning algorithms to automatically differentiate lumbar disc herniation (LDH) and lumbar spinal stenosis (LSS) based on positive symptoms in free-text admission notes. The secondary purpose was to compare the performance of the deep learning algorithm with the ensemble model on the current task.
METHODS
In total, 1921 patients whose principal diagnosis was LDH or LSS between June 2013 and June 2020 at Zhongda Hospital, affiliated with Southeast University, were retrospectively analyzed. The data set was randomly divided into a training set and testing set at a 7:3 ratio. Long Short-Term Memory (LSTM) and extreme gradient boosting (XGBoost) models were developed in this study. NLP algorithms were assessed on the testing set by the following metrics: receiver operating characteristic (ROC) curve, area under the curve (AUC), accuracy score, recall score, F1 score, and precision score.
RESULTS
In the testing set, the LSTM model achieved an AUC of 0.8487, accuracy score of 0.7818, recall score of 0.9045, F1 score of 0.8108, and precision score of 0.7347. In comparison, the XGBoost model achieved an AUC of 0.7565, accuracy score of 0.6961, recall score of 0.7387, F1 score of 0.7153, and precision score of 0.6934.
CONCLUSIONS
NLP-based machine learning algorithms were a promising auxiliary to the electronic health record in spine disease diagnosis. LSTM, the deep learning model, showed better capacity compared with the widely used ensemble model, XGBoost, in differentiation of LDH and LSS using positive symptoms. This study presents a proof of concept for the application of NLP in prediagnosis of spine disease.
Collapse
Affiliation(s)
- GuanRui Ren
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - Kun Yu
- Nanjing Jiangbei Hospital, Nanjing, Jiangsu, China
| | - ZhiYang Xie
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - Lei Liu
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - PeiYang Wang
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - Wei Zhang
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - YunTao Wang
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| | - XiaoTao Wu
- Zhongda Hospital, Medical College, Southeast University, Nanjing; and
| |
Collapse
|
35
|
Andjelkovic J, Ljubic B, Hai AA, Stanojevic M, Pavlovski M, Diaz W, Obradovic Z. Sequential machine learning in prediction of common cancers. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
36
|
Ganguli R, Franklin J, Yu X, Lin A, Heffernan DS. Machine learning methods to predict presence of residual cancer following hysterectomy. Sci Rep 2022; 12:2738. [PMID: 35177700 PMCID: PMC8854708 DOI: 10.1038/s41598-022-06585-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/24/2022] [Indexed: 12/24/2022] Open
Abstract
Surgical management for gynecologic malignancies often involves hysterectomy, often constituting the most common gynecologic surgery worldwide. Despite maximal surgical and medical care, gynecologic malignancies have a high rate of recurrence following surgery. Current machine learning models use advanced pathology data that is often inaccessible within low-resource settings and are specific to singular cancer types. There is currently a need for machine learning models to predict non-clinically evident residual disease using only clinically available health data. Here we developed and tested multiple machine learning models to assess the risk of residual disease post-hysterectomy based on clinical and operative parameters. Data from 3656 hysterectomy patients from the NSQIP dataset over 14 years were used to develop models with a training set of 2925 patients and a validation set of 731 patients. Our models revealed the top postoperative predictors of residual disease were the initial presence of gross abdominal disease on the diaphragm, disease located on the bowel mesentery, located on the bowel serosa, and disease located within the adjacent pelvis prior to resection. There were no statistically significant differences in performances of the top three models. Extreme gradient Boosting, Random Forest, and Logistic Regression models had comparable AUC ROC (0.90) and accuracy metrics (87–88%). Using these models, physicians can identify gynecologic cancer patients post-hysterectomy that may benefit from additional treatment. For patients at high risk for disease recurrence despite adequate surgical intervention, machine learning models may lay the basis for potential prospective trials with prophylactic/adjuvant therapy for non-clinically evident residual disease, particularly in under-resourced settings.
Collapse
Affiliation(s)
- Reetam Ganguli
- Brown University, Providence, USA.,Department of Surgery, Rhode Island Hospital, Brown University, Providence, USA
| | - Jordan Franklin
- Department of Computer Sciences, Georgia Institute of Technology, Atlanta, USA
| | - Xiaotian Yu
- Department of Mathematics, University of Virginia, Charlottesville, USA
| | - Alice Lin
- Warren Alpert Medical School, Providence, USA.,Department of Surgery, Rhode Island Hospital, Brown University, Providence, USA
| | - Daithi S Heffernan
- Brown University, Providence, USA. .,Warren Alpert Medical School, Providence, USA. .,Department of Surgery, Rhode Island Hospital, Brown University, Providence, USA. .,Division of Trauma/Surgical Critical Care, Division of Surgical Research, Department of Surgery, Rhode Island Hospital, Brown University, Room 207, Aldrich Building, 593 Eddy Street, Providence, RI, 02903, USA.
| |
Collapse
|
37
|
Yuan Q, Du M, Loehrer E, Johnson BE, Gainor JF, Lanuti M, Li Y, Christiani DC. Postdiagnosis BMI Change Is Associated with Non-Small Cell Lung Cancer Survival. Cancer Epidemiol Biomarkers Prev 2021; 31:262-268. [PMID: 34728470 DOI: 10.1158/1055-9965.epi-21-0503] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 06/24/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Body mass index (BMI) change after a lung cancer diagnosis has been associated with non-small cell lung cancer (NSCLC) survival. This study aimed to quantify the association based on a large-scale observational study. METHODS Included in the study were 7,547 patients with NSCLC with prospectively collected BMI data from Massachusetts General Hospital and Brigham and Women's Hospital/Dana-Farber Cancer Institute. Cox proportional hazards regression with time-dependent covariates was used to estimate effect of time-varying postdiagnosis BMI change rate (% per month) on overall survival (OS), stratified by clinical subgroups. Spline analysis was conducted to quantify the nonlinear association. A Mendelian Randomization (MR) analysis with a total of 3,495 patients further validated the association. RESULTS There was a J-shape association between postdiagnosis BMI change and OS among patients with NSCLC. Specifically, a moderate BMI decrease [0.5-2.0; HR = 2.45; 95% confidence interval (CI), 2.25-2.67] and large BMI decrease (≥2.0; HR = 4.65; 95% CI, 4.15-5.20) were strongly associated with worse OS, whereas moderate weight gain (0.5-2.0) reduced the risk for mortality (HR = 0.78; 95% CI, 0.68-0.89) and large weight gain (≥2.0) slightly increased the risk of mortality without reaching statistical significance (HR = 1.10; 95% CI, 0.86-1.42). MR analyses supported the potential causal roles of postdiagnosis BMI change in survival. CONCLUSIONS This study indicates that BMI change after diagnosis was associated with mortality risk. IMPACT Our findings, which reinforce the importance of postdiagnosis BMI surveillance, suggest that weight loss or large weight gain may be unwarranted.
Collapse
Affiliation(s)
- Qianyu Yuan
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Mulong Du
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.,Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Elizabeth Loehrer
- Department of Epidemiology, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Bruce E Johnson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Justin F Gainor
- Center for Thoracic Cancers, Massachusetts General Hospital Cancer Center, Boston, Massachusetts.,Department of Medicine, Massachusetts General Hospital Cancer Center, Boston, Massachusetts
| | - Michael Lanuti
- Center for Thoracic Cancers, Massachusetts General Hospital Cancer Center, Boston, Massachusetts
| | - Yi Li
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan
| | - David C Christiani
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts. .,Department of Medicine, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
38
|
Bohn MK, Fabiano GF, Adeli K. Electronic tools in clinical laboratory diagnostics: key examples, limitations, and value in laboratory medicine. J LAB MED 2021. [DOI: 10.1515/labmed-2021-0114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Electronic tools in clinical laboratory diagnostics can assist laboratory professionals, clinicians, and patients in medical diagnostic management and laboratory test interpretation. With increasing implementation of electronic health records (EHRs) and laboratory information systems worldwide, there is increasing demand for well-designed and evidence-based electronic resources. Both complex data-driven and simple interpretative electronic healthcare tools are currently available to improve the integration of clinical and laboratory information towards a more patient-centered approach to medicine. Several studies have reported positive clinical impact of electronic healthcare tool implementation in clinical laboratory diagnostics, including in the management of neonatal bilirubinemia, cardiac disease, and nutritional status. As patients have increasing access to their medical laboratory data, it is essential that accessible electronic healthcare tools are evidence-based and user-friendly for individuals of varying digital and medical literacy. Indeed, studies suggest electronic healthcare tool development processes significantly lack the involvement of relevant healthcare professionals and often present misinformation, including erroneous calculation algorithms or inappropriate interpretative recommendations. The current review provides an overview of the utility of available electronic healthcare tools in clinical laboratory diagnostics and critically reviews potential limitations and benefits of their clinical implementation. The Canadian Laboratory Initiative on Pediatric Reference Intervals (CALIPER) online database is also detailed as an example of a pediatric diagnostic tool with widespread global impact.
Collapse
Affiliation(s)
- Mary Kathryn Bohn
- Molecular Medicine and Clinical Biochemistry , The Hospital for Sick Children , Toronto , ON , Canada
- Laboratory Medicine and Pathobiology , University of Toronto , Toronto , ON , Canada
| | - Giulia F. Fabiano
- Molecular Medicine and Clinical Biochemistry , The Hospital for Sick Children , Toronto , ON , Canada
| | - Khosrow Adeli
- Molecular Medicine and Clinical Biochemistry , The Hospital for Sick Children , Toronto , ON , Canada
- Laboratory Medicine and Pathobiology , University of Toronto , Toronto , ON , Canada
| |
Collapse
|