Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc 2018;24:1052-1061. [PMID: 28379439 DOI: 10.1093/jamia/ocx030] [Citation(s) in RCA: 150] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 03/13/2017] [Indexed: 12/26/2022] Open

For:	Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc 2018;24:1052-1061. [PMID: 28379439 DOI: 10.1093/jamia/ocx030] [Citation(s) in RCA: 150] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 03/13/2017] [Indexed: 12/26/2022] Open

Number

Cited by Other Article(s)

McIlroy DR. Predictive modelling for postoperative acute kidney injury: big data enhancing quality or the Emperor's new clothes? Br J Anaesth 2024;133:476-478. [PMID: 38902116 DOI: 10.1016/j.bja.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/30/2024] [Accepted: 05/10/2024] [Indexed: 06/22/2024] Open

Zhuo XY, Lei SH, Sun L, Bai YW, Wu J, Zheng YJ, Liu KX, Liu WF, Zhao BC. Preoperative risk prediction models for acute kidney injury after noncardiac surgery: an independent external validation cohort study. Br J Anaesth 2024;133:508-518. [PMID: 38527923 DOI: 10.1016/j.bja.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/27/2024] Open

Abstract

BACKGROUND

Numerous models have been developed to predict acute kidney injury (AKI) after noncardiac surgery, yet there is a lack of independent validation and comparison among them.

METHODS

We conducted a systematic literature search to review published risk prediction models for AKI after noncardiac surgery. An independent external validation was performed using a retrospective surgical cohort at a large Chinese hospital from January 2019 to October 2022. The cohort included patients undergoing a wide range of noncardiac surgeries with perioperative creatinine measurements. Postoperative AKI was defined according to the Kidney Disease Improving Global Outcomes creatinine criteria. Model performance was assessed in terms of discrimination (area under the receiver operating characteristic curve, AUROC), calibration (calibration plot), and clinical utility (net benefit), before and after model recalibration through intercept and slope updates. A sensitivity analysis was conducted by including patients without postoperative creatinine measurements in the validation cohort and categorising them as non-AKI cases.

RESULTS

Nine prediction models were evaluated, each with varying clinical and methodological characteristics, including the types of surgical cohorts used for model development, AKI definitions, and predictors. In the validation cohort involving 13,186 patients, 650 (4.9%) developed AKI. Three models demonstrated fair discrimination (AUROC between 0.71 and 0.75); other models had poor or failed discrimination. All models exhibited some miscalibration; five of the nine models were well-calibrated after intercept and slope updates. Decision curve analysis indicated that the three models with fair discrimination consistently provided a positive net benefit after recalibration. The results were confirmed in the sensitivity analysis.

CONCLUSIONS

We identified three models with fair discrimination and potential clinical utility after recalibration for assessing the risk of acute kidney injury after noncardiac surgery.

Collapse

Affiliation(s)

Xiao-Yu Zhuo Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China
Shao-Hui Lei Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China
Lan Sun Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Department of Biostatistics, Lejiu Healthcare Technology Co., Ltd, Hangzhou, China
Ya-Wen Bai College of Anaesthesiology, Southern Medical University, Guangzhou, China
Jiao Wu College of Anaesthesiology, Southern Medical University, Guangzhou, China
Yong-Jia Zheng College of Anaesthesiology, Southern Medical University, Guangzhou, China
Ke-Xuan Liu Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China; Outcomes Research Consortium, Cleveland, OH, USA.
Wei-Feng Liu Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China.
Bing-Cheng Zhao Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China; Outcomes Research Consortium, Cleveland, OH, USA.

Collapse

Han L, Char DS, Aghaeepour N. Artificial Intelligence in Perioperative Care: Opportunities and Challenges. Anesthesiology 2024;141:379-387. [PMID: 38980160 PMCID: PMC11239120 DOI: 10.1097/aln.0000000000005013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]

Silverman AL, Shung D, Stidham RW, Kochhar GS, Iacucci M. How Artificial Intelligence Will Transform Clinical Care, Research, and Trials for Inflammatory Bowel Disease. Clin Gastroenterol Hepatol 2024:S1542-3565(24)00598-6. [PMID: 38992406 DOI: 10.1016/j.cgh.2024.05.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 07/13/2024]

Liou L, Scott E, Parchure P, Ouyang Y, Egorova N, Freeman R, Hofer IS, Nadkarni GN, Timsina P, Kia A, Levin MA. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system. NPJ Digit Med 2024;7:149. [PMID: 38844546 PMCID: PMC11156633 DOI: 10.1038/s41746-024-01141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open

Abstract

Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.

Collapse

Affiliation(s)

Lathan Liou Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Erick Scott cStructure, La Jolla, CA, USA
Prathamesh Parchure Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Yuxia Ouyang Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Natalia Egorova Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Robert Freeman Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Ira S Hofer Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Girish N Nadkarni The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Prem Timsina Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Arash Kia Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Matthew A Levin Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA

Collapse

Kigo J, Kamau S, Mawji A, Mwaniki P, Dunsmuir D, Pillay Y, Zhang C, Pallot K, Ogero M, Kimutai D, Ouma M, Mohamed I, Chege M, Thuranira L, Kissoon N, Ansermino JM, Akech S. External validation of a paediatric Smart triage model for use in resource limited facilities. PLOS DIGITAL HEALTH 2024;3:e0000293. [PMID: 38905166 PMCID: PMC11192416 DOI: 10.1371/journal.pdig.0000293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 04/24/2024] [Indexed: 06/23/2024]

Abstract

Models for digital triage of sick children at emergency departments of hospitals in resource poor settings have been developed. However, prior to their adoption, external validation should be performed to ensure their generalizability. We externally validated a previously published nine-predictor paediatric triage model (Smart Triage) developed in Uganda using data from two hospitals in Kenya. Both discrimination and calibration were assessed, and recalibration was performed by optimizing the intercept for classifying patients into emergency, priority, or non-urgent categories based on low-risk and high-risk thresholds. A total of 2539 patients were eligible at Hospital 1 and 2464 at Hospital 2, and 5003 for both hospitals combined; admission rates were 8.9%, 4.5%, and 6.8%, respectively. The model showed good discrimination, with area under the receiver-operator curve (AUC) of 0.826, 0.784 and 0.821, respectively. The pre-calibrated model at a low-risk threshold of 8% achieved a sensitivity of 93% (95% confidence interval, (CI):89%-96%), 81% (CI:74%-88%), and 89% (CI:85%-92%), respectively, and at a high-risk threshold of 40%, the model achieved a specificity of 86% (CI:84%-87%), 96% (CI:95%-97%), and 91% (CI:90%-92%), respectively. Recalibration improved the graphical fit, but new risk thresholds were required to optimize sensitivity and specificity.The Smart Triage model showed good discrimination on external validation but required recalibration to improve the graphical fit of the calibration plot. There was no change in the order of prioritization of patients following recalibration in the respective triage categories. Recalibration required new site-specific risk thresholds that may not be needed if prioritization based on rank is all that is required. The Smart Triage model shows promise for wider application for use in triage for sick children in different settings.

Collapse

Affiliation(s)

Joyce Kigo Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
Stephen Kamau Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
Alishah Mawji Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
Paul Mwaniki Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
Dustin Dunsmuir Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
Yashodani Pillay Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
Cherri Zhang Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
Katija Pallot Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
Morris Ogero Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
David Kimutai Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
Mary Ouma Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
Ismael Mohamed Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
Mary Chege Department of Pediatrics, Kiambu County Referral Hospital, Kiambu, Kenya
Lydia Thuranira Department of Pediatrics, Kiambu County Referral Hospital, Kiambu, Kenya
Niranjan Kissoon Department of Pediatrics, University of British Columbia, Vancouver, British Columbia, Canada
J. Mark Ansermino Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
Samuel Akech Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya

Collapse

Brosula R, Corbin CK, Chen JH. Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024;2024:95-104. [PMID: 38827052 PMCID: PMC11141811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Kistanova E, Yotov S, Zaimova D. Intelligent Animal Husbandry: Present and Future. Animals (Basel) 2024;14:1645. [PMID: 38891691 PMCID: PMC11171394 DOI: 10.3390/ani14111645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/28/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open

Pean CA, Buddhiraju A, Shimizu MR, Chen TLW, Esposito JG, Kwon YM. Prediction of 30-Day Mortality Following Revision Total Hip and Knee Arthroplasty: Machine Learning Algorithms Outperform CARDE-B, 5-Item, and 6-Item Modified Frailty Index Risk Scores. J Arthroplasty 2024:S0883-5403(24)00528-X. [PMID: 38797444 DOI: 10.1016/j.arth.2024.05.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/13/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024] Open

Abstract

BACKGROUND

Although risk calculators are used to prognosticate postoperative outcomes following revision total hip and knee arthroplasty (total joint arthroplasty [TJA]), machine learning (ML) based predictive tools have emerged as a promising alternative for improved risk stratification. This study aimed to compare the predictive ability of ML models for 30-day mortality following revision TJA to that of traditional risk-assessment indices such as the CARDE-B score (congestive heart failure, albumin (< 3.5 mg/dL), renal failure on dialysis, dependence for daily living, elderly (> 65 years of age), and body mass index (BMI) of < 25 kg/m2), 5-item modified frailty index (5MFI), and 6MFI.

METHODS

Adult patients undergoing revision TJA between 2013 and 2020 were selected from the American College of Surgeons National Surgical Quality Improvement Program database and randomly split 80:20 to compose the training and validation cohorts. There were 3 ML models - extreme gradient boosting, random forest, and elastic-net penalized logistic regression (NEPLR) - that were developed and evaluated using discrimination, calibration metrics, and accuracy. The discrimination of CARDE-B, 5MFI, and 6MFI scores was assessed individually and compared to that of ML models.

RESULTS

All models were equally accurate (Brier score = 0.005) and demonstrated outstanding discrimination with similar areas under the receiver operating characteristic curve (AUCs, extreme gradient boosting = 0.94, random forest = NEPLR = 0.93). The NEPLR was the best-calibrated model overall (slope = 0.54, intercept = -0.004). The CARDE-B had the highest discrimination among the scores (AUC = 0.89), followed by 6MFI (AUC = 0.80), and 5MFI (AUC = 0.68). Albumin < 3.5 mg/dL and BMI (< 30.15) were the most important predictors of 30-day mortality following revision TJA.

CONCLUSIONS

The ML models outperform traditional risk-assessment indices in predicting postoperative 30-day mortality after revision TJA. Our findings highlight the utility of ML for risk stratification in a clinical setting. The identification of hypoalbuminemia and BMI as prognostic markers may allow patient-specific perioperative optimization strategies to improve outcomes following revision TJA.

Collapse

Harrison-Brown M, Scholes C, Ebrahimi M, Bell C, Kirwan G. Applying models of care for total hip and knee arthroplasty: External validation of a published predictive model to identify extended stay risk prior to lower-limb arthroplasty. Clin Rehabil 2024;38:700-712. [PMID: 38377957 DOI: 10.1177/02692155241233348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]

Abstract

OBJECTIVE

This study aimed to externally validate a reported model for identifying patients requiring extended stay following lower limb arthroplasty in a new setting.

DESIGN

External validation of a previously reported prognostic model, using retrospective data.

SETTING

Medium-sized hospital orthopaedic department, Australia.

PARTICIPANTS

Electronic medical records were accessed for data collection between Sep-2019 and Feb-2020 and retrospective data extracted from 200 randomly selected total hip or knee arthroplasty patients.

INTERVENTION

Participants received total hip or knee replacement between 2-Feb-16 and 4-Apr-19. This study was a non-interventional retrospective study.

MAIN MEASURES

Model validation was assessed with discrimination, calibration on both original and adjusted forms of the candidate model. Decision curve analysis was conducted on the outputs of the adjusted model to determine net benefit at a predetermined decision threshold (0.5).

RESULTS

The original model performed poorly, grossly overestimating length of stay with mean calibration of -3.6 (95% confidence interval -3.9 to -3.2) and calibration slope of 0.52. Performance improved following adjustment of the model intercept and model coefficients (mean calibration 0.48, 95% confidence interval 0.16 to 0.80 and slope of 1.0), but remained poorly calibrated at low and medium risk threshold and net benefit was modest (three additional patients per hundred identified as at-risk) at the a-priori risk threshold.

CONCLUSIONS

External validation demonstrated poor performance when applied to a new patient population and would provide limited benefit for our institution. Implementation of predictive models for arthroplasty should include practical assessment of discrimination, calibration and net benefit at a clinically acceptable threshold.

Collapse

Huguet N, Chen J, Parikh RB, Marino M, Flocke SA, Likumahuwa-Ackman S, Bekelman J, DeVoe JE. Applying Machine Learning Techniques to Implementation Science. Online J Public Health Inform 2024;16:e50201. [PMID: 38648094 DOI: 10.2196/50201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 11/15/2023] [Accepted: 03/14/2024] [Indexed: 04/25/2024] Open

Affiliation(s)

Nathalie Huguet Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
Jinying Chen Section of Preventive Medicine and Epidemiology, Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States Data Science Core, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States iDAPT Implementation Science Center for Cancer Control, Wake Forest School of Medicine, Winston-Salem, NC, United States
Ravi B Parikh Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
Miguel Marino Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
Susan A Flocke Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
Sonja Likumahuwa-Ackman Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
Justin Bekelman Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States Penn Center for Cancer Care Innovation, Abramson Cancer Center, Penn Medicine, Philadelphia, PA, United States
Jennifer E DeVoe Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States

Collapse

Andersen ES, Röttger R, Brasen CL, Brandslund I. Analytical Performance Specifications for Input Variables: Investigation of the Model of End-Stage Liver Disease. Clin Chem 2024;70:653-659. [PMID: 38416710 DOI: 10.1093/clinchem/hvae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 12/26/2023] [Indexed: 03/01/2024]

Abstract

BACKGROUND

Artificial intelligence models constitute specific uses of analysis results and, therefore, necessitate evaluation of analytical performance specifications (APS) for this context specifically. The Model of End-stage Liver Disease (MELD) is a clinical prediction model based on measurements of bilirubin, creatinine, and the international normalized ratio (INR). This study evaluates the propagation of error through the MELD, to inform choice of APS for the MELD input variables.

METHODS

A total of 6093 consecutive MELD scores and underlying analysis results were retrospectively collected. "Desirable analytical variation" based on biological variation as well as current local analytical variation was simulated onto the data set as well as onto a constructed data set, representing a worst-case scenario. Resulting changes in MELD score and risk classification were calculated.

RESULTS

Biological variation-based APS in the worst-case scenario resulted in 3.26% of scores changing by ≥1 MELD point. In the patient-derived data set, the same variation resulted in 0.92% of samples changing by ≥1 MELD point, and 5.5% of samples changing risk category. Local analytical performance resulted in lower reclassification rates.

CONCLUSIONS

Error propagation through MELD is complex and includes population-dependent mechanisms. Biological variation-derived APS were acceptable for all uses of the MELD score. Other combinations of APS can yield equally acceptable results. This analysis exemplifies how error propagation through artificial intelligence models can become highly complex. This complexity will necessitate that both model suppliers and clinical laboratories address analytical performance specifications for the specific use case, as these may differ from performance specifications for traditional use of the analyses.

Collapse

Zhuang Y, Dyas A, Meguid RA, Henderson WG, Bronsert M, Madsen H, Colborn KL. Preoperative Prediction of Postoperative Infections Using Machine Learning and Electronic Health Record Data. Ann Surg 2024;279:720-726. [PMID: 37753703 DOI: 10.1097/sla.0000000000006106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]

Abstract

OBJECTIVE

To estimate preoperative risk of postoperative infections using structured electronic health record (EHR) data.

BACKGROUND

Surveillance and reporting of postoperative infections is primarily done through costly, labor-intensive manual chart reviews on a small sample of patients. Automated methods using statistical models applied to postoperative EHR data have shown promise to augment manual review as they can cover all operations in a timely manner. However, there are no specific models for risk-adjusting infectious complication rates using EHR data.

METHODS

Preoperative EHR data from 30,639 patients (2013-2019) were linked to the American College of Surgeons National Surgical Quality Improvement Program preoperative data and postoperative infection outcomes data from 5 hospitals in the University of Colorado Health System. EHR data included diagnoses, procedures, operative variables, patient characteristics, and medications. Lasso and the knockoff filter were used to perform controlled variable selection. Outcomes included surgical site infection, urinary tract infection, sepsis/septic shock, and pneumonia up to 30 days postoperatively.

RESULTS

Among >15,000 candidate predictors, 7 were chosen for the surgical site infection model and 6 for each of the urinary tract infection, sepsis, and pneumonia models. Important variables included preoperative presence of the specific outcome, wound classification, comorbidities, and American Society of Anesthesiologists physical status classification. The area under the receiver operating characteristic curve for each model ranged from 0.73 to 0.89.

CONCLUSIONS

Parsimonious preoperative models for predicting postoperative infection risk using EHR data were developed and showed comparable performance to existing American College of Surgeons National Surgical Quality Improvement Program risk models that use manual chart review. These models can be used to estimate risk-adjusted postoperative infection rates applied to large volumes of EHR data in a timely manner.

Collapse

Affiliation(s)

Yaxu Zhuang Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Department of Biostatistics and Informatics, Colorado School of Public Health
Adam Dyas Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
Robert A Meguid Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO
William G Henderson Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
Michael Bronsert Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO
Helen Madsen Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
Kathryn L Colborn Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus Department of Biostatistics and Informatics, Colorado School of Public Health Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO

Collapse

Perschinka F, Peer A, Joannidis M. [Artificial intelligence and acute kidney injury]. Med Klin Intensivmed Notfmed 2024;119:199-207. [PMID: 38396124 PMCID: PMC10995052 DOI: 10.1007/s00063-024-01111-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/25/2024]

Abstract

Digitalization is increasingly finding its way into intensive care units and with it artificial intelligence (AI) for critically ill patients. One promising area for the use of AI is in the field of acute kidney injury (AKI). The use of AI is primarily focused on the prediction of AKI, but further approaches are also being used to classify existing AKI into different phenotypes. Different AI models are used for prediction. The area under the receiver operating characteristic curve values (AUROC) achieved with these models vary and are influenced by several factors, such as the prediction time and the definition of AKI. Most models have an AUROC between 0.650 and 0.900, with lower values for predictions further into the future and when applying Acute Kidney Injury Network (AKIN) instead of KDIGO criteria. Classification into phenotypes already makes it possible to categorize patients into groups with different risks of mortality or requirement of renal replacement therapy (RRT), but the etiologies or therapeutic consequences derived from this are still lacking. However, all the models suffer from AI-specific shortcomings. The use of large databases does not make it possible to promptly include recent changes in therapy and the implementation of new biomarkers in a relevant proportion. For this reason, serum creatinine and urinary output, with their known limitations, dominate current AI models for prediction impairing the performance of the current models. On the other hand, the increasingly complex models no longer allow physicians to understand the basis on which the warning of a threatening AKI is calculated and subsequent initiation of therapy should take place. The successful use of AIs in routine clinical practice will be highly determined by the trust of the physicians in the systems and overcoming the aforementioned weaknesses. However, the clinician will remain irreplaceable as the decisive authority for critically ill patients by combining measurable and nonmeasurable parameters.

Collapse

Levin TR, Jensen CD, Marks AR, Schlessinger D, Liu V, Udaltsova N, Badalov J, Layefsky E, Corley DA, Nugent JR, Lee JK. Development and External Validation of a Prediction Model for Colorectal Cancer Among Patients Awaiting Surveillance Colonoscopy Following Polypectomy. GASTRO HEP ADVANCES 2024;3:671-683. [PMID: 39165417 PMCID: PMC11330934 DOI: 10.1016/j.gastha.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/12/2024] [Indexed: 08/22/2024]

Abstract

Background and Aims

Demand for surveillance colonoscopy can sometimes exceed capacity, such as during and following the coronavirus disease 2019 pandemic, yet no tools exist to prioritize the patients most likely to be diagnosed with colorectal cancer (CRC) among those awaiting surveillance colonoscopy. We developed a multivariable prediction model for CRC at surveillance comparing performance to a model that assigned patients as low or high risk based solely on polyp characteristics (guideline-based model).

Methods

Logistic regression was used for model development among patients receiving surveillance colonoscopy in 2014-2019. Candidate predictors included index colonoscopy indication, findings, and endoscopist adenoma detection rate, and patient and clinical characteristics at surveillance. Patients were randomly divided into model development (n = 36,994) and internal validation cohorts (n = 15,854). External validation was performed on 30,015 patients receiving surveillance colonoscopy in 2020-2022, and the multivariable model was then updated and retested.

Results

One hundred fourteen, 43, and 71 CRCs were detected at surveillance in the 3 cohorts, respectively. Polyp size ≥10 mm, adenoma detection rate <32.5% or missing, patient age, and ever smoked tobacco were significant CRC predictors; this multivariable model outperformed the guideline-based model (internal validation cohort area under the receiver-operating characteristic curve: 0.73, 95% confidence interval (CI): 0.66-0.81 vs 0.52, 95% CI: 0.45-0.60). Performance declined at external validation but recovered with model updating (operating characteristic curve: 0.72 95% CI: 0.66-0.77).

Conclusion

When surveillance colonoscopy demand exceeds capacity, a prediction model featuring common clinical predictors may help prioritize patients at highest risk for CRC among those awaiting surveillance. Also, regular model updates can address model performance drift.

Collapse

Lasko TA, Strobl EV, Stead WW. Why do probabilistic clinical models fail to transport between sites. NPJ Digit Med 2024;7:53. [PMID: 38429353 PMCID: PMC10907678 DOI: 10.1038/s41746-024-01037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open

Andersen ES, Birk-Korch JB, Röttger R, Brasen CL, Brandslund I, Madsen JS. Monitoring performance of clinical artificial intelligence: a scoping review protocol. JBI Evid Synth 2024;22:453-460. [PMID: 38328955 DOI: 10.11124/jbies-23-00390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]

Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, Harrell FE, Martin GP, Moons KGM, van Smeden M, Sperrin M, Bullock GS, Riley RD. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 2024;384:e074819. [PMID: 38191193 PMCID: PMC10772854 DOI: 10.1136/bmj-2023-074819] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/04/2023] [Indexed: 01/10/2024]

Affiliation(s)

Gary S Collins Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Paula Dhiman Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Jie Ma Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Michael M Schlussel Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Lucinda Archer Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
Ben Van Calster KU Leuven, Department of Development and Regeneration, Leuven, Belgium Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands EPI-Centre, KU Leuven, Belgium
Frank E Harrell Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
Glen P Martin Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
Karel G M Moons Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Maarten van Smeden Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
Matthew Sperrin Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
Garrett S Bullock Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
Richard D Riley Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK

Collapse

Deisenhofer AK, Barkham M, Beierl ET, Schwartz B, Aafjes-van Doorn K, Beevers CG, Berwian IM, Blackwell SE, Bockting CL, Brakemeier EL, Brown G, Buckman JEJ, Castonguay LG, Cusack CE, Dalgleish T, de Jong K, Delgadillo J, DeRubeis RJ, Driessen E, Ehrenreich-May J, Fisher AJ, Fried EI, Fritz J, Furukawa TA, Gillan CM, Gómez Penedo JM, Hitchcock PF, Hofmann SG, Hollon SD, Jacobson NC, Karlin DR, Lee CT, Levinson CA, Lorenzo-Luaces L, McDanal R, Moggia D, Ng MY, Norris LA, Patel V, Piccirillo ML, Pilling S, Rubel JA, Salazar-de-Pablo G, Schleider JL, Schnurr PP, Schueller SM, Siegle GJ, Uher R, Watkins E, Webb CA, Wiltsey Stirman S, Wynants L, Youn SJ, Zilcha-Mano S, Lutz W, Cohen ZD. Implementing precision methods in personalizing psychological therapies: Barriers and possible ways forward. Behav Res Ther 2024;172:104443. [PMID: 38086157 DOI: 10.1016/j.brat.2023.104443] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/21/2023] [Accepted: 11/27/2023] [Indexed: 12/26/2023]

Lou SS, Liu Y, Cohen ME, Ko CY, Hall BL, Kannampallil T. National Multi-Institutional Validation of a Surgical Transfusion Risk Prediction Model. J Am Coll Surg 2024;238:99-105. [PMID: 37737660 DOI: 10.1097/xcs.0000000000000874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]

Bednorz A, Mak JKL, Jylhävä J, Religa D. Use of Electronic Medical Records (EMR) in Gerontology: Benefits, Considerations and a Promising Future. Clin Interv Aging 2023;18:2171-2183. [PMID: 38152074 PMCID: PMC10752027 DOI: 10.2147/cia.s400887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/05/2023] [Indexed: 12/29/2023] Open

Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokiene A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023;31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open

Affiliation(s)

Timothy Bergquist Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Thomas Schaffter Sage Bionetworks, Seattle, WA, United States
Yao Yan Sage Bionetworks, Seattle, WA, United States Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
Thomas Yu Sage Bionetworks, Seattle, WA, United States
Justin Prosser Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
Jifan Gao Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Łukasz Charzewski Proacta, Warsaw, Poland Division of Biophysics, University of Warsaw, Warsaw, Poland
Zofia Nawalany Proacta, Warsaw, Poland
Ivan Brugere Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
Renata Retkute Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
Alisa Prusokiene Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
Augustinas Prusokas Department of Life Sciences, Imperial College London, London, United Kingdom
Yonghwa Choi Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Sanghoon Lee Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Junseok Choe Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Inggeol Lee Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sunkyu Kim Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Jaewoo Kang Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sean D Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Justin Guinney Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States

Collapse

Riley S, Tam K, Tse WY, Connor A, Wei Y. An external validation of the Kidney Donor Risk Index in the UK transplant population in the presence of semi-competing events. Diagn Progn Res 2023;7:20. [PMID: 37986130 PMCID: PMC10662562 DOI: 10.1186/s41512-023-00159-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 09/11/2023] [Indexed: 11/22/2023] Open

Abstract

BACKGROUND

Transplantation represents the optimal treatment for many patients with end-stage kidney disease. When a donor kidney is available to a waitlisted patient, clinicians responsible for the care of the potential recipient must make the decision to accept or decline the offer based upon complex and variable information about the donor, the recipient and the transplant process. A clinical prediction model may be able to support clinicians in their decision-making. The Kidney Donor Risk Index (KDRI) was developed in the United States to predict graft failure following kidney transplantation. The survival process following transplantation consists of semi-competing events where death precludes graft failure, but not vice-versa.

METHODS

We externally validated the KDRI in the UK kidney transplant population and assessed whether validation under a semi-competing risks framework impacted predictive performance. Additionally, we explored whether the KDRI requires updating. We included 20,035 adult recipients of first, deceased donor, single, kidney-only transplants between January 1, 2004, and December 31, 2018, collected by the UK Transplant Registry and held by NHS Blood and Transplant. The outcomes of interest were 1- and 5-year graft failure following transplantation. In light of the semi-competing events, recipient death was handled in two ways: censoring patients at the time of death and modelling death as a competing event. Cox proportional hazard models were used to validate the KDRI when censoring graft failure by death, and cause-specific Cox models were used to account for death as a competing event.

RESULTS

The KDRI underestimated event probabilities for those at higher risk of graft failure. For 5-year graft failure, discrimination was poorer in the semi-competing risks model (0.625, 95% CI 0.611 to 0.640;0.611, 95% CI 0.597 to 0.625), but predictions were more accurate (Brier score 0.117, 95% CI 0.112 to 0.121; 0.114, 95% CI 0.109 to 0.118). Calibration plots were similar regardless of whether the death was modelled as a competing event or not. Updating the KDRI worsened calibration, but marginally improved discrimination.

CONCLUSIONS

Predictive performance for 1-year graft failure was similar between death-censored and competing event graft failure, but differences appeared when predicting 5-year graft failure. The updated index did not have superior performance and we conclude that updating the KDRI in the present form is not required.

Collapse

Bullock GS, Ward P, Impellizzeri FM, Kluzek S, Hughes T, Dhiman P, Riley RD, Collins GS. The Trade Secret Taboo: Open Science Methods are Required to Improve Prediction Models in Sports Medicine and Performance. Sports Med 2023;53:1841-1849. [PMID: 37160562 DOI: 10.1007/s40279-023-01849-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2023] [Indexed: 05/11/2023]

Bottani S, Burgos N, Maire A, Saracino D, Ströer S, Dormont D, Colliot O. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Med Image Anal 2023;89:102903. [PMID: 37523918 DOI: 10.1016/j.media.2023.102903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/01/2023] [Accepted: 07/12/2023] [Indexed: 08/02/2023]

Vaid A, Sawant A, Suarez-Farinas M, Lee J, Kaul S, Kovatch P, Freeman R, Jiang J, Jayaraman P, Fayad Z, Argulian E, Lerakis S, Charney AW, Wang F, Levin M, Glicksberg B, Narula J, Hofer I, Singh K, Nadkarni GN. Implications of the Use of Artificial Intelligence Predictive Models in Health Care Settings : A Simulation Study. Ann Intern Med 2023;176:1358-1369. [PMID: 37812781 DOI: 10.7326/m23-0949] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/11/2023] Open

Abstract

BACKGROUND

Substantial effort has been directed toward demonstrating uses of predictive models in health care. However, implementation of these models into clinical practice may influence patient outcomes, which in turn are captured in electronic health record data. As a result, deployed models may affect the predictive ability of current and future models.

OBJECTIVE

To estimate changes in predictive model performance with use through 3 common scenarios: model retraining, sequentially implementing 1 model after another, and intervening in response to a model when 2 are simultaneously implemented.

DESIGN

Simulation of model implementation and use in critical care settings at various levels of intervention effectiveness and clinician adherence. Models were either trained or retrained after simulated implementation.

SETTING

Admissions to the intensive care unit (ICU) at Mount Sinai Health System (New York, New York) and Beth Israel Deaconess Medical Center (Boston, Massachusetts).

PATIENTS

130 000 critical care admissions across both health systems.

INTERVENTION

Across 3 scenarios, interventions were simulated at varying levels of clinician adherence and effectiveness.

MEASUREMENTS

Statistical measures of performance, including threshold-independent (area under the curve) and threshold-dependent measures.

RESULTS

At fixed 90% sensitivity, in scenario 1 a mortality prediction model lost 9% to 39% specificity after retraining once and in scenario 2 a mortality prediction model lost 8% to 15% specificity when created after the implementation of an acute kidney injury (AKI) prediction model; in scenario 3, models for AKI and mortality prediction implemented simultaneously, each led to reduced effective accuracy of the other by 1% to 28%.

LIMITATIONS

In real-world practice, the effectiveness of and adherence to model-based recommendations are rarely known in advance. Only binary classifiers for tabular ICU admissions data were simulated.

CONCLUSION

In simulated ICU settings, a universally effective model-updating approach for maintaining model performance does not seem to exist. Model use may have to be recorded to maintain viability of predictive modeling.

PRIMARY FUNDING SOURCE

National Center for Advancing Translational Sciences.

Collapse

Affiliation(s)

Akhil Vaid Division of Data-Driven and Digital Medicine, Department of Medicine, and The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.V., P.J.)
Ashwin Sawant Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Division of Hospital Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.S.)
Mayte Suarez-Farinas Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York (M.S., J.L.)
Juhee Lee Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York (M.S., J.L.)
Sanjeev Kaul Department of Surgery, Hackensack Meridian School of Medicine, Nutley, New Jersey (S.K.)
Patricia Kovatch Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York (P.K., B.G.)
Robert Freeman Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (R.F.)
Joy Jiang The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (J.J.)
Pushkala Jayaraman Division of Data-Driven and Digital Medicine, Department of Medicine, and The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.V., P.J.)
Zahi Fayad BioMedical Engineering and Imaging Institute and Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (Z.F.)
Edgar Argulian Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
Stamatios Lerakis Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
Alexander W Charney The Charles Bronfman Institute of Personalized Medicine and Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, and Department of Surgery, Hackensack Meridian School of Medicine, Nutley, New Jersey (A.W.C.)
Fei Wang Department of Population Health Sciences, Weill Cornell Medicine, New York, New York (F.W.)
Matthew Levin Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (M.L.)
Benjamin Glicksberg Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York (P.K., B.G.)
Jagat Narula Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
Ira Hofer Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York (I.H.)
Karandeep Singh Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan (K.S.)
Girish N Nadkarni Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (G.N.N.)

Collapse

Wu L, Li Y, Zhang X, Chen X, Li D, Nie S, Li X, Bellou A. Prediction differences and implications of acute kidney injury with and without urine output criteria in adult critically ill patients. Nephrol Dial Transplant 2023;38:2368-2378. [PMID: 37019835 PMCID: PMC10539235 DOI: 10.1093/ndt/gfad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Indexed: 04/07/2023] Open

Affiliation(s)

Lijuan Wu Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
Yanqin Li Division of Nephrology, Nanfang Hospital, Southern Medical University; National Clinical Research Center for Kidney Disease; State Key Laboratory of Organ Failure Research; Guangdong Provincial Institute of Nephrology; Guangdong Provincial Key Laboratory of Renal Failure Research, Guangzhou, China
Xiangzhou Zhang Big Data Decision Institute, Jinan University, Guangzhou, China
Xuanhui Chen Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, Guangdong Province, China
Deyang Li Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
Sheng Nie Division of Nephrology, Nanfang Hospital, Southern Medical University; National Clinical Research Center for Kidney Disease; State Key Laboratory of Organ Failure Research; Guangdong Provincial Institute of Nephrology; Guangdong Provincial Key Laboratory of Renal Failure Research, Guangzhou, China
Xin Li Department of Emergency Medicine, Guangdong Provincial People's Hospital, (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong, China
Abdelouahab Bellou Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China Department of Emergency Medicine, Guangdong Provincial People's Hospital, (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong, China Department of Emergency Medicine, Wayne State University School of Medicine, Detroit, MI, USA Global Network on Emergency Medicine, Brookline, MA, USA

Collapse

Luther SL, Thomason SS, Sabharwal S, Finch DK, McCart J, Toyinbo P, Bouayad L, Lapcevic W, Hahm B, Hauser RG, Matheny ME, Powell-Cope G. Machine learning to develop a predictive model of pressure injury in persons with spinal cord injury. Spinal Cord 2023;61:513-520. [PMID: 37598263 DOI: 10.1038/s41393-023-00924-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 07/18/2023] [Accepted: 07/24/2023] [Indexed: 08/21/2023]

Rahmani K, Thapa R, Tsou P, Casie Chetty S, Barnes G, Lam C, Foon Tso C. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inform 2023;173:104930. [PMID: 36893656 DOI: 10.1016/j.ijmedinf.2022.104930] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 10/30/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022]

Abstract

BACKGROUND

Data drift can negatively impact the performance of machine learning algorithms (MLAs) that were trained on historical data. As such, MLAs should be continuously monitored and tuned to overcome the systematic changes that occur in the distribution of data. In this paper, we study the extent of data drift and provide insights about its characteristics for sepsis onset prediction. This study will help elucidate the nature of data drift for prediction of sepsis and similar diseases. This may aid with the development of more effective patient monitoring systems that can stratify risk for dynamic disease states in hospitals.

METHODS

We devise a series of simulations that measure the effects of data drift in patients with sepsis, using electronic health records (EHR). We simulate multiple scenarios in which data drift may occur, namely the change in the distribution of the predictor variables (covariate shift), the change in the statistical relationship between the predictors and the target (concept shift), and the occurrence of a major healthcare event (major event) such as the COVID-19 pandemic. We measure the impact of data drift on model performances, identify the circumstances that necessitate model retraining, and compare the effects of different retraining methodologies and model architecture on the outcomes. We present the results for two different MLAs, eXtreme Gradient Boosting (XGB) and Recurrent Neural Network (RNN).

RESULTS

Our results show that the properly retrained XGB models outperform the baseline models in all simulation scenarios, hence signifying the existence of data drift. In the major event scenario, the area under the receiver operating characteristic curve (AUROC) at the end of the simulation period is 0.811 for the baseline XGB model and 0.868 for the retrained XGB model. In the covariate shift scenario, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.853 and 0.874 respectively. In the concept shift scenario and under the mixed labeling method, the retrained XGB models perform worse than the baseline model for most simulation steps. However, under the full relabeling method, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.852 and 0.877 respectively. The results for the RNN models were mixed, suggesting that retraining based on a fixed network architecture may be inadequate for an RNN. We also present the results in the form of other performance metrics such as the ratio of observed to expected probabilities (calibration) and the normalized rate of positive predictive values (PPV) by prevalence, referred to as lift, at a sensitivity of 0.8.

CONCLUSION

Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.

Collapse

Lam G, Rish I, Dixon PC. Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data. J Biomech 2023;154:111606. [PMID: 37187130 DOI: 10.1016/j.jbiomech.2023.111606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 04/19/2023] [Accepted: 04/26/2023] [Indexed: 05/17/2023]

Andonov DI, Ulm B, Graessner M, Podtschaske A, Blobner M, Jungwirth B, Kagerbauer SM. Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality. BMC Med Inform Decis Mak 2023;23:67. [PMID: 37046259 PMCID: PMC10092913 DOI: 10.1186/s12911-023-02151-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/15/2023] [Indexed: 04/14/2023] Open

Abstract

BACKGROUND

Machine-learning models are susceptible to external influences which can result in performance deterioration. The aim of our study was to elucidate the impact of a sudden shift in covariates, like the one caused by the Covid-19 pandemic, on model performance.

METHODS

After ethical approval and registration in Clinical Trials (NCT04092933, initial release 17/09/2019), we developed different models for the prediction of perioperative mortality based on preoperative data: one for the pre-pandemic data period until March 2020, one including data before the pandemic and from the first wave until May 2020, and one that covers the complete period before and during the pandemic until October 2021. We applied XGBoost as well as a Deep Learning neural network (DL). Performance metrics of each model during the different pandemic phases were determined, and XGBoost models were analysed for changes in feature importance.

RESULTS

XGBoost and DL provided similar performance on the pre-pandemic data with respect to area under receiver operating characteristic (AUROC, 0.951 vs. 0.942) and area under precision-recall curve (AUPR, 0.144 vs. 0.187). Validation in patient cohorts of the different pandemic waves showed high fluctuations in performance from both AUROC and AUPR for DL, whereas the XGBoost models seemed more stable. Change in variable frequencies with onset of the pandemic were visible in age, ASA score, and the higher proportion of emergency operations, among others. Age consistently showed the highest information gain. Models based on pre-pandemic data performed worse during the first pandemic wave (AUROC 0.914 for XGBoost and DL) whereas models augmented with data from the first wave lacked performance after the first wave (AUROC 0.907 for XGBoost and 0.747 for DL). The deterioration was also visible in AUPR, which worsened by over 50% in both XGBoost and DL in the first phase after re-training.

CONCLUSIONS

A sudden shift in data impacts model performance. Re-training the model with updated data may cause degradation in predictive accuracy if the changes are only transient. Too early re-training should therefore be avoided, and close model surveillance is necessary.

Collapse

Tohidinezhad F, Bontempi D, Zhang Z, Dingemans AM, Aerts J, Bootsma G, Vansteenkiste J, Hashemi S, Smit E, Gietema H, Aerts HJ, Dekker A, Hendriks LEL, Traverso A, De Ruysscher D. Computed tomography-based radiomics for the differential diagnosis of pneumonitis in stage IV non-small cell lung cancer patients treated with immune checkpoint inhibitors. Eur J Cancer 2023;183:142-151. [PMID: 36857819 DOI: 10.1016/j.ejca.2023.01.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/29/2023] [Accepted: 01/29/2023] [Indexed: 02/11/2023]

Abstract

INTRODUCTION

Immunotherapy-induced pneumonitis (IIP) is a serious side-effect which requires accurate diagnosis and management with high-dose corticosteroids. The differential diagnosis between IIP and other types of pneumonitis (OTP) remains challenging due to similar radiological patterns. This study was aimed to develop a prediction model to differentiate IIP from OTP in patients with stage IV non-small cell lung cancer (NSCLC) who developed pneumonitis during immunotherapy.

METHODS

Consecutive patients with metastatic NSCLC treated with immunotherapy in six centres in the Netherlands and Belgium from 2017 to 2020 were reviewed and cause-specific pneumonitis events were identified. Seven regions of interest (segmented lungs and spheroidal/cubical regions surrounding the inflammation) were examined to extract the most predictive radiomic features from the chest computed tomography images obtained at pneumonitis manifestation. Models were internally tested regarding discrimination, calibration and decisional benefit. To evaluate the clinical application of the models, predicted labels were compared with the separate clinical and radiological judgements.

RESULTS

A total of 556 patients were reviewed; 31 patients (5.6%) developed IIP and 41 patients developed OTP (7.4%). The line of immunotherapy was the only predictive factor in the clinical model (2nd versus 1st odds ratio = 0.08, 95% confidence interval:0.01-0.77). The best radiomic model was achieved using a 75-mm spheroidal region of interest which showed an optimism-corrected area under the receiver operating characteristic curve of 0.83 (95% confidence interval:0.77-0.95) with negative and positive predictive values of 80% and 79%, respectively. Good calibration and net benefits were achieved for the radiomic model across the entire range of probabilities. A correct diagnosis was provided by the radiomic model in 10 out of 12 cases with non-conclusive radiological judgements.

CONCLUSION

Radiomic biomarkers applied to computed tomography imaging may support clinicians making the differential diagnosis of pneumonitis in patients with NSCLC receiving immunotherapy, especially when the radiologic assessment is non-conclusive.

Collapse

Affiliation(s)

Fariba Tohidinezhad Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Dennis Bontempi Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands; Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
Zhen Zhang Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Anne-Marie Dingemans Department of Pulmonary Diseases, School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Joachim Aerts Department of Pulmonary Medicine, School of Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
Gerben Bootsma Department of Pulmonary Diseases, Zuyderland Hospital, Heerlen, the Netherlands
Johan Vansteenkiste Department of Respiratory Oncology, University Hospital KU Leuven, Leuven, Belgium
Sayed Hashemi Department of Pulmonary Medicine, Amsterdam UMC, VU University Medical Center, Amsterdam, the Netherlands
Egbert Smit Netherlands Cancer Institute, Amsterdam, the Netherlands
Hester Gietema Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
Hugo Jwl Aerts Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands; Departments of Radiation Oncology and Radiology, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
Andre Dekker Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Lizza E L Hendriks Department of Pulmonary Diseases, School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Alberto Traverso Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
Dirk De Ruysscher Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands.

Collapse

Yu X, Wu R, Ji Y, Feng Z. Bibliometric and visual analysis of machine learning-based research in acute kidney injury worldwide. Front Public Health 2023;11:1136939. [PMID: 37006534 PMCID: PMC10063840 DOI: 10.3389/fpubh.2023.1136939] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/01/2023] [Indexed: 03/19/2023] Open

Abstract

Background

Acute kidney injury (AKI) is a serious clinical complication associated with adverse short-term and long-term outcomes. In recent years, with the rapid popularization of electronic health records and artificial intelligence machine learning technology, the detection rate and treatment of AKI have been greatly improved. At present, there are many studies in this field, and a large number of articles have been published, but we do not know much about the quality of research production in this field, as well as the focus and trend of current research.

Methods

Based on the Web of Science Core Collection, studies reporting machine learning-based AKI research that were published from 2013 to 2022 were retrieved and collected after manual review. VOSviewer and other software were used for bibliometric visualization analysis, including publication trends, geographical distribution characteristics, journal distribution characteristics, author contributions, citations, funding source characteristics, and keyword clustering.

Results

A total of 336 documents were analyzed. Since 2018, publications and citations have increased dramatically, with the United States (143) and China (101) as the main contributors. Regarding authors, Bihorac, A and Ozrazgat-Baslanti, T from the Kansas City Medical Center have published 10 articles. Regarding institutions, the University of California (18) had the most publications. Approximately 1/3 of the publications were published in Q1 and Q2 journals, of which Scientific Reports (19) was the most prolific journal. Tomašev et al.'s study that was published in 2019 has been widely cited by researchers. The results of cluster analysis of co-occurrence keywords suggest that the construction of AKI prediction model related to critical patients and sepsis patients is the research frontier, and XGBoost algorithm is also popular.

Conclusion

This study first provides an updated perspective on machine learning-based AKI research, which may be beneficial for subsequent researchers to choose suitable journals and collaborators and may provide a more convenient and in-depth understanding of the research basis, hotspots and frontiers.

Collapse

Guo LL, Steinberg E, Fleming SL, Posada J, Lemmon J, Pfohl SR, Shah N, Fries J, Sung L. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci Rep 2023;13:3767. [PMID: 36882576 PMCID: PMC9992466 DOI: 10.1038/s41598-023-30820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 03/02/2023] [Indexed: 03/09/2023] Open

Abstract

Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009-2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5-9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift.

Collapse

There is no such thing as a validated prediction model. BMC Med 2023;21:70. [PMID: 36829188 PMCID: PMC9951847 DOI: 10.1186/s12916-023-02779-w] [Citation(s) in RCA: 49] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/10/2023] [Indexed: 02/26/2023] Open

Debray TPA, Collins GS, Riley RD, Snell KIE, Van Calster B, Reitsma JB, Moons KGM. Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration. BMJ 2023;380:e071058. [PMID: 36750236 PMCID: PMC9903176 DOI: 10.1136/bmj-2022-071058] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/07/2022] [Indexed: 02/09/2023]

Parikh RB, Zhang Y, Kolla L, Chivers C, Courtright KR, Zhu J, Navathe AS, Chen J. Performance drift in a mortality prediction algorithm among patients with cancer during the SARS-CoV-2 pandemic. J Am Med Inform Assoc 2023;30:348-354. [PMID: 36409991 PMCID: PMC9846686 DOI: 10.1093/jamia/ocac221] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/28/2022] [Accepted: 11/03/2022] [Indexed: 11/22/2022] Open

Azimi V, Zaydman MA. Optimizing Equity: Working towards Fair Machine Learning Algorithms in Laboratory Medicine. J Appl Lab Med 2023;8:113-128. [PMID: 36610413 DOI: 10.1093/jalm/jfac085] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/09/2022] [Indexed: 01/09/2023]

Vagliano I, Chesnaye NC, Leopold JH, Jager KJ, Abu-Hanna A, Schut MC. Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal. Clin Kidney J 2022;15:2266-2280. [PMID: 36381375 PMCID: PMC9664575 DOI: 10.1093/ckj/sfac181] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Indexed: 09/08/2023] Open

Abstract

BACKGROUND

The number of studies applying machine learning (ML) to predict acute kidney injury (AKI) has grown steadily over the past decade. We assess and critically appraise the state of the art in ML models for AKI prediction, considering performance, methodological soundness, and applicability.

METHODS

We searched PubMed and ArXiv, extracted data, and critically appraised studies based on the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD), Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), and Prediction Model Risk of Bias Assessment Tool (PROBAST) guidelines.

RESULTS

Forty-six studies from 3166 titles were included. Thirty-eight studies developed a model, five developed and externally validated one, and three studies externally validated one. Flexible ML methods were used more often than deep learning, although the latter was common with temporal variables and text as predictors. Predictive performance showed an area under receiver operating curves ranging from 0.49 to 0.99. Our critical appraisal identified a high risk of bias in 39 studies. Some studies lacked internal validation, whereas external validation and interpretability of results were rarely considered. Fifteen studies focused on AKI prediction in the intensive care setting, and the US-derived Medical Information Mart for Intensive Care (MIMIC) data set was commonly used. Reproducibility was limited as data and code were usually unavailable.

CONCLUSIONS

Flexible ML methods are popular for the prediction of AKI, although more complex models based on deep learning are emerging. Our critical appraisal identified a high risk of bias in most models: Studies should use calibration measures and external validation more often, improve model interpretability, and share data and code to improve reproducibility.

Collapse

Zhang X, Liu K, Yuan B, Wang H, Chen S, Xue Y, Chen W, Liu M, Hu Y. A hybrid adaptive approach for instance transfer learning with dynamic and imbalanced data. INT J INTELL SYST 2022;37:11582-11599. [PMID: 36816520 PMCID: PMC9936919 DOI: 10.1002/int.23055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 08/16/2022] [Indexed: 11/06/2022]

Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022;6:1330-1345. [PMID: 35788685 DOI: 10.1038/s41551-022-00898-y] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/03/2022] [Indexed: 01/14/2023]

Wu JTY, de la Hoz MÁA, Kuo PC, Paguio JA, Yao JS, Dee EC, Yeung W, Jurado J, Moulick A, Milazzo C, Peinado P, Villares P, Cubillo A, Varona JF, Lee HC, Estirado A, Castellano JM, Celi LA. Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study. J Digit Imaging 2022;35:1514-1529. [PMID: 35789446 PMCID: PMC9255527 DOI: 10.1007/s10278-022-00674-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/15/2022] [Accepted: 06/08/2022] [Indexed: 01/07/2023] Open

Abstract

The unprecedented global crisis brought about by the COVID-19 pandemic has sparked numerous efforts to create predictive models for the detection and prognostication of SARS-CoV-2 infections with the goal of helping health systems allocate resources. Machine learning models, in particular, hold promise for their ability to leverage patient clinical information and medical images for prediction. However, most of the published COVID-19 prediction models thus far have little clinical utility due to methodological flaws and lack of appropriate validation. In this paper, we describe our methodology to develop and validate multi-modal models for COVID-19 mortality prediction using multi-center patient data. The models for COVID-19 mortality prediction were developed using retrospective data from Madrid, Spain (N = 2547) and were externally validated in patient cohorts from a community hospital in New Jersey, USA (N = 242) and an academic center in Seoul, Republic of Korea (N = 336). The models we developed performed differently across various clinical settings, underscoring the need for a guided strategy when employing machine learning for clinical decision-making. We demonstrated that using features from both the structured electronic health records and chest X-ray imaging data resulted in better 30-day mortality prediction performance across all three datasets (areas under the receiver operating characteristic curves: 0.85 (95% confidence interval: 0.83-0.87), 0.76 (0.70-0.82), and 0.95 (0.92-0.98)). We discuss the rationale for the decisions made at every step in developing the models and have made our code available to the research community. We employed the best machine learning practices for clinical model development. Our goal is to create a toolkit that would assist investigators and organizations in building multi-modal models for prediction, classification, and/or optimization.

Collapse

Affiliation(s)

Joy Tzung-Yu Wu Department of Radiology and Nuclear Medicine, Stanford University, Palo Alto, CA, USA
Miguel Ángel Armengol de la Hoz Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Anesthesia, Critical Care and Pain Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA Big Data Department, Fundacion Progreso Y Salud, Regional Ministry of Health of Andalucia, Andalucia, Spain
Po-Chih Kuo Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA. Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
Joseph Alexander Paguio Albert Einstein Medical Center, Philadelphia, PA, USA Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
Jasper Seth Yao Albert Einstein Medical Center, Philadelphia, PA, USA Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
Edward Christopher Dee Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Wesley Yeung Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA National University Heart Center, National University Hospital, Singapore, Singapore
Jerry Jurado Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
Achintya Moulick Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
Carmelo Milazzo Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
Paloma Peinado Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
Paula Villares Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
Antonio Cubillo Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
José Felipe Varona Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
Hyung-Chul Lee Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
Alberto Estirado Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
José Maria Castellano Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain Centro Nacional de Investigaciones Cardiovasculares, Instituto de Salud Carlos III, Madrid, Spain
Leo Anthony Celi Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

Collapse

Zhang X, Xue Y, Su X, Chen S, Liu K, Chen W, Liu M, Hu Y. A Transfer Learning Approach to Correct the Temporal Performance Drift of Clinical Prediction Models: Retrospective Cohort Study. JMIR Med Inform 2022;10:e38053. [DOI: 10.2196/38053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 07/31/2022] [Accepted: 10/12/2022] [Indexed: 11/11/2022] Open

Abstract Background Clinical prediction models suffer from performance drift as the patient population shifts over time. There is a great need for model updating approaches or modeling frameworks that can effectively use the old and new data. Objective Based on the paradigm of transfer learning, we aimed to develop a novel modeling framework that transfers old knowledge to the new environment for prediction tasks, and contributes to performance drift correction. Methods The proposed predictive modeling framework maintains a logistic regression–based stacking ensemble of 2 gradient boosting machine (GBM) models representing old and new knowledge learned from old and new data, respectively (referred to as transfer learning gradient boosting machine [TransferGBM]). The ensemble learning procedure can dynamically balance the old and new knowledge. Using 2010-2017 electronic health record data on a retrospective cohort of 141,696 patients, we validated TransferGBM for hospital-acquired acute kidney injury prediction. Results The baseline models (ie, transported models) that were trained on 2010 and 2011 data showed significant performance drift in the temporal validation with 2012-2017 data. Refitting these models using updated samples resulted in performance gains in nearly all cases. The proposed TransferGBM model succeeded in achieving uniformly better performance than the refitted models. Conclusions Under the scenario of population shift, incorporating new knowledge while preserving old knowledge is essential for maintaining stable performance. Transfer learning combined with stacking ensemble learning can help achieve a balance of old and new knowledge in a flexible and adaptive way, even in the case of insufficient new data. Collapse

Budhwani KI, Patel ZH, Guenter RE, Charania AA. A hitchhiker's guide to cancer models. Trends Biotechnol 2022;40:1361-1373. [PMID: 35534320 PMCID: PMC9588514 DOI: 10.1016/j.tibtech.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/31/2022] [Accepted: 04/08/2022] [Indexed: 01/21/2023]

Lu J, Sattler A, Wang S, Khaki AR, Callahan A, Fleming S, Fong R, Ehlert B, Li RC, Shieh L, Ramchandran K, Gensheimer MF, Chobot S, Pfohl S, Li S, Shum K, Parikh N, Desai P, Seevaratnam B, Hanson M, Smith M, Xu Y, Gokhale A, Lin S, Pfeffer MA, Teuteberg W, Shah NH. Considerations in the reliability and fairness audits of predictive models for advance care planning. Front Digit Health 2022;4:943768. [PMID: 36339512 PMCID: PMC9634737 DOI: 10.3389/fdgth.2022.943768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/17/2022] [Indexed: 11/30/2022] Open

Abstract

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.

Collapse

Affiliation(s)

Jonathan Lu Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Correspondence: Jonathan Hsijing Lu
Amelia Sattler Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Samantha Wang Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Ali Raza Khaki Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Alison Callahan Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Scott Fleming Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Rebecca Fong Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Benjamin Ehlert Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Ron C. Li Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Lisa Shieh Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Kavitha Ramchandran Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Michael F. Gensheimer Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, United States
Sarah Chobot Inpatient Palliative Care, Stanford Health Care, Palo Alto, United States
Stephen Pfohl Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Siyun Li Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Kenny Shum Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Nitin Parikh Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Priya Desai Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Briththa Seevaratnam Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Melanie Hanson Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Margaret Smith Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Yizhe Xu Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Arjun Gokhale Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Steven Lin Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Michael A. Pfeffer Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
Winifred Teuteberg Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
Nigam H. Shah Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States Clinical Excellence Research Center, Stanford University School of Medicine, Palo Alto, United States

Collapse

Davis SE, Walsh CG, Matheny ME. Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings. Front Digit Health 2022;4:958284. [PMID: 36120717 PMCID: PMC9478183 DOI: 10.3389/fdgth.2022.958284] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/11/2022] [Indexed: 11/15/2022] Open

Plana D, Shung DL, Grimshaw AA, Saraf A, Sung JJY, Kann BH. Randomized Clinical Trials of Machine Learning Interventions in Health Care: A Systematic Review. JAMA Netw Open 2022;5:e2233946. [PMID: 36173632 PMCID: PMC9523495 DOI: 10.1001/jamanetworkopen.2022.33946] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Abstract

IMPORTANCE

Despite the potential of machine learning to improve multiple aspects of patient care, barriers to clinical adoption remain. Randomized clinical trials (RCTs) are often a prerequisite to large-scale clinical adoption of an intervention, and important questions remain regarding how machine learning interventions are being incorporated into clinical trials in health care.

OBJECTIVE

To systematically examine the design, reporting standards, risk of bias, and inclusivity of RCTs for medical machine learning interventions.

EVIDENCE REVIEW

In this systematic review, the Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus, and Web of Science Core Collection online databases were searched and citation chasing was done to find relevant articles published from the inception of each database to October 15, 2021. Search terms for machine learning, clinical decision-making, and RCTs were used. Exclusion criteria included implementation of a non-RCT design, absence of original data, and evaluation of nonclinical interventions. Data were extracted from published articles. Trial characteristics, including primary intervention, demographics, adherence to the CONSORT-AI reporting guideline, and Cochrane risk of bias were analyzed.

FINDINGS

Literature search yielded 19 737 articles, of which 41 RCTs involved a median of 294 participants (range, 17-2488 participants). A total of 16 RCTS (39%) were published in 2021, 21 (51%) were conducted at single sites, and 15 (37%) involved endoscopy. No trials adhered to all CONSORT-AI standards. Common reasons for nonadherence were not assessing poor-quality or unavailable input data (38 trials [93%]), not analyzing performance errors (38 [93%]), and not including a statement regarding code or algorithm availability (37 [90%]). Overall risk of bias was high in 7 trials (17%). Of 11 trials (27%) that reported race and ethnicity data, the median proportion of participants from underrepresented minority groups was 21% (range, 0%-51%).

CONCLUSIONS AND RELEVANCE

This systematic review found that despite the large number of medical machine learning-based algorithms in development, few RCTs for these technologies have been conducted. Among published RCTs, there was high variability in adherence to reporting standards and risk of bias and a lack of participants from underrepresented minority groups. These findings merit attention and should be considered in future RCT design and reporting.

Collapse

Zhang KS, Schelb P, Netzer N, Tavakoli AA, Keymling M, Wehrse E, Hog R, Rotkopf LT, Wennmann M, Glemser PA, Thierjung H, von Knebel Doeberitz N, Kleesiek J, Görtz M, Schütz V, Hielscher T, Stenzinger A, Hohenfellner M, Schlemmer HP, Maier-Hein K, Bonekamp D. Pseudoprospective Paraclinical Interaction of Radiology Residents With a Deep Learning System for Prostate Cancer Detection: Experience, Performance, and Identification of the Need for Intermittent Recalibration. Invest Radiol 2022;57:601-612. [PMID: 35467572 DOI: 10.1097/rli.0000000000000878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Abstract

OBJECTIVES

The aim of this study was to estimate the prospective utility of a previously retrospectively validated convolutional neural network (CNN) for prostate cancer (PC) detection on prostate magnetic resonance imaging (MRI).

MATERIALS AND METHODS

The biparametric (T2-weighted and diffusion-weighted) portion of clinical multiparametric prostate MRI from consecutive men included between November 2019 and September 2020 was fully automatically and individually analyzed by a CNN briefly after image acquisition (pseudoprospective design). Radiology residents performed 2 research Prostate Imaging Reporting and Data System (PI-RADS) assessments of the multiparametric dataset independent from clinical reporting (paraclinical design) before and after review of the CNN results and completed a survey. Presence of clinically significant PC was determined by the presence of an International Society of Urological Pathology grade 2 or higher PC on combined targeted and extended systematic transperineal MRI/transrectal ultrasound fusion biopsy. Sensitivities and specificities on a patient and prostate sextant basis were compared using the McNemar test and compared with the receiver operating characteristic (ROC) curve of CNN. Survey results were summarized as absolute counts and percentages.

RESULTS

A total of 201 men were included. The CNN achieved an ROC area under the curve of 0.77 on a patient basis. Using PI-RADS ≥3-emulating probability threshold (c3), CNN had a patient-based sensitivity of 81.8% and specificity of 54.8%, not statistically different from the current clinical routine PI-RADS ≥4 assessment at 90.9% and 54.8%, respectively ( P = 0.30/ P = 1.0). In general, residents achieved similar sensitivity and specificity before and after CNN review. On a prostate sextant basis, clinical assessment possessed the highest ROC area under the curve of 0.82, higher than CNN (AUC = 0.76, P = 0.21) and significantly higher than resident performance before and after CNN review (AUC = 0.76 / 0.76, P ≤ 0.03). The resident survey indicated CNN to be helpful and clinically useful.

CONCLUSIONS

Pseudoprospective paraclinical integration of fully automated CNN-based detection of suspicious lesions on prostate multiparametric MRI was demonstrated and showed good acceptance among residents, whereas no significant improvement in resident performance was found. General CNN performance was preserved despite an observed shift in CNN calibration, identifying the requirement for continuous quality control and recalibration.

Collapse

Galuzio PP, Cherif A. Recent Advances and Future Perspectives in the Use of Machine Learning and Mathematical Models in Nephrology. Adv Chronic Kidney Dis 2022;29:472-479. [PMID: 36253031 DOI: 10.1053/j.ackd.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/21/2022] [Accepted: 07/07/2022] [Indexed: 01/25/2023]

Gottlieb ER, Samuel M, Bonventre JV, Celi LA, Mattie H. Machine Learning for Acute Kidney Injury Prediction in the Intensive Care Unit. Adv Chronic Kidney Dis 2022;29:431-438. [PMID: 36253026 PMCID: PMC9586459 DOI: 10.1053/j.ackd.2022.06.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/01/2022] [Accepted: 06/22/2022] [Indexed: 01/25/2023]