1
|
McIlroy DR. Predictive modelling for postoperative acute kidney injury: big data enhancing quality or the Emperor's new clothes? Br J Anaesth 2024; 133:476-478. [PMID: 38902116 DOI: 10.1016/j.bja.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/30/2024] [Accepted: 05/10/2024] [Indexed: 06/22/2024] Open
Abstract
The increased availability of large clinical datasets together with increasingly sophisticated computing power has facilitated development of numerous risk prediction models for various adverse perioperative outcomes, including acute kidney injury (AKI). The rationale for developing such models is straightforward. However, despite numerous purported benefits, the uptake of preoperative prediction models into clinical practice has been limited. Barriers to implementation of predictive models, including limitations in their discrimination and accuracy, as well as their ability to meaningfully impact clinical practice and patient outcomes, are increasingly recognised. Some of the purported benefits of predictive modelling, particularly when applied to postoperative AKI, might not fare well under detailed scrutiny. Future research should address existing limitations and seek to demonstrate both benefit to patients and value to healthcare systems from implementation of these models in clinical practice.
Collapse
Affiliation(s)
- David R McIlroy
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Anaesthesia, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
2
|
Zhuo XY, Lei SH, Sun L, Bai YW, Wu J, Zheng YJ, Liu KX, Liu WF, Zhao BC. Preoperative risk prediction models for acute kidney injury after noncardiac surgery: an independent external validation cohort study. Br J Anaesth 2024; 133:508-518. [PMID: 38527923 DOI: 10.1016/j.bja.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND Numerous models have been developed to predict acute kidney injury (AKI) after noncardiac surgery, yet there is a lack of independent validation and comparison among them. METHODS We conducted a systematic literature search to review published risk prediction models for AKI after noncardiac surgery. An independent external validation was performed using a retrospective surgical cohort at a large Chinese hospital from January 2019 to October 2022. The cohort included patients undergoing a wide range of noncardiac surgeries with perioperative creatinine measurements. Postoperative AKI was defined according to the Kidney Disease Improving Global Outcomes creatinine criteria. Model performance was assessed in terms of discrimination (area under the receiver operating characteristic curve, AUROC), calibration (calibration plot), and clinical utility (net benefit), before and after model recalibration through intercept and slope updates. A sensitivity analysis was conducted by including patients without postoperative creatinine measurements in the validation cohort and categorising them as non-AKI cases. RESULTS Nine prediction models were evaluated, each with varying clinical and methodological characteristics, including the types of surgical cohorts used for model development, AKI definitions, and predictors. In the validation cohort involving 13,186 patients, 650 (4.9%) developed AKI. Three models demonstrated fair discrimination (AUROC between 0.71 and 0.75); other models had poor or failed discrimination. All models exhibited some miscalibration; five of the nine models were well-calibrated after intercept and slope updates. Decision curve analysis indicated that the three models with fair discrimination consistently provided a positive net benefit after recalibration. The results were confirmed in the sensitivity analysis. CONCLUSIONS We identified three models with fair discrimination and potential clinical utility after recalibration for assessing the risk of acute kidney injury after noncardiac surgery.
Collapse
Affiliation(s)
- Xiao-Yu Zhuo
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China
| | - Shao-Hui Lei
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China
| | - Lan Sun
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Department of Biostatistics, Lejiu Healthcare Technology Co., Ltd, Hangzhou, China
| | - Ya-Wen Bai
- College of Anaesthesiology, Southern Medical University, Guangzhou, China
| | - Jiao Wu
- College of Anaesthesiology, Southern Medical University, Guangzhou, China
| | - Yong-Jia Zheng
- College of Anaesthesiology, Southern Medical University, Guangzhou, China
| | - Ke-Xuan Liu
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China; Outcomes Research Consortium, Cleveland, OH, USA.
| | - Wei-Feng Liu
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China.
| | - Bing-Cheng Zhao
- Department of Anaesthesiology, Nanfang Hospital, Southern Medical University, Guangzhou, China; Guangdong Provincial Key Laboratory of Precision Anaesthesia and Perioperative Organ Protection, Guangzhou, China; College of Anaesthesiology, Southern Medical University, Guangzhou, China; Outcomes Research Consortium, Cleveland, OH, USA.
| |
Collapse
|
3
|
Han L, Char DS, Aghaeepour N. Artificial Intelligence in Perioperative Care: Opportunities and Challenges. Anesthesiology 2024; 141:379-387. [PMID: 38980160 PMCID: PMC11239120 DOI: 10.1097/aln.0000000000005013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Artificial intelligence (AI) applications have great potential to enhance perioperative care. This paper explores promising areas for AI in anesthesiology; expertise, stakeholders, and infrastructure for development; and barriers and challenges to implementation.
Collapse
Affiliation(s)
- Lichy Han
- Department of Anesthesiology, Perioperative, and Pain Medicine, School of Medicine, Stanford University, Stanford, California
| | - Danton S Char
- Department of Anesthesiology, Perioperative, and Pain Medicine, School of Medicine, Stanford University, Stanford, California
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, School of Medicine, Stanford University, Stanford, California
| |
Collapse
|
4
|
Silverman AL, Shung D, Stidham RW, Kochhar GS, Iacucci M. How Artificial Intelligence Will Transform Clinical Care, Research, and Trials for Inflammatory Bowel Disease. Clin Gastroenterol Hepatol 2024:S1542-3565(24)00598-6. [PMID: 38992406 DOI: 10.1016/j.cgh.2024.05.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) refers to computer-based methodologies that use data to teach a computer to solve pre-defined tasks; these methods can be applied to identify patterns in large multi-modal data sources. AI applications in inflammatory bowel disease (IBD) includes predicting response to therapy, disease activity scoring of endoscopy, drug discovery, and identifying bowel damage in images. As a complex disease with entangled relationships between genomics, metabolomics, microbiome, and the environment, IBD stands to benefit greatly from methodologies that can handle this complexity. We describe current applications, critical challenges, and propose future directions of AI in IBD.
Collapse
Affiliation(s)
- Anna L Silverman
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Mayo Clinic, Scottsdale, Arizona.
| | - Dennis Shung
- Section of Digestive Diseases, Department of Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut
| | - Ryan W Stidham
- Division of Gastroenterology, Department of Internal Medicine, Michigan Medicine, Ann Arbor, Michigan; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan; Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan
| | - Gursimran S Kochhar
- Division of Gastroenterology, Hepatology, and Nutrition, Allegheny Health Network, Pittsburgh, Pennsylvania
| | - Marietta Iacucci
- University of Birmingham, Institute of Immunology and Immunotherapy, Birmingham, United Kingdom; College of Medicine and Health, University College Cork, and APC Microbiome Ireland, Cork, Ireland
| |
Collapse
|
5
|
Liou L, Scott E, Parchure P, Ouyang Y, Egorova N, Freeman R, Hofer IS, Nadkarni GN, Timsina P, Kia A, Levin MA. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system. NPJ Digit Med 2024; 7:149. [PMID: 38844546 PMCID: PMC11156633 DOI: 10.1038/s41746-024-01141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
Collapse
Affiliation(s)
- Lathan Liou
- Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | | | - Prathamesh Parchure
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yuxia Ouyang
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Natalia Egorova
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert Freeman
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ira S Hofer
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Prem Timsina
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Arash Kia
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew A Levin
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
6
|
Kigo J, Kamau S, Mawji A, Mwaniki P, Dunsmuir D, Pillay Y, Zhang C, Pallot K, Ogero M, Kimutai D, Ouma M, Mohamed I, Chege M, Thuranira L, Kissoon N, Ansermino JM, Akech S. External validation of a paediatric Smart triage model for use in resource limited facilities. PLOS DIGITAL HEALTH 2024; 3:e0000293. [PMID: 38905166 PMCID: PMC11192416 DOI: 10.1371/journal.pdig.0000293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 04/24/2024] [Indexed: 06/23/2024]
Abstract
Models for digital triage of sick children at emergency departments of hospitals in resource poor settings have been developed. However, prior to their adoption, external validation should be performed to ensure their generalizability. We externally validated a previously published nine-predictor paediatric triage model (Smart Triage) developed in Uganda using data from two hospitals in Kenya. Both discrimination and calibration were assessed, and recalibration was performed by optimizing the intercept for classifying patients into emergency, priority, or non-urgent categories based on low-risk and high-risk thresholds. A total of 2539 patients were eligible at Hospital 1 and 2464 at Hospital 2, and 5003 for both hospitals combined; admission rates were 8.9%, 4.5%, and 6.8%, respectively. The model showed good discrimination, with area under the receiver-operator curve (AUC) of 0.826, 0.784 and 0.821, respectively. The pre-calibrated model at a low-risk threshold of 8% achieved a sensitivity of 93% (95% confidence interval, (CI):89%-96%), 81% (CI:74%-88%), and 89% (CI:85%-92%), respectively, and at a high-risk threshold of 40%, the model achieved a specificity of 86% (CI:84%-87%), 96% (CI:95%-97%), and 91% (CI:90%-92%), respectively. Recalibration improved the graphical fit, but new risk thresholds were required to optimize sensitivity and specificity.The Smart Triage model showed good discrimination on external validation but required recalibration to improve the graphical fit of the calibration plot. There was no change in the order of prioritization of patients following recalibration in the respective triage categories. Recalibration required new site-specific risk thresholds that may not be needed if prioritization based on rank is all that is required. The Smart Triage model shows promise for wider application for use in triage for sick children in different settings.
Collapse
Affiliation(s)
- Joyce Kigo
- Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
| | - Stephen Kamau
- Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
| | - Alishah Mawji
- Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
| | - Paul Mwaniki
- Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
| | - Dustin Dunsmuir
- Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Yashodani Pillay
- Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cherri Zhang
- Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Katija Pallot
- Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Morris Ogero
- Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
| | - David Kimutai
- Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
| | - Mary Ouma
- Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
| | - Ismael Mohamed
- Department of Pediatrics, Mbagathi County Hospital, Nairobi, Kenya
| | - Mary Chege
- Department of Pediatrics, Kiambu County Referral Hospital, Kiambu, Kenya
| | - Lydia Thuranira
- Department of Pediatrics, Kiambu County Referral Hospital, Kiambu, Kenya
| | - Niranjan Kissoon
- Department of Pediatrics, University of British Columbia, Vancouver, British Columbia, Canada
| | - J. Mark Ansermino
- Centre for International Child Health, BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Samuel Akech
- Health Service Unit, Kenya Medical Research Institute (KEMRI)-Wellcome Trust Research Programme, Nairobi, Kenya
| |
Collapse
|
7
|
Brosula R, Corbin CK, Chen JH. Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:95-104. [PMID: 38827052 PMCID: PMC11141811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Access to real-world data streams like electronic medical records (EMRs) has accelerated the development of supervised machine learning (ML) models for clinical applications. However, few studies investigate the differential impact of particular features in the EMR on model performance under temporal dataset shift. To explain how features in the EMR impact models over time, this study aggregates features into feature groups by their source (e.g. medication orders, diagnosis codes and lab results) and feature categories based on their reflection of patient pathophysiology or healthcare processes. We adapt Shapley values to explain feature groups' and feature categories' marginal contribution to initial and sustained model performance. We investigate three standard clinical prediction tasks and find that while feature contributions to initial performance differ across tasks, pathophysiological features help mitigate temporal discrimination deterioration. These results provide interpretable insights on how specific feature groups contribute to model performance and robustness to temporal dataset shift.
Collapse
Affiliation(s)
- Raphael Brosula
- Genomic Center for Infectious Diseases, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Conor K Corbin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
8
|
Kistanova E, Yotov S, Zaimova D. Intelligent Animal Husbandry: Present and Future. Animals (Basel) 2024; 14:1645. [PMID: 38891691 PMCID: PMC11171394 DOI: 10.3390/ani14111645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/28/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open
Abstract
The main priorities in the contemporary breeding of different animal species have been directed toward the use of intelligent approaches for accelerating genetic progress, ensuring animal welfare and environmental protection by reducing the release of manure and gas emissions [...].
Collapse
Affiliation(s)
- Elena Kistanova
- Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria
| | - Stanimir Yotov
- Department of Obstetrics, Reproduction and Reproductive Disorders, Trakia University, 6000 Stara Zagora, Bulgaria;
| | - Darina Zaimova
- Department of Industrial Business and Entrepreneurship, Faculty of Economics, Trakia University, 6000 Stara Zagora, Bulgaria;
| |
Collapse
|
9
|
Pean CA, Buddhiraju A, Shimizu MR, Chen TLW, Esposito JG, Kwon YM. Prediction of 30-Day Mortality Following Revision Total Hip and Knee Arthroplasty: Machine Learning Algorithms Outperform CARDE-B, 5-Item, and 6-Item Modified Frailty Index Risk Scores. J Arthroplasty 2024:S0883-5403(24)00528-X. [PMID: 38797444 DOI: 10.1016/j.arth.2024.05.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/13/2024] [Accepted: 05/15/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND Although risk calculators are used to prognosticate postoperative outcomes following revision total hip and knee arthroplasty (total joint arthroplasty [TJA]), machine learning (ML) based predictive tools have emerged as a promising alternative for improved risk stratification. This study aimed to compare the predictive ability of ML models for 30-day mortality following revision TJA to that of traditional risk-assessment indices such as the CARDE-B score (congestive heart failure, albumin (< 3.5 mg/dL), renal failure on dialysis, dependence for daily living, elderly (> 65 years of age), and body mass index (BMI) of < 25 kg/m2), 5-item modified frailty index (5MFI), and 6MFI. METHODS Adult patients undergoing revision TJA between 2013 and 2020 were selected from the American College of Surgeons National Surgical Quality Improvement Program database and randomly split 80:20 to compose the training and validation cohorts. There were 3 ML models - extreme gradient boosting, random forest, and elastic-net penalized logistic regression (NEPLR) - that were developed and evaluated using discrimination, calibration metrics, and accuracy. The discrimination of CARDE-B, 5MFI, and 6MFI scores was assessed individually and compared to that of ML models. RESULTS All models were equally accurate (Brier score = 0.005) and demonstrated outstanding discrimination with similar areas under the receiver operating characteristic curve (AUCs, extreme gradient boosting = 0.94, random forest = NEPLR = 0.93). The NEPLR was the best-calibrated model overall (slope = 0.54, intercept = -0.004). The CARDE-B had the highest discrimination among the scores (AUC = 0.89), followed by 6MFI (AUC = 0.80), and 5MFI (AUC = 0.68). Albumin < 3.5 mg/dL and BMI (< 30.15) were the most important predictors of 30-day mortality following revision TJA. CONCLUSIONS The ML models outperform traditional risk-assessment indices in predicting postoperative 30-day mortality after revision TJA. Our findings highlight the utility of ML for risk stratification in a clinical setting. The identification of hypoalbuminemia and BMI as prognostic markers may allow patient-specific perioperative optimization strategies to improve outcomes following revision TJA.
Collapse
Affiliation(s)
- Christian A Pean
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts; Department of Orthopaedic Trauma and Reconstruction Surgery, Duke University School of Medicine, Durham, North Carolina
| | - Anirudh Buddhiraju
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Michelle R Shimizu
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Tony L-W Chen
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - John G Esposito
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Young-Min Kwon
- Bioengineering Laboratory, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
10
|
Harrison-Brown M, Scholes C, Ebrahimi M, Bell C, Kirwan G. Applying models of care for total hip and knee arthroplasty: External validation of a published predictive model to identify extended stay risk prior to lower-limb arthroplasty. Clin Rehabil 2024; 38:700-712. [PMID: 38377957 DOI: 10.1177/02692155241233348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
OBJECTIVE This study aimed to externally validate a reported model for identifying patients requiring extended stay following lower limb arthroplasty in a new setting. DESIGN External validation of a previously reported prognostic model, using retrospective data. SETTING Medium-sized hospital orthopaedic department, Australia. PARTICIPANTS Electronic medical records were accessed for data collection between Sep-2019 and Feb-2020 and retrospective data extracted from 200 randomly selected total hip or knee arthroplasty patients. INTERVENTION Participants received total hip or knee replacement between 2-Feb-16 and 4-Apr-19. This study was a non-interventional retrospective study. MAIN MEASURES Model validation was assessed with discrimination, calibration on both original and adjusted forms of the candidate model. Decision curve analysis was conducted on the outputs of the adjusted model to determine net benefit at a predetermined decision threshold (0.5). RESULTS The original model performed poorly, grossly overestimating length of stay with mean calibration of -3.6 (95% confidence interval -3.9 to -3.2) and calibration slope of 0.52. Performance improved following adjustment of the model intercept and model coefficients (mean calibration 0.48, 95% confidence interval 0.16 to 0.80 and slope of 1.0), but remained poorly calibrated at low and medium risk threshold and net benefit was modest (three additional patients per hundred identified as at-risk) at the a-priori risk threshold. CONCLUSIONS External validation demonstrated poor performance when applied to a new patient population and would provide limited benefit for our institution. Implementation of predictive models for arthroplasty should include practical assessment of discrimination, calibration and net benefit at a clinically acceptable threshold.
Collapse
Affiliation(s)
| | | | | | - Christopher Bell
- Department of Orthopaedics, QEII Jubilee Hospital, Brisbane, Australia
| | - Garry Kirwan
- Department of Physiotherapy, QEII Jubilee Hospital, Brisbane, Australia
- School of Health Sciences and Social Work, Griffith University, Brisbane, Australia
| |
Collapse
|
11
|
Huguet N, Chen J, Parikh RB, Marino M, Flocke SA, Likumahuwa-Ackman S, Bekelman J, DeVoe JE. Applying Machine Learning Techniques to Implementation Science. Online J Public Health Inform 2024; 16:e50201. [PMID: 38648094 DOI: 10.2196/50201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 11/15/2023] [Accepted: 03/14/2024] [Indexed: 04/25/2024] Open
Abstract
Machine learning (ML) approaches could expand the usefulness and application of implementation science methods in clinical medicine and public health settings. The aim of this viewpoint is to introduce a roadmap for applying ML techniques to address implementation science questions, such as predicting what will work best, for whom, under what circumstances, and with what predicted level of support, and what and when adaptation or deimplementation are needed. We describe how ML approaches could be used and discuss challenges that implementation scientists and methodologists will need to consider when using ML throughout the stages of implementation.
Collapse
Affiliation(s)
- Nathalie Huguet
- Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States
- BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
| | - Jinying Chen
- Section of Preventive Medicine and Epidemiology, Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States
- Data Science Core, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States
- iDAPT Implementation Science Center for Cancer Control, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Ravi B Parikh
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Miguel Marino
- Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States
- BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
| | - Susan A Flocke
- Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States
- BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
| | - Sonja Likumahuwa-Ackman
- Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States
- BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
| | - Justin Bekelman
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Penn Center for Cancer Care Innovation, Abramson Cancer Center, Penn Medicine, Philadelphia, PA, United States
| | - Jennifer E DeVoe
- Department of Family Medicine, Oregon Health & Science University, Portland, OR, United States
- BRIDGE-C2 Implementation Science Center for Cancer Control, Oregon Health & Science University, Portland, OR, United States
| |
Collapse
|
12
|
Andersen ES, Röttger R, Brasen CL, Brandslund I. Analytical Performance Specifications for Input Variables: Investigation of the Model of End-Stage Liver Disease. Clin Chem 2024; 70:653-659. [PMID: 38416710 DOI: 10.1093/clinchem/hvae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 12/26/2023] [Indexed: 03/01/2024]
Abstract
BACKGROUND Artificial intelligence models constitute specific uses of analysis results and, therefore, necessitate evaluation of analytical performance specifications (APS) for this context specifically. The Model of End-stage Liver Disease (MELD) is a clinical prediction model based on measurements of bilirubin, creatinine, and the international normalized ratio (INR). This study evaluates the propagation of error through the MELD, to inform choice of APS for the MELD input variables. METHODS A total of 6093 consecutive MELD scores and underlying analysis results were retrospectively collected. "Desirable analytical variation" based on biological variation as well as current local analytical variation was simulated onto the data set as well as onto a constructed data set, representing a worst-case scenario. Resulting changes in MELD score and risk classification were calculated. RESULTS Biological variation-based APS in the worst-case scenario resulted in 3.26% of scores changing by ≥1 MELD point. In the patient-derived data set, the same variation resulted in 0.92% of samples changing by ≥1 MELD point, and 5.5% of samples changing risk category. Local analytical performance resulted in lower reclassification rates. CONCLUSIONS Error propagation through MELD is complex and includes population-dependent mechanisms. Biological variation-derived APS were acceptable for all uses of the MELD score. Other combinations of APS can yield equally acceptable results. This analysis exemplifies how error propagation through artificial intelligence models can become highly complex. This complexity will necessitate that both model suppliers and clinical laboratories address analytical performance specifications for the specific use case, as these may differ from performance specifications for traditional use of the analyses.
Collapse
Affiliation(s)
- Eline S Andersen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Richard Röttger
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Claus L Brasen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Ivan Brandslund
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
13
|
Zhuang Y, Dyas A, Meguid RA, Henderson WG, Bronsert M, Madsen H, Colborn KL. Preoperative Prediction of Postoperative Infections Using Machine Learning and Electronic Health Record Data. Ann Surg 2024; 279:720-726. [PMID: 37753703 DOI: 10.1097/sla.0000000000006106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
OBJECTIVE To estimate preoperative risk of postoperative infections using structured electronic health record (EHR) data. BACKGROUND Surveillance and reporting of postoperative infections is primarily done through costly, labor-intensive manual chart reviews on a small sample of patients. Automated methods using statistical models applied to postoperative EHR data have shown promise to augment manual review as they can cover all operations in a timely manner. However, there are no specific models for risk-adjusting infectious complication rates using EHR data. METHODS Preoperative EHR data from 30,639 patients (2013-2019) were linked to the American College of Surgeons National Surgical Quality Improvement Program preoperative data and postoperative infection outcomes data from 5 hospitals in the University of Colorado Health System. EHR data included diagnoses, procedures, operative variables, patient characteristics, and medications. Lasso and the knockoff filter were used to perform controlled variable selection. Outcomes included surgical site infection, urinary tract infection, sepsis/septic shock, and pneumonia up to 30 days postoperatively. RESULTS Among >15,000 candidate predictors, 7 were chosen for the surgical site infection model and 6 for each of the urinary tract infection, sepsis, and pneumonia models. Important variables included preoperative presence of the specific outcome, wound classification, comorbidities, and American Society of Anesthesiologists physical status classification. The area under the receiver operating characteristic curve for each model ranged from 0.73 to 0.89. CONCLUSIONS Parsimonious preoperative models for predicting postoperative infection risk using EHR data were developed and showed comparable performance to existing American College of Surgeons National Surgical Quality Improvement Program risk models that use manual chart review. These models can be used to estimate risk-adjusted postoperative infection rates applied to large volumes of EHR data in a timely manner.
Collapse
Affiliation(s)
- Yaxu Zhuang
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Department of Biostatistics and Informatics, Colorado School of Public Health
| | - Adam Dyas
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
| | - Robert A Meguid
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
- Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - William G Henderson
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
| | - Michael Bronsert
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Helen Madsen
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
| | - Kathryn L Colborn
- Department of Surgery, Surgical Outcomes and Applied Research Program, University of Colorado Anschutz Medical Campus
- Department of Biostatistics and Informatics, Colorado School of Public Health
- Department of Surgery, School of Medicine, University of Colorado Anschutz Medical Campus
- Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
14
|
Perschinka F, Peer A, Joannidis M. [Artificial intelligence and acute kidney injury]. Med Klin Intensivmed Notfmed 2024; 119:199-207. [PMID: 38396124 PMCID: PMC10995052 DOI: 10.1007/s00063-024-01111-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/25/2024]
Abstract
Digitalization is increasingly finding its way into intensive care units and with it artificial intelligence (AI) for critically ill patients. One promising area for the use of AI is in the field of acute kidney injury (AKI). The use of AI is primarily focused on the prediction of AKI, but further approaches are also being used to classify existing AKI into different phenotypes. Different AI models are used for prediction. The area under the receiver operating characteristic curve values (AUROC) achieved with these models vary and are influenced by several factors, such as the prediction time and the definition of AKI. Most models have an AUROC between 0.650 and 0.900, with lower values for predictions further into the future and when applying Acute Kidney Injury Network (AKIN) instead of KDIGO criteria. Classification into phenotypes already makes it possible to categorize patients into groups with different risks of mortality or requirement of renal replacement therapy (RRT), but the etiologies or therapeutic consequences derived from this are still lacking. However, all the models suffer from AI-specific shortcomings. The use of large databases does not make it possible to promptly include recent changes in therapy and the implementation of new biomarkers in a relevant proportion. For this reason, serum creatinine and urinary output, with their known limitations, dominate current AI models for prediction impairing the performance of the current models. On the other hand, the increasingly complex models no longer allow physicians to understand the basis on which the warning of a threatening AKI is calculated and subsequent initiation of therapy should take place. The successful use of AIs in routine clinical practice will be highly determined by the trust of the physicians in the systems and overcoming the aforementioned weaknesses. However, the clinician will remain irreplaceable as the decisive authority for critically ill patients by combining measurable and nonmeasurable parameters.
Collapse
Affiliation(s)
| | | | - Michael Joannidis
- Gemeinsame Einrichtung für Internistische Notfall- und Intensivmedizin, Department Innere Medizin, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich.
| |
Collapse
|
15
|
Levin TR, Jensen CD, Marks AR, Schlessinger D, Liu V, Udaltsova N, Badalov J, Layefsky E, Corley DA, Nugent JR, Lee JK. Development and External Validation of a Prediction Model for Colorectal Cancer Among Patients Awaiting Surveillance Colonoscopy Following Polypectomy. GASTRO HEP ADVANCES 2024; 3:671-683. [PMID: 39165417 PMCID: PMC11330934 DOI: 10.1016/j.gastha.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 03/12/2024] [Indexed: 08/22/2024]
Abstract
Background and Aims Demand for surveillance colonoscopy can sometimes exceed capacity, such as during and following the coronavirus disease 2019 pandemic, yet no tools exist to prioritize the patients most likely to be diagnosed with colorectal cancer (CRC) among those awaiting surveillance colonoscopy. We developed a multivariable prediction model for CRC at surveillance comparing performance to a model that assigned patients as low or high risk based solely on polyp characteristics (guideline-based model). Methods Logistic regression was used for model development among patients receiving surveillance colonoscopy in 2014-2019. Candidate predictors included index colonoscopy indication, findings, and endoscopist adenoma detection rate, and patient and clinical characteristics at surveillance. Patients were randomly divided into model development (n = 36,994) and internal validation cohorts (n = 15,854). External validation was performed on 30,015 patients receiving surveillance colonoscopy in 2020-2022, and the multivariable model was then updated and retested. Results One hundred fourteen, 43, and 71 CRCs were detected at surveillance in the 3 cohorts, respectively. Polyp size ≥10 mm, adenoma detection rate <32.5% or missing, patient age, and ever smoked tobacco were significant CRC predictors; this multivariable model outperformed the guideline-based model (internal validation cohort area under the receiver-operating characteristic curve: 0.73, 95% confidence interval (CI): 0.66-0.81 vs 0.52, 95% CI: 0.45-0.60). Performance declined at external validation but recovered with model updating (operating characteristic curve: 0.72 95% CI: 0.66-0.77). Conclusion When surveillance colonoscopy demand exceeds capacity, a prediction model featuring common clinical predictors may help prioritize patients at highest risk for CRC among those awaiting surveillance. Also, regular model updates can address model performance drift.
Collapse
Affiliation(s)
- Theodore R. Levin
- Division of Research, Kaiser Permanente Northern California, Oakland, California
- Gastroenterology Department, Kaiser Permanente Medical Center, Walnut Creek, California
| | | | - Amy R. Marks
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - David Schlessinger
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Vincent Liu
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Natalia Udaltsova
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Jessica Badalov
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Evan Layefsky
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Joshua R. Nugent
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| | - Jeffrey K. Lee
- Division of Research, Kaiser Permanente Northern California, Oakland, California
| |
Collapse
|
16
|
Lasko TA, Strobl EV, Stead WW. Why do probabilistic clinical models fail to transport between sites. NPJ Digit Med 2024; 7:53. [PMID: 38429353 PMCID: PMC10907678 DOI: 10.1038/s41746-024-01037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open
Abstract
The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we argue that we should typically expect this failure to transport, and we present common sources for it, divided into those under the control of the experimenter and those inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.
Collapse
Affiliation(s)
- Thomas A Lasko
- Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Eric V Strobl
- Vanderbilt University Medical Center, Nashville, TN, USA
| | | |
Collapse
|
17
|
Andersen ES, Birk-Korch JB, Röttger R, Brasen CL, Brandslund I, Madsen JS. Monitoring performance of clinical artificial intelligence: a scoping review protocol. JBI Evid Synth 2024; 22:453-460. [PMID: 38328955 DOI: 10.11124/jbies-23-00390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
OBJECTIVE The objective of this scoping review is to describe the scope and nature of research on the monitoring of clinical artificial intelligence (AI) systems. The review will identify the various methodologies used to monitor clinical AI, while also mapping the factors that influence the selection of monitoring approaches. INTRODUCTION AI is being used in clinical decision-making at an increasing rate. While much attention has been directed toward the development and validation of AI for clinical applications, the practical implementation aspects, notably the establishment of rational monitoring/quality assurance systems, has received comparatively limited scientific interest. Given the scarcity of evidence and the heterogeneity of methodologies used in this domain, there is a compelling rationale for conducting a scoping review on this subject. INCLUSION CRITERIA This scoping review will include any publications that describe systematic, continuous, or repeated initiatives that evaluate or predict clinical performance of AI models with direct implications for the management of patients in any segment of the health care system. METHODS Publications will be identified through searches of the MEDLINE (Ovid), Embase (Ovid), and Scopus databases. Additionally, backward and forward citation searches, as well as a thorough investigation of gray literature, will be conducted. Title and abstract screening, full-text evaluation, and data extraction will be performed by 2 or more independent reviewers. Data will be extracted using a tool developed by the authors. The results will be presented graphically and narratively. REVIEW REGISTRATION Open Science Framework https://osf.io/afkrn.
Collapse
Affiliation(s)
- Eline Sandvig Andersen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| | - Johan Baden Birk-Korch
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| | - Richard Röttger
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Claus Lohman Brasen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| | - Ivan Brandslund
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| | - Jonna Skov Madsen
- Department of Biochemistry and Immunology, Lillebaelt Hospital, Vejle, Denmark
- Department of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| |
Collapse
|
18
|
Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, Harrell FE, Martin GP, Moons KGM, van Smeden M, Sperrin M, Bullock GS, Riley RD. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 2024; 384:e074819. [PMID: 38191193 PMCID: PMC10772854 DOI: 10.1136/bmj-2023-074819] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/04/2023] [Indexed: 01/10/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Michael M Schlussel
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Lucinda Archer
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
| | - Ben Van Calster
- KU Leuven, Department of Development and Regeneration, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
- EPI-Centre, KU Leuven, Belgium
| | - Frank E Harrell
- Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
| |
Collapse
|
19
|
Deisenhofer AK, Barkham M, Beierl ET, Schwartz B, Aafjes-van Doorn K, Beevers CG, Berwian IM, Blackwell SE, Bockting CL, Brakemeier EL, Brown G, Buckman JEJ, Castonguay LG, Cusack CE, Dalgleish T, de Jong K, Delgadillo J, DeRubeis RJ, Driessen E, Ehrenreich-May J, Fisher AJ, Fried EI, Fritz J, Furukawa TA, Gillan CM, Gómez Penedo JM, Hitchcock PF, Hofmann SG, Hollon SD, Jacobson NC, Karlin DR, Lee CT, Levinson CA, Lorenzo-Luaces L, McDanal R, Moggia D, Ng MY, Norris LA, Patel V, Piccirillo ML, Pilling S, Rubel JA, Salazar-de-Pablo G, Schleider JL, Schnurr PP, Schueller SM, Siegle GJ, Uher R, Watkins E, Webb CA, Wiltsey Stirman S, Wynants L, Youn SJ, Zilcha-Mano S, Lutz W, Cohen ZD. Implementing precision methods in personalizing psychological therapies: Barriers and possible ways forward. Behav Res Ther 2024; 172:104443. [PMID: 38086157 DOI: 10.1016/j.brat.2023.104443] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/21/2023] [Accepted: 11/27/2023] [Indexed: 12/26/2023]
Affiliation(s)
| | | | | | | | | | | | | | | | - Claudi L Bockting
- AmsterdamUMC, Department of Psychiatry, Research Program Amsterdam Public Health and Centre for Urban Mental Health, University of Amsterdam, the Netherlands
| | | | | | | | | | | | | | - Kim de Jong
- Leiden University, Institute of Psychology, USA
| | | | | | | | | | | | | | - Jessica Fritz
- University of Cambridge, UK; Philipps University of Marburg, Germany
| | | | - Claire M Gillan
- School of Psychology, Trinity College Institute for Neuroscience, And Global Brain Health Institute, Trinity College Dublin, USA
| | | | | | | | | | | | | | | | | | | | | | | | - Mei Yi Ng
- Florida International University, USA
| | | | | | | | | | | | | | - Jessica L Schleider
- Stony Brook University and Feinberg School of Medicine Northwestern University, USA
| | - Paula P Schnurr
- National Center for PTSD and Geisel School of Medicine at Dartmouth, USA
| | | | | | | | | | | | | | | | - Soo Jeong Youn
- Reliant Medical Group, OptumCare and Harvard Medical School, USA
| | | | | | - Zachary D Cohen
- University of California, Los Angeles and University of Arizona, USA.
| |
Collapse
|
20
|
Lou SS, Liu Y, Cohen ME, Ko CY, Hall BL, Kannampallil T. National Multi-Institutional Validation of a Surgical Transfusion Risk Prediction Model. J Am Coll Surg 2024; 238:99-105. [PMID: 37737660 DOI: 10.1097/xcs.0000000000000874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]
Abstract
BACKGROUND Accurate estimation of surgical transfusion risk is important for many aspects of surgical planning, yet few methods for estimating are available for estimating such risk. There is a need for reliable validated methods for transfusion risk stratification to support effective perioperative planning and resource stewardship. STUDY DESIGN This study was conducted using the American College of Surgeons NSQIP datafile from 2019. S-PATH performance was evaluated at each contributing hospital, with and without hospital-specific model tuning. Linear regression was used to assess the relationship between hospital characteristics and area under the receiver operating characteristic (AUROC) curve. RESULTS A total of 1,000,927 surgical cases from 414 hospitals were evaluated. Aggregate AUROC was 0.910 (95% CI 0.904 to 0.916) without model tuning and 0.925 (95% CI 0.919 to 0.931) with model tuning. AUROC varied across individual hospitals (median 0.900, interquartile range 0.849 to 0.944), but no statistically significant relationships were found between hospital-level characteristics studied and model AUROC. CONCLUSIONS S-PATH demonstrated excellent discriminative performance, although there was variation across hospitals that was not well-explained by hospital-level characteristics. These results highlight the S-PATH's viability as a generalizable surgical transfusion risk prediction tool.
Collapse
Affiliation(s)
- Sunny S Lou
- From the Department of Anesthesiology, Washington University School of Medicine, St Louis, MO (Lou, Kannampallil)
| | - Yaoming Liu
- Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL (Liu, Ko, Hall, Cohen)
| | - Mark E Cohen
- Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL (Liu, Ko, Hall, Cohen)
| | - Clifford Y Ko
- Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL (Liu, Ko, Hall, Cohen)
- Department of Surgery, David Geffen School of Medicine, University of California Los Angeles, and the VA Greater Los Angeles Health System, Los Angeles, CA (Ko)
| | - Bruce L Hall
- Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL (Liu, Ko, Hall, Cohen)
- Department of Surgery, Washington University School of Medicine; Center for Health Policy and the Olin Business School at Washington University in St Louis; John Cochran Veterans Affairs Medical Center; and BJC Healthcare, St Louis, MO (Hall)
| | - Thomas Kannampallil
- From the Department of Anesthesiology, Washington University School of Medicine, St Louis, MO (Lou, Kannampallil)
| |
Collapse
|
21
|
Bednorz A, Mak JKL, Jylhävä J, Religa D. Use of Electronic Medical Records (EMR) in Gerontology: Benefits, Considerations and a Promising Future. Clin Interv Aging 2023; 18:2171-2183. [PMID: 38152074 PMCID: PMC10752027 DOI: 10.2147/cia.s400887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/05/2023] [Indexed: 12/29/2023] Open
Abstract
Electronic medical records (EMRs) have many benefits in clinical research in gerontology, enabling data analysis, development of prognostic tools and disease risk prediction. EMRs also offer a range of advantages in clinical practice, such as comprehensive medical records, streamlined communication with healthcare providers, remote data access, and rapid retrieval of test results, ultimately leading to increased efficiency, enhanced patient safety, and improved quality of care in gerontology, which includes benefits like reduced medication use and better patient history taking and physical examination assessments. The use of artificial intelligence (AI) and machine learning (ML) approaches on EMRs can further improve disease diagnosis, symptom classification, and support clinical decision-making. However, there are also challenges related to data quality, data entry errors, as well as the ethics and safety of using AI in healthcare. This article discusses the future of EMRs in gerontology and the application of AI and ML in clinical research. Ethical and legal issues surrounding data sharing and the need for healthcare professionals to critically evaluate and integrate these technologies are also emphasized. The article concludes by discussing the challenges related to the use of EMRs in research as well as in their primary intended use, the daily clinical practice.
Collapse
Affiliation(s)
- Adam Bednorz
- John Paul II Geriatric Hospital, Katowice, Poland
- Institute of Psychology, Humanitas Academy, Sosnowiec, Poland
| | - Jonathan K L Mak
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Juulia Jylhävä
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Faculty of Social Sciences (Health Sciences) and Gerontology Research Center (GEREC), University of Tampere, Tampere, Finland
| | - Dorota Religa
- Division of Clinical Geriatrics, Department of Neurobiology, Care sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Theme Inflammation and Aging, Karolinska University Hospital, Huddinge, Sweden
| |
Collapse
|
22
|
Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokiene A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023; 31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open
Abstract
OBJECTIVE Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND METHODS Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. RESULTS The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. DISCUSSION Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. CONCLUSION This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.
Collapse
Affiliation(s)
- Timothy Bergquist
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | | | - Yao Yan
- Sage Bionetworks, Seattle, WA, United States
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, United States
| | - Justin Prosser
- Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
| | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Łukasz Charzewski
- Proacta, Warsaw, Poland
- Division of Biophysics, University of Warsaw, Warsaw, Poland
| | | | - Ivan Brugere
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Alisa Prusokiene
- Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Augustinas Prusokas
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Yonghwa Choi
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sanghoon Lee
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Junseok Choe
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Inggeol Lee
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Justin Guinney
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
23
|
Riley S, Tam K, Tse WY, Connor A, Wei Y. An external validation of the Kidney Donor Risk Index in the UK transplant population in the presence of semi-competing events. Diagn Progn Res 2023; 7:20. [PMID: 37986130 PMCID: PMC10662562 DOI: 10.1186/s41512-023-00159-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 09/11/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND Transplantation represents the optimal treatment for many patients with end-stage kidney disease. When a donor kidney is available to a waitlisted patient, clinicians responsible for the care of the potential recipient must make the decision to accept or decline the offer based upon complex and variable information about the donor, the recipient and the transplant process. A clinical prediction model may be able to support clinicians in their decision-making. The Kidney Donor Risk Index (KDRI) was developed in the United States to predict graft failure following kidney transplantation. The survival process following transplantation consists of semi-competing events where death precludes graft failure, but not vice-versa. METHODS We externally validated the KDRI in the UK kidney transplant population and assessed whether validation under a semi-competing risks framework impacted predictive performance. Additionally, we explored whether the KDRI requires updating. We included 20,035 adult recipients of first, deceased donor, single, kidney-only transplants between January 1, 2004, and December 31, 2018, collected by the UK Transplant Registry and held by NHS Blood and Transplant. The outcomes of interest were 1- and 5-year graft failure following transplantation. In light of the semi-competing events, recipient death was handled in two ways: censoring patients at the time of death and modelling death as a competing event. Cox proportional hazard models were used to validate the KDRI when censoring graft failure by death, and cause-specific Cox models were used to account for death as a competing event. RESULTS The KDRI underestimated event probabilities for those at higher risk of graft failure. For 5-year graft failure, discrimination was poorer in the semi-competing risks model (0.625, 95% CI 0.611 to 0.640;0.611, 95% CI 0.597 to 0.625), but predictions were more accurate (Brier score 0.117, 95% CI 0.112 to 0.121; 0.114, 95% CI 0.109 to 0.118). Calibration plots were similar regardless of whether the death was modelled as a competing event or not. Updating the KDRI worsened calibration, but marginally improved discrimination. CONCLUSIONS Predictive performance for 1-year graft failure was similar between death-censored and competing event graft failure, but differences appeared when predicting 5-year graft failure. The updated index did not have superior performance and we conclude that updating the KDRI in the present form is not required.
Collapse
Affiliation(s)
- Stephanie Riley
- Centre for Mathematical Sciences, School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, UK.
| | - Kimberly Tam
- School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, UK
| | - Wai-Yee Tse
- Department of Renal Medicine, South West Transplant Centre, University Hospitals Plymouth NHS Trust, Plymouth, UK
| | - Andrew Connor
- Department of Renal Medicine, South West Transplant Centre, University Hospitals Plymouth NHS Trust, Plymouth, UK
| | - Yinghui Wei
- Centre for Mathematical Sciences, School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, UK.
| |
Collapse
|
24
|
Bullock GS, Ward P, Impellizzeri FM, Kluzek S, Hughes T, Dhiman P, Riley RD, Collins GS. The Trade Secret Taboo: Open Science Methods are Required to Improve Prediction Models in Sports Medicine and Performance. Sports Med 2023; 53:1841-1849. [PMID: 37160562 DOI: 10.1007/s40279-023-01849-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2023] [Indexed: 05/11/2023]
Abstract
Clinical prediction models in sports medicine that utilize regression or machine learning techniques have become more widely published, used, and disseminated. However, these models are typically characterized by poor methodology and incomplete reporting, and an inadequate evaluation of performance, leading to unreliable predictions and weak clinical utility within their intended sport population. Before implementation in practice, models require a thorough evaluation. Strong replicable methods and transparency reporting allow practitioners and researchers to make independent judgments as to the model's validity, performance, clinical usefulness, and confidence it will do no harm. However, this is not reflected in the sports medicine literature. As shown in a recent systematic review of models for predicting sports injury models, most were typically characterized by poor methodology, incomplete reporting, and inadequate performance evaluation. Because of constraints imposed by data from individual teams, the development of accurate, reliable, and useful models is highly reliant on external validation. However, a barrier to collaboration is a desire to maintain a competitive advantage; a team's proprietary information is often perceived as high value, and so these 'trade secrets' are frequently guarded. These 'trade secrets' also apply to commercially available models, as developers are unwilling to share proprietary (and potentially profitable) development and validation information. In this Current Opinion, we: (1) argue that open science is essential for improving sport prediction models and (2) critically examine sport prediction models for open science practices.
Collapse
Affiliation(s)
- Garrett S Bullock
- Department of Orthopaedic Surgery and Rehabilitation, Wake Forest School of Medicine, 475 Vine St., Winston-Salem, NC, 27101, USA.
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA.
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK.
| | | | - Franco M Impellizzeri
- School of Sport, Exercise, and Rehabilitation, University of Technology Sydney, Sydney, NSW, Australia
| | - Stefan Kluzek
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
- Sports Medicine Research Department, University of Nottingham, Nottingham, UK
- English Institute of Sport, Bisham Abbey, UK
| | - Tom Hughes
- Manchester United Football Club, Manchester, UK
- Department of Health Professions, Manchester Metropolitan University, Manchester, UK
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, UK
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
25
|
Bottani S, Burgos N, Maire A, Saracino D, Ströer S, Dormont D, Colliot O. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Med Image Anal 2023; 89:102903. [PMID: 37523918 DOI: 10.1016/j.media.2023.102903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/01/2023] [Accepted: 07/12/2023] [Indexed: 08/02/2023]
Abstract
A variety of algorithms have been proposed for computer-aided diagnosis of dementia from anatomical brain MRI. These approaches achieve high accuracy when applied to research data sets but their performance on real-life clinical routine data has not been evaluated yet. The aim of this work was to study the performance of such approaches on clinical routine data, based on a hospital data warehouse, and to compare the results to those obtained on a research data set. The clinical data set was extracted from the hospital data warehouse of the Greater Paris area, which includes 39 different hospitals. The research set was composed of data from the Alzheimer's Disease Neuroimaging Initiative data set. In the clinical set, the population of interest was identified by exploiting the diagnostic codes from the 10th revision of the International Classification of Diseases that are assigned to each patient. We studied how the imbalance of the training sets, in terms of contrast agent injection and image quality, may bias the results. We demonstrated that computer-aided diagnosis performance was strongly biased upwards (over 17 percent points of balanced accuracy) by the confounders of image quality and contrast agent injection, a phenomenon known as the Clever Hans effect or shortcut learning. When these biases were removed, the performance was very poor. In any case, the performance was considerably lower than on the research data set. Our study highlights that there are still considerable challenges for translating dementia computer-aided diagnosis systems to clinical routine.
Collapse
Affiliation(s)
- Simona Bottani
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France
| | - Ninon Burgos
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France
| | | | - Dario Saracino
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France; IM2A, Reference Centre for Rare or Early-Onset Dementias, Département de Neurologie, AP-HP, Hôpital de la Pitié Salpêtrière, Paris, 75013, France
| | - Sebastian Ströer
- AP-HP, Hôpital de la Pitié Salpêtrière, Department of Neuroradiology, Paris, 75013, France
| | - Didier Dormont
- AP-HP, Hôpital de la Pitié Salpêtrière, Department of Neuroradiology, Paris, 75013, France; Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, DMU DIAMENT, Paris, 75013, France
| | - Olivier Colliot
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, Paris, 75013, France.
| |
Collapse
|
26
|
Vaid A, Sawant A, Suarez-Farinas M, Lee J, Kaul S, Kovatch P, Freeman R, Jiang J, Jayaraman P, Fayad Z, Argulian E, Lerakis S, Charney AW, Wang F, Levin M, Glicksberg B, Narula J, Hofer I, Singh K, Nadkarni GN. Implications of the Use of Artificial Intelligence Predictive Models in Health Care Settings : A Simulation Study. Ann Intern Med 2023; 176:1358-1369. [PMID: 37812781 DOI: 10.7326/m23-0949] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/11/2023] Open
Abstract
BACKGROUND Substantial effort has been directed toward demonstrating uses of predictive models in health care. However, implementation of these models into clinical practice may influence patient outcomes, which in turn are captured in electronic health record data. As a result, deployed models may affect the predictive ability of current and future models. OBJECTIVE To estimate changes in predictive model performance with use through 3 common scenarios: model retraining, sequentially implementing 1 model after another, and intervening in response to a model when 2 are simultaneously implemented. DESIGN Simulation of model implementation and use in critical care settings at various levels of intervention effectiveness and clinician adherence. Models were either trained or retrained after simulated implementation. SETTING Admissions to the intensive care unit (ICU) at Mount Sinai Health System (New York, New York) and Beth Israel Deaconess Medical Center (Boston, Massachusetts). PATIENTS 130 000 critical care admissions across both health systems. INTERVENTION Across 3 scenarios, interventions were simulated at varying levels of clinician adherence and effectiveness. MEASUREMENTS Statistical measures of performance, including threshold-independent (area under the curve) and threshold-dependent measures. RESULTS At fixed 90% sensitivity, in scenario 1 a mortality prediction model lost 9% to 39% specificity after retraining once and in scenario 2 a mortality prediction model lost 8% to 15% specificity when created after the implementation of an acute kidney injury (AKI) prediction model; in scenario 3, models for AKI and mortality prediction implemented simultaneously, each led to reduced effective accuracy of the other by 1% to 28%. LIMITATIONS In real-world practice, the effectiveness of and adherence to model-based recommendations are rarely known in advance. Only binary classifiers for tabular ICU admissions data were simulated. CONCLUSION In simulated ICU settings, a universally effective model-updating approach for maintaining model performance does not seem to exist. Model use may have to be recorded to maintain viability of predictive modeling. PRIMARY FUNDING SOURCE National Center for Advancing Translational Sciences.
Collapse
Affiliation(s)
- Akhil Vaid
- Division of Data-Driven and Digital Medicine, Department of Medicine, and The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.V., P.J.)
| | - Ashwin Sawant
- Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Division of Hospital Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.S.)
| | - Mayte Suarez-Farinas
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York (M.S., J.L.)
| | - Juhee Lee
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York (M.S., J.L.)
| | - Sanjeev Kaul
- Department of Surgery, Hackensack Meridian School of Medicine, Nutley, New Jersey (S.K.)
| | - Patricia Kovatch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York (P.K., B.G.)
| | - Robert Freeman
- Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (R.F.)
| | - Joy Jiang
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (J.J.)
| | - Pushkala Jayaraman
- Division of Data-Driven and Digital Medicine, Department of Medicine, and The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (A.V., P.J.)
| | - Zahi Fayad
- BioMedical Engineering and Imaging Institute and Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (Z.F.)
| | - Edgar Argulian
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
| | - Stamatios Lerakis
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
| | - Alexander W Charney
- The Charles Bronfman Institute of Personalized Medicine and Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, and Department of Surgery, Hackensack Meridian School of Medicine, Nutley, New Jersey (A.W.C.)
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York (F.W.)
| | - Matthew Levin
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (M.L.)
| | - Benjamin Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York (P.K., B.G.)
| | - Jagat Narula
- Mount Sinai Heart, Icahn School of Medicine at Mount Sinai, New York, New York (E.A., S.L., J.N.)
| | - Ira Hofer
- Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York (I.H.)
| | - Karandeep Singh
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan (K.S.)
| | - Girish N Nadkarni
- Division of Data-Driven and Digital Medicine, Department of Medicine; The Charles Bronfman Institute of Personalized Medicine; and Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York (G.N.N.)
| |
Collapse
|
27
|
Wu L, Li Y, Zhang X, Chen X, Li D, Nie S, Li X, Bellou A. Prediction differences and implications of acute kidney injury with and without urine output criteria in adult critically ill patients. Nephrol Dial Transplant 2023; 38:2368-2378. [PMID: 37019835 PMCID: PMC10539235 DOI: 10.1093/ndt/gfad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Due to the convenience of serum creatinine (SCr) monitoring and the relative complexity of urine output (UO) monitoring, most studies have predicted acute kidney injury (AKI) only based on SCr criteria. This study aimed to compare the differences between SCr alone and combined UO criteria in predicting AKI. METHODS We applied machine learning methods to evaluate the performance of 13 prediction models composed of different feature categories on 16 risk assessment tasks (half used only SCr criteria, half used both SCr and UO criteria). The area under receiver operator characteristic curve (AUROC), the area under precision recall curve (AUPRC) and calibration were used to assess the prediction performance. RESULTS In the first week after ICU admission, the prevalence of any AKI was 29% under SCr criteria alone and increased to 60% when the UO criteria was combined. Adding UO to SCr criteria can significantly identify more AKI patients. The predictive importance of feature types with and without UO was different. Using only laboratory data maintained similar predictive performance to the full feature model under only SCr criteria [e.g. for AKI within the 48-h time window after 1 day of ICU admission, AUROC (95% confidence interval) 0.83 (0.82, 0.84) vs 0.84 (0.83, 0.85)], but it was not sufficient when the UO was added [corresponding AUROC (95% confidence interval) 0.75 (0.74, 0.76) vs 0.84 (0.83, 0.85)]. CONCLUSIONS This study found that SCr and UO measures should not be regarded as equivalent criteria for AKI staging, and emphasizes the importance and necessity of UO criteria in AKI risk assessment.
Collapse
Affiliation(s)
- Lijuan Wu
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
- Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
| | - Yanqin Li
- Division of Nephrology, Nanfang Hospital, Southern Medical University; National Clinical Research Center for Kidney Disease; State Key Laboratory of Organ Failure Research; Guangdong Provincial Institute of Nephrology; Guangdong Provincial Key Laboratory of Renal Failure Research, Guangzhou, China
| | - Xiangzhou Zhang
- Big Data Decision Institute, Jinan University, Guangzhou, China
| | - Xuanhui Chen
- Medical Big Data Center, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, Guangdong Province, China
| | - Deyang Li
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
| | - Sheng Nie
- Division of Nephrology, Nanfang Hospital, Southern Medical University; National Clinical Research Center for Kidney Disease; State Key Laboratory of Organ Failure Research; Guangdong Provincial Institute of Nephrology; Guangdong Provincial Key Laboratory of Renal Failure Research, Guangzhou, China
| | - Xin Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong, China
| | - Abdelouahab Bellou
- Institute of Sciences in Emergency Medicine, Department of Emergency Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, Guangdong, China
- Department of Emergency Medicine, Wayne State University School of Medicine, Detroit, MI, USA
- Global Network on Emergency Medicine, Brookline, MA, USA
| |
Collapse
|
28
|
Luther SL, Thomason SS, Sabharwal S, Finch DK, McCart J, Toyinbo P, Bouayad L, Lapcevic W, Hahm B, Hauser RG, Matheny ME, Powell-Cope G. Machine learning to develop a predictive model of pressure injury in persons with spinal cord injury. Spinal Cord 2023; 61:513-520. [PMID: 37598263 DOI: 10.1038/s41393-023-00924-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 07/18/2023] [Accepted: 07/24/2023] [Indexed: 08/21/2023]
Abstract
STUDY DESIGN A 5-year longitudinal, retrospective, cohort study. OBJECTIVES Develop a prediction model based on electronic health record (EHR) data to identify veterans with spinal cord injury/diseases (SCI/D) at highest risk for new pressure injuries (PIs). SETTING Structured (coded) and text EHR data, for veterans with SCI/D treated in a VHA SCI/D Center between October 1, 2008, and September 30, 2013. METHODS A total of 4709 veterans were available for analysis after randomly selecting 175 to act as a validation (gold standard) sample. Machine learning models were created using ten-fold cross validation and three techniques: (1) two-step logistic regression; (2) regression model employing adaptive LASSO; (3) and gradient boosting. Models based on each method were compared using area under the receiver-operating curve (AUC) analysis. RESULTS The AUC value for the gradient boosting model was 0.62 (95% CI = 0.54-0.70), for the logistic regression model it was 0.67 (95% CI = 0.59-0.75), and for the adaptive LASSO model it was 0.72 (95% CI = 0.65-80). Based on these results, the adaptive LASSO model was chosen for interpretation. The strongest predictors of new PI cases were having fewer total days in the hospital in the year before the annual exam, higher vs. lower weight and most severe vs. less severe grade of injury based on the American Spinal Cord Injury Association (ASIA) Impairment Scale. CONCLUSIONS While the analyses resulted in a potentially useful predictive model, clinical implications were limited because modifiable risk factors were absent in the models.
Collapse
Affiliation(s)
- Stephen L Luther
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA.
- College of Public Health, University of South Florida, Tampa, FL, USA.
| | | | - Sunil Sabharwal
- VA Boston Health Care System, Spinal Cord Injury Service, Harvard Medical School, Boston, MA, USA
- Department of Physical Medicine and Rehabilitation, Harvard Medical School, Boston, MA, USA
| | - Dezon K Finch
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA
| | - James McCart
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA
| | - Peter Toyinbo
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA
| | - Lina Bouayad
- College of Business, Florida International University, Miami, FL, USA
| | - William Lapcevic
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA
| | - Bridget Hahm
- Research Service, James A. Haley Veterans' Hospital, Tampa, FL, USA
| | | | - Michael E Matheny
- Geriatrics Research Education and Clinical Care, Tennessee Valley Healthcare System, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of General Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Research & Development Service, Tennessee Valley Healthcare System, Nashville, TN College of Nursing, Nashville, TN, USA
| | | |
Collapse
|
29
|
Rahmani K, Thapa R, Tsou P, Casie Chetty S, Barnes G, Lam C, Foon Tso C. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inform 2023; 173:104930. [PMID: 36893656 DOI: 10.1016/j.ijmedinf.2022.104930] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 10/30/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022]
Abstract
BACKGROUND Data drift can negatively impact the performance of machine learning algorithms (MLAs) that were trained on historical data. As such, MLAs should be continuously monitored and tuned to overcome the systematic changes that occur in the distribution of data. In this paper, we study the extent of data drift and provide insights about its characteristics for sepsis onset prediction. This study will help elucidate the nature of data drift for prediction of sepsis and similar diseases. This may aid with the development of more effective patient monitoring systems that can stratify risk for dynamic disease states in hospitals. METHODS We devise a series of simulations that measure the effects of data drift in patients with sepsis, using electronic health records (EHR). We simulate multiple scenarios in which data drift may occur, namely the change in the distribution of the predictor variables (covariate shift), the change in the statistical relationship between the predictors and the target (concept shift), and the occurrence of a major healthcare event (major event) such as the COVID-19 pandemic. We measure the impact of data drift on model performances, identify the circumstances that necessitate model retraining, and compare the effects of different retraining methodologies and model architecture on the outcomes. We present the results for two different MLAs, eXtreme Gradient Boosting (XGB) and Recurrent Neural Network (RNN). RESULTS Our results show that the properly retrained XGB models outperform the baseline models in all simulation scenarios, hence signifying the existence of data drift. In the major event scenario, the area under the receiver operating characteristic curve (AUROC) at the end of the simulation period is 0.811 for the baseline XGB model and 0.868 for the retrained XGB model. In the covariate shift scenario, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.853 and 0.874 respectively. In the concept shift scenario and under the mixed labeling method, the retrained XGB models perform worse than the baseline model for most simulation steps. However, under the full relabeling method, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.852 and 0.877 respectively. The results for the RNN models were mixed, suggesting that retraining based on a fixed network architecture may be inadequate for an RNN. We also present the results in the form of other performance metrics such as the ratio of observed to expected probabilities (calibration) and the normalized rate of positive predictive values (PPV) by prevalence, referred to as lift, at a sensitivity of 0.8. CONCLUSION Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.
Collapse
Affiliation(s)
- Keyvan Rahmani
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Rahul Thapa
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Peiling Tsou
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Satish Casie Chetty
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA.
| | - Gina Barnes
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Carson Lam
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Chak Foon Tso
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| |
Collapse
|
30
|
Lam G, Rish I, Dixon PC. Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data. J Biomech 2023; 154:111606. [PMID: 37187130 DOI: 10.1016/j.jbiomech.2023.111606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 04/19/2023] [Accepted: 04/26/2023] [Indexed: 05/17/2023]
Abstract
Clinical datasets often comprise multiple data points or trials sampled from a single participant. When these datasets are used to train machine learning models, the method used to extract train and test sets must be carefully chosen. Using the standard machine learning approach (random-wise split), different trials from the same participant may appear in both training and test sets. This has led to schemes capable of segregating data points from a same participant into a single set (subject-wise split). Past investigations have demonstrated that models trained in this manner underperform compared to those trained using random-split schemes. Additional training of models via a small subset of trials, known as calibration, bridges the gap in performance across split schemes; however, the amount of calibration trials required to achieve strong model performance is unclear. Thus, this study aims to investigate the relationship between calibration training set size and prediction accuracy on the calibration test set. A database of 30 young, healthy adults performing multiple walking trials across nine different surfaces while fit with inertial measurement unit sensors on the lower limbs was used to develop a deep-learning classifier. For subject-wise trained models, calibration on a single gait cycle per surface yielded a 70% increase in F1-score, the harmonic mean of precision and recall, while 10 gait cycles per surface were sufficient to match the performance of a random-wise trained model. Code to generate calibration curves may be found at (https://github.com/GuillaumeLam/PaCalC).
Collapse
Affiliation(s)
- Guillaume Lam
- Department of Computer Science and Operations Research, Université de Montréal, Canada.
| | - Irina Rish
- Department of Computer Science and Operations Research, Université de Montréal, Canada; Mila - Quebec AI Institute, Université de Montréal, Canada
| | - Philippe C Dixon
- School of Kinesiology and Physical Activity Sciences, Faculty of Medicine, Université de Montréal, Canada; Research Center of the Sainte-Justine University Hospital (CRCHUSJ), Canada; Institute of Biomedical Engineering, Faculty of medicine, Université de Montréal, Canada
| |
Collapse
|
31
|
Andonov DI, Ulm B, Graessner M, Podtschaske A, Blobner M, Jungwirth B, Kagerbauer SM. Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality. BMC Med Inform Decis Mak 2023; 23:67. [PMID: 37046259 PMCID: PMC10092913 DOI: 10.1186/s12911-023-02151-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/15/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Machine-learning models are susceptible to external influences which can result in performance deterioration. The aim of our study was to elucidate the impact of a sudden shift in covariates, like the one caused by the Covid-19 pandemic, on model performance. METHODS After ethical approval and registration in Clinical Trials (NCT04092933, initial release 17/09/2019), we developed different models for the prediction of perioperative mortality based on preoperative data: one for the pre-pandemic data period until March 2020, one including data before the pandemic and from the first wave until May 2020, and one that covers the complete period before and during the pandemic until October 2021. We applied XGBoost as well as a Deep Learning neural network (DL). Performance metrics of each model during the different pandemic phases were determined, and XGBoost models were analysed for changes in feature importance. RESULTS XGBoost and DL provided similar performance on the pre-pandemic data with respect to area under receiver operating characteristic (AUROC, 0.951 vs. 0.942) and area under precision-recall curve (AUPR, 0.144 vs. 0.187). Validation in patient cohorts of the different pandemic waves showed high fluctuations in performance from both AUROC and AUPR for DL, whereas the XGBoost models seemed more stable. Change in variable frequencies with onset of the pandemic were visible in age, ASA score, and the higher proportion of emergency operations, among others. Age consistently showed the highest information gain. Models based on pre-pandemic data performed worse during the first pandemic wave (AUROC 0.914 for XGBoost and DL) whereas models augmented with data from the first wave lacked performance after the first wave (AUROC 0.907 for XGBoost and 0.747 for DL). The deterioration was also visible in AUPR, which worsened by over 50% in both XGBoost and DL in the first phase after re-training. CONCLUSIONS A sudden shift in data impacts model performance. Re-training the model with updated data may cause degradation in predictive accuracy if the changes are only transient. Too early re-training should therefore be avoided, and close model surveillance is necessary.
Collapse
Affiliation(s)
- D I Andonov
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
| | - B Ulm
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - M Graessner
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - A Podtschaske
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
| | - M Blobner
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - B Jungwirth
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - S M Kagerbauer
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany.
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany.
| |
Collapse
|
32
|
Tohidinezhad F, Bontempi D, Zhang Z, Dingemans AM, Aerts J, Bootsma G, Vansteenkiste J, Hashemi S, Smit E, Gietema H, Aerts HJ, Dekker A, Hendriks LEL, Traverso A, De Ruysscher D. Computed tomography-based radiomics for the differential diagnosis of pneumonitis in stage IV non-small cell lung cancer patients treated with immune checkpoint inhibitors. Eur J Cancer 2023; 183:142-151. [PMID: 36857819 DOI: 10.1016/j.ejca.2023.01.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 01/29/2023] [Accepted: 01/29/2023] [Indexed: 02/11/2023]
Abstract
INTRODUCTION Immunotherapy-induced pneumonitis (IIP) is a serious side-effect which requires accurate diagnosis and management with high-dose corticosteroids. The differential diagnosis between IIP and other types of pneumonitis (OTP) remains challenging due to similar radiological patterns. This study was aimed to develop a prediction model to differentiate IIP from OTP in patients with stage IV non-small cell lung cancer (NSCLC) who developed pneumonitis during immunotherapy. METHODS Consecutive patients with metastatic NSCLC treated with immunotherapy in six centres in the Netherlands and Belgium from 2017 to 2020 were reviewed and cause-specific pneumonitis events were identified. Seven regions of interest (segmented lungs and spheroidal/cubical regions surrounding the inflammation) were examined to extract the most predictive radiomic features from the chest computed tomography images obtained at pneumonitis manifestation. Models were internally tested regarding discrimination, calibration and decisional benefit. To evaluate the clinical application of the models, predicted labels were compared with the separate clinical and radiological judgements. RESULTS A total of 556 patients were reviewed; 31 patients (5.6%) developed IIP and 41 patients developed OTP (7.4%). The line of immunotherapy was the only predictive factor in the clinical model (2nd versus 1st odds ratio = 0.08, 95% confidence interval:0.01-0.77). The best radiomic model was achieved using a 75-mm spheroidal region of interest which showed an optimism-corrected area under the receiver operating characteristic curve of 0.83 (95% confidence interval:0.77-0.95) with negative and positive predictive values of 80% and 79%, respectively. Good calibration and net benefits were achieved for the radiomic model across the entire range of probabilities. A correct diagnosis was provided by the radiomic model in 10 out of 12 cases with non-conclusive radiological judgements. CONCLUSION Radiomic biomarkers applied to computed tomography imaging may support clinicians making the differential diagnosis of pneumonitis in patients with NSCLC receiving immunotherapy, especially when the radiologic assessment is non-conclusive.
Collapse
Affiliation(s)
- Fariba Tohidinezhad
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Dennis Bontempi
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands; Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Zhen Zhang
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Anne-Marie Dingemans
- Department of Pulmonary Diseases, School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Joachim Aerts
- Department of Pulmonary Medicine, School of Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Gerben Bootsma
- Department of Pulmonary Diseases, Zuyderland Hospital, Heerlen, the Netherlands
| | - Johan Vansteenkiste
- Department of Respiratory Oncology, University Hospital KU Leuven, Leuven, Belgium
| | - Sayed Hashemi
- Department of Pulmonary Medicine, Amsterdam UMC, VU University Medical Center, Amsterdam, the Netherlands
| | - Egbert Smit
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Hester Gietema
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
| | - Hugo Jwl Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands; Departments of Radiation Oncology and Radiology, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Andre Dekker
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Lizza E L Hendriks
- Department of Pulmonary Diseases, School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Alberto Traverso
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands
| | - Dirk De Ruysscher
- Department of Radiation Oncology (Maastro Clinic), School for Oncology and Reproduction (GROW), Maastricht University Medical Center, Maastricht, the Netherlands.
| |
Collapse
|
33
|
Yu X, Wu R, Ji Y, Feng Z. Bibliometric and visual analysis of machine learning-based research in acute kidney injury worldwide. Front Public Health 2023; 11:1136939. [PMID: 37006534 PMCID: PMC10063840 DOI: 10.3389/fpubh.2023.1136939] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 03/01/2023] [Indexed: 03/19/2023] Open
Abstract
Background Acute kidney injury (AKI) is a serious clinical complication associated with adverse short-term and long-term outcomes. In recent years, with the rapid popularization of electronic health records and artificial intelligence machine learning technology, the detection rate and treatment of AKI have been greatly improved. At present, there are many studies in this field, and a large number of articles have been published, but we do not know much about the quality of research production in this field, as well as the focus and trend of current research. Methods Based on the Web of Science Core Collection, studies reporting machine learning-based AKI research that were published from 2013 to 2022 were retrieved and collected after manual review. VOSviewer and other software were used for bibliometric visualization analysis, including publication trends, geographical distribution characteristics, journal distribution characteristics, author contributions, citations, funding source characteristics, and keyword clustering. Results A total of 336 documents were analyzed. Since 2018, publications and citations have increased dramatically, with the United States (143) and China (101) as the main contributors. Regarding authors, Bihorac, A and Ozrazgat-Baslanti, T from the Kansas City Medical Center have published 10 articles. Regarding institutions, the University of California (18) had the most publications. Approximately 1/3 of the publications were published in Q1 and Q2 journals, of which Scientific Reports (19) was the most prolific journal. Tomašev et al.'s study that was published in 2019 has been widely cited by researchers. The results of cluster analysis of co-occurrence keywords suggest that the construction of AKI prediction model related to critical patients and sepsis patients is the research frontier, and XGBoost algorithm is also popular. Conclusion This study first provides an updated perspective on machine learning-based AKI research, which may be beneficial for subsequent researchers to choose suitable journals and collaborators and may provide a more convenient and in-depth understanding of the research basis, hotspots and frontiers.
Collapse
Affiliation(s)
- Xiang Yu
- State Key Laboratory of Kidney Diseases, Department of Nephrology, Chinese People's Liberation Army General Hospital, Chinese People's Liberation Army Institute of Nephrology, National Clinical Research Center of Kidney Diseases, Beijing, China
| | - RiLiGe Wu
- Medical Big Data Research Center, Chinese People's Liberation Army General Hospital, Beijing, China
| | - YuWei Ji
- State Key Laboratory of Kidney Diseases, Department of Nephrology, Chinese People's Liberation Army General Hospital, Chinese People's Liberation Army Institute of Nephrology, National Clinical Research Center of Kidney Diseases, Beijing, China
| | - Zhe Feng
- State Key Laboratory of Kidney Diseases, Department of Nephrology, Chinese People's Liberation Army General Hospital, Chinese People's Liberation Army Institute of Nephrology, National Clinical Research Center of Kidney Diseases, Beijing, China
| |
Collapse
|
34
|
Guo LL, Steinberg E, Fleming SL, Posada J, Lemmon J, Pfohl SR, Shah N, Fries J, Sung L. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci Rep 2023; 13:3767. [PMID: 36882576 PMCID: PMC9992466 DOI: 10.1038/s41598-023-30820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 03/02/2023] [Indexed: 03/09/2023] Open
Abstract
Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009-2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5-9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift.
Collapse
Affiliation(s)
- Lin Lawrence Guo
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Ethan Steinberg
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Scott Lanyon Fleming
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Jose Posada
- Universidad del Norte, Barranquilla, Colombia
| | - Joshua Lemmon
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Stephen R Pfohl
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Nigam Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Jason Fries
- Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA
| | - Lillian Sung
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada. .,Division of Haematology/Oncology, The Hospital for Sick Children, 555 University Avenue, Toronto, ON, M5G1X8, Canada.
| |
Collapse
|
35
|
Abstract
BACKGROUND Clinical prediction models should be validated before implementation in clinical practice. But is favorable performance at internal validation or one external validation sufficient to claim that a prediction model works well in the intended clinical context? MAIN BODY We argue to the contrary because (1) patient populations vary, (2) measurement procedures vary, and (3) populations and measurements change over time. Hence, we have to expect heterogeneity in model performance between locations and settings, and across time. It follows that prediction models are never truly validated. This does not imply that validation is not important. Rather, the current focus on developing new models should shift to a focus on more extensive, well-conducted, and well-reported validation studies of promising models. CONCLUSION Principled validation strategies are needed to understand and quantify heterogeneity, monitor performance over time, and update prediction models when appropriate. Such strategies will help to ensure that prediction models stay up-to-date and safe to support clinical decision-making.
Collapse
|
36
|
Debray TPA, Collins GS, Riley RD, Snell KIE, Van Calster B, Reitsma JB, Moons KGM. Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration. BMJ 2023; 380:e071058. [PMID: 36750236 PMCID: PMC9903176 DOI: 10.1136/bmj-2022-071058] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/07/2022] [Indexed: 02/09/2023]
Affiliation(s)
- Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Oxford, UK
- National Institute for Health and Care Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Kym I E Snell
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- EPI-centre, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
37
|
Parikh RB, Zhang Y, Kolla L, Chivers C, Courtright KR, Zhu J, Navathe AS, Chen J. Performance drift in a mortality prediction algorithm among patients with cancer during the SARS-CoV-2 pandemic. J Am Med Inform Assoc 2023; 30:348-354. [PMID: 36409991 PMCID: PMC9846686 DOI: 10.1093/jamia/ocac221] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/28/2022] [Accepted: 11/03/2022] [Indexed: 11/22/2022] Open
Abstract
Sudden changes in health care utilization during the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic may have impacted the performance of clinical predictive models that were trained prior to the pandemic. In this study, we evaluated the performance over time of a machine learning, electronic health record-based mortality prediction algorithm currently used in clinical practice to identify patients with cancer who may benefit from early advance care planning conversations. We show that during the pandemic period, algorithm identification of high-risk patients had a substantial and sustained decline. Decreases in laboratory utilization during the peak of the pandemic may have contributed to drift. Calibration and overall discrimination did not markedly decline during the pandemic. This argues for careful attention to the performance and retraining of predictive algorithms that use inputs from the pandemic period.
Collapse
Affiliation(s)
- Ravi B Parikh
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania, USA
| | - Yichen Zhang
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Likhitha Kolla
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Corey Chivers
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Katherine R Courtright
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jingsan Zhu
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Amol S Navathe
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
38
|
Azimi V, Zaydman MA. Optimizing Equity: Working towards Fair Machine Learning Algorithms in Laboratory Medicine. J Appl Lab Med 2023; 8:113-128. [PMID: 36610413 DOI: 10.1093/jalm/jfac085] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/09/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND Methods of machine learning provide opportunities to use real-world data to solve complex problems. Applications of these methods in laboratory medicine promise to increase diagnostic accuracy and streamline laboratory operations leading to improvement in the quality and efficiency of healthcare delivery. However, machine learning models are vulnerable to learning from undesirable patterns in the data that reflect societal biases. As a result, irresponsible application of machine learning may lead to the perpetuation, or even amplification, of existing disparities in healthcare outcomes. CONTENT In this work, we review what it means for a model to be unfair, discuss the various ways that machine learning models become unfair, and present engineering principles emerging from the field of algorithmic fairness. These materials are presented with a focus on the development of machine learning models in laboratory medicine. SUMMARY We hope that this work will serve to increase awareness, and stimulate further discussion, of this important issue among laboratorians as the field moves forward with the incorporation of machine learning models into laboratory practice.
Collapse
Affiliation(s)
- Vahid Azimi
- Washington University in St. Louis School of Medicine, Department of Pathology and Immunology, St. Louis, MO 63110, United States
| | - Mark A Zaydman
- Washington University in St. Louis School of Medicine, Department of Pathology and Immunology, St. Louis, MO 63110, United States
| |
Collapse
|
39
|
Vagliano I, Chesnaye NC, Leopold JH, Jager KJ, Abu-Hanna A, Schut MC. Machine learning models for predicting acute kidney injury: a systematic review and critical appraisal. Clin Kidney J 2022; 15:2266-2280. [PMID: 36381375 PMCID: PMC9664575 DOI: 10.1093/ckj/sfac181] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND The number of studies applying machine learning (ML) to predict acute kidney injury (AKI) has grown steadily over the past decade. We assess and critically appraise the state of the art in ML models for AKI prediction, considering performance, methodological soundness, and applicability. METHODS We searched PubMed and ArXiv, extracted data, and critically appraised studies based on the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD), Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), and Prediction Model Risk of Bias Assessment Tool (PROBAST) guidelines. RESULTS Forty-six studies from 3166 titles were included. Thirty-eight studies developed a model, five developed and externally validated one, and three studies externally validated one. Flexible ML methods were used more often than deep learning, although the latter was common with temporal variables and text as predictors. Predictive performance showed an area under receiver operating curves ranging from 0.49 to 0.99. Our critical appraisal identified a high risk of bias in 39 studies. Some studies lacked internal validation, whereas external validation and interpretability of results were rarely considered. Fifteen studies focused on AKI prediction in the intensive care setting, and the US-derived Medical Information Mart for Intensive Care (MIMIC) data set was commonly used. Reproducibility was limited as data and code were usually unavailable. CONCLUSIONS Flexible ML methods are popular for the prediction of AKI, although more complex models based on deep learning are emerging. Our critical appraisal identified a high risk of bias in most models: Studies should use calibration measures and external validation more often, improve model interpretability, and share data and code to improve reproducibility.
Collapse
Affiliation(s)
- Iacopo Vagliano
- Deptartment of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Nicholas C Chesnaye
- ERA Registry, Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Jan Hendrik Leopold
- Deptartment of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Kitty J Jager
- ERA Registry, Department of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Deptartment of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Martijn C Schut
- Deptartment of Medical Informatics, Amsterdam UMC, University of Amsterdam, Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
40
|
Zhang X, Liu K, Yuan B, Wang H, Chen S, Xue Y, Chen W, Liu M, Hu Y. A hybrid adaptive approach for instance transfer learning with dynamic and imbalanced data. INT J INTELL SYST 2022; 37:11582-11599. [PMID: 36816520 PMCID: PMC9936919 DOI: 10.1002/int.23055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 08/16/2022] [Indexed: 11/06/2022]
Abstract
Machine learning has demonstrated success in clinical risk prediction modeling with complex electronic health record data. However, the evolving nature of clinical practices can dynamically change the underlying data distribution over time, leading to model performance drift. Adopting an outdated model is potentially risky and may result in unintentional losses. In this paper, we propose a novel Hybrid Adaptive Boosting approach (HA-Boost) for transfer learning. HA-Boost is characterized by the domain similarity-based and class imbalance-based adaptation mechanisms, which simultaneously address two critical limitations of the classical TrAdaBoost algorithm. We validated HA-Boost in predicting hospital-acquired acute kidney injury using real-world longitudinal electronic health records data. The experiment results demonstrate that HA-Boost stably outperforms the competing baselines in terms of both AUROC and AUPRC across a 7-year time span. This study has confirmed the effectiveness of transfer learning as a superior model updating approach in dynamic environment.
Collapse
Affiliation(s)
- Xiangzhou Zhang
- Big Data Decision Institute, Jinan University, Guangzhou, China
| | - Kang Liu
- Big Data Decision Institute, Jinan University, Guangzhou, China
- School of Management, Jinan University, Guangzhou, China
| | - Borong Yuan
- Big Data Decision Institute, Jinan University, Guangzhou, China
- College of Information Science and Technology, Jinan University, Guangzhou, China
| | - Hongnian Wang
- Big Data Decision Institute, Jinan University, Guangzhou, China
- School of Management, Jinan University, Guangzhou, China
| | - Shaoyong Chen
- Big Data Decision Institute, Jinan University, Guangzhou, China
- College of Information Science and Technology, Jinan University, Guangzhou, China
| | - Yunfei Xue
- Big Data Decision Institute, Jinan University, Guangzhou, China
- College of Information Science and Technology, Jinan University, Guangzhou, China
| | - Weiqi Chen
- Big Data Decision Institute, Jinan University, Guangzhou, China
| | - Mei Liu
- Division of Medical Informatics, University of Kansas Medical Center, Kansas City, KS, United States of America
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou, China
| |
Collapse
|
41
|
Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022; 6:1330-1345. [PMID: 35788685 DOI: 10.1038/s41551-022-00898-y] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/03/2022] [Indexed: 01/14/2023]
Abstract
In the past decade, the application of machine learning (ML) to healthcare has helped drive the automation of physician tasks as well as enhancements in clinical capabilities and access to care. This progress has emphasized that, from model development to model deployment, data play central roles. In this Review, we provide a data-centric view of the innovations and challenges that are defining ML for healthcare. We discuss deep generative models and federated learning as strategies to augment datasets for improved model performance, as well as the use of the more recent transformer models for handling larger datasets and enhancing the modelling of clinical text. We also discuss data-focused problems in the deployment of ML, emphasizing the need to efficiently deliver data to ML models for timely clinical predictions and to account for natural data shifts that can deteriorate model performance.
Collapse
Affiliation(s)
- Angela Zhang
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA. .,Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA. .,Greenstone Biosciences, Palo Alto, CA, USA. .,Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Lei Xing
- Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA.,Department of Biomedical Informatics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA. .,Greenstone Biosciences, Palo Alto, CA, USA. .,Departments of Medicine, Division of Cardiovascular Medicine Stanford University, Stanford, CA, USA. .,Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
42
|
Wu JTY, de la Hoz MÁA, Kuo PC, Paguio JA, Yao JS, Dee EC, Yeung W, Jurado J, Moulick A, Milazzo C, Peinado P, Villares P, Cubillo A, Varona JF, Lee HC, Estirado A, Castellano JM, Celi LA. Developing and Validating Multi-Modal Models for Mortality Prediction in COVID-19 Patients: a Multi-center Retrospective Study. J Digit Imaging 2022; 35:1514-1529. [PMID: 35789446 PMCID: PMC9255527 DOI: 10.1007/s10278-022-00674-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/15/2022] [Accepted: 06/08/2022] [Indexed: 01/07/2023] Open
Abstract
The unprecedented global crisis brought about by the COVID-19 pandemic has sparked numerous efforts to create predictive models for the detection and prognostication of SARS-CoV-2 infections with the goal of helping health systems allocate resources. Machine learning models, in particular, hold promise for their ability to leverage patient clinical information and medical images for prediction. However, most of the published COVID-19 prediction models thus far have little clinical utility due to methodological flaws and lack of appropriate validation. In this paper, we describe our methodology to develop and validate multi-modal models for COVID-19 mortality prediction using multi-center patient data. The models for COVID-19 mortality prediction were developed using retrospective data from Madrid, Spain (N = 2547) and were externally validated in patient cohorts from a community hospital in New Jersey, USA (N = 242) and an academic center in Seoul, Republic of Korea (N = 336). The models we developed performed differently across various clinical settings, underscoring the need for a guided strategy when employing machine learning for clinical decision-making. We demonstrated that using features from both the structured electronic health records and chest X-ray imaging data resulted in better 30-day mortality prediction performance across all three datasets (areas under the receiver operating characteristic curves: 0.85 (95% confidence interval: 0.83-0.87), 0.76 (0.70-0.82), and 0.95 (0.92-0.98)). We discuss the rationale for the decisions made at every step in developing the models and have made our code available to the research community. We employed the best machine learning practices for clinical model development. Our goal is to create a toolkit that would assist investigators and organizations in building multi-modal models for prediction, classification, and/or optimization.
Collapse
Affiliation(s)
- Joy Tzung-Yu Wu
- Department of Radiology and Nuclear Medicine, Stanford University, Palo Alto, CA, USA
| | - Miguel Ángel Armengol de la Hoz
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Anesthesia, Critical Care and Pain Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Big Data Department, Fundacion Progreso Y Salud, Regional Ministry of Health of Andalucia, Andalucia, Spain
| | - Po-Chih Kuo
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
| | - Joseph Alexander Paguio
- Albert Einstein Medical Center, Philadelphia, PA, USA
- Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
| | - Jasper Seth Yao
- Albert Einstein Medical Center, Philadelphia, PA, USA
- Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
| | - Edward Christopher Dee
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wesley Yeung
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- National University Heart Center, National University Hospital, Singapore, Singapore
| | - Jerry Jurado
- Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
| | - Achintya Moulick
- Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
| | - Carmelo Milazzo
- Hoboken University Medical Center-CarePoint Health, Hoboken, NJ, USA
| | - Paloma Peinado
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
| | - Paula Villares
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
| | - Antonio Cubillo
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
| | - José Felipe Varona
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
| | - Hyung-Chul Lee
- Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Alberto Estirado
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
| | - José Maria Castellano
- Centro Integral de Enfermedades Cardiovasculares, Hospital Universitario Monteprincipe, Grupo HM Hospitales, Madrid, Spain
- Centro Nacional de Investigaciones Cardiovasculares, Instituto de Salud Carlos III, Madrid, Spain
| | - Leo Anthony Celi
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
43
|
Zhang X, Xue Y, Su X, Chen S, Liu K, Chen W, Liu M, Hu Y. A Transfer Learning Approach to Correct the Temporal Performance Drift of Clinical Prediction Models: Retrospective Cohort Study. JMIR Med Inform 2022; 10:e38053. [DOI: 10.2196/38053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 07/31/2022] [Accepted: 10/12/2022] [Indexed: 11/11/2022] Open
Abstract
Background
Clinical prediction models suffer from performance drift as the patient population shifts over time. There is a great need for model updating approaches or modeling frameworks that can effectively use the old and new data.
Objective
Based on the paradigm of transfer learning, we aimed to develop a novel modeling framework that transfers old knowledge to the new environment for prediction tasks, and contributes to performance drift correction.
Methods
The proposed predictive modeling framework maintains a logistic regression–based stacking ensemble of 2 gradient boosting machine (GBM) models representing old and new knowledge learned from old and new data, respectively (referred to as transfer learning gradient boosting machine [TransferGBM]). The ensemble learning procedure can dynamically balance the old and new knowledge. Using 2010-2017 electronic health record data on a retrospective cohort of 141,696 patients, we validated TransferGBM for hospital-acquired acute kidney injury prediction.
Results
The baseline models (ie, transported models) that were trained on 2010 and 2011 data showed significant performance drift in the temporal validation with 2012-2017 data. Refitting these models using updated samples resulted in performance gains in nearly all cases. The proposed TransferGBM model succeeded in achieving uniformly better performance than the refitted models.
Conclusions
Under the scenario of population shift, incorporating new knowledge while preserving old knowledge is essential for maintaining stable performance. Transfer learning combined with stacking ensemble learning can help achieve a balance of old and new knowledge in a flexible and adaptive way, even in the case of insufficient new data.
Collapse
|
44
|
Budhwani KI, Patel ZH, Guenter RE, Charania AA. A hitchhiker's guide to cancer models. Trends Biotechnol 2022; 40:1361-1373. [PMID: 35534320 PMCID: PMC9588514 DOI: 10.1016/j.tibtech.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/31/2022] [Accepted: 04/08/2022] [Indexed: 01/21/2023]
Abstract
Cancer is a complex and uniquely personal disease. More than 1.7 million people in the United States are diagnosed with cancer every year. As the burden of cancer grows, so does the need for new, more effective therapeutics and for predictive tools to identify optimal, personalized treatment options for every patient. Cancer models that recapitulate various aspects of the disease are fundamental to making advances along the continuum of cancer treatment from benchside discoveries to bedside delivery. In this review, we use a thought experiment as a vehicle to arrive at four broad categories of cancer models and explore the strengths, weaknesses, opportunities, and threats for each category in advancing our understanding of the disease and improving treatment strategies.
Collapse
Affiliation(s)
- Karim I Budhwani
- CerFlux, Inc., Birmingham, AL, USA; Department of Radiation Oncology, Heersink School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA; Department of Physics, Coe College, Cedar Rapids, IA, USA.
| | | | | | | |
Collapse
|
45
|
Lu J, Sattler A, Wang S, Khaki AR, Callahan A, Fleming S, Fong R, Ehlert B, Li RC, Shieh L, Ramchandran K, Gensheimer MF, Chobot S, Pfohl S, Li S, Shum K, Parikh N, Desai P, Seevaratnam B, Hanson M, Smith M, Xu Y, Gokhale A, Lin S, Pfeffer MA, Teuteberg W, Shah NH. Considerations in the reliability and fairness audits of predictive models for advance care planning. Front Digit Health 2022; 4:943768. [PMID: 36339512 PMCID: PMC9634737 DOI: 10.3389/fdgth.2022.943768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 08/17/2022] [Indexed: 11/30/2022] Open
Abstract
Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.
Collapse
Affiliation(s)
- Jonathan Lu
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Correspondence: Jonathan Hsijing Lu
| | - Amelia Sattler
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Samantha Wang
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Ali Raza Khaki
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Alison Callahan
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Scott Fleming
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Rebecca Fong
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Benjamin Ehlert
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Ron C. Li
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Lisa Shieh
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Kavitha Ramchandran
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Michael F. Gensheimer
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, United States
| | - Sarah Chobot
- Inpatient Palliative Care, Stanford Health Care, Palo Alto, United States
| | - Stephen Pfohl
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Siyun Li
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Kenny Shum
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Nitin Parikh
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Priya Desai
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Briththa Seevaratnam
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Melanie Hanson
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Margaret Smith
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Yizhe Xu
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Arjun Gokhale
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Steven Lin
- Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Michael A. Pfeffer
- Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
| | - Winifred Teuteberg
- Serious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States
- Technology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United States
- Clinical Excellence Research Center, Stanford University School of Medicine, Palo Alto, United States
| |
Collapse
|
46
|
Davis SE, Walsh CG, Matheny ME. Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings. Front Digit Health 2022; 4:958284. [PMID: 36120717 PMCID: PMC9478183 DOI: 10.3389/fdgth.2022.958284] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/11/2022] [Indexed: 11/15/2022] Open
Abstract
As the implementation of artificial intelligence (AI)-enabled tools is realized across diverse clinical environments, there is a growing understanding of the need for ongoing monitoring and updating of prediction models. Dataset shift-temporal changes in clinical practice, patient populations, and information systems-is now well-documented as a source of deteriorating model accuracy and a challenge to the sustainability of AI-enabled tools in clinical care. While best practices are well-established for training and validating new models, there has been limited work developing best practices for prospective validation and model maintenance. In this paper, we highlight the need for updating clinical prediction models and discuss open questions regarding this critical aspect of the AI modeling lifecycle in three focus areas: model maintenance policies, performance monitoring perspectives, and model updating strategies. With the increasing adoption of AI-enabled tools, the need for such best practices must be addressed and incorporated into new and existing implementations. This commentary aims to encourage conversation and motivate additional research across clinical and data science stakeholders.
Collapse
Affiliation(s)
- Sharon E. Davis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States,Correspondence: Sharon E. Davis
| | - Colin G. Walsh
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States,Department of Psychiatry, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Michael E. Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States,Tennessee Valley Healthcare System VA Medical Center, Veterans Health Administration, Nashville, TN, United States
| |
Collapse
|
47
|
Plana D, Shung DL, Grimshaw AA, Saraf A, Sung JJY, Kann BH. Randomized Clinical Trials of Machine Learning Interventions in Health Care: A Systematic Review. JAMA Netw Open 2022; 5:e2233946. [PMID: 36173632 PMCID: PMC9523495 DOI: 10.1001/jamanetworkopen.2022.33946] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
IMPORTANCE Despite the potential of machine learning to improve multiple aspects of patient care, barriers to clinical adoption remain. Randomized clinical trials (RCTs) are often a prerequisite to large-scale clinical adoption of an intervention, and important questions remain regarding how machine learning interventions are being incorporated into clinical trials in health care. OBJECTIVE To systematically examine the design, reporting standards, risk of bias, and inclusivity of RCTs for medical machine learning interventions. EVIDENCE REVIEW In this systematic review, the Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus, and Web of Science Core Collection online databases were searched and citation chasing was done to find relevant articles published from the inception of each database to October 15, 2021. Search terms for machine learning, clinical decision-making, and RCTs were used. Exclusion criteria included implementation of a non-RCT design, absence of original data, and evaluation of nonclinical interventions. Data were extracted from published articles. Trial characteristics, including primary intervention, demographics, adherence to the CONSORT-AI reporting guideline, and Cochrane risk of bias were analyzed. FINDINGS Literature search yielded 19 737 articles, of which 41 RCTs involved a median of 294 participants (range, 17-2488 participants). A total of 16 RCTS (39%) were published in 2021, 21 (51%) were conducted at single sites, and 15 (37%) involved endoscopy. No trials adhered to all CONSORT-AI standards. Common reasons for nonadherence were not assessing poor-quality or unavailable input data (38 trials [93%]), not analyzing performance errors (38 [93%]), and not including a statement regarding code or algorithm availability (37 [90%]). Overall risk of bias was high in 7 trials (17%). Of 11 trials (27%) that reported race and ethnicity data, the median proportion of participants from underrepresented minority groups was 21% (range, 0%-51%). CONCLUSIONS AND RELEVANCE This systematic review found that despite the large number of medical machine learning-based algorithms in development, few RCTs for these technologies have been conducted. Among published RCTs, there was high variability in adherence to reporting standards and risk of bias and a lack of participants from underrepresented minority groups. These findings merit attention and should be considered in future RCT design and reporting.
Collapse
Affiliation(s)
| | - Dennis L Shung
- Department of Medicine, Yale University, New Haven, Connecticut
| | - Alyssa A Grimshaw
- Harvey Cushing/John Hay Whitney Medical Library, Yale University, New Haven, Connecticut
| | - Anurag Saraf
- Department of Radiation Oncology, Massachusetts General Hospital, Boston, Massachusetts
| | - Joseph J Y Sung
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Benjamin H Kann
- Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
48
|
Zhang KS, Schelb P, Netzer N, Tavakoli AA, Keymling M, Wehrse E, Hog R, Rotkopf LT, Wennmann M, Glemser PA, Thierjung H, von Knebel Doeberitz N, Kleesiek J, Görtz M, Schütz V, Hielscher T, Stenzinger A, Hohenfellner M, Schlemmer HP, Maier-Hein K, Bonekamp D. Pseudoprospective Paraclinical Interaction of Radiology Residents With a Deep Learning System for Prostate Cancer Detection: Experience, Performance, and Identification of the Need for Intermittent Recalibration. Invest Radiol 2022; 57:601-612. [PMID: 35467572 DOI: 10.1097/rli.0000000000000878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The aim of this study was to estimate the prospective utility of a previously retrospectively validated convolutional neural network (CNN) for prostate cancer (PC) detection on prostate magnetic resonance imaging (MRI). MATERIALS AND METHODS The biparametric (T2-weighted and diffusion-weighted) portion of clinical multiparametric prostate MRI from consecutive men included between November 2019 and September 2020 was fully automatically and individually analyzed by a CNN briefly after image acquisition (pseudoprospective design). Radiology residents performed 2 research Prostate Imaging Reporting and Data System (PI-RADS) assessments of the multiparametric dataset independent from clinical reporting (paraclinical design) before and after review of the CNN results and completed a survey. Presence of clinically significant PC was determined by the presence of an International Society of Urological Pathology grade 2 or higher PC on combined targeted and extended systematic transperineal MRI/transrectal ultrasound fusion biopsy. Sensitivities and specificities on a patient and prostate sextant basis were compared using the McNemar test and compared with the receiver operating characteristic (ROC) curve of CNN. Survey results were summarized as absolute counts and percentages. RESULTS A total of 201 men were included. The CNN achieved an ROC area under the curve of 0.77 on a patient basis. Using PI-RADS ≥3-emulating probability threshold (c3), CNN had a patient-based sensitivity of 81.8% and specificity of 54.8%, not statistically different from the current clinical routine PI-RADS ≥4 assessment at 90.9% and 54.8%, respectively ( P = 0.30/ P = 1.0). In general, residents achieved similar sensitivity and specificity before and after CNN review. On a prostate sextant basis, clinical assessment possessed the highest ROC area under the curve of 0.82, higher than CNN (AUC = 0.76, P = 0.21) and significantly higher than resident performance before and after CNN review (AUC = 0.76 / 0.76, P ≤ 0.03). The resident survey indicated CNN to be helpful and clinically useful. CONCLUSIONS Pseudoprospective paraclinical integration of fully automated CNN-based detection of suspicious lesions on prostate multiparametric MRI was demonstrated and showed good acceptance among residents, whereas no significant improvement in resident performance was found. General CNN performance was preserved despite an observed shift in CNN calibration, identifying the requirement for continuous quality control and recalibration.
Collapse
Affiliation(s)
- Kevin Sun Zhang
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | | | | | | | - Myriam Keymling
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | - Eckhard Wehrse
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | - Robert Hog
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | | | - Markus Wennmann
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | | | - Heidi Thierjung
- From the Division of Radiology, German Cancer Research Center (DKFZ)
| | | | | | | | - Viktoria Schütz
- Department of Urology, University of Heidelberg Medical Center
| | | | | | | | | | | | | |
Collapse
|
49
|
Galuzio PP, Cherif A. Recent Advances and Future Perspectives in the Use of Machine Learning and Mathematical Models in Nephrology. Adv Chronic Kidney Dis 2022; 29:472-479. [PMID: 36253031 DOI: 10.1053/j.ackd.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/21/2022] [Accepted: 07/07/2022] [Indexed: 01/25/2023]
Abstract
We reviewed some of the latest advancements in the use of mathematical models in nephrology. We looked over 2 distinct categories of mathematical models that are widely used in biological research and pointed out some of their strengths and weaknesses when applied to health care, especially in the context of nephrology. A mechanistic dynamical system allows the representation of causal relations among the system variables but with a more complex and longer development/implementation phase. Artificial intelligence/machine learning provides predictive tools that allow identifying correlative patterns in large data sets, but they are usually harder-to-interpret black boxes. Chronic kidney disease (CKD), a major worldwide health problem, generates copious quantities of data that can be leveraged by choice of the appropriate model; also, there is a large number of dialysis parameters that need to be determined at every treatment session that can benefit from predictive mechanistic models. Following important steps in the use of mathematical methods in medical science might be in the intersection of seemingly antagonistic frameworks, by leveraging the strength of each to provide better care.
Collapse
Affiliation(s)
| | - Alhaji Cherif
- Research Division, Renal Research Institute, New York, NY.
| |
Collapse
|
50
|
Gottlieb ER, Samuel M, Bonventre JV, Celi LA, Mattie H. Machine Learning for Acute Kidney Injury Prediction in the Intensive Care Unit. Adv Chronic Kidney Dis 2022; 29:431-438. [PMID: 36253026 PMCID: PMC9586459 DOI: 10.1053/j.ackd.2022.06.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/01/2022] [Accepted: 06/22/2022] [Indexed: 01/25/2023]
Abstract
Machine learning is the field of artificial intelligence in which computers are trained to make predictions or to identify patterns in data through complex mathematical algorithms. It has great potential in critical care to predict outcomes, such as acute kidney injury, and can be used for prognosis and to suggest management strategies. Machine learning can also be used as a research tool to advance our clinical and biochemical understanding of acute kidney injury. In this review, we introduce basic concepts in machine learning and review recent research in each of these domains.
Collapse
Affiliation(s)
- Eric R Gottlieb
- Renal Section, Brigham and Women's Hospital, Boston, MA; Harvard Medical School, Boston, MA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA.
| | | | - Joseph V Bonventre
- Renal Section, Brigham and Women's Hospital, Boston, MA; Harvard Medical School, Boston, MA
| | - Leo A Celi
- Harvard Medical School, Boston, MA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA; MIT Critical Data, Cambridge, MA; Harvard T.H. Chan School of Public Health, Boston, MA; Beth Israel Deaconess Medical Center, Boston, MA
| | | |
Collapse
|