Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Guo LL, Pfohl SR, Fries J, Posada J, Fleming SL, Aftandilian C, Shah N, Sung L. Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Appl Clin Inform 2021;12:808-815. [PMID: 34470057 PMCID: PMC8410238 DOI: 10.1055/s-0041-1735184] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/12/2021] [Indexed: 10/20/2022] Open

For:	Guo LL, Pfohl SR, Fries J, Posada J, Fleming SL, Aftandilian C, Shah N, Sung L. Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Appl Clin Inform 2021;12:808-815. [PMID: 34470057 PMCID: PMC8410238 DOI: 10.1055/s-0041-1735184] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/12/2021] [Indexed: 10/20/2022] Open

Number

Cited by Other Article(s)

Brosula R, Corbin CK, Chen JH. Pathophysiological Features in Electronic Medical Records Sustain Model Performance under Temporal Dataset Shift. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024;2024:95-104. [PMID: 38827052 PMCID: PMC11141811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Davis SE, Embí PJ, Matheny ME. Sustainable deployment of clinical prediction tools-a 360° approach to model maintenance. J Am Med Inform Assoc 2024;31:1195-1198. [PMID: 38422379 PMCID: PMC11031208 DOI: 10.1093/jamia/ocae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/15/2024] [Indexed: 03/02/2024] Open

Ahmad FS, Hu TL, Adler ED, Petito LC, Wehbe RM, Wilcox JE, Mutharasan RK, Nardone B, Tadel M, Greenberg B, Yagil A, Campagnari C. Performance of risk models to predict mortality risk for patients with heart failure: evaluation in an integrated health system. Clin Res Cardiol 2024:10.1007/s00392-024-02433-2. [PMID: 38565710 DOI: 10.1007/s00392-024-02433-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024]

Affiliation(s)

Faraz S Ahmad Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Northwestern University, 676 North Saint Clair Street, Suite 600, Chicago, IL, 60611, USA. Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, IL, USA. Institute for Augmented Intelligence in Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Ted Ling Hu Institute for Augmented Intelligence in Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric D Adler Division of Cardiology, Department of Medicine, UC San Diego School of Medicine, La Jolla, CA, USA
Lucia C Petito Division of Biostatistics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Ramsey M Wehbe Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, IL, USA Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
Jane E Wilcox Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Northwestern University, 676 North Saint Clair Street, Suite 600, Chicago, IL, 60611, USA Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, IL, USA
R Kannan Mutharasan Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Northwestern University, 676 North Saint Clair Street, Suite 600, Chicago, IL, 60611, USA Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, IL, USA
Beatrice Nardone Institute for Augmented Intelligence in Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Division of General Internal Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Matevz Tadel Physics Department, UC San Diego, La Jolla, CA, USA
Barry Greenberg Division of Cardiology, Department of Medicine, UC San Diego School of Medicine, La Jolla, CA, USA
Avi Yagil Physics Department, UC San Diego, La Jolla, CA, USA
Claudio Campagnari Physics Department, UC Santa Barbara, Santa Barbara, CA, USA

Collapse

Andersen ES, Birk-Korch JB, Röttger R, Brasen CL, Brandslund I, Madsen JS. Monitoring performance of clinical artificial intelligence: a scoping review protocol. JBI Evid Synth 2024;22:453-460. [PMID: 38328955 DOI: 10.11124/jbies-23-00390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]

Guo LL, Morse KE, Aftandilian C, Steinberg E, Fries J, Posada J, Fleming SL, Lemmon J, Jessa K, Shah N, Sung L. Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare. BMC Med Inform Decis Mak 2024;24:51. [PMID: 38355486 PMCID: PMC10868117 DOI: 10.1186/s12911-024-02449-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 01/30/2024] [Indexed: 02/16/2024] Open

Abstract

BACKGROUND

Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.

METHODS

This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level.

RESULTS

The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds.

CONCLUSIONS

Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

Collapse

Bhaskhar N, Ip W, Chen JH, Rubin DL. Clinical outcome prediction using observational supervision with electronic health records and audit logs. J Biomed Inform 2023;147:104522. [PMID: 37827476 DOI: 10.1016/j.jbi.2023.104522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/14/2023]

Abstract

OBJECTIVE

Audit logs in electronic health record (EHR) systems capture interactions of providers with clinical data. We determine if machine learning (ML) models trained using audit logs in conjunction with clinical data ("observational supervision") outperform ML models trained using clinical data alone in clinical outcome prediction tasks, and whether they are more robust to temporal distribution shifts in the data.

MATERIALS AND METHODS

Using clinical and audit log data from Stanford Healthcare, we trained and evaluated various ML models including logistic regression, support vector machine (SVM) classifiers, neural networks, random forests, and gradient boosted machines (GBMs) on clinical EHR data, with and without audit logs for two clinical outcome prediction tasks: major adverse kidney events within 120 days of ICU admission (MAKE-120) in acute kidney injury (AKI) patients and 30-day readmission in acute stroke patients. We further tested the best performing models using patient data acquired during different time-intervals to evaluate the impact of temporal distribution shifts on model performance.

RESULTS

Performance generally improved for all models when trained with clinical EHR data and audit log data compared with those trained with only clinical EHR data, with GBMs tending to have the overall best performance. GBMs trained with clinical EHR data and audit logs outperformed GBMs trained without audit logs in both clinical outcome prediction tasks: AUROC 0.88 (95% CI: 0.85-0.91) vs. 0.79 (95% CI: 0.77-0.81), respectively, for MAKE-120 prediction in AKI patients, and AUROC 0.74 (95% CI: 0.71-0.77) vs. 0.63 (95% CI: 0.62-0.64), respectively, for 30-day readmission prediction in acute stroke patients. The performance of GBM models trained using audit log and clinical data degraded less in later time-intervals than models trained using only clinical data.

CONCLUSION

Observational supervision with audit logs improved the performance of ML models trained to predict important clinical outcomes in patients with AKI and acute stroke, and improved robustness to temporal distribution shifts.

Collapse

Zeng Z, Wang L, Wu Y, Hu Z, Evans J, Zhu X, Ye G, He S. Utilizing Mixed Training and Multi-Head Attention to Address Data Shift in AI-Based Electromagnetic Solvers for Nano-Structured Metamaterials. NANOMATERIALS (BASEL, SWITZERLAND) 2023;13:2778. [PMID: 37887929 PMCID: PMC10609168 DOI: 10.3390/nano13202778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/14/2023] [Accepted: 10/15/2023] [Indexed: 10/28/2023]

Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol 2023;96:20220878. [PMID: 36971405 PMCID: PMC10546450 DOI: 10.1259/bjr.20220878] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 03/29/2023] Open

Oikonomou EK, Khera R. Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovasc Diabetol 2023;22:259. [PMID: 37749579 PMCID: PMC10521578 DOI: 10.1186/s12933-023-01985-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/07/2023] [Indexed: 09/27/2023] Open

Ekemeyong Awong LE, Zielinska T. Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification. SENSORS (BASEL, SWITZERLAND) 2023;23:7925. [PMID: 37765983 PMCID: PMC10538130 DOI: 10.3390/s23187925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]

Corbin CK, Maclay R, Acharya A, Mony S, Punnathanam S, Thapa R, Kotecha N, Shah NH, Chen JH. DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. J Am Med Inform Assoc 2023;30:1532-1542. [PMID: 37369008 PMCID: PMC10436147 DOI: 10.1093/jamia/ocad114] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/16/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open

Abstract

OBJECTIVE

Heatlhcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable, and reliable machine learning models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient, safe and high-quality manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher-created models into a widely used electronic medical record system.

MATERIALS AND METHODS

We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within electronic medical record software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact.

RESULTS

We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating 12 machine learning models trained using electronic medical record data that predict laboratory diagnostic results, triggered by clinician button-clicks in Stanford Health Care's electronic medical record.

DISCUSSION

Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. When possible, we recommend using prospectively estimated performance measures during silent trials to make final go decisions for model deployment.

CONCLUSION

Machine learning applications in healthcare are extensively researched, but successful translations to the bedside are rare. By describing DEPLOYR, we aim to inform machine learning deployment best practices and help bridge the model implementation gap.

Collapse

Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng 2023;7:719-742. [PMID: 37380750 PMCID: PMC10632090 DOI: 10.1038/s41551-023-01056-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2023] [Indexed: 06/30/2023]

Affiliation(s)

Richard J Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Judy J Wang Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Boston University School of Medicine, Boston, MA, USA
Drew F K Williamson Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Tiffany Y Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Jana Lipkova Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Ming Y Lu Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Sharifa Sahai Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Faisal Mahmood Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA. Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

Collapse

Yamga E, Mullie L, Durand M, Cadrin-Chenevert A, Tang A, Montagnon E, Chartrand-Lefebvre C, Chassé M. Interpretable clinical phenotypes among patients hospitalized with COVID-19 using cluster analysis. Front Digit Health 2023;5:1142822. [PMID: 37114183 PMCID: PMC10128042 DOI: 10.3389/fdgth.2023.1142822] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 03/13/2023] [Indexed: 04/29/2023] Open

Abstract

Background

Multiple clinical phenotypes have been proposed for coronavirus disease (COVID-19), but few have used multimodal data. Using clinical and imaging data, we aimed to identify distinct clinical phenotypes in patients admitted with COVID-19 and to assess their clinical outcomes. Our secondary objective was to demonstrate the clinical applicability of this method by developing an interpretable model for phenotype assignment.

Methods

We analyzed data from 547 patients hospitalized with COVID-19 at a Canadian academic hospital. We processed the data by applying a factor analysis of mixed data (FAMD) and compared four clustering algorithms: k-means, partitioning around medoids (PAM), and divisive and agglomerative hierarchical clustering. We used imaging data and 34 clinical variables collected within the first 24 h of admission to train our algorithm. We conducted a survival analysis to compare the clinical outcomes across phenotypes. With the data split into training and validation sets (75/25 ratio), we developed a decision-tree-based model to facilitate the interpretation and assignment of the observed phenotypes.

Results

Agglomerative hierarchical clustering was the most robust algorithm. We identified three clinical phenotypes: 79 patients (14%) in Cluster 1, 275 patients (50%) in Cluster 2, and 203 (37%) in Cluster 3. Cluster 2 and Cluster 3 were both characterized by a low-risk respiratory and inflammatory profile but differed in terms of demographics. Compared with Cluster 3, Cluster 2 comprised older patients with more comorbidities. Cluster 1 represented the group with the most severe clinical presentation, as inferred by the highest rate of hypoxemia and the highest radiological burden. Intensive care unit (ICU) admission and mechanical ventilation risks were the highest in Cluster 1. Using only two to four decision rules, the classification and regression tree (CART) phenotype assignment model achieved an AUC of 84% (81.5-86.5%, 95 CI) on the validation set.

Conclusions

We conducted a multidimensional phenotypic analysis of adult inpatients with COVID-19 and identified three distinct phenotypes associated with different clinical outcomes. We also demonstrated the clinical usability of this approach, as phenotypes can be accurately assigned using a simple decision tree. Further research is still needed to properly incorporate these phenotypes in the management of patients with COVID-19.

Collapse

Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature 2023;616:259-265. [PMID: 37045921 DOI: 10.1038/s41586-023-05881-4] [Citation(s) in RCA: 191] [Impact Index Per Article: 191.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/22/2023] [Indexed: 04/14/2023]

Guo LL, Steinberg E, Fleming SL, Posada J, Lemmon J, Pfohl SR, Shah N, Fries J, Sung L. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci Rep 2023;13:3767. [PMID: 36882576 PMCID: PMC9992466 DOI: 10.1038/s41598-023-30820-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 03/02/2023] [Indexed: 03/09/2023] Open

Abstract

Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009-2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5-9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift.

Collapse

Lemmon J, Guo LL, Posada J, Pfohl SR, Fries J, Fleming SL, Aftandilian C, Shah N, Sung L. Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Methods Inf Med 2023;62:60-70. [PMID: 36812932 DOI: 10.1055/s-0043-1762904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]

Abstract

BACKGROUND

Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

METHODS

Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

RESULTS

The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

CONCLUSIONS

While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

Collapse

Artificial intelligence in bronchopulmonary dysplasia- current research and unexplored frontiers. Pediatr Res 2023;93:287-290. [PMID: 36385519 DOI: 10.1038/s41390-022-02387-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 10/21/2022] [Accepted: 10/30/2022] [Indexed: 11/17/2022]

Parimbelli E, Buonocore TM, Nicora G, Michalowski W, Wilk S, Bellazzi R. Why did AI get this one wrong? - Tree-based explanations of machine learning model predictions. Artif Intell Med 2023;135:102471. [PMID: 36628785 DOI: 10.1016/j.artmed.2022.102471] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 11/25/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]

Sperrin M, Riley RD, Collins GS, Martin GP. Targeted validation: validating clinical prediction models in their intended population and setting. Diagn Progn Res 2022;6:24. [PMID: 36550534 PMCID: PMC9773429 DOI: 10.1186/s41512-022-00136-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 11/14/2022] [Indexed: 12/24/2022] Open

Nelson AE, Arbeeva L. Narrative Review of Machine Learning in Rheumatic and Musculoskeletal Diseases for Clinicians and Researchers: Biases, Goals, and Future Directions. J Rheumatol 2022;49:1191-1200. [PMID: 35840150 PMCID: PMC9633365 DOI: 10.3899/jrheum.220326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2022] [Indexed: 11/22/2022]

Davis SE, Walsh CG, Matheny ME. Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings. Front Digit Health 2022;4:958284. [PMID: 36120717 PMCID: PMC9478183 DOI: 10.3389/fdgth.2022.958284] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/11/2022] [Indexed: 11/15/2022] Open

Ahmad FS, Luo Y, Wehbe RM, Thomas JD, Shah SJ. Advances in Machine Learning Approaches to Heart Failure with Preserved Ejection Fraction. Heart Fail Clin 2022;18:287-300. [PMID: 35341541 PMCID: PMC8983114 DOI: 10.1016/j.hfc.2021.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Guo LL, Pfohl SR, Fries J, Johnson AEW, Posada J, Aftandilian C, Shah N, Sung L. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci Rep 2022;12:2726. [PMID: 35177653 PMCID: PMC8854561 DOI: 10.1038/s41598-022-06484-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open

Abstract

Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to proactively mitigate dataset shift. The objective was to characterize the impact of temporal dataset shift on clinical prediction models and benchmark DG and UDA algorithms on improving model robustness. In this cohort study, intensive care unit patients from the MIMIC-IV database were categorized by year groups (2008–2010, 2011–2013, 2014–2016 and 2017–2019). Tasks were predicting mortality, long length of stay, sepsis and invasive ventilation. Feedforward neural networks were used as prediction models. The baseline experiment trained models using empirical risk minimization (ERM) on 2008–2010 (ERM[08–10]) and evaluated them on subsequent year groups. DG experiment trained models using algorithms that estimated invariant properties using 2008–2016 and evaluated them on 2017–2019. UDA experiment leveraged unlabelled samples from 2017 to 2019 for unsupervised distribution matching. DG and UDA models were compared to ERM[08–16] models trained using 2008–2016. Main performance measures were area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve and absolute calibration error. Threshold-based metrics including false-positives and false-negatives were used to assess the clinical impact of temporal dataset shift and its mitigation strategies. In the baseline experiments, dataset shift was most evident for sepsis prediction (maximum AUROC drop, 0.090; 95% confidence interval (CI), 0.080–0.101). Considering a scenario of 100 consecutively admitted patients showed that ERM[08–10] applied to 2017–2019 was associated with one additional false-negative among 11 patients with sepsis, when compared to the model applied to 2008–2010. When compared with ERM[08–16], DG and UDA experiments failed to produce more robust models (range of AUROC difference, − 0.003 to 0.050). In conclusion, DG and UDA failed to produce more robust models compared to ERM in the setting of temporal dataset shift. Alternate approaches are required to preserve model performance over time in clinical medicine.

Collapse

Machine Learning Approaches to Investigate Clostridioides difficile Infection and Outcomes: A Systematic Review. Int J Med Inform 2022;160:104706. [DOI: 10.1016/j.ijmedinf.2022.104706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 12/21/2021] [Accepted: 01/22/2022] [Indexed: 11/20/2022]