1
|
Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokas A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023; 31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open
Abstract
OBJECTIVE Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND METHODS Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. RESULTS The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. DISCUSSION Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. CONCLUSION This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.
Collapse
Affiliation(s)
- Timothy Bergquist
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | | | - Yao Yan
- Sage Bionetworks, Seattle, WA, United States
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, United States
| | - Justin Prosser
- Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
| | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Łukasz Charzewski
- Proacta, Warsaw, Poland
- Division of Biophysics, University of Warsaw, Warsaw, Poland
| | | | - Ivan Brugere
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Alidivinas Prusokas
- Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Augustinas Prusokas
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Yonghwa Choi
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sanghoon Lee
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Junseok Choe
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Inggeol Lee
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Justin Guinney
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
2
|
Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney SD, Malin BA. A Multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun 2022; 13:7609. [PMID: 36494374 PMCID: PMC9734113 DOI: 10.1038/s41467-022-35295-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open
Abstract
Synthetic health data have the potential to mitigate privacy concerns in supporting biomedical research and healthcare applications. Modern approaches for data generation continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a systematic benchmarking framework to appraise key characteristics with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic health data and further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.
Collapse
Affiliation(s)
- Chao Yan
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Yao Yan
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Zhiyu Wan
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Ziqi Zhang
- grid.152326.10000 0001 2264 7217Department of Computer Science, Vanderbilt University, Nashville, TN USA
| | - Larsson Omberg
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Justin Guinney
- grid.34477.330000000122986657Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA USA ,grid.511425.60000 0004 9346 3636Tempus Labs, Chicago, IL USA
| | - Sean D. Mooney
- grid.34477.330000000122986657Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA USA
| | - Bradley A. Malin
- grid.412807.80000 0004 1936 9916Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA ,grid.152326.10000 0001 2264 7217Department of Computer Science, Vanderbilt University, Nashville, TN USA ,grid.412807.80000 0004 1936 9916Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN USA
| |
Collapse
|
3
|
Yan Y, Schaffter T, Bergquist T, Yu T, Prosser J, Aydin Z, Jabeer A, Brugere I, Gao J, Chen G, Causey J, Yao Y, Bryson K, Long DR, Jarvik JG, Lee CI, Wilcox A, Guinney J, Mooney S. A Continuously Benchmarked and Crowdsourced Challenge for Rapid Development and Evaluation of Models to Predict COVID-19 Diagnosis and Hospitalization. JAMA Netw Open 2021; 4:e2124946. [PMID: 34633425 PMCID: PMC8506231 DOI: 10.1001/jamanetworkopen.2021.24946] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 07/08/2021] [Indexed: 01/28/2023] Open
Abstract
Importance Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic. Objectives To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups. Design, Setting, and Participants This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries. Main Outcomes and Measures Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated. Results In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger. Conclusions and Relevance In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.
Collapse
Affiliation(s)
- Yao Yan
- Sage Bionetworks, Seattle, Washington
- Molecular Engineering and Sciences Institute, University of Washington, Seattle
| | | | - Timothy Bergquist
- Sage Bionetworks, Seattle, Washington
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | - Thomas Yu
- Sage Bionetworks, Seattle, Washington
| | - Justin Prosser
- Institute of Translational Health Sciences, University of Washington, Seattle
| | - Zafer Aydin
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Amhar Jabeer
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Ivan Brugere
- Department of Computer Science, University of Illinois at Chicago, Chicago
| | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison
| | - Jason Causey
- Computer Science Department, College of Engineering and Computer Science, Arkansas State University, Jonesboro
- Arkansas AI-Campus, Center for No-Boundary Thinking, Arkansas State University, Jonesboro
| | - Yuxin Yao
- Department of Computer Science, University College London, London, United Kingdom
| | - Kevin Bryson
- Department of Computer Science, University College London, London, United Kingdom
| | - Dustin R. Long
- Division of Critical Care Medicine, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle
| | - Jeffrey G. Jarvik
- The University of Washington Clinical Learning, Evidence And Research Center for Musculoskeletal Disorders, Seattle
- Department of Radiology, University of Washington School of Medicine, Seattle
| | - Christoph I. Lee
- Department of Radiology, University of Washington School of Medicine, Seattle
| | - Adam Wilcox
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | | | - Sean Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| |
Collapse
|
4
|
Oh EJ, Parikh RB, Chivers C, Chen J. Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. JCO Clin Cancer Inform 2021; 5:1015-1023. [PMID: 34591602 PMCID: PMC8812620 DOI: 10.1200/cci.21.00077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/24/2021] [Accepted: 08/26/2021] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.
Collapse
Affiliation(s)
- Eun Jeong Oh
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Ravi B. Parikh
- Department of Medical Ethics and Health Policy, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| | - Corey Chivers
- University of Pennsylvania Health System, Philadelphia, PA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| |
Collapse
|