Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bergquist T, Yan Y, Schaffter T, Yu T, Pejaver V, Hammarlund N, Prosser J, Guinney J, Mooney S. Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction. J Am Med Inform Assoc 2020;27:1393-1400. [PMID: 32638010 PMCID: PMC7526463 DOI: 10.1093/jamia/ocaa083] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 04/16/2020] [Accepted: 05/06/2020] [Indexed: 02/06/2023] Open

For:	Bergquist T, Yan Y, Schaffter T, Yu T, Pejaver V, Hammarlund N, Prosser J, Guinney J, Mooney S. Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction. J Am Med Inform Assoc 2020;27:1393-1400. [PMID: 32638010 PMCID: PMC7526463 DOI: 10.1093/jamia/ocaa083] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 04/16/2020] [Accepted: 05/06/2020] [Indexed: 02/06/2023] Open

Number

Cited by Other Article(s)

Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokas A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023;31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open

Affiliation(s)

Timothy Bergquist Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Thomas Schaffter Sage Bionetworks, Seattle, WA, United States
Yao Yan Sage Bionetworks, Seattle, WA, United States Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
Thomas Yu Sage Bionetworks, Seattle, WA, United States
Justin Prosser Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
Jifan Gao Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Łukasz Charzewski Proacta, Warsaw, Poland Division of Biophysics, University of Warsaw, Warsaw, Poland
Zofia Nawalany Proacta, Warsaw, Poland
Ivan Brugere Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
Renata Retkute Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
Alidivinas Prusokas Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
Augustinas Prusokas Department of Life Sciences, Imperial College London, London, United Kingdom
Yonghwa Choi Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Sanghoon Lee Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Junseok Choe Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Inggeol Lee Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sunkyu Kim Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Jaewoo Kang Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sean D Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Justin Guinney Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States

Collapse

Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney SD, Malin BA. A Multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun 2022;13:7609. [PMID: 36494374 PMCID: PMC9734113 DOI: 10.1038/s41467-022-35295-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open

Yan Y, Schaffter T, Bergquist T, Yu T, Prosser J, Aydin Z, Jabeer A, Brugere I, Gao J, Chen G, Causey J, Yao Y, Bryson K, Long DR, Jarvik JG, Lee CI, Wilcox A, Guinney J, Mooney S. A Continuously Benchmarked and Crowdsourced Challenge for Rapid Development and Evaluation of Models to Predict COVID-19 Diagnosis and Hospitalization. JAMA Netw Open 2021;4:e2124946. [PMID: 34633425 PMCID: PMC8506231 DOI: 10.1001/jamanetworkopen.2021.24946] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 07/08/2021] [Indexed: 01/28/2023] Open

Abstract

Importance

Machine learning could be used to predict the likelihood of diagnosis and severity of illness. Lack of COVID-19 patient data has hindered the data science community in developing models to aid in the response to the pandemic.

Objectives

To describe the rapid development and evaluation of clinical algorithms to predict COVID-19 diagnosis and hospitalization using patient data by citizen scientists, provide an unbiased assessment of model performance, and benchmark model performance on subgroups.

Design, Setting, and Participants

This diagnostic and prognostic study operated a continuous, crowdsourced challenge using a model-to-data approach to securely enable the use of regularly updated COVID-19 patient data from the University of Washington by participants from May 6 to December 23, 2020. A postchallenge analysis was conducted from December 24, 2020, to April 7, 2021, to assess the generalizability of models on the cumulative data set as well as subgroups stratified by age, sex, race, and time of COVID-19 test. By December 23, 2020, this challenge engaged 482 participants from 90 teams and 7 countries.

Main Outcomes and Measures

Machine learning algorithms used patient data and output a score that represented the probability of patients receiving a positive COVID-19 test result or being hospitalized within 21 days after receiving a positive COVID-19 test result. Algorithms were evaluated using area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC) scores. Ensemble models aggregating models from the top challenge teams were developed and evaluated.

Results

In the analysis using the cumulative data set, the best performance for COVID-19 diagnosis prediction was an AUROC of 0.776 (95% CI, 0.775-0.777) and an AUPRC of 0.297, and for hospitalization prediction, an AUROC of 0.796 (95% CI, 0.794-0.798) and an AUPRC of 0.188. Analysis on top models submitting to the challenge showed consistently better model performance on the female group than the male group. Among all age groups, the best performance was obtained for the 25- to 49-year age group, and the worst performance was obtained for the group aged 17 years or younger.

Conclusions and Relevance

In this diagnostic and prognostic study, models submitted by citizen scientists achieved high performance for the prediction of COVID-19 testing and hospitalization outcomes. Evaluation of challenge models on demographic subgroups and prospective data revealed performance discrepancies, providing insights into the potential bias and limitations in the models.

Collapse

Affiliation(s)

Yao Yan Sage Bionetworks, Seattle, Washington Molecular Engineering and Sciences Institute, University of Washington, Seattle
Thomas Schaffter Sage Bionetworks, Seattle, Washington
Timothy Bergquist Sage Bionetworks, Seattle, Washington Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
Thomas Yu Sage Bionetworks, Seattle, Washington
Justin Prosser Institute of Translational Health Sciences, University of Washington, Seattle
Zafer Aydin Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
Amhar Jabeer Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
Ivan Brugere Department of Computer Science, University of Illinois at Chicago, Chicago
Jifan Gao Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison
Jason Causey Computer Science Department, College of Engineering and Computer Science, Arkansas State University, Jonesboro Arkansas AI-Campus, Center for No-Boundary Thinking, Arkansas State University, Jonesboro
Yuxin Yao Department of Computer Science, University College London, London, United Kingdom
Kevin Bryson Department of Computer Science, University College London, London, United Kingdom
Dustin R. Long Division of Critical Care Medicine, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle
Jeffrey G. Jarvik The University of Washington Clinical Learning, Evidence And Research Center for Musculoskeletal Disorders, Seattle Department of Radiology, University of Washington School of Medicine, Seattle
Christoph I. Lee Department of Radiology, University of Washington School of Medicine, Seattle
Adam Wilcox Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
Justin Guinney Sage Bionetworks, Seattle, Washington
Sean Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle

Collapse

Oh EJ, Parikh RB, Chivers C, Chen J. Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. JCO Clin Cancer Inform 2021;5:1015-1023. [PMID: 34591602 PMCID: PMC8812620 DOI: 10.1200/cci.21.00077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 07/24/2021] [Accepted: 08/26/2021] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors.

METHODS

We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20,723 patients of 11 cancer types, where 1,340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type-specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models.

RESULTS

The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 (∆ = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 (∆ = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types.

CONCLUSION

Our two-stage modeling approach that accounts for cancer type-specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types.

Collapse