Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 2023;89:104467. [PMID: 36791660 PMCID: PMC10025760 DOI: 10.1016/j.ebiom.2023.104467] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open

For:	Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 2023;89:104467. [PMID: 36791660 PMCID: PMC10025760 DOI: 10.1016/j.ebiom.2023.104467] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open

Number

Cited by Other Article(s)

Yang Y, Zhang H, Gichoya JW, Katabi D, Ghassemi M. The limits of fair medical imaging AI in real-world generalization. Nat Med 2024:10.1038/s41591-024-03113-4. [PMID: 38942996 DOI: 10.1038/s41591-024-03113-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/05/2024] [Indexed: 06/30/2024]

Kale AU, Hogg HDJ, Pearson R, Glocker B, Golder S, Coombe A, Waring J, Liu X, Moore DJ, Denniston AK. Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review. JMIR Res Protoc 2024;13:e51614. [PMID: 38941147 PMCID: PMC11245650 DOI: 10.2196/51614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 06/29/2024] Open

Abstract

BACKGROUND

Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report.

OBJECTIVE

This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes.

METHODS

This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate.

RESULTS

The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024.

CONCLUSIONS

Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance.

TRIAL REGISTRATION

PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

PRR1-10.2196/51614.

Collapse

Stanley EAM, Souza R, Winder AJ, Gulve V, Amador K, Wilms M, Forkert ND. Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging. J Am Med Inform Assoc 2024:ocae165. [PMID: 38942737 DOI: 10.1093/jamia/ocae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 06/11/2024] [Accepted: 06/18/2024] [Indexed: 06/30/2024] Open

Affiliation(s)

Emma A M Stanley Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, T2N 1N4, Canada Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Raissa Souza Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, T2N 1N4, Canada Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Anthony J Winder Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Vedant Gulve Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Kimberly Amador Biomedical Engineering Graduate Program, University of Calgary, Calgary, Alberta, T2N 1N4, Canada Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Matthias Wilms Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Department of Pediatrics, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Department of Community Health Sciences, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
Nils D Forkert Department of Radiology, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Department of Community Health Sciences, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Department of Clinical Neuroscience, University of Calgary, Calgary, Alberta, T2N 4N1, Canada Department of Electrical and Software Engineering, University of Calgary, Calgary, Alberta, T2N 1N4, Canada

Collapse

Meerwijk EL, McElfresh DC, Martins S, Tamang SR. Evaluating accuracy and fairness of clinical decision support algorithms when health care resources are limited. J Biomed Inform 2024;156:104664. [PMID: 38851413 DOI: 10.1016/j.jbi.2024.104664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 04/02/2024] [Accepted: 06/02/2024] [Indexed: 06/10/2024]

Abstract

OBJECTIVE

Guidance on how to evaluate accuracy and algorithmic fairness across subgroups is missing for clinical models that flag patients for an intervention but when health care resources to administer that intervention are limited. We aimed to propose a framework of metrics that would fit this specific use case.

METHODS

We evaluated the following metrics and applied them to a Veterans Health Administration clinical model that flags patients for intervention who are at risk of overdose or a suicidal event among outpatients who were prescribed opioids (N = 405,817): Receiver - Operating Characteristic and area under the curve, precision - recall curve, calibration - reliability curve, false positive rate, false negative rate, and false omission rate. In addition, we developed a new approach to visualize false positives and false negatives that we named 'per true positive bars.' We demonstrate the utility of these metrics to our use case for three cohorts of patients at the highest risk (top 0.5 %, 1.0 %, and 5.0 %) by evaluating algorithmic fairness across the following age groups: <=30, 31-50, 51-65, and >65 years old.

RESULTS

Metrics that allowed us to assess group differences more clearly were the false positive rate, false negative rate, false omission rate, and the new 'per true positive bars'. Metrics with limited utility to our use case were the Receiver - Operating Characteristic and area under the curve, the calibration - reliability curve, and the precision - recall curve.

CONCLUSION

There is no "one size fits all" approach to model performance monitoring and bias analysis. Our work informs future researchers and clinicians who seek to evaluate accuracy and fairness of predictive models that identify patients to intervene on in the context of limited health care resources. In terms of ease of interpretation and utility for our use case, the new 'per true positive bars' may be the most intuitive to a range of stakeholders and facilitates choosing a threshold that allows weighing false positives against false negatives, which is especially important when predicting severe adverse events.

Collapse

Kotter E, Pinto Dos Santos D. [Ethics and artificial intelligence]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024;64:498-502. [PMID: 38499692 DOI: 10.1007/s00117-024-01286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/26/2024] [Indexed: 03/20/2024]

Restrepo D, Wu C, Vásquez-Venegas C, Nakayama LF, Celi LA, López DM. DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era. RESEARCH SQUARE 2024:rs.3.rs-4277992. [PMID: 38746100 PMCID: PMC11092829 DOI: 10.21203/rs.3.rs-4277992/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024;30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]

Affiliation(s)

Anurag Vaidya Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
Richard J Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Drew F K Williamson Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
Andrew H Song Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Guillaume Jaume Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Yuzhe Yang Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
Thomas Hartvigsen School of Data Science, University of Virginia, Charlottesville, VA, USA
Emma C Dyer T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
Ming Y Lu Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
Jana Lipkova Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Muhammad Shaban Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Tiffany Y Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Faisal Mahmood Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA. Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

Collapse

Wang R, Kuo PC, Chen LC, Seastedt KP, Gichoya JW, Celi LA. Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images. EBioMedicine 2024;102:105047. [PMID: 38471396 PMCID: PMC10945176 DOI: 10.1016/j.ebiom.2024.105047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open

Abstract

BACKGROUND

It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation.

METHODS

This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques.

FINDINGS

In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features.

INTERPRETATION

The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds.

FUNDING

National Science and Technology Council (Taiwan), National Institutes of Health.

Collapse

Meissen F, Breuer S, Knolle M, Buyx A, Müller R, Kaissis G, Wiestler B, Rückert D. (Predictable) performance bias in unsupervised anomaly detection. EBioMedicine 2024;101:105002. [PMID: 38335791 PMCID: PMC10873649 DOI: 10.1016/j.ebiom.2024.105002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 02/12/2024] Open

Affiliation(s)

Felix Meissen Chair for AI in Healthcare and Medicine, Klinikum rechts der Isar der Technischen Universität München, Einsteinstr. 25, Munich, 81675, Germany.
Svenja Breuer Department of Science, Technology and Society, School of Social Sciences and Technology, and Technical University of Munich, Arcisstr. 21, Munich, 80333, Germany; Department of Economics and Policy, School of Management, Technical University of Munich, Arcisstraße 21, 80333, Munich, Germany
Moritz Knolle Chair for AI in Healthcare and Medicine, Klinikum rechts der Isar der Technischen Universität München, Einsteinstr. 25, Munich, 81675, Germany; Konrad Zuse School of Excellence in Reliable AI, Munich Data Science Institute (MDSI), Walther-von-Dyck-Str. 10, Garching, 85748, Germany
Alena Buyx Department of Science, Technology and Society, School of Social Sciences and Technology, and Technical University of Munich, Arcisstr. 21, Munich, 80333, Germany; Institute for History and Ethics of Medicine, School of Medicine, Technical University of Munich, Prinzregentenstraße 68, Munich, 81675, Germany
Ruth Müller Department of Science, Technology and Society, School of Social Sciences and Technology, and Technical University of Munich, Arcisstr. 21, Munich, 80333, Germany; Department of Economics and Policy, School of Management, Technical University of Munich, Arcisstraße 21, 80333, Munich, Germany
Georgios Kaissis Chair for AI in Healthcare and Medicine, Klinikum rechts der Isar der Technischen Universität München, Einsteinstr. 25, Munich, 81675, Germany; Institute for Machine Learning in Biomedical Imaging, Helmholtz Munich, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany; Department of Computing, Imperial College London, London, SW7 2AZ, UK
Benedikt Wiestler Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, Ismaninger Str. 22, Munich, 81675, Germany; TranslaTUM, Center for Translational Cancer Research, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany
Daniel Rückert Chair for AI in Healthcare and Medicine, Klinikum rechts der Isar der Technischen Universität München, Einsteinstr. 25, Munich, 81675, Germany; Department of Computing, Imperial College London, London, SW7 2AZ, UK

Collapse

Khara G, Trivedi H, Newell MS, Patel R, Rijken T, Kecskemethy P, Glocker B. Generalisable deep learning method for mammographic density prediction across imaging techniques and self-reported race. COMMUNICATIONS MEDICINE 2024;4:21. [PMID: 38374436 PMCID: PMC10876691 DOI: 10.1038/s43856-024-00446-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 01/31/2024] [Indexed: 02/21/2024] Open

Weng WH, Sellergen A, Kiraly AP, D'Amour A, Park J, Pilgrim R, Pfohl S, Lau C, Natarajan V, Azizi S, Karthikesalingam A, Cole-Lewis H, Matias Y, Corrado GS, Webster DR, Shetty S, Prabhakara S, Eswaran K, Celi LAG, Liu Y. An intentional approach to managing bias in general purpose embedding models. Lancet Digit Health 2024;6:e126-e130. [PMID: 38278614 DOI: 10.1016/s2589-7500(23)00227-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/24/2023] [Accepted: 11/02/2023] [Indexed: 01/28/2024]

Glocker B, Jones C, Roschewitz M, Winzeck S. Risk of Bias in Chest Radiography Deep Learning Foundation Models. Radiol Artif Intell 2023;5:e230060. [PMID: 38074789 PMCID: PMC10698597 DOI: 10.1148/ryai.230060] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 03/15/2024]

Abstract

PURPOSE

To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race.

MATERIALS AND METHODS

This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups.

RESULTS

Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients.

CONCLUSION

The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.

Collapse

Baughan N, Whitney HM, Drukker K, Sahiner B, Hu T, Kim GH, McNitt-Gray M, Myers KJ, Giger ML. Sequestration of imaging studies in MIDRC: stratified sampling to balance demographic characteristics of patients in a multi-institutional data commons. J Med Imaging (Bellingham) 2023;10:064501. [PMID: 38074627 PMCID: PMC10704184 DOI: 10.1117/1.jmi.10.6.064501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 02/12/2024] Open

Abstract

Purpose

The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered commons for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated metadata become part of the open commons and 20% are sequestered from the open commons. To ensure that both commons are representative of the population available, we introduced a stratified sampling method to balance the demographic characteristics across the two datasets.

Approach

Our method uses multi-dimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to the open or sequestered commons. This algorithm was used on an example dataset containing 5000 patients using the variables of race, age, sex at birth, ethnicity, COVID-19 status, and image modality and compared resulting demographic distributions to naïve random sampling of the dataset over 2000 independent trials.

Results

Resulting prevalence of each demographic variable matched the prevalence from the input dataset within one standard deviation. Mann-Whitney U test results supported the hypothesis that sequestration by stratified sampling provided more balanced subsets than naïve randomization, except for demographic subcategories with very low prevalence.

Conclusions

The developed multi-dimensional stratified sampling algorithm can partition a large dataset while maintaining balance across several variables, superior to the balance achieved from naïve randomization.

Collapse

Brown A, Tomasev N, Freyberg J, Liu Y, Karthikesalingam A, Schrouff J. Detecting shortcut learning for fair medical AI using shortcut testing. Nat Commun 2023;14:4314. [PMID: 37463884 DOI: 10.1038/s41467-023-39902-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 06/26/2023] [Indexed: 07/20/2023] Open

Petersen E, Holm S, Ganz M, Feragen A. The path toward equal performance in medical machine learning. PATTERNS (NEW YORK, N.Y.) 2023;4:100790. [PMID: 37521051 PMCID: PMC10382979 DOI: 10.1016/j.patter.2023.100790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]

Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng 2023;7:719-742. [PMID: 37380750 PMCID: PMC10632090 DOI: 10.1038/s41551-023-01056-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2023] [Indexed: 06/30/2023]

Affiliation(s)

Richard J Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Judy J Wang Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Boston University School of Medicine, Boston, MA, USA
Drew F K Williamson Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Tiffany Y Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Jana Lipkova Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
Ming Y Lu Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Sharifa Sahai Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Faisal Mahmood Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA. Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

Collapse

Fairness metrics for health AI: we have a long way to go. EBioMedicine 2023;90:104525. [PMID: 36924621 PMCID: PMC10114188 DOI: 10.1016/j.ebiom.2023.104525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 03/01/2023] [Indexed: 03/17/2023] Open