Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yi PH, Wei J, Kim TK, Shin J, Sair HI, Hui FK, Hager GD, Lin CT. Radiology "forensics": determination of age and sex from chest radiographs using deep learning. Emerg Radiol 2021;28:949-54. [PMID: 34089126 DOI: 10.1007/s10140-021-01953-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 05/24/2021] [Indexed: 01/23/2023]

For:	Yi PH, Wei J, Kim TK, Shin J, Sair HI, Hui FK, Hager GD, Lin CT. Radiology "forensics": determination of age and sex from chest radiographs using deep learning. Emerg Radiol 2021;28:949-54. [PMID: 34089126 DOI: 10.1007/s10140-021-01953-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 05/24/2021] [Indexed: 01/23/2023]

Number

Cited by Other Article(s)

Carrillo-Larco RM. Recognition of Patient Gender: A Machine Learning Preliminary Analysis Using Heart Sounds from Children and Adolescents. Pediatr Cardiol 2024:10.1007/s00246-024-03561-2. [PMID: 38937337 DOI: 10.1007/s00246-024-03561-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]

Tejani AS, Ng YS, Xi Y, Rayan JC. Understanding and Mitigating Bias in Imaging Artificial Intelligence. Radiographics 2024;44:e230067. [PMID: 38635456 DOI: 10.1148/rg.230067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]

Abstract

Artificial intelligence (AI) algorithms are prone to bias at multiple stages of model development, with potential for exacerbating health disparities. However, bias in imaging AI is a complex topic that encompasses multiple coexisting definitions. Bias may refer to unequal preference to a person or group owing to preexisting attitudes or beliefs, either intentional or unintentional. However, cognitive bias refers to systematic deviation from objective judgment due to reliance on heuristics, and statistical bias refers to differences between true and expected values, commonly manifesting as systematic error in model prediction (ie, a model with output unrepresentative of real-world conditions). Clinical decisions informed by biased models may lead to patient harm due to action on inaccurate AI results or exacerbate health inequities due to differing performance among patient populations. However, while inequitable bias can harm patients in this context, a mindful approach leveraging equitable bias can address underrepresentation of minority groups or rare diseases. Radiologists should also be aware of bias after AI deployment such as automation bias, or a tendency to agree with automated decisions despite contrary evidence. Understanding common sources of imaging AI bias and the consequences of using biased models can guide preventive measures to mitigate its impact. Accordingly, the authors focus on sources of bias at stages along the imaging machine learning life cycle, attempting to simplify potentially intimidating technical terminology for general radiologists using AI tools in practice or collaborating with data scientists and engineers for AI tool development. The authors review definitions of bias in AI, describe common sources of bias, and present recommendations to guide quality control measures to mitigate the impact of bias in imaging AI. Understanding the terms featured in this article will enable a proactive approach to identifying and mitigating bias in imaging AI. Published under a CC BY 4.0 license. Test Your Knowledge questions for this article are available in the supplemental material. See the invited commentary by Rouzrokh and Erickson in this issue.

Collapse

Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024;30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]

Affiliation(s)

Anurag Vaidya Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
Richard J Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Drew F K Williamson Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
Andrew H Song Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Guillaume Jaume Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Yuzhe Yang Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
Thomas Hartvigsen School of Data Science, University of Virginia, Charlottesville, VA, USA
Emma C Dyer T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
Ming Y Lu Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
Jana Lipkova Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Muhammad Shaban Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Tiffany Y Chen Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Faisal Mahmood Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA. Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

Collapse

Hong GS, Jang M, Kyung S, Cho K, Jeong J, Lee GY, Shin K, Kim KD, Ryu SM, Seo JB, Lee SM, Kim N. Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning. Korean J Radiol 2023;24:1061-1080. [PMID: 37724586 PMCID: PMC10613849 DOI: 10.3348/kjr.2023.0393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/01/2023] [Accepted: 07/30/2023] [Indexed: 09/21/2023] Open

Affiliation(s)

Gil-Sun Hong Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Miso Jang Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Sunggu Kyung Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Kyungjin Cho Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Jiheon Jeong Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Grace Yoojin Lee Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Keewon Shin Laboratory for Biosignal Analysis and Perioperative Outcome Research, Biomedical Engineering Center, Asan Institute of Lifesciences, Asan Medical Center, Seoul, Republic of Korea
Ki Duk Kim Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Seung Min Ryu Department of Orthopedic Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Joon Beom Seo Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Sang Min Lee Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
Namkug Kim Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

Collapse

Glocker B, Jones C, Roschewitz M, Winzeck S. Risk of Bias in Chest Radiography Deep Learning Foundation Models. Radiol Artif Intell 2023;5:e230060. [PMID: 38074789 PMCID: PMC10698597 DOI: 10.1148/ryai.230060] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 03/15/2024]

Abstract

PURPOSE

To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race.

MATERIALS AND METHODS

This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups.

RESULTS

Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients.

CONCLUSION

The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.

Collapse

Tripathi S, Gabriel K, Dheer S, Parajuli A, Augustin AI, Elahi A, Awan O, Dako F. Understanding Biases and Disparities in Radiology AI Datasets: A Review. J Am Coll Radiol 2023;20:836-841. [PMID: 37454752 DOI: 10.1016/j.jacr.2023.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023]

Petersen E, Holm S, Ganz M, Feragen A. The path toward equal performance in medical machine learning. PATTERNS (NEW YORK, N.Y.) 2023;4:100790. [PMID: 37521051 PMCID: PMC10382979 DOI: 10.1016/j.patter.2023.100790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]

Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 2023;89:104467. [PMID: 36791660 PMCID: PMC10025760 DOI: 10.1016/j.ebiom.2023.104467] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open

Abstract

BACKGROUND

It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring methodology for subgroup analysis in image-based disease detection models.

METHODS

We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups.

FINDINGS

We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We find that transfer learning alone is insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable insights about the way protected characteristics are encoded in the feature representations of deep neural networks.

INTERPRETATION

Subgroup analysis is key for identifying performance disparities of AI models, but statistical differences across subgroups need to be taken into account when analyzing potential biases in disease detection. The proposed methodology provides a comprehensive framework for subgroup analysis enabling further research into the underlying causes of disparities.

FUNDING

European Research Council Horizon 2020, UK Research and Innovation.

Collapse

Deep learning in sex estimation from knee radiographs - A proof-of-concept study utilizing the Terry Anatomical Collection. Leg Med (Tokyo) 2023;61:102211. [PMID: 36738551 DOI: 10.1016/j.legalmed.2023.102211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 01/05/2023] [Accepted: 01/29/2023] [Indexed: 02/01/2023]

Confounders mediate AI prediction of demographics in medical imaging. NPJ Digit Med 2022;5:188. [PMID: 36550271 PMCID: PMC9780355 DOI: 10.1038/s41746-022-00720-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 11/02/2022] [Indexed: 12/24/2022] Open

Ieki H, Ito K, Saji M, Kawakami R, Nagatomo Y, Takada K, Kariyasu T, Machida H, Koyama S, Yoshida H, Kurosawa R, Matsunaga H, Miyazawa K, Ozaki K, Onouchi Y, Katsushika S, Matsuoka R, Shinohara H, Yamaguchi T, Kodera S, Higashikuni Y, Fujiu K, Akazawa H, Iguchi N, Isobe M, Yoshikawa T, Komuro I. Deep learning-based age estimation from chest X-rays indicates cardiovascular prognosis. COMMUNICATIONS MEDICINE 2022;2:159. [PMID: 36494479 PMCID: PMC9734197 DOI: 10.1038/s43856-022-00220-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open

Affiliation(s)

Hirotaka Ieki grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,2grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan ,3grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
Kaoru Ito grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Mike Saji grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
Rei Kawakami grid.32197.3e0000 0001 2179 2105Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Yuji Nagatomo grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan ,5grid.416614.00000 0004 0374 0880Department of Cardiology, National Defense Medical College, Tokorozawa, Japan
Kaori Takada grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan
Toshiya Kariyasu grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan ,7grid.413376.40000 0004 1761 1035Department of Radiology, Tokyo Women’s Medical University, Medical Center East, Tokyo, Japan
Haruhiko Machida grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan ,7grid.413376.40000 0004 1761 1035Department of Radiology, Tokyo Women’s Medical University, Medical Center East, Tokyo, Japan
Satoshi Koyama grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Hiroki Yoshida grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,2grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Ryo Kurosawa grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Hiroshi Matsunaga grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,2grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Kazuo Miyazawa grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Kouichi Ozaki grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,8grid.419257.c0000 0004 1791 9005Division for Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Japan
Yoshihiro Onouchi grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,9grid.136304.30000 0004 0370 1101Department of Public Health, Chiba University Graduate School of Medicine, Chiba, Japan
Susumu Katsushika grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Ryo Matsuoka grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Hiroki Shinohara grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Toshihiro Yamaguchi grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan ,10grid.412708.80000 0004 1764 7572Center for Epidemiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
Satoshi Kodera grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Yasutomi Higashikuni grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Katsuhito Fujiu grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Hiroshi Akazawa grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Nobuo Iguchi grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
Mitsuaki Isobe grid.413411.2Sakakibara Heart Institute, Tokyo, Japan
Tsutomu Yoshikawa grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
Issei Komuro grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan

Collapse

Adleberg J, Wardeh A, Doo FX, Marinelli B, Cook TS, Mendelson DS, Kagen A. Predicting Patient Demographics From Chest Radiographs With Deep Learning. J Am Coll Radiol 2022;19:1151-1161. [PMID: 35964688 DOI: 10.1016/j.jacr.2022.06.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 11/29/2022]

Abstract

BACKGROUND

Deep learning models are increasingly informing medical decision making, for instance, in the detection of acute intracranial hemorrhage and pulmonary embolism. However, many models are trained on medical image databases that poorly represent the diversity of the patients they serve. In turn, many artificial intelligence models may not perform as well on assisting providers with important medical decisions for underrepresented populations.

PURPOSE

Assessment of the ability of deep learning models to classify the self-reported gender, age, self-reported ethnicity, and insurance status of an individual patient from a given chest radiograph.

METHODS

Models were trained and tested with 55,174 radiographs in the MIMIC Chest X-ray (MIMIC-CXR) database. External validation data came from two separate databases, one from CheXpert and another from a multihospital urban health care system after institutional review board approval. Macro-averaged area under the curve (AUC) values were used to evaluate performance of models. Code used for this study is open-source and available at https://github.com/ai-bias/cxr-bias, and pixelstopatients.com/models/demographics.

RESULTS

Accuracy of models to predict gender was nearly perfect, with 0.999 (95% confidence interval: 0.99-0.99) AUC on held-out test data and 0.994 (0.99-0.99) and 0.997 (0.99-0.99) on external validation data. There was high accuracy to predict age and ethnicity, ranging from 0.854 (0.80-0.91) to 0.911 (0.88-0.94) AUC, and moderate accuracy to predict insurance status, with AUC ranging from 0.705 (0.60-0.81) on held-out test data to 0.675 (0.54-0.79) on external validation data.

CONCLUSIONS

Deep learning models can predict the age, self-reported gender, self-reported ethnicity, and insurance status of a patient from a chest radiograph. Visualization techniques are useful to ensure deep learning models function as intended and to demonstrate anatomical regions of interest. These models can be used to ensure that training data are diverse, thereby ensuring artificial intelligence models that work on diverse populations.

Collapse

Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dullerud N, Ghassemi M, Huang SC, Kuo PC, Lungren MP, Palmer LJ, Price BJ, Purkayastha S, Pyrros AT, Oakden-Rayner L, Okechukwu C, Seyyed-Kalantari L, Trivedi H, Wang R, Zaiman Z, Zhang H. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health 2022;4:e406-e414. [PMID: 35568690 PMCID: PMC9650160 DOI: 10.1016/s2589-7500(22)00063-2] [Citation(s) in RCA: 122] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/03/2022] [Accepted: 03/18/2022] [Indexed: 02/01/2023]

Abstract

BACKGROUND

Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images.

METHODS

Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race.

FINDINGS

In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study.

INTERPRETATION

The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging.

FUNDING

National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology.

Collapse

Li D, Lin CT, Sulam J, Yi PH. Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms. Emerg Radiol 2022;29:365-370. [PMID: 35006495 DOI: 10.1007/s10140-022-02019-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 01/06/2022] [Indexed: 11/26/2022]

Abstract

BACKGROUND

Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias.

OBJECTIVE

To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs.

MATERIALS AND METHODS

Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN's prediction.

RESULTS

On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions.

CONCLUSION

DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.

Collapse

Padash S, Mohebbian MR, Adams SJ, Henderson RDE, Babyn P. Pediatric chest radiograph interpretation: how far has artificial intelligence come? A systematic literature review. Pediatr Radiol 2022;52:1568-1580. [PMID: 35460035 PMCID: PMC9033522 DOI: 10.1007/s00247-022-05368-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/28/2022] [Accepted: 03/24/2022] [Indexed: 10/24/2022]

Herrmann P, Busana M, Cressoni M, Lotz J, Moerer O, Saager L, Meissner K, Quintel M, Gattinoni L. Using Artificial Intelligence for Automatic Segmentation of CT Lung Images in Acute Respiratory Distress Syndrome. Front Physiol 2021;12:676118. [PMID: 34594233 PMCID: PMC8476971 DOI: 10.3389/fphys.2021.676118] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 08/17/2021] [Indexed: 01/17/2023] Open

Abstract

Knowledge of gas volume, tissue mass and recruitability measured by the quantitative CT scan analysis (CT-qa) is important when setting the mechanical ventilation in acute respiratory distress syndrome (ARDS). Yet, the manual segmentation of the lung requires a considerable workload. Our goal was to provide an automatic, clinically applicable and reliable lung segmentation procedure. Therefore, a convolutional neural network (CNN) was used to train an artificial intelligence (AI) algorithm on 15 healthy subjects (1,302 slices), 100 ARDS patients (12,279 slices), and 20 COVID-19 (1,817 slices). Eighty percent of this populations was used for training, 20% for testing. The AI and manual segmentation at slice level were compared by intersection over union (IoU). The CT-qa variables were compared by regression and Bland Altman analysis. The AI-segmentation of a single patient required 5–10 s vs. 1–2 h of the manual. At slice level, the algorithm showed on the test set an IOU across all CT slices of 91.3 ± 10.0, 85.2 ± 13.9, and 84.7 ± 14.0%, and across all lung volumes of 96.3 ± 0.6, 88.9 ± 3.1, and 86.3 ± 6.5% for normal lungs, ARDS and COVID-19, respectively, with a U-shape in the performance: better in the lung middle region, worse at the apex and base. At patient level, on the test set, the total lung volume measured by AI and manual segmentation had a R² of 0.99 and a bias −9.8 ml [CI: +56.0/−75.7 ml]. The recruitability measured with manual and AI-segmentation, as change in non-aerated tissue fraction had a bias of +0.3% [CI: +6.2/−5.5%] and −0.5% [CI: +2.3/−3.3%] expressed as change in well-aerated tissue fraction. The AI-powered lung segmentation provided fast and clinically reliable results. It is able to segment the lungs of seriously ill ARDS patients fully automatically.

Collapse