1
|
Carrillo-Larco RM. Recognition of Patient Gender: A Machine Learning Preliminary Analysis Using Heart Sounds from Children and Adolescents. Pediatr Cardiol 2024:10.1007/s00246-024-03561-2. [PMID: 38937337 DOI: 10.1007/s00246-024-03561-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
Research has shown that X-rays and fundus images can classify gender, age group, and race, raising concerns about bias and fairness in medical AI applications. However, the potential for physiological sounds to classify sociodemographic traits has not been investigated. Exploring this gap is crucial for understanding the implications and ensuring fairness in the field of medical sound analysis. We aimed to develop classifiers to determine gender (men/women) based on heart sound recordings and using machine learning (ML). Data-driven ML analysis. We utilized the open-access CirCor DigiScope Phonocardiogram Dataset obtained from cardiac screening programs in Brazil. Volunteers < 21 years of age. Each participant completed a questionnaire and underwent a clinical examination, including electronic auscultation at four cardiac points: aortic (AV), mitral (MV), pulmonary (PV), and tricuspid (TV). We used Mel-frequency cepstral coefficients (MFCCs) to develop the ML classifiers. From each patient and from each auscultation sound recording, we extracted 10 MFCCs. In sensitivity analysis, we additionally extracted 20, 30, 40, and 50 MFCCs. The most effective gender classifier was developed using PV recordings (AUC ROC = 70.3%). The second best came from MV recordings (AUC ROC = 58.8%). AV and TV recordings produced classifiers with an AUC ROC of 56.4% and 56.1%, respectively. Using more MFCCs did not substantially improve the classifiers. It is possible to classify between males and females using phonocardiogram data. As health-related audio recordings become more prominent in ML applications, research is required to explore if these recordings contain signals that could distinguish sociodemographic features.
Collapse
Affiliation(s)
- Rodrigo M Carrillo-Larco
- Hubert Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA.
| |
Collapse
|
2
|
Tejani AS, Ng YS, Xi Y, Rayan JC. Understanding and Mitigating Bias in Imaging Artificial Intelligence. Radiographics 2024; 44:e230067. [PMID: 38635456 DOI: 10.1148/rg.230067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]
Abstract
Artificial intelligence (AI) algorithms are prone to bias at multiple stages of model development, with potential for exacerbating health disparities. However, bias in imaging AI is a complex topic that encompasses multiple coexisting definitions. Bias may refer to unequal preference to a person or group owing to preexisting attitudes or beliefs, either intentional or unintentional. However, cognitive bias refers to systematic deviation from objective judgment due to reliance on heuristics, and statistical bias refers to differences between true and expected values, commonly manifesting as systematic error in model prediction (ie, a model with output unrepresentative of real-world conditions). Clinical decisions informed by biased models may lead to patient harm due to action on inaccurate AI results or exacerbate health inequities due to differing performance among patient populations. However, while inequitable bias can harm patients in this context, a mindful approach leveraging equitable bias can address underrepresentation of minority groups or rare diseases. Radiologists should also be aware of bias after AI deployment such as automation bias, or a tendency to agree with automated decisions despite contrary evidence. Understanding common sources of imaging AI bias and the consequences of using biased models can guide preventive measures to mitigate its impact. Accordingly, the authors focus on sources of bias at stages along the imaging machine learning life cycle, attempting to simplify potentially intimidating technical terminology for general radiologists using AI tools in practice or collaborating with data scientists and engineers for AI tool development. The authors review definitions of bias in AI, describe common sources of bias, and present recommendations to guide quality control measures to mitigate the impact of bias in imaging AI. Understanding the terms featured in this article will enable a proactive approach to identifying and mitigating bias in imaging AI. Published under a CC BY 4.0 license. Test Your Knowledge questions for this article are available in the supplemental material. See the invited commentary by Rouzrokh and Erickson in this issue.
Collapse
Affiliation(s)
- Ali S Tejani
- From the Department of Radiology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390
| | - Yee Seng Ng
- From the Department of Radiology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390
| | - Yin Xi
- From the Department of Radiology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390
| | - Jesse C Rayan
- From the Department of Radiology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390
| |
Collapse
|
3
|
Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024; 30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]
Abstract
Despite increasing numbers of regulatory approvals, deep learning-based computational pathology systems often overlook the impact of demographic factors on performance, potentially leading to biases. This concern is all the more important as computational pathology has leveraged large public datasets that underrepresent certain demographic groups. Using publicly available data from The Cancer Genome Atlas and the EBRAINS brain tumor atlas, as well as internal patient data, we show that whole-slide image classification models display marked performance disparities across different demographic groups when used to subtype breast and lung carcinomas and to predict IDH1 mutations in gliomas. For example, when using common modeling approaches, we observed performance gaps (in area under the receiver operating characteristic curve) between white and Black patients of 3.0% for breast cancer subtyping, 10.9% for lung cancer subtyping and 16.0% for IDH1 mutation prediction in gliomas. We found that richer feature representations obtained from self-supervised vision foundation models reduce performance variations between groups. These representations provide improvements upon weaker models even when those weaker models are combined with state-of-the-art bias mitigation strategies and modeling choices. Nevertheless, self-supervised vision foundation models do not fully eliminate these discrepancies, highlighting the continuing need for bias mitigation efforts in computational pathology. Finally, we demonstrate that our results extend to other demographic factors beyond patient race. Given these findings, we encourage regulatory and policy agencies to integrate demographic-stratified evaluation into their assessment guidelines.
Collapse
Affiliation(s)
- Anurag Vaidya
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
| | - Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Andrew H Song
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Guillaume Jaume
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yuzhe Yang
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Thomas Hartvigsen
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Emma C Dyer
- T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Muhammad Shaban
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
4
|
Hong GS, Jang M, Kyung S, Cho K, Jeong J, Lee GY, Shin K, Kim KD, Ryu SM, Seo JB, Lee SM, Kim N. Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning. Korean J Radiol 2023; 24:1061-1080. [PMID: 37724586 PMCID: PMC10613849 DOI: 10.3348/kjr.2023.0393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/01/2023] [Accepted: 07/30/2023] [Indexed: 09/21/2023] Open
Abstract
Artificial intelligence (AI) in radiology is a rapidly developing field with several prospective clinical studies demonstrating its benefits in clinical practice. In 2022, the Korean Society of Radiology held a forum to discuss the challenges and drawbacks in AI development and implementation. Various barriers hinder the successful application and widespread adoption of AI in radiology, such as limited annotated data, data privacy and security, data heterogeneity, imbalanced data, model interpretability, overfitting, and integration with clinical workflows. In this review, some of the various possible solutions to these challenges are presented and discussed; these include training with longitudinal and multimodal datasets, dense training with multitask learning and multimodal learning, self-supervised contrastive learning, various image modifications and syntheses using generative models, explainable AI, causal learning, federated learning with large data models, and digital twins.
Collapse
Affiliation(s)
- Gil-Sun Hong
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Miso Jang
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sunggu Kyung
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Kyungjin Cho
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jiheon Jeong
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Grace Yoojin Lee
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Keewon Shin
- Laboratory for Biosignal Analysis and Perioperative Outcome Research, Biomedical Engineering Center, Asan Institute of Lifesciences, Asan Medical Center, Seoul, Republic of Korea
| | - Ki Duk Kim
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Seung Min Ryu
- Department of Orthopedic Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Joon Beom Seo
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sang Min Lee
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| | - Namkug Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
- Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Glocker B, Jones C, Roschewitz M, Winzeck S. Risk of Bias in Chest Radiography Deep Learning Foundation Models. Radiol Artif Intell 2023; 5:e230060. [PMID: 38074789 PMCID: PMC10698597 DOI: 10.1148/ryai.230060] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 03/15/2024]
Abstract
PURPOSE To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. MATERIALS AND METHODS This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. RESULTS Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients. CONCLUSION The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.
Collapse
Affiliation(s)
- Ben Glocker
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Charles Jones
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Mélanie Roschewitz
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Stefan Winzeck
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| |
Collapse
|
6
|
Tripathi S, Gabriel K, Dheer S, Parajuli A, Augustin AI, Elahi A, Awan O, Dako F. Understanding Biases and Disparities in Radiology AI Datasets: A Review. J Am Coll Radiol 2023; 20:836-841. [PMID: 37454752 DOI: 10.1016/j.jacr.2023.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023]
Abstract
Artificial intelligence (AI) continues to show great potential in disease detection and diagnosis on medical imaging with increasingly high accuracy. An important component of AI model creation is dataset development for training, validation, and testing. Diverse and high-quality datasets are critical to ensure robust and unbiased AI models that maintain validity, especially in traditionally underserved populations globally. Yet publicly available datasets demonstrate problems with quality and inclusivity. In this literature review, the authors evaluate publicly available medical imaging datasets for demographic, geographic, genetic, and disease representation or lack thereof and call for an increase emphasis on dataset development to maximize the impact of AI models.
Collapse
Affiliation(s)
- Satvik Tripathi
- Department of Radiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania.
| | - Kyla Gabriel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Suhani Dheer
- Department of Radiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
| | - Aastha Parajuli
- Department of Radiology, Kathmandu University of School of Medical Sciences, Dhulikhel, Nepal
| | | | - Ameena Elahi
- Department of Information Services, University of Pennsylvania Health System, Philadelphia, Pennsylvania
| | - Omar Awan
- Department of Radiology, University of Maryland School of Medicine, Baltimore, Maryland
| | - Farouk Dako
- Department of Radiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
| |
Collapse
|
7
|
Petersen E, Holm S, Ganz M, Feragen A. The path toward equal performance in medical machine learning. PATTERNS (NEW YORK, N.Y.) 2023; 4:100790. [PMID: 37521051 PMCID: PMC10382979 DOI: 10.1016/j.patter.2023.100790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
To ensure equitable quality of care, differences in machine learning model performance between patient groups must be addressed. Here, we argue that two separate mechanisms can cause performance differences between groups. First, model performance may be worse than theoretically achievable in a given group. This can occur due to a combination of group underrepresentation, modeling choices, and the characteristics of the prediction task at hand. We examine scenarios in which underrepresentation leads to underperformance, scenarios in which it does not, and the differences between them. Second, the optimal achievable performance may also differ between groups due to differences in the intrinsic difficulty of the prediction task. We discuss several possible causes of such differences in task difficulty. In addition, challenges such as label biases and selection biases may confound both learning and performance evaluation. We highlight consequences for the path toward equal performance, and we emphasize that leveling up model performance may require gathering not only more data from underperforming groups but also better data. Throughout, we ground our discussion in real-world medical phenomena and case studies while also referencing relevant statistical theory.
Collapse
Affiliation(s)
- Eike Petersen
- DTU Compute, Technical University of Denmark, Richard Pedersens Plads, 2800 Kgs. Lyngby, Denmark
- Pioneer Centre for AI, Øster Voldgade 3, 1350 Copenhagen, Denmark
| | - Sune Holm
- Pioneer Centre for AI, Øster Voldgade 3, 1350 Copenhagen, Denmark
- Department of Food and Resource Economics, University of Copenhagen, Rolighedsvej 23, 1958 Frederiksberg C., Denmark
| | - Melanie Ganz
- Pioneer Centre for AI, Øster Voldgade 3, 1350 Copenhagen, Denmark
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark
- Neurobiology Research Unit, Rigshospitalet, Inge Lehmanns Vej 6–8, 2100 Copenhagen, Denmark
| | - Aasa Feragen
- DTU Compute, Technical University of Denmark, Richard Pedersens Plads, 2800 Kgs. Lyngby, Denmark
- Pioneer Centre for AI, Øster Voldgade 3, 1350 Copenhagen, Denmark
| |
Collapse
|
8
|
Glocker B, Jones C, Bernhardt M, Winzeck S. Algorithmic encoding of protected characteristics in chest X-ray disease detection models. EBioMedicine 2023; 89:104467. [PMID: 36791660 PMCID: PMC10025760 DOI: 10.1016/j.ebiom.2023.104467] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring methodology for subgroup analysis in image-based disease detection models. METHODS We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. FINDINGS We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We find that transfer learning alone is insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable insights about the way protected characteristics are encoded in the feature representations of deep neural networks. INTERPRETATION Subgroup analysis is key for identifying performance disparities of AI models, but statistical differences across subgroups need to be taken into account when analyzing potential biases in disease detection. The proposed methodology provides a comprehensive framework for subgroup analysis enabling further research into the underlying causes of disparities. FUNDING European Research Council Horizon 2020, UK Research and Innovation.
Collapse
Affiliation(s)
- Ben Glocker
- Department of Computing, Imperial College London, London, SW7 2AZ, UK.
| | - Charles Jones
- Department of Computing, Imperial College London, London, SW7 2AZ, UK
| | - Mélanie Bernhardt
- Department of Computing, Imperial College London, London, SW7 2AZ, UK
| | - Stefan Winzeck
- Department of Computing, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
9
|
Deep learning in sex estimation from knee radiographs - A proof-of-concept study utilizing the Terry Anatomical Collection. Leg Med (Tokyo) 2023; 61:102211. [PMID: 36738551 DOI: 10.1016/j.legalmed.2023.102211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 01/05/2023] [Accepted: 01/29/2023] [Indexed: 02/01/2023]
Abstract
Although knee measurements yield high classification rates in metric sex estimation, there is a paucity of studies exploring the knee in artificial intelligence-based sexing. This proof-of-concept study aimed to develop deep learning algorithms for sex estimation from radiographs of reconstructed cadaver knee joints belonging to the Terry Anatomical Collection. A total of 199 knee radiographs were obtained from 100 skeletons (46 male and 54 female cadavers; mean age at death 64.2 years, range 50-102 years) whose tibiofemoral joints were reconstructed in standard anatomical position. The AIDeveloper software was used to train, validate, and test neural network architectures in sex estimation based on image classification. Of the explored algorithms, an MhNet-based model reached the highest overall testing accuracy of 90.3%. The model was able to classify all females (100.0%) and most males (78.6%) correctly. These preliminary findings encourage further research on artificial intelligence-based methods in sex estimation from the knee joint. Combining radiographic data with automated and externally validated algorithms may establish valuable tools to be utilized in forensic anthropology.
Collapse
|
10
|
Confounders mediate AI prediction of demographics in medical imaging. NPJ Digit Med 2022; 5:188. [PMID: 36550271 PMCID: PMC9780355 DOI: 10.1038/s41746-022-00720-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 11/02/2022] [Indexed: 12/24/2022] Open
Abstract
Deep learning has been shown to accurately assess "hidden" phenotypes from medical imaging beyond traditional clinician interpretation. Using large echocardiography datasets from two healthcare systems, we test whether it is possible to predict age, race, and sex from cardiac ultrasound images using deep learning algorithms and assess the impact of varying confounding variables. Using a total of 433,469 videos from Cedars-Sinai Medical Center and 99,909 videos from Stanford Medical Center, we trained video-based convolutional neural networks to predict age, sex, and race. We found that deep learning models were able to identify age and sex, while unable to reliably predict race. Without considering confounding differences between categories, the AI model predicted sex with an AUC of 0.85 (95% CI 0.84-0.86), age with a mean absolute error of 9.12 years (95% CI 9.00-9.25), and race with AUCs ranging from 0.63 to 0.71. When predicting race, we show that tuning the proportion of confounding variables (age or sex) in the training data significantly impacts model AUC (ranging from 0.53 to 0.85), while sex and age prediction was not particularly impacted by adjusting race proportion in the training dataset AUC of 0.81-0.83 and 0.80-0.84, respectively. This suggests significant proportion of AI's performance on predicting race could come from confounding features being detected. Further work remains to identify the particular imaging features that associate with demographic information and to better understand the risks of demographic identification in medical AI as it pertains to potentially perpetuating bias and disparities.
Collapse
|
11
|
Ieki H, Ito K, Saji M, Kawakami R, Nagatomo Y, Takada K, Kariyasu T, Machida H, Koyama S, Yoshida H, Kurosawa R, Matsunaga H, Miyazawa K, Ozaki K, Onouchi Y, Katsushika S, Matsuoka R, Shinohara H, Yamaguchi T, Kodera S, Higashikuni Y, Fujiu K, Akazawa H, Iguchi N, Isobe M, Yoshikawa T, Komuro I. Deep learning-based age estimation from chest X-rays indicates cardiovascular prognosis. COMMUNICATIONS MEDICINE 2022; 2:159. [PMID: 36494479 PMCID: PMC9734197 DOI: 10.1038/s43856-022-00220-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND In recent years, there has been considerable research on the use of artificial intelligence to estimate age and disease status from medical images. However, age estimation from chest X-ray (CXR) images has not been well studied and the clinical significance of estimated age has not been fully determined. METHODS To address this, we trained a deep neural network (DNN) model using more than 100,000 CXRs to estimate the patients' age solely from CXRs. We applied our DNN to CXRs of 1562 consecutive hospitalized heart failure patients, and 3586 patients admitted to the intensive care unit with cardiovascular disease. RESULTS The DNN's estimated age (X-ray age) showed a strong significant correlation with chronological age on the hold-out test data and independent test data. Elevated X-ray age is associated with worse clinical outcomes (heart failure readmission and all-cause death) for heart failure. Additionally, elevated X-ray age was associated with a worse prognosis in 3586 patients admitted to the intensive care unit with cardiovascular disease. CONCLUSIONS Our results suggest that X-ray age can serve as a useful indicator of cardiovascular abnormalities, which will help clinicians to predict, prevent and manage cardiovascular diseases.
Collapse
Affiliation(s)
- Hirotaka Ieki
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan ,grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
| | - Kaoru Ito
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mike Saji
- grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
| | - Rei Kawakami
- grid.32197.3e0000 0001 2179 2105Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
| | - Yuji Nagatomo
- grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan ,grid.416614.00000 0004 0374 0880Department of Cardiology, National Defense Medical College, Tokorozawa, Japan
| | - Kaori Takada
- grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan
| | - Toshiya Kariyasu
- grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan ,grid.413376.40000 0004 1761 1035Department of Radiology, Tokyo Women’s Medical University, Medical Center East, Tokyo, Japan
| | - Haruhiko Machida
- grid.413411.2Department of Radiology, Sakakibara Heart Institute, Tokyo, Japan ,grid.413376.40000 0004 1761 1035Department of Radiology, Tokyo Women’s Medical University, Medical Center East, Tokyo, Japan
| | - Satoshi Koyama
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Hiroki Yoshida
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ryo Kurosawa
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Hiroshi Matsunaga
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kazuo Miyazawa
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Kouichi Ozaki
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,grid.419257.c0000 0004 1791 9005Division for Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Japan
| | - Yoshihiro Onouchi
- grid.509459.40000 0004 0472 0267Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan ,grid.136304.30000 0004 0370 1101Department of Public Health, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Susumu Katsushika
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ryo Matsuoka
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiroki Shinohara
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Toshihiro Yamaguchi
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan ,grid.412708.80000 0004 1764 7572Center for Epidemiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Satoshi Kodera
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yasutomi Higashikuni
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Katsuhito Fujiu
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiroshi Akazawa
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Nobuo Iguchi
- grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
| | | | - Tsutomu Yoshikawa
- grid.413411.2Department of Cardiology, Sakakibara Heart Institute, Tokyo, Japan
| | - Issei Komuro
- grid.26999.3d0000 0001 2151 536XDepartment of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
12
|
Adleberg J, Wardeh A, Doo FX, Marinelli B, Cook TS, Mendelson DS, Kagen A. Predicting Patient Demographics From Chest Radiographs With Deep Learning. J Am Coll Radiol 2022; 19:1151-1161. [PMID: 35964688 DOI: 10.1016/j.jacr.2022.06.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/13/2022] [Accepted: 06/21/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND Deep learning models are increasingly informing medical decision making, for instance, in the detection of acute intracranial hemorrhage and pulmonary embolism. However, many models are trained on medical image databases that poorly represent the diversity of the patients they serve. In turn, many artificial intelligence models may not perform as well on assisting providers with important medical decisions for underrepresented populations. PURPOSE Assessment of the ability of deep learning models to classify the self-reported gender, age, self-reported ethnicity, and insurance status of an individual patient from a given chest radiograph. METHODS Models were trained and tested with 55,174 radiographs in the MIMIC Chest X-ray (MIMIC-CXR) database. External validation data came from two separate databases, one from CheXpert and another from a multihospital urban health care system after institutional review board approval. Macro-averaged area under the curve (AUC) values were used to evaluate performance of models. Code used for this study is open-source and available at https://github.com/ai-bias/cxr-bias, and pixelstopatients.com/models/demographics. RESULTS Accuracy of models to predict gender was nearly perfect, with 0.999 (95% confidence interval: 0.99-0.99) AUC on held-out test data and 0.994 (0.99-0.99) and 0.997 (0.99-0.99) on external validation data. There was high accuracy to predict age and ethnicity, ranging from 0.854 (0.80-0.91) to 0.911 (0.88-0.94) AUC, and moderate accuracy to predict insurance status, with AUC ranging from 0.705 (0.60-0.81) on held-out test data to 0.675 (0.54-0.79) on external validation data. CONCLUSIONS Deep learning models can predict the age, self-reported gender, self-reported ethnicity, and insurance status of a patient from a chest radiograph. Visualization techniques are useful to ensure deep learning models function as intended and to demonstrate anatomical regions of interest. These models can be used to ensure that training data are diverse, thereby ensuring artificial intelligence models that work on diverse populations.
Collapse
Affiliation(s)
- Jason Adleberg
- Department of Radiology, Mount Sinai Health System, New York, New York.
| | - Amr Wardeh
- Deaprtment of Radiology, Upstate University Hospital, Syracuse, New York
| | - Florence X Doo
- Department of Radiology, Mount Sinai Health System, New York, New York
| | - Brett Marinelli
- Department of Radiology, Mount Sinai Health System, New York, New York
| | - Tessa S Cook
- Director, 3D and Advanced Imaging Laboratory and Director, Center for Practice Transformation in Radiology, Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - David S Mendelson
- Vice Chair, Informatics, Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Alexander Kagen
- Site Chair, Department of Radiology, Mount Sinai West and Mount Sinai St. Luke's Hospitals, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
13
|
Gichoya JW, Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dullerud N, Ghassemi M, Huang SC, Kuo PC, Lungren MP, Palmer LJ, Price BJ, Purkayastha S, Pyrros AT, Oakden-Rayner L, Okechukwu C, Seyyed-Kalantari L, Trivedi H, Wang R, Zaiman Z, Zhang H. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health 2022; 4:e406-e414. [PMID: 35568690 PMCID: PMC9650160 DOI: 10.1016/s2589-7500(22)00063-2] [Citation(s) in RCA: 122] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/03/2022] [Accepted: 03/18/2022] [Indexed: 02/01/2023]
Abstract
BACKGROUND Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. METHODS Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. FINDINGS In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. INTERPRETATION The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging. FUNDING National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology.
Collapse
|
14
|
Li D, Lin CT, Sulam J, Yi PH. Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms. Emerg Radiol 2022; 29:365-370. [PMID: 35006495 DOI: 10.1007/s10140-022-02019-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 01/06/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias. OBJECTIVE To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs. MATERIALS AND METHODS Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN's prediction. RESULTS On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions. CONCLUSION DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.
Collapse
Affiliation(s)
- David Li
- Faculty of Medicine, University of Ottawa, Roger Guindon Hall, 451 Smyth Rd #2044, Ottawa, ON, K1H 8M5, Canada
- University of Maryland Medical Intelligent Imaging (UM2II) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, Room 1172, Baltimore, MD, 21201, USA
| | - Cheng Ting Lin
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, 601 N Caroline St, Baltimore, MD, 21231, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Clark 320B, 3400 N Charles St, Baltimore, MD, 21218, USA
| | - Paul H Yi
- University of Maryland Medical Intelligent Imaging (UM2II) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, Room 1172, Baltimore, MD, 21201, USA.
| |
Collapse
|
15
|
Padash S, Mohebbian MR, Adams SJ, Henderson RDE, Babyn P. Pediatric chest radiograph interpretation: how far has artificial intelligence come? A systematic literature review. Pediatr Radiol 2022; 52:1568-1580. [PMID: 35460035 PMCID: PMC9033522 DOI: 10.1007/s00247-022-05368-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/28/2022] [Accepted: 03/24/2022] [Indexed: 10/24/2022]
Abstract
Most artificial intelligence (AI) studies have focused primarily on adult imaging, with less attention to the unique aspects of pediatric imaging. The objectives of this study were to (1) identify all publicly available pediatric datasets and determine their potential utility and limitations for pediatric AI studies and (2) systematically review the literature to assess the current state of AI in pediatric chest radiograph interpretation. We searched PubMed, Web of Science and Embase to retrieve all studies from 1990 to 2021 that assessed AI for pediatric chest radiograph interpretation and abstracted the datasets used to train and test AI algorithms, approaches and performance metrics. Of 29 publicly available chest radiograph datasets, 2 datasets included solely pediatric chest radiographs, and 7 datasets included pediatric and adult patients. We identified 55 articles that implemented an AI model to interpret pediatric chest radiographs or pediatric and adult chest radiographs. Classification of chest radiographs as pneumonia was the most common application of AI, evaluated in 65% of the studies. Although many studies report high diagnostic accuracy, most algorithms were not validated on external datasets. Most AI studies for pediatric chest radiograph interpretation have focused on a limited number of diseases, and progress is hindered by a lack of large-scale pediatric chest radiograph datasets.
Collapse
Affiliation(s)
- Sirwa Padash
- Department of Medical Imaging, University of Saskatchewan, 103 Hospital Drive, Saskatoon, Saskatchewan, S7N 0W8, Canada. .,Department of Radiology, Mayo Clinic, Rochester, MN, USA.
| | - Mohammad Reza Mohebbian
- grid.25152.310000 0001 2154 235XDepartment of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, Saskatchewan Canada
| | - Scott J. Adams
- grid.25152.310000 0001 2154 235XDepartment of Medical Imaging, University of Saskatchewan, 103 Hospital Drive, Saskatoon, Saskatchewan S7N 0W8 Canada
| | - Robert D. E. Henderson
- grid.25152.310000 0001 2154 235XDepartment of Medical Imaging, University of Saskatchewan, 103 Hospital Drive, Saskatoon, Saskatchewan S7N 0W8 Canada
| | - Paul Babyn
- grid.25152.310000 0001 2154 235XDepartment of Medical Imaging, University of Saskatchewan, 103 Hospital Drive, Saskatoon, Saskatchewan S7N 0W8 Canada
| |
Collapse
|
16
|
Herrmann P, Busana M, Cressoni M, Lotz J, Moerer O, Saager L, Meissner K, Quintel M, Gattinoni L. Using Artificial Intelligence for Automatic Segmentation of CT Lung Images in Acute Respiratory Distress Syndrome. Front Physiol 2021; 12:676118. [PMID: 34594233 PMCID: PMC8476971 DOI: 10.3389/fphys.2021.676118] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 08/17/2021] [Indexed: 01/17/2023] Open
Abstract
Knowledge of gas volume, tissue mass and recruitability measured by the quantitative CT scan analysis (CT-qa) is important when setting the mechanical ventilation in acute respiratory distress syndrome (ARDS). Yet, the manual segmentation of the lung requires a considerable workload. Our goal was to provide an automatic, clinically applicable and reliable lung segmentation procedure. Therefore, a convolutional neural network (CNN) was used to train an artificial intelligence (AI) algorithm on 15 healthy subjects (1,302 slices), 100 ARDS patients (12,279 slices), and 20 COVID-19 (1,817 slices). Eighty percent of this populations was used for training, 20% for testing. The AI and manual segmentation at slice level were compared by intersection over union (IoU). The CT-qa variables were compared by regression and Bland Altman analysis. The AI-segmentation of a single patient required 5–10 s vs. 1–2 h of the manual. At slice level, the algorithm showed on the test set an IOU across all CT slices of 91.3 ± 10.0, 85.2 ± 13.9, and 84.7 ± 14.0%, and across all lung volumes of 96.3 ± 0.6, 88.9 ± 3.1, and 86.3 ± 6.5% for normal lungs, ARDS and COVID-19, respectively, with a U-shape in the performance: better in the lung middle region, worse at the apex and base. At patient level, on the test set, the total lung volume measured by AI and manual segmentation had a R2 of 0.99 and a bias −9.8 ml [CI: +56.0/−75.7 ml]. The recruitability measured with manual and AI-segmentation, as change in non-aerated tissue fraction had a bias of +0.3% [CI: +6.2/−5.5%] and −0.5% [CI: +2.3/−3.3%] expressed as change in well-aerated tissue fraction. The AI-powered lung segmentation provided fast and clinically reliable results. It is able to segment the lungs of seriously ill ARDS patients fully automatically.
Collapse
Affiliation(s)
- Peter Herrmann
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| | - Mattia Busana
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| | | | - Joachim Lotz
- Institute for Diagnostic and Interventional Radiology, University Medical Center Göttingen, Göttingen, Germany
| | - Onnen Moerer
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| | - Leif Saager
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| | - Konrad Meissner
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| | - Michael Quintel
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany.,Department of Anesthesiology, DONAUISAR Klinikum Deggendorf, Deggendorf, Germany
| | - Luciano Gattinoni
- Department of Anesthesiology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|