1
|
A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01083-0. [PMID: 38558368 DOI: 10.1007/s10278-024-01083-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/29/2024] [Accepted: 03/08/2024] [Indexed: 04/04/2024]
Abstract
In recent years, the role of Artificial Intelligence (AI) in medical imaging has become increasingly prominent, with the majority of AI applications approved by the FDA being in imaging and radiology in 2023. The surge in AI model development to tackle clinical challenges underscores the necessity for preparing high-quality medical imaging data. Proper data preparation is crucial as it fosters the creation of standardized and reproducible AI models while minimizing biases. Data curation transforms raw data into a valuable, organized, and dependable resource and is a fundamental process to the success of machine learning and analytical projects. Considering the plethora of available tools for data curation in different stages, it is crucial to stay informed about the most relevant tools within specific research areas. In the current work, we propose a descriptive outline for different steps of data curation while we furnish compilations of tools collected from a survey applied among members of the Society of Imaging Informatics (SIIM) for each of these stages. This collection has the potential to enhance the decision-making process for researchers as they select the most appropriate tool for their specific tasks.
Collapse
|
2
|
MIDRC-MetricTree: a decision tree-based tool for recommending performance metrics in artificial intelligence-assisted medical image analysis. J Med Imaging (Bellingham) 2024; 11:024504. [PMID: 38576536 PMCID: PMC10990563 DOI: 10.1117/1.jmi.11.2.024504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 02/16/2024] [Accepted: 03/18/2024] [Indexed: 04/06/2024] Open
Abstract
Purpose The Medical Imaging and Data Resource Center (MIDRC) was created to facilitate medical imaging machine learning (ML) research for tasks including early detection, diagnosis, prognosis, and assessment of treatment response related to the coronavirus disease 2019 pandemic and beyond. The purpose of this work was to create a publicly available metrology resource to assist researchers in evaluating the performance of their medical image analysis ML algorithms. Approach An interactive decision tree, called MIDRC-MetricTree, has been developed, organized by the type of task that the ML algorithm was trained to perform. The criteria for this decision tree were that (1) users can select information such as the type of task, the nature of the reference standard, and the type of the algorithm output and (2) based on the user input, recommendations are provided regarding appropriate performance evaluation approaches and metrics, including literature references and, when possible, links to publicly available software/code as well as short tutorial videos. Results Five types of tasks were identified for the decision tree: (a) classification, (b) detection/localization, (c) segmentation, (d) time-to-event (TTE) analysis, and (e) estimation. As an example, the classification branch of the decision tree includes two-class (binary) and multiclass classification tasks and provides suggestions for methods, metrics, software/code recommendations, and literature references for situations where the algorithm produces either binary or non-binary (e.g., continuous) output and for reference standards with negligible or non-negligible variability and unreliability. Conclusions The publicly available decision tree is a resource to assist researchers in conducting task-specific performance evaluations, including classification, detection/localization, segmentation, TTE, and estimation tasks.
Collapse
|
3
|
Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI. Med Phys 2024; 51:1812-1821. [PMID: 37602841 PMCID: PMC10879454 DOI: 10.1002/mp.16673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/22/2023] Open
Abstract
BACKGROUND Artificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier. PURPOSE The purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features. METHODS Images of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations. RESULTS In classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance. CONCLUSIONS When there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.
Collapse
|
4
|
Pseudo-spectral angle mapping for automated pixel-level analysis of highly multiplexed tissue image data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574920. [PMID: 38260318 PMCID: PMC10802447 DOI: 10.1101/2024.01.09.574920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The rapid development of highly multiplexed microscopy systems has enabled the study of cells embedded within their native tissue, which is providing exciting insights into the spatial features of human disease [1]. However, computational methods for analyzing these high-content images are still emerging, and there is a need for more robust and generalizable tools for evaluating the cellular constituents and underlying stroma captured by high-plex imaging [2]. To address this need, we have adapted spectral angle mapping - an algorithm used widely in hyperspectral image analysis - to compress the channel dimension of high-plex immunofluorescence images. As many high-plex immunofluorescence imaging experiments probe unique sets of protein markers, existing cell and pixel classification models do not typically generalize well. Pseudospectral angle mapping (pSAM) uses reference pseudospectra - or pixel vectors - to assign each pixel in an image a similarity score to several cell class reference vectors, which are defined by each unique staining panel. Here, we demonstrate that the class maps provided by pSAM can directly provide insight into the prevalence of each class defined by reference pseudospectra. In a dataset of high-plex images of colon biopsies from patients with gut autoimmune conditions, sixteen pSAM class representation maps were combined with instance segmentation of cells to provide cell class predictions. Finally, pSAM detected a diverse set of structure and immune cells when applied to a novel dataset of kidney biopsies imaged with a 43-marker panel. In summary, pSAM provides a powerful and readily generalizable method for evaluating high-plex immunofluorescence image data.
Collapse
|
5
|
Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons. J Med Imaging (Bellingham) 2023; 10:61105. [PMID: 37469387 PMCID: PMC10353566 DOI: 10.1117/1.jmi.10.6.061105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
Purpose The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity. Results Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time. Conclusion The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.
Collapse
|
6
|
Pilot study of machine learning in the task of distinguishing high and low-grade pediatric hydronephrosis on ultrasound. Investig Clin Urol 2023; 64:588-596. [PMID: 37932570 PMCID: PMC10630684 DOI: 10.4111/icu.20230170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/22/2023] [Accepted: 09/07/2023] [Indexed: 11/08/2023] Open
Abstract
PURPOSE Hydronephrosis is a common pediatric urological condition, characterized by dilation of the renal collecting system. Accurate identification of the severity of hydronephrosis is crucial in clinical management, as high-grade hydronephrosis can cause significant damage to the kidney. In this pilot study, we demonstrate the feasibility of machine learning in differentiating between high and low-grade hydronephrosis in pediatric patients. MATERIALS AND METHODS We retrospectively reviewed 592 images from 90 unique patients ages 0-8 years diagnosed with hydronephrosis at the University of Chicago's Pediatric Urology Clinic. The study included 74 high-grade hydronephrosis (145 images) and 227 low-grade hydronephrosis (447 images). Patients were excluded if they had less than 2 studies prior to surgical intervention or had structural abnormalities. We developed a radiomic-based artificial intelligence algorithm incorporating computerized texture analysis and machine learning (support-vector machine) to yield a predictor of hydronephrosis grade. RESULTS Receiver operating characteristic analysis of the classifier output yielded an area under the curve value of 0.86 (95% CI 0.81-0.92) in the task of distinguishing between low and high-grade hydronephrosis using a five-fold cross-validation by kidney. In addition, a Mann-Kendall trend test between computer output and clinical hydronephrosis grade yielded a statistically significant upward trend (p<0.001). CONCLUSIONS Our findings demonstrate the potential of machine learning in the differentiation between low and high-grade hydronephrosis. Further studies are warranted to validate our findings and their generalizability for use in clinical practice as a means to predict clinical outcomes and the resolution of hydronephrosis.
Collapse
|
7
|
Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors. J Med Imaging (Bellingham) 2023; 10:064504. [PMID: 38162317 PMCID: PMC10753846 DOI: 10.1117/1.jmi.10.6.064504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/30/2023] [Accepted: 12/06/2023] [Indexed: 01/03/2024] Open
Abstract
Purpose The purpose is to assess the performance of a pre-trained deep learning model in the task of classifying between coronavirus disease (COVID)-positive and COVID-negative patients from chest radiographs (CXRs) while considering various image acquisition parameters, clinical factors, and patient demographics. Methods Standard and soft-tissue CXRs of 9860 patients comprised the "original dataset," consisting of training and test sets and were used to train a DenseNet-121 architecture model to classify COVID-19 using three classification algorithms: standard, soft tissue, and a combination of both types of images via feature fusion. A larger more-current test set of 5893 patients (the "current test set") was used to assess the performance of the pretrained model. The current test set contained a larger span of dates, incorporated different variants of the virus and included different immunization statuses. Model performance between the original and current test sets was evaluated using area under the receiver operating characteristic curve (ROC AUC) [95% CI]. Results The model achieved AUC values of 0.67 [0.65, 0.70] for cropped standard images, 0.65 [0.63, 0.67] for cropped soft-tissue images, and 0.67 [0.65, 0.69] for both types of cropped images. These were all significantly lower than the performance of the model on the original test set. Investigations regarding matching the acquisition dates between the test sets (i.e., controlling for virus variants), immunization status, disease severity, and age and sex distributions did not fully explain the discrepancy in performance. Conclusions Several relevant factors were considered to determine whether differences existed in the test sets, including time period of image acquisition, vaccination status, and disease severity. The lower performance on the current test set may have occurred due to model overfitting and a lack of generalizability.
Collapse
|
8
|
Sequestration of imaging studies in MIDRC: stratified sampling to balance demographic characteristics of patients in a multi-institutional data commons. J Med Imaging (Bellingham) 2023; 10:064501. [PMID: 38074627 PMCID: PMC10704184 DOI: 10.1117/1.jmi.10.6.064501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 10/23/2023] [Accepted: 10/25/2023] [Indexed: 02/12/2024] Open
Abstract
Purpose The Medical Imaging and Data Resource Center (MIDRC) is a multi-institutional effort to accelerate medical imaging machine intelligence research and create a publicly available image repository/commons as well as a sequestered commons for performance evaluation and benchmarking of algorithms. After de-identification, approximately 80% of the medical images and associated metadata become part of the open commons and 20% are sequestered from the open commons. To ensure that both commons are representative of the population available, we introduced a stratified sampling method to balance the demographic characteristics across the two datasets. Approach Our method uses multi-dimensional stratified sampling where several demographic variables of interest are sequentially used to separate the data into individual strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to the open or sequestered commons. This algorithm was used on an example dataset containing 5000 patients using the variables of race, age, sex at birth, ethnicity, COVID-19 status, and image modality and compared resulting demographic distributions to naïve random sampling of the dataset over 2000 independent trials. Results Resulting prevalence of each demographic variable matched the prevalence from the input dataset within one standard deviation. Mann-Whitney U test results supported the hypothesis that sequestration by stratified sampling provided more balanced subsets than naïve randomization, except for demographic subcategories with very low prevalence. Conclusions The developed multi-dimensional stratified sampling algorithm can partition a large dataset while maintaining balance across several variables, superior to the balance achieved from naïve randomization.
Collapse
|
9
|
Radiomic and deep learning characterization of breast parenchyma on full field digital mammograms and specimen radiographs: a pilot study of a potential cancer field effect. J Med Imaging (Bellingham) 2023; 10:044501. [PMID: 37426053 PMCID: PMC10329416 DOI: 10.1117/1.jmi.10.4.044501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 06/11/2023] [Accepted: 06/20/2023] [Indexed: 07/11/2023] Open
Abstract
Purpose In women with biopsy-proven breast cancer, histologically normal areas of the parenchyma have shown molecular similarity to the tumor, supporting a potential cancer field effect. The purpose of this work was to investigate relationships of human-engineered radiomic and deep learning features between regions across the breast in mammographic parenchymal patterns and specimen radiographs. Approach This study included mammograms from 74 patients with at least 1 identified malignant tumor, of whom 32 also possessed intraoperative radiographs of mastectomy specimens. Mammograms were acquired with a Hologic system and specimen radiographs were acquired with a Fujifilm imaging system. All images were retrospectively collected under an Institutional Review Board-approved protocol. Regions of interest (ROI) of 128×128 pixels were selected from three regions: within the identified tumor, near to the tumor, and far from the tumor. Radiographic texture analysis was used to extract 45 radiomic features and transfer learning was used to extract 20 deep learning features in each region. Kendall's Tau-b and Pearson correlation tests were performed to assess relationships between features in each region. Results Statistically significant correlations in select subgroups of features with tumor, near to the tumor, and far from the tumor ROI regions were identified in both mammograms and specimen radiographs. Intensity-based features were found to show significant correlations with ROI regions across both modalities. Conclusions Results support our hypothesis of a potential cancer field effect, accessible radiographically, across tumor and non-tumor regions, thus indicating the potential for computerized analysis of mammographic parenchymal patterns to predict breast cancer risk.
Collapse
|
10
|
Abstract
In response to the unprecedented global healthcare crisis of the COVID-19 pandemic, the scientific community has joined forces to tackle the challenges and prepare for future pandemics. Multiple modalities of data have been investigated to understand the nature of COVID-19. In this paper, MIDRC investigators present an overview of the state-of-the-art development of multimodal machine learning for COVID-19 and model assessment considerations for future studies. We begin with a discussion of the lessons learned from radiogenomic studies for cancer diagnosis. We then summarize the multi-modality COVID-19 data investigated in the literature including symptoms and other clinical data, laboratory tests, imaging, pathology, physiology, and other omics data. Publicly available multimodal COVID-19 data provided by MIDRC and other sources are summarized. After an overview of machine learning developments using multimodal data for COVID-19, we present our perspectives on the future development of multimodal machine learning models for COVID-19.
Collapse
|
11
|
Predicting intensive care need for COVID-19 patients using deep learning on chest radiography. J Med Imaging (Bellingham) 2023; 10:044504. [PMID: 37608852 PMCID: PMC10440543 DOI: 10.1117/1.jmi.10.4.044504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 07/12/2023] [Accepted: 08/01/2023] [Indexed: 08/24/2023] Open
Abstract
Purpose Image-based prediction of coronavirus disease 2019 (COVID-19) severity and resource needs can be an important means to address the COVID-19 pandemic. In this study, we propose an artificial intelligence/machine learning (AI/ML) COVID-19 prognosis method to predict patients' needs for intensive care by analyzing chest X-ray radiography (CXR) images using deep learning. Approach The dataset consisted of 8357 CXR exams from 5046 COVID-19-positive patients as confirmed by reverse transcription polymerase chain reaction (RT-PCR) tests for the SARS-CoV-2 virus with a training/validation/test split of 64%/16%/20% on a by patient level. Our model involved a DenseNet121 network with a sequential transfer learning technique employed to train on a sequence of gradually more specific and complex tasks: (1) fine-tuning a model pretrained on ImageNet using a previously established CXR dataset with a broad spectrum of pathologies; (2) refining on another established dataset to detect pneumonia; and (3) fine-tuning using our in-house training/validation datasets to predict patients' needs for intensive care within 24, 48, 72, and 96 h following the CXR exams. The classification performances were evaluated on our independent test set (CXR exams of 1048 patients) using the area under the receiver operating characteristic curve (AUC) as the figure of merit in the task of distinguishing between those COVID-19-positive patients who required intensive care following the imaging exam and those who did not. Results Our proposed AI/ML model achieved an AUC (95% confidence interval) of 0.78 (0.74, 0.81) when predicting the need for intensive care 24 h in advance, and at least 0.76 (0.73, 0.80) for 48 h or more in advance using predictions based on the AI prognostic marker derived from CXR images. Conclusions This AI/ML prediction model for patients' needs for intensive care has the potential to support both clinical decision-making and resource management.
Collapse
|
12
|
Temporal Machine Learning Analysis of Prior Mammograms for Breast Cancer Risk Prediction. Cancers (Basel) 2023; 15:2141. [PMID: 37046802 PMCID: PMC10093086 DOI: 10.3390/cancers15072141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/24/2023] [Accepted: 03/29/2023] [Indexed: 04/09/2023] Open
Abstract
The identification of women at risk for sporadic breast cancer remains a clinical challenge. We hypothesize that the temporal analysis of annual screening mammograms, using a long short-term memory (LSTM) network, could accurately identify women at risk of future breast cancer. Women with an imaging abnormality, which had been biopsy-confirmed to be cancer or benign, who also had antecedent imaging available were included in this case-control study. Sequences of antecedent mammograms were retrospectively collected under HIPAA-approved guidelines. Radiomic and deep-learning-based features were extracted on regions of interest placed posterior to the nipple in antecedent images. These features were input to LSTM recurrent networks to classify whether the future lesion would be malignant or benign. Classification performance was assessed using all available antecedent time-points and using a single antecedent time-point in the task of lesion classification. Classifiers incorporating multiple time-points with LSTM, based either on deep-learning-extracted features or on radiomic features, tended to perform statistically better than chance, whereas those using only a single time-point failed to show improved performance compared to chance, as judged by area under the receiver operating characteristic curves (AUC: 0.63 ± 0.05, 0.65 ± 0.05, 0.52 ± 0.06 and 0.54 ± 0.06, respectively). Lastly, similar classification performance was observed when using features extracted from the affected versus the contralateral breast in predicting future unilateral malignancy (AUC: 0.63 ± 0.05 vs. 0.59 ± 0.06 for deep-learning-extracted features; 0.65 ± 0.05 vs. 0.62 ± 0.06 for radiomic features). The results of this study suggest that the incorporation of temporal information into radiomic analyses may improve the overall classification performance through LSTM, as demonstrated by the improved discrimination of future lesions as malignant or benign. Further, our data suggest that a potential field effect, changes in the breast extending beyond the lesion itself, is present in both the affected and contralateral breasts in antecedent imaging, and, thus, the evaluation of either breast might inform on the future risk of breast cancer.
Collapse
|
13
|
Differences in Molecular Subtype Reference Standards Impact AI-based Breast Cancer Classification with Dynamic Contrast-enhanced MRI. Radiology 2023; 307:e220984. [PMID: 36594836 PMCID: PMC10068887 DOI: 10.1148/radiol.220984] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 10/20/2022] [Accepted: 11/01/2022] [Indexed: 01/04/2023]
Abstract
Background Breast cancer tumors can be identified as different luminal molecular subtypes depending on either immunohistochemical (IHC) staining or St Gallen criteria that includes Ki-67. Purpose To characterize molecular subtypes and understand the impact of disagreement among IHC and St Gallen molecular subtype reference standards on artificial intelligence classification of luminal A and luminal B tumors with use of radiomic features extracted from dynamic contrast-enhanced (DCE) MRI scans. Materials and Methods In this retrospective study, 28 radiomic features previously extracted from DCE-MRI scans of breast tumors imaged between February 2015 and October 2017 were examined in the following groups: (a) tumors classified as luminal A by both reference standards ("agreement"), (b) tumors classified as luminal A by IHC and luminal B by St Gallen ("disagreement"), and (c) tumors classified as luminal B by both ("agreement"). Luminal A or luminal B tumor classification with use of radiomic features was conducted with use of three sets: (a) IHC molecular subtyping, (b) St Gallen molecular subtyping, and (c) agreement tumors. The Kruskal-Wallis test was followed by the Mann-Whitney U test to determine pair-wise differences of radiomic features among agreement and disagreement tumors. Fivefold cross-validation with use of stepwise feature selection and linear discriminant analysis classified tumors in each set, with performance measured with use of area under the receiver operating characteristic curve (AUC). Results A total of 877 breast cancer tumors from 872 women (mean age, 48 years [range, 19-75 years]) were analyzed. Six features (sphericity, irregularity, surface area to volume ratio, variance of radial gradient histogram, sum average, volume of most enhancing voxels) were different (P ≤ .001) among agreement and disagreement tumors. AUC (median, 0.74 [95% CI: 0.68, 0.80]) was higher than when using tumors subtyped by either reference standard (IHC, 0.66 [0.60, 0.71], P = .003; St Gallen, 0.62 [0.58, 0.67], P = .001). Conclusion Differences in reference standards can hinder artificial intelligence classification performance of luminal molecular subtypes with dynamic contrast-enhanced MRI. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Bae in this issue.
Collapse
|
14
|
Patient-specific fetal radiation dosimetry for pregnant patients undergoing abdominal and pelvic CT imaging. Med Phys 2023. [PMID: 36799714 DOI: 10.1002/mp.16304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 01/18/2023] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
BACKGROUND Accurate estimation of fetal radiation dose is crucial for risk-benefit analysis of radiological imaging, while the radiation dosimetry studies based on individual pregnant patient are highly desired. PURPOSE To use Monte Carlo calculations for estimation of fetal radiation dose from abdominal and pelvic computed tomography (CT) examinations for a population of patients with a range of variations in patients' anatomy, abdominal circumference, gestational age (GA), fetal depth (FD), and fetal development. METHODS Forty-four patient-specific pregnant female models were constructed based on CT imaging data of pregnant patients, with gestational ages ranging from 8 to 35 weeks. The simulation of abdominal and pelvic helical CT examinations was performed on three validated commercial scanner systems to calculate organ-level fetal radiation dose. RESULTS The absorbed radiation dose to the fetus ranged between 0.97 and 2.24 mGy, with an average of 1.63 ± 0.33 mGy. The CTDIvol -normalized fetal dose ranged between 0.56 and 1.30, with an average of 0.94 ± 0.25. The normalized fetal organ dose showed significant correlations with gestational age, maternal abdominal circumference (MAC), and fetal depth. The use of ATCM technique increased the fetal radiation dose in some patients. CONCLUSION A technique enabling the calculation of organ-level radiation dose to the fetus was developed from models of actual anatomy representing a range of gestational age, maternal size, and fetal position. The developed maternal and fetal models provide a basis for reliable and accurate radiation dose estimation to fetal organs.
Collapse
|
15
|
Evaluation of emphysema on thoracic low-dose CTs through attention-based multiple instance deep learning. Sci Rep 2023; 13:1187. [PMID: 36681685 PMCID: PMC9867724 DOI: 10.1038/s41598-023-27549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 01/04/2023] [Indexed: 01/22/2023] Open
Abstract
In addition to lung cancer, other thoracic abnormalities, such as emphysema, can be visualized within low-dose CT scans that were initially obtained in cancer screening programs, and thus, opportunistic evaluation of these diseases may be highly valuable. However, manual assessment for each scan is tedious and often subjective, thus we have developed an automatic, rapid computer-aided diagnosis system for emphysema using attention-based multiple instance deep learning and 865 LDCTs. In the task of determining if a CT scan presented with emphysema or not, our novel Transfer AMIL approach yielded an area under the ROC curve of 0.94 ± 0.04, which was a statistically significant improvement compared to other methods evaluated in our study following the Delong Test with correction for multiple comparisons. Further, from our novel attention weight curves, we found that the upper lung demonstrated a stronger influence in all scan classes, indicating that the model prioritized upper lobe information. Overall, our novel Transfer AMIL method yielded high performance and provided interpretable information by identifying slices that were most influential to the classification decision, thus demonstrating strong potential for clinical implementation.
Collapse
|
16
|
Special Issue Editorial: The SPIE Medical Imaging Symposium Celebrates 50 Years. J Med Imaging (Bellingham) 2022; 9:S12200. [PMID: 36247334 PMCID: PMC9555234 DOI: 10.1117/1.jmi.9.s1.s12200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The article introduces the JMI Special Issue Celebrating 50 Years of SPIE Medical Imaging.
Collapse
|
17
|
Past, Present, and Future of Machine Learning and Artificial Intelligence for Breast Cancer Screening. JOURNAL OF BREAST IMAGING 2022; 4:451-459. [PMID: 38416954 DOI: 10.1093/jbi/wbac052] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Indexed: 03/01/2024]
Abstract
Breast cancer screening has evolved substantially over the past few decades because of advancements in new image acquisition systems and novel artificial intelligence (AI) algorithms. This review provides a brief overview of the history, current state, and future of AI in breast cancer screening and diagnosis along with challenges involved in the development of AI systems. Although AI has been developing for interpretation tasks associated with breast cancer screening for decades, its potential to combat the subjective nature and improve the efficiency of human image interpretation is always expanding. The rapid advancement of computational power and deep learning has increased greatly in AI research, with promising performance in detection and classification tasks across imaging modalities. Most AI systems, based on human-engineered or deep learning methods, serve as concurrent or secondary readers, that is, as aids to radiologists for a specific, well-defined task. In the future, AI may be able to perform multiple integrated tasks, making decisions at the level of or surpassing the ability of humans. Artificial intelligence may also serve as a partial primary reader to streamline ancillary tasks, triaging cases or ruling out obvious normal cases. However, before AI is used as an independent, autonomous reader, various challenges need to be addressed, including explainability and interpretability, in addition to repeatability and generalizability, to ensure that AI will provide a significant clinical benefit to breast cancer screening across all populations.
Collapse
|
18
|
AAPM Task Group 298: Recommendations on certificate program/alternative pathway candidate education and training. J Appl Clin Med Phys 2022; 23:e13777. [PMID: 36125203 PMCID: PMC9797172 DOI: 10.1002/acm2.13777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 08/05/2022] [Accepted: 08/17/2022] [Indexed: 01/01/2023] Open
Abstract
Entry into the field of clinical medical physics is most commonly accomplished through the completion of a Commission on Accreditation of Medical Physics Educational Programs (CAMPEP)-accredited graduate and residency program. To allow a mechanism to bring valuable expertise from other disciplines into clinical practice in medical physics, an "alternative pathway" approach was also established. To ensure those trainees who have completed a doctoral degree in physics or a related discipline have the appropriate background and didactic training in medical physics, certificate programs and a CAMPEP-accreditation process for these programs were initiated. However, medical physics-specific didactic, research, and clinical exposure of those entering medical physics residencies from these certificate programs is often comparatively modest when evaluated against individuals holding Master's and/or Doctoral degrees in CAMPEP-accredited graduate programs. In 2016, the AAPM approved the formation of Task Group (TG) 298, "Alternative Pathway Candidate Education and Training." The TG was charged with reviewing previous published recommendations for alternative pathway candidates and developing recommendations on the appropriate education and training of these candidates. This manuscript is a summary of the AAPM TG 298 report.
Collapse
|
19
|
Construction of A Digital Fetus Library for Radiation Dosimetry. Med Phys 2022; 50:2577-2589. [PMID: 35962972 DOI: 10.1002/mp.15905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/12/2022] [Accepted: 07/08/2022] [Indexed: 11/06/2022] Open
Abstract
PURPOSE Accurate estimation of fetal absorbed dose and radiation risks are crucial for radiation protection and important for radiological imaging research owing to the high radio-sensitivity of the fetus. Computational anthropomorphic models have been widely used in patient-specific radiation dosimetry calculations. In this work, we aim to build the first digital fetal library for more reliable and accurate radiation dosimetry studies. ACQUISITION AND VALIDATION METHODS Computed tomography (CT) images of abdominal and pelvic regions of 46 pregnant females were segmented by experienced medical physicists. The segmented tissues/organs include the body contour, skeleton, uterus, liver, kidney, intestine, stomach, lung, bladder, gall bladder, spleen and pancreas for maternal body, and placenta, amniotic fluid, fetal body, fetal brain and fetal skeleton. Non-Uniform Rational B-Spline (NURBS) surfaces of each identified region was constructed manually using 3D modeling software. The Hounsfield unit (HU) values of each identified organs were gathered from CT images of pregnant patients and converted to tissue density. Organ volumes were further adjusted according to reference measurements for the developing fetus recommended by the World Health Organization (WHO) and International Commission on Radiological Protection (ICRP). A series of anatomical parameters, including femur length (FL), humerus length (HL), biparietal diameter (BPD), abdominal circumference (FAC) and head circumference (HC) were measured and compared with WHO recommendations. DATA FORMAT AND USAGE NOTES The first fetal patient-specific model library was developed with the anatomical characteristics of each model derived from the corresponding patient whose gestational age varies between 8-weeks and 35-weeks. Voxelized models are represented in the form of MCNP matrix input files representing the three-dimensional model of the fetus. The size distributions of each model are also provided in text files. All data are stored on Zenodo and are publicly accessible on the following link: https://zenodo.org/record/6471884. POTENTIAL APPLICATIONS The constructed fetal models and maternal anatomical characteristics are consistent with the corresponding patients. The resulting computational fetus could be used in radiation dosimetry studies to improve the reliability of fetal dosimetry and radiation risks assessment. The advantages of NURBS surfaces in terms of adapting fetal postures and positions enable us to adequately assess their impact on radiation dosimetry calculations. This article is protected by copyright. All rights reserved.
Collapse
|
20
|
Impact of continuous learning on diagnostic breast MRI AI: evaluation on an independent clinical dataset. J Med Imaging (Bellingham) 2022; 9:034502. [DOI: 10.1117/1.jmi.9.3.034502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 05/12/2022] [Indexed: 11/14/2022] Open
|
21
|
Comment on "Machine Learning for Early Detection of Hypoxic-Ischemic Brain Injury After Cardiac Arrest" Submitted by Noah Salomon Molinski et al. Neurocrit Care 2022; 37:365-366. [PMID: 35612784 PMCID: PMC9131979 DOI: 10.1007/s12028-022-01527-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 04/22/2022] [Indexed: 11/29/2022]
|
22
|
Specific in situ inflammatory states associate with progression to renal failure in lupus nephritis. J Clin Invest 2022; 132:155350. [PMID: 35608910 PMCID: PMC9246394 DOI: 10.1172/jci155350] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 05/19/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND In human lupus nephritis (LN), tubulointerstitial inflammation (TII) on biopsy predicts progression to end-stage renal disease (ESRD). However, only about half of patients with moderate-to-severe TII develop ESRD. We hypothesized that this heterogeneity in outcome reflects different underlying inflammatory states. Therefore, we interrogated renal biopsies from LN longitudinal and cross-sectional cohorts. METHODS Data were acquired using conventional and highly multiplexed confocal microscopy. To accurately segment cells across whole biopsies, and to understand their spatial relationships, we developed computational pipelines by training and implementing several deep-learning models and other computer vision techniques. RESULTS High B cell densities were associated with protection from ESRD. In contrast, high densities of CD8+, γδ, and other CD4–CD8– T cells were associated with both acute renal failure and progression to ESRD. B cells were often organized into large periglomerular neighborhoods with Tfh cells, while CD4– T cells formed small neighborhoods in the tubulointerstitium, with frequency that predicted progression to ESRD. CONCLUSION These data reveal that specific in situ inflammatory states are associated with refractory and progressive renal disease. FUNDING This study was funded by the NIH Autoimmunity Centers of Excellence (AI082724), Department of Defense (LRI180083), Alliance for Lupus Research, and NIH awards (S10-OD025081, S10-RR021039, and P30-CA14599).
Collapse
|
23
|
SPIE Computer-Aided Diagnosis conference anniversary review. J Med Imaging (Bellingham) 2022; 9:012208. [DOI: 10.1117/1.jmi.9.s1.012208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 04/13/2022] [Indexed: 11/14/2022] Open
|
24
|
Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification. J Med Imaging (Bellingham) 2022; 9:035502. [PMID: 35656541 PMCID: PMC9152992 DOI: 10.1117/1.jmi.9.3.035502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/11/2022] [Indexed: 08/23/2023] Open
Abstract
Purpose: The aim of this study is to (1) demonstrate a graphical method and interpretation framework to extend performance evaluation beyond receiver operating characteristic curve analysis and (2) assess the impact of disease prevalence and variability in training and testing sets, particularly when a specific operating point is used. Approach: The proposed performance metric curves (PMCs) simultaneously assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), and the 95% confidence intervals thereof, as a function of the threshold for the decision variable. We investigated the utility of PMCs using six example operating points associated with commonly used methods to select operating points (including the Youden index and maximum mutual information). As an example, we applied PMCs to the task of distinguishing between malignant and benign breast lesions using human-engineered radiomic features extracted from dynamic contrast-enhanced magnetic resonance images. The dataset had 1885 lesions, with the images acquired in 2015 and 2016 serving as the training set (1450 lesions) and those acquired in 2017 as the test set (435 lesions). Our study used this dataset in two ways: (1) the clinical dataset itself and (2) simulated datasets with features based on the clinical set but with five different disease prevalences. The median and 95% CI of the number of type I (false positive) and type II (false negative) errors were determined for each operating point of interest. Results: PMCs from both the clinical and simulated datasets demonstrated that PMCs could support interpretation of the impact of decision threshold choice on type I and type II errors of classification, particularly relevant to prevalence. Conclusion: PMCs allow simultaneous evaluation of the four performance metrics of sensitivity, specificity, PPV, and NPV as a function of the decision threshold. This may create a better understanding of two-class classifier performance in machine learning.
Collapse
|
25
|
A machine-learning algorithm for distinguishing malignant from benign indeterminate thyroid nodules using ultrasound radiomic features. J Med Imaging (Bellingham) 2022; 9:034501. [PMID: 35692282 PMCID: PMC9133922 DOI: 10.1117/1.jmi.9.3.034501] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 05/11/2022] [Indexed: 11/02/2023] Open
Abstract
Background: Ultrasound (US)-guided fine needle aspiration (FNA) cytology is the gold standard for the evaluation of thyroid nodules. However, up to 30% of FNA results are indeterminate, requiring further testing. In this study, we present a machine-learning analysis of indeterminate thyroid nodules on ultrasound with the aim to improve cancer diagnosis. Methods: Ultrasound images were collected from two institutions and labeled according to their FNA (F) and surgical pathology (S) diagnoses [malignant (M), benign (B), and indeterminate (I)]. Subgroup breakdown (FS) included: 90 BB, 83 IB, 70 MM, and 59 IM thyroid nodules. Margins of thyroid nodules were manually annotated, and computerized radiomic texture analysis was conducted within tumor contours. Initial investigation was conducted using five-fold cross-validation paradigm with a two-class Bayesian artificial neural networks classifier, including stepwise feature selection. Testing was conducted on an independent set and compared with a commercial molecular testing platform. Performance was evaluated using receiver operating characteristic analysis in the task of distinguishing between malignant and benign nodules. Results: About 1052 ultrasound images from 302 thyroid nodules were used for radiomic feature extraction and analysis. On the training/validation set comprising 263 nodules, five-fold cross-validation yielded area under curves (AUCs) of 0.75 [Standard Error (SE) = 0.04; P < 0.001 ] and 0.67 (SE = 0.05; P = 0.0012 ) for the classification tasks of MM versus BB, and IM versus IB, respectively. On an independent test set of 19 IM/IB cases, the algorithm for distinguishing indeterminate nodules yielded an AUC value of 0.88 (SE = 0.09; P < 0.001 ), which was higher than the AUC of a commercially available molecular testing platform (AUC = 0.81, SE = 0.11; P < 0.005 ). Conclusion: Machine learning of computer-extracted texture features on gray-scale ultrasound images showed promising results classifying indeterminate thyroid nodules according to their surgical pathology.
Collapse
|
26
|
Machine Learning for Early Detection of Hypoxic-Ischemic Brain Injury After Cardiac Arrest. Neurocrit Care 2021; 36:974-982. [PMID: 34873672 PMCID: PMC8647961 DOI: 10.1007/s12028-021-01405-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 11/16/2021] [Indexed: 11/25/2022]
Abstract
Background Establishing whether a patient who survived a cardiac arrest has suffered hypoxic-ischemic brain injury (HIBI) shortly after return of spontaneous circulation (ROSC) can be of paramount importance for informing families and identifying patients who may benefit the most from neuroprotective therapies. We hypothesize that using deep transfer learning on normal-appearing findings on head computed tomography (HCT) scans performed after ROSC would allow us to identify early evidence of HIBI. Methods We analyzed 54 adult comatose survivors of cardiac arrest for whom both an initial HCT scan, done early after ROSC, and a follow-up HCT scan were available. The initial HCT scan of each included patient was read as normal by a board-certified neuroradiologist. Deep transfer learning was used to evaluate the initial HCT scan and predict progression of HIBI on the follow-up HCT scan. A naive set of 16 additional patients were used for external validation of the model. Results The median age (interquartile range) of our cohort was 61 (16) years, and 25 (46%) patients were female. Although findings of all initial HCT scans appeared normal, follow-up HCT scans showed signs of HIBI in 29 (54%) patients (computed tomography progression). Evaluating the first HCT scan with deep transfer learning accurately predicted progression to HIBI. The deep learning score was the most significant predictor of progression (area under the receiver operating characteristic curve = 0.96 [95% confidence interval 0.91–1.00]), with a deep learning score of 0.494 having a sensitivity of 1.00, specificity of 0.88, accuracy of 0.94, and positive predictive value of 0.91. An additional assessment of an independent test set confirmed high performance (area under the receiver operating characteristic curve = 0.90 [95% confidence interval 0.74–1.00]). Conclusions Deep transfer learning used to evaluate normal-appearing findings on HCT scans obtained early after ROSC in comatose survivors of cardiac arrest accurately identifies patients who progress to show radiographic evidence of HIBI on follow-up HCT scans.
Collapse
|
27
|
A review of explainable and interpretable AI with applications in COVID-19 imaging. Med Phys 2021; 49:1-14. [PMID: 34796530 PMCID: PMC8646613 DOI: 10.1002/mp.15359] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 10/14/2021] [Accepted: 10/25/2021] [Indexed: 12/24/2022] Open
Abstract
The development of medical imaging artificial intelligence (AI) systems for evaluating COVID‐19 patients has demonstrated potential for improving clinical decision making and assessing patient outcomes during the recent COVID‐19 pandemic. These have been applied to many medical imaging tasks, including disease diagnosis and patient prognosis, as well as augmented other clinical measurements to better inform treatment decisions. Because these systems are used in life‐or‐death decisions, clinical implementation relies on user trust in the AI output. This has caused many developers to utilize explainability techniques in an attempt to help a user understand when an AI algorithm is likely to succeed as well as which cases may be problematic for automatic assessment, thus increasing the potential for rapid clinical translation. AI application to COVID‐19 has been marred with controversy recently. This review discusses several aspects of explainable and interpretable AI as it pertains to the evaluation of COVID‐19 disease and it can restore trust in AI application to this disease. This includes the identification of common tasks that are relevant to explainable medical imaging AI, an overview of several modern approaches for producing explainable output as appropriate for a given imaging scenario, a discussion of how to evaluate explainable AI, and recommendations for best practices in explainable/interpretable AI implementation. This review will allow developers of AI systems for COVID‐19 to quickly understand the basics of several explainable AI techniques and assist in the selection of an approach that is both appropriate and effective for a given scenario.
Collapse
|
28
|
Abstract
This article gives a brief overview of the development of artificial intelligence in clinical breast imaging. For multiple decades, artificial intelligence (AI) methods have been developed and translated for breast imaging tasks such as detection, diagnosis, and assessing response to therapy. As imaging modalities arise to support breast cancer screening programs and diagnostic examinations, including full-field digital mammography, breast tomosynthesis, ultrasound, and MRI, AI techniques parallel the efforts with more complex algorithms, faster computers, and larger data sets. AI methods include human-engineered radiomics algorithms and deep learning methods. Examples of these AI-supported clinical tasks are given along with commentary on the future.
Collapse
|
29
|
Robustness of radiomic features of benign breast lesions and hormone receptor positive/HER2-negative cancers across DCE-MR magnet strengths. Magn Reson Imaging 2021; 82:111-121. [PMID: 34174331 PMCID: PMC8386988 DOI: 10.1016/j.mri.2021.06.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 06/19/2021] [Accepted: 06/21/2021] [Indexed: 02/07/2023]
Abstract
Radiomic features extracted from breast lesion images have shown potential in diagnosis and prognosis of breast cancer. As medical centers transition from 1.5 T to 3.0 T magnetic resonance (MR) imaging, it is beneficial to identify potentially robust radiomic features across field strengths because images acquired at different field strengths could be used in machine learning models. Dynamic contrast-enhanced MR images of benign breast lesions and hormone receptor positive/HER2-negative (HR+/HER2-) breast cancers were acquired retrospectively, yielding 612 unique cases: 150 and 99 benign lesions imaged at 1.5 T and 3.0 T, and 223 and 140 HR+/HER2- cancerous lesions imaged at 1.5 T and 3.0 T, respectively. In addition, an independent set of seven lesions imaged at both field strengths, three benign lesions and four HR+/HER2- cancers, was analyzed separately. Lesions were automatically segmented using a 4D fuzzy c-means method; thirty-eight radiomic features were extracted. Feature value distributions were compared by cancer status and imaging field strength using the Kolmogorov-Smirnov test. Features that did not demonstrate a statistically significant difference were considered to be potentially robust. The area under the receiver operating characteristic curve (AUC), for the task of classifying lesions as benign or HR+/HER2- cancer, was determined for each feature at each field strength. Three features were found to be both potentially robust across field strength and of high classification performance, i.e., AUCs statistically greater than 0.5 in the classification task: one shape feature (irregularity), one texture feature (sum average) and one enhancement variance kinetics features (enhancement variance increasing rate). In the demonstration set of lesions imaged at both field strengths, two of the three potentially robust features showed qualitative agreement across field strength. These findings may contribute to the development of computer-aided diagnosis models that are robust across field strength for this classification task.
Collapse
|
30
|
Artificial Intelligence and Cellular Segmentation in Tissue Microscopy Images. THE AMERICAN JOURNAL OF PATHOLOGY 2021; 191:1693-1701. [PMID: 34129842 PMCID: PMC8485056 DOI: 10.1016/j.ajpath.2021.05.022] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 05/07/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023]
Abstract
With applications in object detection, image feature extraction, image classification, and image segmentation, artificial intelligence is facilitating high-throughput analysis of image data in a variety of biomedical imaging disciplines, ranging from radiology and pathology to cancer biology and immunology. Specifically, a growth in research on deep learning has led to the widespread application of computer-visualization techniques for analyzing and mining data from biomedical images. The availability of open-source software packages and the development of novel, trainable deep neural network architectures has led to increased accuracy in cell detection and segmentation algorithms. By automating cell segmentation, it is now possible to mine quantifiable cellular and spatio-cellular features from microscopy images, providing insight into the organization of cells in various pathologies. This mini-review provides an overview of the current state of the art in deep learning- and artificial intelligence-based methods of segmentation and data mining of cells in microscopy images of tissue.
Collapse
|
31
|
Multi-Stage Harmonization for Robust AI across Breast MR Databases. Cancers (Basel) 2021; 13:cancers13194809. [PMID: 34638294 PMCID: PMC8508003 DOI: 10.3390/cancers13194809] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 09/16/2021] [Accepted: 09/18/2021] [Indexed: 12/22/2022] Open
Abstract
Simple Summary Batch harmonization of radiomic features extracted from magnetic resonance images of breast lesions from two databases was applied to an artificial intelligence/machine learning classification workflow. Training and independent test sets from the two databases, as well as the combination of them, were used in pre-harmonization and post-harmonization forms to investigate the generalizability of performance in the task of distinguishing between malignant and benign lesions. Most training and independent test scenarios were statistically equivalent, demonstrating that batch harmonization with feature selection harmonization can potentially develop generalizable classification models. Abstract Radiomic features extracted from medical images may demonstrate a batch effect when cases come from different sources. We investigated classification performance using training and independent test sets drawn from two sources using both pre-harmonization and post-harmonization features. In this retrospective study, a database of thirty-two radiomic features, extracted from DCE-MR images of breast lesions after fuzzy c-means segmentation, was collected. There were 944 unique lesions in Database A (208 benign lesions, 736 cancers) and 1986 unique lesions in Database B (481 benign lesions, 1505 cancers). The lesions from each database were divided by year of image acquisition into training and independent test sets, separately by database and in combination. ComBat batch harmonization was conducted on the combined training set to minimize the batch effect on eligible features by database. The empirical Bayes estimates from the feature harmonization were applied to the eligible features of the combined independent test set. The training sets (A, B, and combined) were then used in training linear discriminant analysis classifiers after stepwise feature selection. The classifiers were then run on the A, B, and combined independent test sets. Classification performance was compared using pre-harmonization features to post-harmonization features, including their corresponding feature selection, evaluated using the area under the receiver operating characteristic curve (AUC) as the figure of merit. Four out of five training and independent test scenarios demonstrated statistically equivalent classification performance when compared pre- and post-harmonization. These results demonstrate that translation of machine learning techniques with batch data harmonization can potentially yield generalizable models that maintain classification performance.
Collapse
|
32
|
Improved Classification of Benign and Malignant Breast Lesions Using Deep Feature Maximum Intensity Projection MRI in Breast Cancer Diagnosis Using Dynamic Contrast-enhanced MRI. Radiol Artif Intell 2021; 3:e200159. [PMID: 34235439 PMCID: PMC8231792 DOI: 10.1148/ryai.2021200159] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 02/04/2021] [Accepted: 02/09/2021] [Indexed: 04/16/2023]
Abstract
PURPOSE To develop a deep transfer learning method that incorporates four-dimensional (4D) information in dynamic contrast-enhanced (DCE) MRI to classify benign and malignant breast lesions. MATERIALS AND METHODS The retrospective dataset is composed of 1990 distinct lesions (1494 malignant and 496 benign) from 1979 women (mean age, 47 years ± 10). Lesions were split into a training and validation set of 1455 lesions (acquired in 2015-2016) and an independent test set of 535 lesions (acquired in 2017). Features were extracted from a convolutional neural network (CNN), and lesions were classified as benign or malignant using support vector machines. Volumetric information was collapsed into two dimensions by taking the maximum intensity projection (MIP) at the image level or feature level within the CNN architecture. Performances were evaluated using the area under the receiver operating characteristic curve (AUC) as the figure of merit and were compared using the DeLong test. RESULTS The image MIP and feature MIP methods yielded AUCs of 0.91 (95% CI: 0.87, 0.94) and 0.93 (95% CI: 0.91, 0.96), respectively, for the independent test set. The feature MIP method achieved higher performance than the image MIP method (∆AUC 95% CI: 0.003, 0.051; P = .03). CONCLUSION Incorporating 4D information in DCE MRI by MIP of features in deep transfer learning demonstrated superior classification performance compared with using MIP images as input in the task of distinguishing between benign and malignant breast lesions.Keywords: Breast, Computer Aided Diagnosis (CAD), Convolutional Neural Network (CNN), MR-Dynamic Contrast Enhanced, Supervised learning, Support vector machines (SVM), Transfer learning, Volume Analysis © RSNA, 2021.
Collapse
|
33
|
Enhanced detection of oral dysplasia by structured illumination fluorescence lifetime imaging microscopy. Sci Rep 2021; 11:4984. [PMID: 33654229 PMCID: PMC7925521 DOI: 10.1038/s41598-021-84552-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 02/17/2021] [Indexed: 12/15/2022] Open
Abstract
We demonstrate that structured illumination microscopy has the potential to enhance fluorescence lifetime imaging microscopy (FLIM) as an early detection method for oral squamous cell carcinoma. FLIM can be used to monitor or detect changes in the fluorescence lifetime of metabolic cofactors (e.g. NADH and FAD) associated with the onset of carcinogenesis. However, out of focus fluorescence often interferes with this lifetime measurement. Structured illumination fluorescence lifetime imaging (SI-FLIM) addresses this by providing depth-resolved lifetime measurements, and applied to oral mucosa, can localize the collected signal to the epithelium. In this study, the hamster model of oral carcinogenesis was used to evaluate SI-FLIM in premalignant and malignant oral mucosa. Cheek pouches were imaged in vivo and correlated to histopathological diagnoses. The potential of NADH fluorescence signal and lifetime, as measured by widefield FLIM and SI-FLIM, to differentiate dysplasia (pre-malignancy) from normal tissue was evaluated. ROC analysis was carried out with the task of discriminating between normal tissue and mild dysplasia, when changes in fluorescence characteristics are localized to the epithelium only. The results demonstrate that SI-FLIM (AUC = 0.83) is a significantly better (p-value = 0.031) marker for mild dysplasia when compared to widefield FLIM (AUC = 0.63).
Collapse
|
34
|
Lessons learned in transitioning to AI in the medical imaging of COVID-19. J Med Imaging (Bellingham) 2021; 8:010902-10902. [PMID: 34646912 PMCID: PMC8488974 DOI: 10.1117/1.jmi.8.s1.010902] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/20/2021] [Indexed: 12/12/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has wreaked havoc across the world. It also created a need for the urgent development of efficacious predictive diagnostics, specifically, artificial intelligence (AI) methods applied to medical imaging. This has led to the convergence of experts from multiple disciplines to solve this global pandemic including clinicians, medical physicists, imaging scientists, computer scientists, and informatics experts to bring to bear the best of these fields for solving the challenges of the COVID-19 pandemic. However, such a convergence over a very brief period of time has had unintended consequences and created its own challenges. As part of Medical Imaging Data and Resource Center initiative, we discuss the lessons learned from career transitions across the three involved disciplines (radiology, medical imaging physics, and computer science) and draw recommendations based on these experiences by analyzing the challenges associated with each of the three associated transition types: (1) AI of non-imaging data to AI of medical imaging data, (2) medical imaging clinician to AI of medical imaging, and (3) AI of medical imaging to AI of COVID-19 imaging. The lessons learned from these career transitions and the diffusion of knowledge among them could be accomplished more effectively by recognizing their associated intricacies. These lessons learned in the transitioning to AI in the medical imaging of COVID-19 can inform and enhance future AI applications, making the whole of the transitions more than the sum of each discipline, for confronting an emergency like the COVID-19 pandemic or solving emerging problems in biomedicine.
Collapse
|
35
|
Quantifying the effects of biopsy fixation and staining panel design on automatic instance segmentation of immune cells in human lupus nephritis. JOURNAL OF BIOMEDICAL OPTICS 2021; 26:JBO-200195SSR. [PMID: 33420765 PMCID: PMC7791891 DOI: 10.1117/1.jbo.26.2.022910] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 12/11/2020] [Indexed: 06/12/2023]
Abstract
SIGNIFICANCE Lupus nephritis (LuN) is a chronic inflammatory kidney disease. The cellular mechanisms by which LuN progresses to kidney failure are poorly characterized. Automated instance segmentation of immune cells in immunofluorescence images of LuN can probe these cellular interactions. AIM Our specific goal is to quantify how sample fixation and staining panel design impact automated instance segmentation and characterization of immune cells. APPROACH Convolutional neural networks (CNNs) were trained to segment immune cells in fluorescence confocal images of LuN biopsies. Three datasets were used to probe the effects of fixation methods on cell features and the effects of one-marker versus two-marker per cell staining panels on CNN performance. RESULTS Networks trained for multi-class instance segmentation on fresh-frozen and formalin-fixed, paraffin-embedded (FFPE) samples stained with a two-marker panel had sensitivities of 0.87 and 0.91 and specificities of 0.82 and 0.88, respectively. Training on samples with a one-marker panel reduced sensitivity (0.72). Cell size and intercellular distances were significantly smaller in FFPE samples compared to fresh frozen (Kolmogorov-Smirnov, p ≪ 0.0001). CONCLUSIONS Fixation method significantly reduces cell size and intercellular distances in LuN biopsies. The use of two markers to identify cell subsets showed improved CNN sensitivity relative to using a single marker.
Collapse
|
36
|
Contributors. Mol Imaging 2021. [DOI: 10.1016/b978-0-12-816386-3.01004-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
37
|
List of contributors. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00035-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
38
|
Role of standard and soft tissue chest radiography images in deep-learning-based early diagnosis of COVID-19. J Med Imaging (Bellingham) 2021; 8:014503. [PMID: 34595245 PMCID: PMC8478672 DOI: 10.1117/1.jmi.8.s1.014503] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 09/13/2021] [Indexed: 12/24/2022] Open
Abstract
Purpose: We propose a deep learning method for the automatic diagnosis of COVID-19 at patient presentation on chest radiography (CXR) images and investigates the role of standard and soft tissue CXR in this task. Approach: The dataset consisted of the first CXR exams of 9860 patients acquired within 2 days after their initial reverse transcription polymerase chain reaction tests for the SARS-CoV-2 virus, 1523 (15.5%) of whom tested positive and 8337 (84.5%) of whom tested negative for COVID-19. A sequential transfer learning strategy was employed to fine-tune a convolutional neural network in phases on increasingly specific and complex tasks. The COVID-19 positive/negative classification was performed on standard images, soft tissue images, and both combined via feature fusion. A U-Net variant was used to segment and crop the lung region from each image prior to performing classification. Classification performances were evaluated and compared on a held-out test set of 1972 patients using the area under the receiver operating characteristic curve (AUC) and the DeLong test. Results: Using full standard, cropped standard, cropped, soft tissue, and both types of cropped CXR yielded AUC values of 0.74 [0.70, 0.77], 0.76 [0.73, 0.79], 0.73 [0.70, 0.76], and 0.78 [0.74, 0.81], respectively. Using soft tissue images significantly underperformed standard images, and using both types of CXR failed to significantly outperform using standard images alone. Conclusions: The proposed method was able to automatically diagnose COVID-19 at patient presentation with promising performance, and the inclusion of soft tissue images did not result in a significant performance improvement.
Collapse
|
39
|
Cascaded deep transfer learning on thoracic CT in COVID-19 patients treated with steroids. J Med Imaging (Bellingham) 2021; 8:014501. [PMID: 33415179 PMCID: PMC7773028 DOI: 10.1117/1.jmi.8.s1.014501] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
Purpose: Given the recent COVID-19 pandemic and its stress on global medical resources, presented here is the development of a machine intelligent method for thoracic computed tomography (CT) to inform management of patients on steroid treatment. Approach: Transfer learning has demonstrated strong performance when applied to medical imaging, particularly when only limited data are available. A cascaded transfer learning approach extracted quantitative features from thoracic CT sections using a fine-tuned VGG19 network. The extracted slice features were axially pooled to provide a CT-scan-level representation of thoracic characteristics and a support vector machine was trained to distinguish between patients who required steroid administration and those who did not, with performance evaluated through receiver operating characteristic (ROC) curve analysis. Least-squares fitting was used to assess temporal trends using the transfer learning approach, providing a preliminary method for monitoring disease progression. Results: In the task of identifying patients who should receive steroid treatments, this approach yielded an area under the ROC curve of 0.85 ± 0.10 and demonstrated significant separation between patients who received steroids and those who did not. Furthermore, temporal trend analysis of the prediction score matched expected progression during hospitalization for both groups, with separation at early timepoints prior to convergence near the end of the duration of hospitalization. Conclusions: The proposed cascade deep learning method has strong clinical potential for informing clinical decision-making and monitoring patient treatment.
Collapse
|
40
|
Automated mesenchymal stem cell segmentation and machine learning-based phenotype classification using morphometric and textural analysis. J Med Imaging (Bellingham) 2021; 8:014503. [PMID: 33542945 PMCID: PMC7849042 DOI: 10.1117/1.jmi.8.1.014503] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 01/11/2021] [Indexed: 01/22/2023] Open
Abstract
Purpose: Mesenchymal stem cells (MSCs) have demonstrated clinically relevant therapeutic effects for treatment of trauma and chronic diseases. The proliferative potential, immunomodulatory characteristics, and multipotentiality of MSCs in monolayer culture is reflected by their morphological phenotype. Standard techniques to evaluate culture viability are subjective, destructive, or time-consuming. We present an image analysis approach to objectively determine morphological phenotype of MSCs for prediction of culture efficacy. Approach: The algorithm was trained using phase-contrast micrographs acquired during the early and mid-logarithmic stages of MSC expansion. Cell regions are localized using edge detection, thresholding, and morphological operations, followed by cell marker identification using H-minima transform within each region to differentiate individual cells from cell clusters. Clusters are segmented using marker-controlled watershed to obtain single cells. Morphometric and textural features are extracted to classify cells based on phenotype using machine learning. Results: Algorithm performance was validated using an independent test dataset of 186 MSCs in 36 culture images. Results show 88% sensitivity and 86% precision for overall cell detection and a mean Sorensen-Dice coefficient of 0.849 ± 0.106 for segmentation per image. The algorithm exhibited an area under the curve of 0.816 (CI 95 = 0.769 to 0.886) and 0.787 (CI 95 = 0.716 to 0.851) for classifying MSCs according to their phenotype at early and mid-logarithmic expansion, respectively. Conclusions: The proposed method shows potential to segment and classify low and moderately dense MSCs based on phenotype with high accuracy and robustness. It enables quantifiable and consistent morphology-based quality assessment for various culture protocols to facilitate cytotherapy development.
Collapse
|
41
|
AI/Machine Learning in Medical Imaging. Mol Imaging 2021. [DOI: 10.1016/b978-0-12-816386-3.00052-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
42
|
Radiomics methodology for breast cancer diagnosis using multiparametric magnetic resonance imaging. J Med Imaging (Bellingham) 2020; 7:044502. [PMID: 32864390 PMCID: PMC7444714 DOI: 10.1117/1.jmi.7.4.044502] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 07/29/2020] [Indexed: 12/30/2022] Open
Abstract
Purpose: This study aims to develop and compare human-engineered radiomics methodologies that use multiparametric magnetic resonance imaging (mpMRI) to diagnose breast cancer. Approach: The dataset comprises clinical multiparametric MR images of 852 unique lesions from 612 patients. Each MR study included a dynamic contrast-enhanced (DCE)-MRI sequence and a T2-weighted (T2w) MRI sequence, and a subset of 389 lesions were also imaged with a diffusion-weighted imaging (DWI) sequence. Lesions were automatically segmented using the fuzzy C-means algorithm. Radiomic features were extracted from each MRI sequence. Two approaches, feature fusion and classifier fusion, to utilizing multiparametric information were investigated. A support vector machine classifier was trained for each method to differentiate between benign and malignant lesions. Area under the receiver operating characteristic curve (AUC) was used to evaluate and compare diagnostic performance. Analyses were first performed on the entire dataset and then on the subset that was imaged using the three-sequence protocol. Results: When using the full dataset, the single-parametric classifiers yielded the following AUCs and 95% confidence intervals:AUC DCE = 0.84 [0.82, 0.87],AUC T 2 w = 0.83 [0.80, 0.86], andAUC DWI = 0.69 [0.62, 0.75]. The two multiparametric classifiers both yielded AUCs of 0.87 [0.84, 0.89] and significantly outperformed all single-parametric methods classifiers. When using the three-sequence subset, the mpMRI classifiers' performances significantly decreased. Conclusions: The proposed mpMRI radiomics methods can improve the performance of computer-aided diagnostics for breast cancer and handle missing sequences in the imaging protocol.
Collapse
|
43
|
Artificial Intelligence: reshaping the practice of radiological sciences in the 21st century. Br J Radiol 2020; 93:20190855. [PMID: 31965813 DOI: 10.1259/bjr.20190855] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Advances in computing hardware and software platforms have led to the recent resurgence in artificial intelligence (AI) touching almost every aspect of our daily lives by its capability for automating complex tasks or providing superior predictive analytics. AI applications are currently spanning many diverse fields from economics to entertainment, to manufacturing, as well as medicine. Since modern AI's inception decades ago, practitioners in radiological sciences have been pioneering its development and implementation in medicine, particularly in areas related to diagnostic imaging and therapy. In this anniversary article, we embark on a journey to reflect on the learned lessons from past AI's chequered history. We further summarize the current status of AI in radiological sciences, highlighting, with examples, its impressive achievements and effect on re-shaping the practice of medical imaging and radiotherapy in the areas of computer-aided detection, diagnosis, prognosis, and decision support. Moving beyond the commercial hype of AI into reality, we discuss the current challenges to overcome, for AI to achieve its promised hope of providing better precision healthcare for each patient while reducing cost burden on their families and the society at large.
Collapse
|
44
|
Harmonization of radiomic features of breast lesions across international DCE-MRI datasets. J Med Imaging (Bellingham) 2020; 7:012707. [PMID: 32206682 PMCID: PMC7056633 DOI: 10.1117/1.jmi.7.1.012707] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/24/2020] [Indexed: 12/12/2022] Open
Abstract
Purpose: Radiomic features extracted from medical images acquired in different countries may demonstrate a batch effect. Thus, we investigated the effect of harmonization on a database of radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) breast imaging studies of 3150 benign lesions and cancers collected from international datasets, as well as the potential of harmonization to improve classification of malignancy. Approach: Eligible features were harmonized by category using the ComBat method. Harmonization effect on features was evaluated using the Davies-Bouldin index for degree of clustering between populations for both benign lesions and cancers. Performance in distinguishing between cancers and benign lesions was evaluated for each dataset using 10-fold cross validation with the area under the receiver operating characteristic curve (AUC) determined on the pre- and postharmonization sets of radiomic features in each dataset and a combined one. Differences in AUCs were evaluated for statistical significance. Results: The Davies-Bouldin index increased by 27% for benign lesions and by 43% for cancers, indicating that the postharmonization features were more similar. Classification performance using postharmonization features performed better than that using preharmonization features ( p < 0.001 for all three). Conclusion: Harmonization of radiomic features may enable combining databases from different populations for more comprehensive computer-aided diagnosis models of breast cancer.
Collapse
|
45
|
Comparison of Breast MRI Tumor Classification Using Human-Engineered Radiomics, Transfer Learning From Deep Convolutional Neural Networks, and Fusion Methods. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:163-177. [PMID: 34045769 PMCID: PMC8152568 DOI: 10.1109/jproc.2019.2950187] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Digital image-based signatures of breast tumors may ultimately contribute to the design of patient-specific breast cancer diagnostics and treatments. Beyond traditional human-engineered computer vision methods, tumor classification methods using transfer learning from deep convolutional neural networks (CNNs) are actively under development. This article will first discuss our progress in using CNN-based transfer learning to characterize breast tumors for various diagnostic, prognostic, or predictive image-based tasks across multiple imaging modalities, including mammography, digital breast tomosynthesis, ultrasound (US), and magnetic resonance imaging (MRI), compared to both human-engineered feature-based radiomics and fusion classifiers created through combination of such features. Second, a new study is presented that reports on a comprehensive comparison of the classification performances of features derived from human-engineered radiomic features, CNN transfer learning, and fusion classifiers for breast lesions imaged with MRI. These studies demonstrate the utility of transfer learning for computer-aided diagnosis and highlight the synergistic improvement in classification performance using fusion classifiers.
Collapse
|
46
|
Independent validation of machine learning in diagnosing breast Cancer on magnetic resonance imaging within a single institution. Cancer Imaging 2019; 19:64. [PMID: 31533838 PMCID: PMC6751793 DOI: 10.1186/s40644-019-0252-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 09/11/2019] [Indexed: 11/30/2022] Open
Abstract
Background As artificial intelligence methods for the diagnosis of disease advance, we aimed to evaluate machine learning in the predictive task of distinguishing between malignant and benign breast lesions on an independent clinical magnetic resonance imaging (MRI) dataset within a single institution for subsequent use as a computer aid for radiologists. Methods Computer analysis was conducted on consecutive dynamic contrast-enhanced MRI (DCE-MRI) studies from 1483 breast cancer and 496 benign patients who underwent MRI examinations between February 2015 and October 2017; with the age ranges of the cancer and benign patients being 19 to 77 and 16 to 76 years old, respectively. Cases were separated into a training dataset (years 2015 & 2016; 1444 cases) and an independent testing dataset (year 2017; 535 cases) based solely on MRI examination date. After radiologist indication of the lesion, the computer automatically segmented and extracted radiomic features, which were subsequently merged with a support-vector machine (SVM) to yield a lesion signature. Area under the receiving operating characteristic (ROC) curve (AUC) with 95% confidence intervals (CI) served as the primary figure of merit in the statistical evaluation for this clinical classification task. Results In the task of distinguishing malignant and benign breast lesions DCE-MRI, the trained predictive model yielded an AUC value of 0.89 (95% CI: 0.858, 0.922) on the independent image set. AUC values of 0.88 (95% CI: 0.845, 0.926) and 0.90 (95% CI: 0.837, 0.940) were obtained for mass lesions only and non-mass lesions only, respectively. Compared with actual clinical management decisions, the predictive model achieved 99.5% sensitivity with 9.6% fewer recommended biopsies. Conclusion On an independent, consecutive clinical dataset within a single institution, a trained machine learning system yielded promising performance in distinguishing between malignant and benign breast lesions.
Collapse
|
47
|
Artificial intelligence in the interpretation of breast cancer on MRI. J Magn Reson Imaging 2019; 51:1310-1324. [PMID: 31343790 DOI: 10.1002/jmri.26878] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 07/08/2019] [Indexed: 12/13/2022] Open
Abstract
Advances in both imaging and computers have led to the rise in the potential use of artificial intelligence (AI) in various tasks in breast imaging, going beyond the current use in computer-aided detection to include diagnosis, prognosis, response to therapy, and risk assessment. The automated capabilities of AI offer the potential to enhance the diagnostic expertise of clinicians, including accurate demarcation of tumor volume, extraction of characteristic cancer phenotypes, translation of tumoral phenotype features to clinical genotype implications, and risk prediction. The combination of image-specific findings with the underlying genomic, pathologic, and clinical features is becoming of increasing value in breast cancer. The concurrent emergence of newer imaging techniques has provided radiologists with greater diagnostic tools and image datasets to analyze and interpret. Integrating an AI-based workflow within breast imaging enables the integration of multiple data streams into powerful multidisciplinary applications that may lead the path to personalized patient-specific medicine. In this article we describe the goals of AI in breast cancer imaging, in particular MRI, and review the literature as it relates to the current application, potential, and limitations in breast cancer. Level of Evidence: 3 Technical Efficacy: Stage 3 J. Magn. Reson. Imaging 2020;51:1310-1324.
Collapse
|
48
|
Breast MRI radiomics for the pretreatment prediction of response to neoadjuvant chemotherapy in node-positive breast cancer patients. J Med Imaging (Bellingham) 2019; 6:034502. [PMID: 31592438 PMCID: PMC6768440 DOI: 10.1117/1.jmi.6.3.034502] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 09/09/2019] [Indexed: 12/16/2022] Open
Abstract
The purpose of this study was to evaluate breast MRI radiomics in predicting, prior to any treatment, the response to neoadjuvant chemotherapy (NAC) in patients with invasive lymph node (LN)-positive breast cancer for two tasks: (1) prediction of pathologic complete response and (2) prediction of post-NAC LN status. Our study included 158 patients, with 19 showing post-NAC complete pathologic response (pathologic TNM stage T0,N0,MX) and 139 showing incomplete response. Forty-two patients were post-NAC LN-negative, and 116 were post-NAC LN-positive. We further analyzed prediction of response by hormone receptor subtype of the primary cancer (77 hormone receptor-positive, 39 HER2-enriched, 38 triple negative, and 4 cancers with unknown receptor status). Only pre-NAC MRIs underwent computer analysis, initialized by an expert breast radiologist indicating index cancers and metastatic axillary sentinel LNs on DCE-MRI images. Forty-nine computer-extracted radiomics features were obtained, both for the primary cancers and for the metastatic sentinel LNs. Since the dataset contained MRIs acquired at 1.5 T and at 3.0 T, we eliminated features affected by magnet strength using the Mann-Whitney U-test with the null-hypothesis that 1.5 T and 3.0 T samples were selected from populations having the same distribution. Bootstrapping and ROC analysis were used to assess performance of individual features in the two classification tasks. Eighteen features appeared unaffected by magnet strength. Pre-NAC tumor features generally appeared uninformative in predicting response to therapy. In contrast, some pre-NAC LN features were able to predict response: two pre-NAC LN features were able to predict pathologic complete response (area under the ROC curve (AUC) up to 0.82 [0.70; 0.88]), and another two were able to predict post-NAC LN-status (AUC up to 0.72 [0.62; 0.77]), respectively. In the analysis by a hormone receptor subtype, several potentially useful features were identified for predicting response to therapy in the hormone receptor-positive and HER2-enriched cancers.
Collapse
|
49
|
Effect of biopsy on the MRI radiomics classification of benign lesions and luminal A cancers. J Med Imaging (Bellingham) 2019; 6:031408. [PMID: 35834307 PMCID: PMC6378704 DOI: 10.1117/1.jmi.6.3.031408] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/14/2019] [Indexed: 09/05/2023] Open
Abstract
Radiomic features extracted from magnetic resonance (MR) images have potential for diagnosis and prognosis of breast cancer. However, presentation of lesions on images may be affected by biopsy. Thirty-four nonsize features were extracted from 338 dynamic contrast-enhanced MR images of benign lesions and luminal A cancers (80 benign/34 luminal A prebiopsy; 46 benign/178 luminal A postbiopsy). Feature value distributions were compared by biopsy condition using the Kolmogorov-Smirnov test. Classification performance was assessed by biopsy condition in the task of distinguishing between lesion types using the area under the receiver operating characteristic curve (AUCROC) as performance metric. Superiority and equivalence testing of differences in AUCROC between biopsy conditions were conducted using Bonferroni-Holm-adjusted significance levels. Distributions for most nonsize features for each lesion type failed to show a statistically significant difference between biopsy conditions. Fourteen features outperformed random guessing in classification. Their differences in AUCROC by biopsy condition failed to reach statistical significance, but we were unable to prove equivalence using a margin of Δ AUCROC = ± 0.10 . However, classification performance for lesions imaged either prebiopsy or postbiopsy appears to be similar when taking into account biopsy condition.
Collapse
|
50
|
Digital Mammography in Breast Cancer: Additive Value of Radiomics of Breast Parenchyma. Radiology 2019; 291:15-20. [PMID: 30747591 PMCID: PMC6445042 DOI: 10.1148/radiol.2019181113] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 12/14/2018] [Accepted: 01/02/2019] [Indexed: 11/11/2022]
Abstract
Background Previous studies have suggested that breast parenchymal texture features may reflect the biologic risk factors associated with breast cancer development. Therefore, combining the characteristics of normal parenchyma from the contralateral breast with radiomic features of breast tumors may improve the accuracy of digital mammography in the diagnosis of breast cancer. Purpose To determine whether the addition of radiomic analysis of contralateral breast parenchyma to the characterization of breast lesions with digital mammography improves lesion classification over that with radiomic tumor features alone. Materials and Methods This HIPAA-compliant, retrospective study included 182 patients (age range, 25-90 years; mean age, 55.9 years ± 14.9) who underwent mammography between June 2002 and July 2009. There were 106 malignant and 76 benign lesions. Automatic lesion segmentation and radiomic analysis were performed for each breast lesion. Radiomic texture analysis was applied in the normal regions of interest in the contralateral breast parenchyma to assess the mammographic parenchymal patterns. The classification performance of both individual features and the output from a Bayesian artificial neural network classifier was evaluated with the leave-one-patient-out method by using the area under the receiver operating characteristic curve (AUC) as the figure of merit in the task of differentiating between malignant and benign lesions. Results The performance of the combined lesion and parenchyma classifier in the differentiation between malignant and benign mammographic lesions was better than that with the lesion features alone (AUC = 0.84 ± 0.03 vs 0.79 ± 0.03, respectively; P = .047). Overall, six radiomic features-spiculation, margin sharpness, size, circularity from the tumor feature set, and skewness and power law beta from the parenchymal feature set-were selected more than 50% of the time during the feature selection process on the combined feature set. Conclusion Combining quantitative radiomic data from tumors with contralateral parenchyma characterizations may improve diagnostic accuracy for breast cancer. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Shaffer in this issue.
Collapse
|