51
|
Gomez W, Pereira WCA, Infantosi AFC. Analysis of co-occurrence texture statistics as a function of gray-level quantization for classifying breast ultrasound. IEEE TRANSACTIONS ON MEDICAL IMAGING 2012; 31:1889-99. [PMID: 22759441 DOI: 10.1109/tmi.2012.2206398] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In this paper, we investigated the behavior of 22 co-occurrence statistics combined to six gray-scale quantization levels to classify breast lesions on ultrasound (BUS) images. The database of 436 BUS images used in this investigation was formed by 217 carcinoma and 219 benign lesions images. The region delimited by a minimum bounding rectangle around the lesion was employed to calculate the gray-level co-occurrence matrix (GLCM). Next, 22 co-occurrence statistics were computed regarding six quantization levels (8, 16, 32, 64, 128, and 256), four orientations (0° , 45° , 90° , and 135°), and ten distances (1, 2,...,10 pixels). Also, to reduce feature space dimensionality, texture descriptors of the same distance were averaged over all orientations, which is a common practice in the literature. Thereafter, the feature space was ranked using mutual information technique with minimal-redundancy-maximal-relevance (mRMR) criterion. Fisher linear discriminant analysis (FLDA) was applied to assess the discrimination power of texture features, by adding the first m-ranked features to the classification procedure iteratively until all of them were considered. The area under ROC curve (AUC) was used as figure of merit to measure the performance of the classifier. It was observed that averaging texture descriptors of a same distance impacts negatively the classification performance, since the best AUC of 0.81 was achieved with 32 gray levels and 109 features. On the other hand, regarding the single texture features (i.e., without averaging procedure), the quantization level does not impact the discrimination power, since AUC = 0.87 was obtained for the six quantization levels. Moreover, the number of features was reduced (between 17 and 24 features). The texture descriptors that contributed notably to distinguish breast lesions were contrast and correlation computed from GLCMs with orientation of 90° and distance more than five pixels.
Collapse
Affiliation(s)
- W Gomez
- Technology Information Laboratory, Center for Research and Advanced Studies of the National Polytechnic Institute, Ciudad Victoria, 87130 Tamaulipas, Mexico.
| | | | | |
Collapse
|
52
|
Xu JW, Suzuki K. Massive-training support vector regression and Gaussian process for false-positive reduction in computer-aided detection of polyps in CT colonography. Med Phys 2011; 38:1888-902. [PMID: 21626922 DOI: 10.1118/1.3562898] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE A massive-training artificial neural network (MTANN) has been developed for the reduction of false positives (FPs) in computer-aided detection (CADe) of polyps in CT colonography (CTC). A major limitation of the MTANN is the long training time. To address this issue, the authors investigated the feasibility of two state-of-the-art regression models, namely, support vector regression (SVR) and Gaussian process regression (GPR) models, in the massive-training framework and developed massive-training SVR (MTSVR) and massive-training GPR (MTGPR) for the reduction of FPs in CADe of polyps. METHODS The authors applied SVR and GPR as volume-processing techniques in the distinction of polyps from FP detections in a CTC CADe scheme. Unlike artificial neural networks (ANNs), both SVR and GPR are memory-based methods that store a part of or the entire training data for testing. Therefore, their training is generally fast and they are able to improve the efficiency of the massive-training methodology. Rooted in a maximum margin property, SVR offers excellent generalization ability and robustness to outliers. On the other hand, GPR approaches nonlinear regression from a Bayesian perspective, which produces both the optimal estimated function and the covariance associated with the estimation. Therefore, both SVR and GPR, as the state-of-the-art nonlinear regression models, are able to offer a performance comparable or potentially superior to that of ANN, with highly efficient training. Both MTSVR and MTGPR were trained directly with voxel values from CTC images. A 3D scoring method based on a 3D Gaussian weighting function was applied to the outputs of MTSVR and MTGPR for distinction between polyps and nonpolyps. To test the performance of the proposed models, the authors compared them to the original MTANN in the distinction between actual polyps and various types of FPs in terms of training time reduction and FP reduction performance. The authors' CTC database consisted of 240 CTC data sets obtained from 120 patients in the supine and prone positions. The training set consisted of 27 patients, 10 of which had 10 polyps. The authors selected 10 nonpolyps (i.e., FP sources) from the training set. These ten polyps and ten nonpolyps were used for training the proposed models. The testing set consisted of 93 patients, including 19 polyps in 7 patients and 86 negative patients with 474 FPs produced by an original CADe scheme. RESULTS With the MTSVR, the training time was reduced by a factor of 190, while a FP reduction performance [by-polyp sensitivity of 94.7% (18/19) with 2.5 (230/93) FPs/patient] comparable to that of the original MTANN [the same sensitivity with 2.6 (244/93) FPs/patient] was achieved. The classification performance in terms of the area under the receiver-operating-characteristic curve value of the MTGPR (0.82) was statistically significantly higher than that of the original MTANN (0.77), with a two-sided p-value of 0.03. The MTGPR yielded a 94.7% (18/19) by-polyp sensitivity at a FP rate of 2.5 (235/93) per patient and reduced the training time by a factor of 1.3. CONCLUSIONS Both MTSVR and MTGPR improve the efficiency of the training in the massive-training framework while maintaining a comparable performance.
Collapse
Affiliation(s)
- Jian-Wu Xu
- Department of Radiology, The University of Chicago, 5841 South Maryland Avenue, Chicago, Illinois 60637, USA.
| | | |
Collapse
|
53
|
Chen S, Suzuki K, MacMahon H. Development and evaluation of a computer-aided diagnostic scheme for lung nodule detection in chest radiographs by means of two-stage nodule enhancement with support vector classification. Med Phys 2011; 38:1844-58. [PMID: 21626918 DOI: 10.1118/1.3561504] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
PURPOSE To develop a computer-aided detection (CADe) scheme for nodules in chest radiographs (CXRs) with a high sensitivity and a low false-positive (FP) rate. METHODS The authors developed a CADe scheme consisting of five major steps, which were developed for improving the overall performance of CADe schemes. First, to segment the lung fields accurately, the authors developed a multisegment active shape model. Then, a two-stage nodule-enhancement technique was developed for improving the conspicuity of nodules. Initial nodule candidates were detected and segmented by using the clustering watershed algorithm. Thirty-one shape-, gray-level-, surface-, and gradient-based features were extracted from each segmented candidate for determining the feature space, including one of the new features based on the Canny edge detector to eliminate a major FP source caused by rib crossings. Finally, a nonlinear support vector machine (SVM) with a Gaussian kernel was employed for classification of the nodule candidates. RESULTS To evaluate and compare the scheme to other published CADe schemes, the authors used a publicly available database containing 140 nodules in 140 CXRs and 93 normal CXRs. The CADe scheme based on the SVM classifier achieved sensitivities of 78.6% (110/140) and 71.4% (100/140) with averages of 5.0 (1165/233) FPs/image and 2.0 (466/233) FPs/image, respectively, in a leave-one-out cross-validation test, whereas the CADe scheme based on a linear discriminant analysis classifier had a sensitivity of 60.7% (85/140) at an FP rate of 5.0 FPs/image. For nodules classified as "very subtle" and "extremely subtle," a sensitivity of 57.1% (24/42) was achieved at an FP rate of 5.0 FPs/image. When the authors used a database developed at the University of Chicago, the sensitivities was 83.3% (40/48) and 77.1% (37/48) at an FP rate of 5.0 (240/48) FPs/image and 2.0 (96/48) FPs/image, respectively. CONCLUSIONS These results compare favorably to those described for other commercial and non-commercial CADe nodule detection systems.
Collapse
Affiliation(s)
- Sheng Chen
- Department of Radiology, The University of Chicago, 5841 South Maryland Avenue, MC 2026, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
54
|
Jamieson AR, Giger ML, Drukker K, Pesce LL. Enhancement of breast CADx with unlabeled data. Med Phys 2010; 37:4155-72. [PMID: 20879576 DOI: 10.1118/1.3455704] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE Unlabeled medical image data are abundant, yet the process of converting them into a labeled ("truth-known") database is time and resource expensive and fraught with ethical and logistics issues. The authors propose a dual-stage CADx scheme in which both labeled and unlabeled (truth-known and "truth-unknown") data are used. This study is an initial exploration of the potential for leveraging unlabeled data toward enhancing breast CADx. METHODS From a labeled ultrasound image database consisting of 1126 lesions with an empirical cancer prevalence of 14%, 200 different randomly sampled subsets were selected and the truth status of a variable number of cases was masked to the algorithm to mimic different types of labeled and unlabeled data sources. The prevalence was fixed at 50% cancerous for the labeled data and 5% cancerous for the unlabeled. In the first stage of the dual-stage CADx scheme, the authors term "transductive dimension reduction regularization" (TDR-R), both labeled and unlabeled images characterized by extracted lesion features were combined using dimension reduction (DR) techniques and mapped to a lower-dimensional representation. (The first stage ignored truth status therefore was an unsupervised algorithm.) In the second stage, the labeled data from the reduced dimension embedding were used to train a classifier toward estimating the probability of malignancy. For the first CADx stage, the authors investigated three DR approaches: Laplacian eigen-maps, t-distributed stochastic neighbor embedding (t-SNE), and principal component analysis. For the TDR-R methods, the classifier in the second stage was a supervised (i.e., utilized truth) Bayesian neural net. The dual-stage CADx schemes were compared to a single-stage scheme based on manifold regularization (MR) in a semisupervised setting via the LapSVM algorithm. Performance in terms of areas under the ROC curve (AUC) of the CADx schemes was evaluated in leave-one-out and .632+ bootstrap analyses on a by-lesion basis. Additionally, the trained algorithms were applied to an independent test data set consisting of 101 lesions with approximately 50% cancer prevalence. The difference in AUC (deltaAUC) between with and without the use of unlabeled data was computed. RESULTS Statistically significant differences in the average AUC value (deltaAUC) were found in many instances between training with and without unlabeled data, based on the sample set distributions generated from this particular ultrasound data set during cross-validation and using independent test set. For example, when using 100 labeled and 900 unlabeled cases and testing on the independent test set, the TDR-R methods produced average deltaAUC=0.0361 with 95% intervals [0.0301; 0.0408] (p-value < 0.0001, adjusted for multiple comparisons, but considering the test set fixed) using t-SNE and average deltaAUC=.026 [0.0227, 0.0298] (adjusted p-value < 0.0001) using Laplacian eigenmaps, while the MR-based LapSVM produced an average deltaAUC=.0381 [0.0351; 0.0405] (adjusted p-value < 0.0001). The authors also found that schemes initially obtaining lower than average performance when using labeled data only showed the most prominent increase in performance when unlabeled data were added in the first CADx stage, suggesting a regularization effect due to the injection of unlabeled data. CONCLUSION The findings reveal evidence that incorporating unlabeled data information into the overall development of CADx methods may improve classifier performance by non-negligible amounts and warrants further investigation.
Collapse
Affiliation(s)
- Andrew R Jamieson
- Department of Radiology, University of Chicago, Illinois 60637, USA.
| | | | | | | |
Collapse
|
55
|
Zheng B, Wang X, Lederman D, Tan J, Gur D. Computer-aided detection; the effect of training databases on detection of subtle breast masses. Acad Radiol 2010; 17:1401-8. [PMID: 20650667 PMCID: PMC2952663 DOI: 10.1016/j.acra.2010.06.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Revised: 06/09/2010] [Accepted: 06/10/2010] [Indexed: 10/19/2022]
Abstract
RATIONALE AND OBJECTIVES Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses. MATERIALS AND METHODS A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets ("all"), (2) one including 350 randomly selected mass regions ("diverse"), (3) one including 350 high-conspicuity mass regions ("easy"), and (4) one including 350 low-conspicuity mass regions ("difficult")-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared. RESULTS CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image. CONCLUSIONS CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.
Collapse
Affiliation(s)
- Bin Zheng
- Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Room 128, Pittsburgh, PA 15213, USA.
| | | | | | | | | |
Collapse
|
56
|
Drukker K, Pesce L, Giger M. Repeatability in computer-aided diagnosis: application to breast cancer diagnosis on sonography. Med Phys 2010; 37:2659-69. [PMID: 20632577 DOI: 10.1118/1.3427409] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE The aim of this study was to investigate the concept of repeatability in a case-based performance evaluation of two classifiers commonly used in computer-aided diagnosis in the task of distinguishing benign from malignant lesions. METHODS The authors performed .632+ bootstrap analyses using a data set of 1251 sonographic lesions of which 212 were malignant. Several analyses were performed investigating the impact of sample size and number of bootstrap iterations. The classifiers investigated were a Bayesian neural net (BNN) with five hidden units and linear discriminant analysis (LDA). Both used the same four input lesion features. While the authors did evaluate classifier performance using receiver operating characteristic (ROC) analysis, the main focus was to investigate case-based performance based on the classifier output for individual cases, i.e., the classifier outputs for each test case measured over the bootstrap iterations. In this case-based analysis, the authors examined the classifier output variability and linked it to the concept of repeatability. Repeatability was assessed on the level of individual cases, overall for all cases in the data set, and regarding its dependence on the case-based classifier output. The impact of repeatability was studied when aiming to operate at a constant sensitivity or specificity and when aiming to operate at a constant threshold value for the classifier output. RESULTS The BNN slightly outperformed the LDA with an area under the ROC curve of 0.88 versus 0.85 (p < 0.05). In the repeatability analysis on an individual case basis, it was evident that different cases posed different degrees of difficulty to each classifier as measured by the by-case output variability. When considering the entire data set, however, the overall repeatability of the BNN classifier was lower than for the LDA classifier, i.e., the by-case variability for the BNN was higher. The dependence of the by-case variability on the average by-case classifier output was markedly different for the classifiers. The BNN achieved the lowest variability (best repeatability) when operating at high sensitivity (> 90%) and low specificity (< 66%), while the LDA achieved this at moderate sensitivity (approximately 74%) and specificity (approximately 84%). When operating at constant 90% sensitivity or constant 90% specificity, the width of the 95% confidence intervals for the corresponding classifier output was considerable for both classifiers and increased for smaller sample sizes. When operating at a constant threshold value for the classifier output, the width of the 95% confidence intervals for the corresponding sensitivity and specificity ranged from 9 percentage points (pp) to 30 pp. CONCLUSIONS The repeatability of the classifier output can have a substantial effect on the obtained sensitivity and specificity. Knowledge of classifier repeatability, in addition to overall performance level, is important for successful translation and implementation of computer-aided diagnosis in clinical decision making.
Collapse
Affiliation(s)
- Karen Drukker
- Department of Radiology, The University of Chicago, 5841 S. Maryland Ave., MC 2026 Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
57
|
Way TW, Sahiner B, Hadjiiski LM, Chan HP. Effect of finite sample size on feature selection and classification: a simulation study. Med Phys 2010; 37:907-20. [PMID: 20229900 DOI: 10.1118/1.3284974] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. METHODS Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. RESULTS It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. CONCLUSIONS None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.
Collapse
Affiliation(s)
- Ted W Way
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109-5842, USA
| | | | | | | |
Collapse
|
58
|
Suzuki K, Rockey DC, Dachman AH. CT colonography: advanced computer-aided detection scheme utilizing MTANNs for detection of "missed" polyps in a multicenter clinical trial. Med Phys 2010; 37:12-21. [PMID: 20175461 DOI: 10.1118/1.3263615] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
PURPOSE The purpose of this study was to develop an advanced computer-aided detection (CAD) scheme utilizing massive-training artificial neural networks (MTANNs) to allow detection of "difficult" polyps in CT colonography (CTC) and to evaluate its performance on false-negative (FN) CTC cases that radiologists "missed" in a multicenter clinical trial. METHODS The authors developed an advanced CAD scheme consisting of an initial polyp-detection scheme for identification of polyp candidates and a mixture of expert MTANNs for substantial reduction in false positives (FPs) while maintaining sensitivity. The initial polyp-detection scheme consisted of (1) colon segmentation based on anatomy-based extraction and colon-based analysis and (2) detection of polyp candidates based on a morphologic analysis on the segmented colon. The mixture of expert MTANNs consisted of (1) supervised enhancement of polyps and suppression of various types of nonpolyps, (2) a scoring scheme for converting output voxels into a score for each polyp candidate, and (3) combining scores from multiple MTANNs by the use of a mixing artificial neural network. For testing the advanced CAD scheme, they created a database containing 24 FN cases with 23 polyps (range of 6-15 mm; average of 8 mm) and a mass (35 mm), which were "missed" by radiologists in CTC in the original trial in which 15 institutions participated. RESULTS The initial polyp-detection scheme detected 63% (15/24) of the missed polyps with 21.0 (505/24) FPs per patient. The MTANNs removed 76% of the FPs with loss of one true positive; thus, the performance of the advanced CAD scheme was improved to a sensitivity of 58% (14/24) with 8.6 (207/24) FPs per patient, whereas a conventional CAD scheme yielded a sensitivity of 25% at the same FP rate (the difference was statistically significant). CONCLUSIONS With the advanced MTANN CAD scheme, 58% of the polyps missed by radiologists in the original trial were detected and with a reasonable number of FPs. The results suggest that the use of an advanced MTANN CAD scheme may potentially enhance the detection of "difficult" polyps.
Collapse
Affiliation(s)
- Kenji Suzuki
- Department of Radiology, The University of Chicago, 5841 South Maryland Avenue, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
59
|
Jamieson AR, Giger ML, Drukker K, Li H, Yuan Y, Bhooshan N. Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE. Med Phys 2010; 37:339-51. [PMID: 20175497 DOI: 10.1118/1.3267037] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. METHODS These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. RESULTS In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. CONCLUSIONS Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Collapse
Affiliation(s)
- Andrew R Jamieson
- Department of Radiology, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | | | | | |
Collapse
|
60
|
Wu YT, Zhou C, Chan HP, Paramagul C, Hadjiiski LM, Daly CP, Douglas JA, Zhang Y, Sahiner B, Shi J, Wei J. Dynamic multiple thresholding breast boundary detection algorithm for mammograms. Med Phys 2010; 37:391-401. [PMID: 20175501 DOI: 10.1118/1.3273062] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE Automated detection of breast boundary is one of the fundamental steps for computer-aided analysis of mammograms. In this study, the authors developed a new dynamic multiple thresholding based breast boundary (MTBB) detection method for digitized mammograms. METHODS A large data set of 716 screen-film mammograms (442 CC view and 274 MLO view) obtained from consecutive cases of an Institutional Review Board approved project were used. An experienced breast radiologist manually traced the breast boundary on each digitized image using a graphical interface to provide a reference standard. The initial breast boundary (MTBB-Initial) was obtained by dynamically adapting the threshold to the gray level range in local regions of the breast periphery. The initial breast boundary was then refined by using gradient information from horizontal and vertical Sobel filtering to obtain the final breast boundary (MTBB-Final). The accuracy of the breast boundary detection algorithm was evaluated by comparison with the reference standard using three performance metrics: The Hausdorff distance (HDist), the average minimum Euclidean distance (AMinDist), and the area overlap measure (AOM). RESULTS In comparison with the authors' previously developed gradient-based breast boundary (GBB) algorithm, it was found that 68%, 85%, and 94% of images had HDist errors less than 6 pixels (4.8 mm) for GBB, MTBB-Initial, and MTBB-Final, respectively. 89%, 90%, and 96% of images had AMinDist errors less than 1.5 pixels (1.2 mm) for GBB, MTBB-Initial, and MTBB-Final, respectively. 96%, 98%, and 99% of images had AOM values larger than 0.9 for GBB, MTBB-Initial, and MTBB-Final, respectively. The improvement by the MTBB-Final method was statistically significant for all the evaluation measures by the Wilcoxon signed rank test (p < 0.0001). CONCLUSIONS The MTBB approach that combined dynamic multiple thresholding and gradient information provided better performance than the breast boundary detection algorithm that mainly used gradient information.
Collapse
Affiliation(s)
- Yi-Ta Wu
- Department of Radiology, University of Michigan, Ann Arbor Michigan 48109, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Muniz AMS, Liu W, Liu H, Lyons KE, Pahwa R, Nobre FF, Nadal J. Assessment of the effects of subthalamic stimulation in Parkinson disease patients by artificial neural network. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2009:5673-6. [PMID: 19964412 DOI: 10.1109/iembs.2009.5333545] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This study aims at using a probabilistic neural network (PNN) for discriminating between normal and Parkinson disease (PD) subjects using as input the principal components (PCs) derived from vertical component of the ground reaction force (vGRF). The trained PNN was further used for evaluating the effects of deep brain stimulation of the subthalamic nucleus (STN DBS) on PD, with and without medication. A sample of 45 subjects (30 normal and 15 PD subjects who underwent STN DBS) was evaluated by gait analysis. PD subjects were assessed under four test conditions: without treatment (mof-sof), only with stimulation (mof-son) or medication (mon-sof), and with combined treatments (mon-son). PC analysis was applied on vGRF, where six PC scores were chosen by the broken stick test. Using a bootstrap approach for the PNN model, and the area under the receiver operating characteristic curve (AUC) as performance measurement, the first three and fifth PCs were selected as input variables. The PNN presented AUC = 0.995 for classifying controls and PD subjects in the mof-sof condition. When applied to classify the PD subjects under treatment, the PNN indicated that STN DBS alone is more effective than medication, and further vGRF enhancement is obtained with combined therapies.
Collapse
Affiliation(s)
- A M S Muniz
- Biomedical Engineering Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil.
| | | | | | | | | | | | | |
Collapse
|
62
|
Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. J Biomech 2009; 43:720-6. [PMID: 19914622 DOI: 10.1016/j.jbiomech.2009.10.018] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Revised: 10/05/2009] [Accepted: 10/06/2009] [Indexed: 11/20/2022]
Abstract
Deep brain stimulation of the subthalamic nucleus (DBS-STN) is an approved treatment for advanced Parkinson disease (PD) patients; however, there is a need to further evaluate its effect on gait. This study compares logistic regression (LR), probabilistic neural network (PNN) and support vector machine (SVM) classifiers for discriminating between normal and PD subjects in assessing the effects of DBS-STN on ground reaction force (GRF) with and without medication. Gait analysis of 45 subjects (30 normal and 15 PD subjects who underwent bilateral DBS-STN) was performed. PD subjects were assessed under four test conditions: without treatment (mof-sof), with stimulation alone (mof-son), with medication alone (mon-sof), and with medication and stimulation (mon-son). Principal component (PC) analysis was applied to the three components of GRF separately, where six PC scores from vertical, one from anterior-posterior and one from medial-lateral were chosen by the broken stick test. Stepwise LR analysis employed the first two and fifth vertical PC scores as input variables. Using the bootstrap approach to compare model performances for classifying GRF patterns from normal and untreated PD subjects, the first three and the fifth vertical PCs were attained as SVM input variables, while the same ones plus the first anterior-posterior were selected as PNN input variables. PNN performed better than LR and SVM according to area under the receiver operating characteristic curve and the negative likelihood ratio. When evaluating treatment effects, the classifiers indicated that DBS-STN alone was more effective than medication alone, but the greatest improvements occurred with both treatments together.
Collapse
|
63
|
Wunderlich A, Noo F. Estimation of channelized hotelling observer performance with known class means or known difference of class means. IEEE TRANSACTIONS ON MEDICAL IMAGING 2009; 28:1198-1207. [PMID: 19164081 PMCID: PMC2860872 DOI: 10.1109/tmi.2009.2012705] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This paper concerns task-based image quality assessment for the task of discriminating between two classes of images. We address the problem of estimating two widely-used detection performance measures, SNR and AUC, from a finite number of images, assuming that the class discrimination is performed with a channelized Hotelling observer. In particular, we investigate the advantage that can be gained when either 1) the means of the signal-absent and signal-present classes are both known, or 2) when the difference of class means is known. For these two scenarios, we propose uniformly minimum variance unbiased estimators of SNR(2), derive the corresponding sampling distributions and provide variance expressions. In addition, we demonstrate how the bias and variance for the related AUC estimators may be calculated numerically by using the sampling distributions for the SNR(2) estimators. We find that for both SNR(2) and AUC, the new estimators have significantly lower bias and mean-square error than the traditional estimator, which assumes that the class means, and their difference, are unknown.
Collapse
Affiliation(s)
- Adam Wunderlich
- Utah Center for Advanced Imaging Research, Departmentof Radiology, University of Utah, Salt Lake City, UT 84108, USA.
| | | |
Collapse
|
64
|
Sahiner B, Chan HP, Hadjiiski LM, Roubidoux MA, Paramagul C, Bailey JE, Nees AV, Blane CE, Adler DD, Patterson SK, Klein KA, Pinsky RW, Helvie MA. Multi-modality CADx: ROC study of the effect on radiologists' accuracy in characterizing breast masses on mammograms and 3D ultrasound images. Acad Radiol 2009; 16:810-8. [PMID: 19375953 DOI: 10.1016/j.acra.2009.01.011] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2008] [Revised: 01/01/2009] [Accepted: 01/10/2009] [Indexed: 02/06/2023]
Abstract
RATIONALE AND OBJECTIVES To investigate the effect of a computer-aided diagnosis (CADx) system on radiologists' performance in discriminating malignant and benign masses on mammograms and three-dimensional (3D) ultrasound (US) images. MATERIALS AND METHODS Our dataset contained mammograms and 3D US volumes from 67 women (median age, 51; range: 27-86) with 67 biopsy-proven breast masses (32 benign and 35 malignant). A CADx system was designed to automatically delineate the mass boundaries on mammograms and the US volumes, extract features, and merge the extracted features into a multi-modality malignancy score. Ten experienced readers (subspecialty academic breast imaging radiologists) first viewed the mammograms alone, and provided likelihood of malignancy (LM) ratings and Breast Imaging and Reporting System assessments. Subsequently, the reader viewed the US images with the mammograms, and provided LM and action category ratings. Finally, the CADx score was shown and the reader had the opportunity to revise the ratings. The LM ratings were analyzed using receiver-operating characteristic (ROC) methodology, and the action category ratings were used to determine the sensitivity and specificity of cancer diagnosis. RESULTS Without CADx, readers' average area under the ROC curve, A(z), was 0.93 (range, 0.86-0.96) for combined assessment of the mass on both the US volume and mammograms. With CADx, their average A(z) increased to 0.95 (range, 0.91-0.98), which was borderline significant (P = .05). The average sensitivity of the readers increased from 98% to 99% with CADx, while the average specificity increased from 27% to 29%. The change in sensitivity with CADx did not achieve statistical significance for the individual radiologists, and the change in specificity was statistically significant for one of the radiologists. CONCLUSIONS A well-trained CADx system that combines features extracted from mammograms and US images may have the potential to improve radiologists' performance in distinguishing malignant from benign breast masses and making decisions about biopsies.
Collapse
Affiliation(s)
- Berkman Sahiner
- Department of Radiology, The University of Michigan, MIB C480A, 1500 East Medical Center Drive, Ann Arbor, MI 48109-5842, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
65
|
Filev P, Hadjiiski L, Chan HP, Sahiner B, Ge J, Helvie MA, Roubidoux M, Zhou C. Automated regional registration and characterization of corresponding microcalcification clusters on temporal pairs of mammograms for interval change analysis. Med Phys 2009; 35:5340-50. [PMID: 19175093 DOI: 10.1118/1.3002311] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
A computerized regional registration and characterization system for analysis of microcalcification clusters on serial mammograms is being developed in our laboratory. The system consists of two stages. In the first stage, based on the location of a detected cluster on the current mammogram, a regional registration procedure identifies the local area on the prior that may contain the corresponding cluster. A search program is used to detect cluster candidates within the local area. The detected cluster on the current image is then paired with the cluster candidates on the prior image to form true (TP-TP) or false (TP-FP) pairs. Automatically extracted features were used in a newly designed correspondence classifier to reduce the number of false pairs. In the second stage, a temporal classifier, based on both current and prior information, is used if a cluster has been detected on the prior image, and a current classifier, based on current information alone, is used if no prior cluster has been detected. The data set used in this study consisted of 261 serial pairs containing biopsy-proven calcification clusters. An MQSA radiologist identified the corresponding clusters on the mammograms. On the priors, the radiologist rated the subtlety of 30 clusters (out of the 261 clusters) as 9 or 10 on a scale of 1 (very obvious) to 10 (very subtle). Leave-one-case-out resampling was used for feature selection and classification in both the correspondence and malignant/benign classification schemes. The search program detected 91.2% (238/261) of the clusters on the priors with an average of 0.42 FPs/image. The correspondence classifier identified 86.6% (226/261) of the TP-TP pairs with 20 false matches (0.08 FPs/image) relative to the entire set of 261 image pairs. In the malignant/benign classification stage the temporal classifier achieved a test A(z) of 0.81 for the 246 pairs which contained a detection on the prior. In addition, a classifier was designed by using the clusters on the current mammograms only. It achieved a test A(z) of 0.72 in classifying the clusters as malignant and benign. The difference between the performance of the temporal classifier and the current classifier was statistically significant (p=0.0014). Our interval change analysis system can detect the corresponding cluster on the prior mammogram with high sensitivity, and classify them with a satisfactory accuracy.
Collapse
Affiliation(s)
- Peter Filev
- Department of Radiology, The University of Michigan, Ann Arbor, Michigan 48109-0904, USA
| | | | | | | | | | | | | | | |
Collapse
|
66
|
Giger ML, Chan HP, Boone J. Anniversary paper: History and status of CAD and quantitative image analysis: the role of Medical Physics and AAPM. Med Phys 2009; 35:5799-820. [PMID: 19175137 PMCID: PMC2673617 DOI: 10.1118/1.3013555] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The roles of physicists in medical imaging have expanded over the years, from the study of imaging systems (sources and detectors) and dose to the assessment of image quality and perception, the development of image processing techniques, and the development of image analysis methods to assist in detection and diagnosis. The latter is a natural extension of medical physicists' goals in developing imaging techniques to help physicians acquire diagnostic information and improve clinical decisions. Studies indicate that radiologists do not detect all abnormalities on images that are visible on retrospective review, and they do not always correctly characterize abnormalities that are found. Since the 1950s, the potential use of computers had been considered for analysis of radiographic abnormalities. In the mid-1980s, however, medical physicists and radiologists began major research efforts for computer-aided detection or computer-aided diagnosis (CAD), that is, using the computer output as an aid to radiologists-as opposed to a completely automatic computer interpretation-focusing initially on methods for the detection of lesions on chest radiographs and mammograms. Since then, extensive investigations of computerized image analysis for detection or diagnosis of abnormalities in a variety of 2D and 3D medical images have been conducted. The growth of CAD over the past 20 years has been tremendous-from the early days of time-consuming film digitization and CPU-intensive computations on a limited number of cases to its current status in which developed CAD approaches are evaluated rigorously on large clinically relevant databases. CAD research by medical physicists includes many aspects-collecting relevant normal and pathological cases; developing computer algorithms appropriate for the medical interpretation task including those for segmentation, feature extraction, and classifier design; developing methodology for assessing CAD performance; validating the algorithms using appropriate cases to measure performance and robustness; conducting observer studies with which to evaluate radiologists in the diagnostic task without and with the use of the computer aid; and ultimately assessing performance with a clinical trial. Medical physicists also have an important role in quantitative imaging, by validating the quantitative integrity of scanners and developing imaging techniques, and image analysis tools that extract quantitative data in a more accurate and automated fashion. As imaging systems become more complex and the need for better quantitative information from images grows, the future includes the combined research efforts from physicists working in CAD with those working on quantitative imaging systems to readily yield information on morphology, function, molecular structure, and more-from animal imaging research to clinical patient care. A historical review of CAD and a discussion of challenges for the future are presented here, along with the extension to quantitative image analysis.
Collapse
Affiliation(s)
- Maryellen L Giger
- Department of Radiology, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|