26
|
Lederman D, Zheng B, Wang X, Sumkin JH, Gur D. A GMM-based breast cancer risk stratification using a resonance-frequency electrical impedance spectroscopy. Med Phys 2011; 38:1649-59. [PMID: 21520878 DOI: 10.1118/1.3555300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE The authors developed and tested a multiprobe-based resonance-frequency-based electrical impedance spectroscopy (REIS) system. The purpose of this study was to preliminarily assess the performance of this system in classifying younger women into two groups, those ultimately recommended for biopsy during imaging-based diagnostic workups that followed screening and those rated as negative during mammography. METHODS A seven probe-based REIS system was designed, assembled, and is currently being tested in the breast imaging facility. During an examination, contact is made with the nipple and six concentric points on the breast skin. For each measurement channel between the center probe and one of the six external probes, a set of electrical impedance spectroscopy (EIS) signal sweeps is performed and signal outputs ranging from 200 to 800 kHz at 5 kHz interval are recorded. An initial subset of 174 examinations from an ongoing prospective clinical study was selected for this preliminary analysis. An initial set of 35 features, 33 of which represented the corresponding EIS signal differences between the left and right breasts, was established. A Gaussian mixture model (GMM) classifier was developed to differentiate between "positive" (biopsy recommended) cases and "negative" (nonbiopsy) cases. Selecting an optimal feature set was performed using genetic algorithms with an area under a receiver operating characteristic curve (AUC) as the fitness criterion. RESULTS The recorded EIS signal sweeps showed that, in general, negative (nonbiopsy) examinations have a higher level of electrical impedance symmetry between the two breasts than positive (biopsy) examinations. Fourteen features were selected by genetic algorithm and used in the optimized GMM classifier. Using a leave-one-case-out test, the GMM classifier yielded a performance level of AUC = 0.78, which compared favorably to other three widely used classifiers including support vector machine, classification tree, and linear discriminant analysis. These results also suggest that the REIS signal based GMM classifier could be used as a prescreening tool to correctly identify a fraction of younger women at higher risk of developing breast cancer (i.e., 47% sensitivity at 90% specificity). CONCLUSIONS The study confirms that asymmetry in electrical impedance characteristics between two breasts provides valuable information regarding the presence of a developing breast abnormality; hence, REIS data may be useful in classifying younger women into two groups of "average" and "significantly higher than average" risk of having or developing a breast abnormality that would ultimately result in a later imaging-based recommendation for biopsy.
Collapse
|
27
|
Zheng B, Sumkin JH, Zuley ML, Lederman D, Wang X, Gur D. Computer-aided detection of breast masses depicted on full-field digital mammograms: a performance assessment. Br J Radiol 2011; 85:e153-61. [PMID: 21343322 DOI: 10.1259/bjr/51461617] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVES To investigate the feasibility of converting a computer-aided detection (CAD) scheme for digitised screen-film mammograms to full-field digital mammograms (FFDMs) and assessing CAD performance on a large database. METHODS The database included 6478 FFDM images acquired on 1120 females, with 525 cancer cases and 595 negative cases. The database was divided into five case groups: (1) cancer detected during screening, (2) interval cancers, (3) "high-risk" recommended for surgical excision, (4) recalled but negative and (5) negative (not recalled). A previously developed CAD scheme for masses depicted on digitised images was converted and re-optimised for FFDM images while keeping the same image-processing structure. CAD performance was analysed on the entire database. RESULTS The case-based sensitivity was 75.6% (397/525) for the current mammograms and 40.8% (42/103) for the prior mammograms deemed negative during clinical interpretation but "visible" during retrospective review. The region-based sensitivity was 58.1% (618/1064) for the current mammograms and 28.4% (57/201) for the prior mammograms. The CAD scheme marked 55.7% (221/397) and 35.7% (15/42) of the masses on both views of the current and the prior examinations, respectively. The overall CAD-cued false-positive rate was 0.32 per image, ranging from 0.29 to 0.51 for the five case groups. CONCLUSION This study indicated that (1) digitised image-based CAD can be converted for FFDMs while performing at a comparable, or better, level; (2) CAD detects a substantial fraction of cancers depicted on prior examinations, albeit most having been marked only on one view; and (3) CAD tends to mark more false-positive results on "difficult" negative cases that are more visually difficult for radiologists to interpret.
Collapse
|
28
|
Zheng B, Lederman D, Sumkin JH, Zuley ML, Gruss MZ, Lovy LS, Gur D. A preliminary evaluation of multi-probe resonance-frequency electrical impedance based measurements of the breast. Acad Radiol 2011; 18:220-9. [PMID: 21126888 DOI: 10.1016/j.acra.2010.09.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Revised: 09/22/2010] [Accepted: 09/29/2010] [Indexed: 11/25/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of this study was to preliminarily assess the performance of a new, resonance-frequency electrical impedance spectroscopy (REIS) system in identifying young women who were recommended to undergo breast biopsy following imaging. MATERIALS AND METHODS A seven-probe REIS system was designed and assembled and is currently being prospectively tested. During examination, contact is made with the nipple and six concentric points on the breast skin. Signal sweeps are performed, and outputs ranging from 200 to 800 kHz at 5-kHz intervals are recorded. An initial set of 140 patients, including 56 who eventually had biopsies, 63 who had negative results on screening mammography, and 21 recalled for additional imaging but later determined to have negative results, was used. An initial set of 35 features, 33 representing impedance signal differences between breasts and two representing participant age and average breast density, was assembled and reduced by a genetic algorithm to 14. The performance of an artificial neural network-based classifier was assessed using a case-based leave-one-out method. RESULTS The substantially greater asymmetry between signals of mirror-matched regions ascertained from biopsy ("positive") compared to nonbiopsy ("negative") cases resulted in an artificial neural network classifier performance (area under the curve) of 0.830 ± 0.023. At 90% specificity, this classifier, optimized for "recommendation for biopsy" rather than "cancer," detected 30 REIS-positive cases (54%), including six of nine (67%) actual cancer cases and six of nine women (67%) recommended for surgical excision of high-risk lesions. CONCLUSIONS Asymmetry in impedance measurements between bilateral breasts may provide valuable discriminatory information regarding the presence of highly suspicious imaging-based findings.
Collapse
|
29
|
Gur D, Bandos AI, Rockette HE, Zuley ML, Hakim CM, Chough DM, Ganott MA, Sumkin JH. Is an ROC-type response truly always better than a binary response in observer performance studies? Acad Radiol 2010; 17:639-45. [PMID: 20236840 DOI: 10.1016/j.acra.2009.12.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Revised: 12/17/2009] [Accepted: 12/27/2009] [Indexed: 01/20/2023]
Abstract
RATIONALE AND OBJECTIVES The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis. MATERIALS AND METHODS Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates. RESULTS The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points. CONCLUSIONS The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.
Collapse
|
30
|
Zuley ML, Bandos AI, Abrams GS, Cohen C, Hakim CM, Sumkin JH, Drescher J, Rockette HE, Gur D. Time to diagnosis and performance levels during repeat interpretations of digital breast tomosynthesis: preliminary observations. Acad Radiol 2010; 17:450-5. [PMID: 20036584 DOI: 10.1016/j.acra.2009.11.011] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Revised: 11/06/2009] [Accepted: 11/08/2009] [Indexed: 10/20/2022]
Abstract
RATIONALE AND OBJECTIVES To compare time to interpretation and diagnostic performance levels during repeat readings of full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) in a retrospective study. MATERIALS AND METHODS Three experienced radiologists twice interpreted 125 selected examinations, 35 with verified cancers and 90 negative for cancer during a period of 22 months using FFDM alone followed by a combined FFDM + DBT mode. Changes in time to "review and rate" these examinations as well as in diagnostic performance levels where assessed. A fixed-effect analysis accounting for cross-correlation due to the review of the same examinations by the same readers was performed. RESULTS The total (combined) time to review and rate an examination increased on average by 33% between the first and second readings of the same examinations (P < .001). Radiologists reduced their time to review FFDM before making the DBT available for viewing. However, they spent more time reviewing the combined FFDM + DBT mode. The recall rates for examinations depicting cancer remained largely unchanged. Among the groups of examinations with concordant and discordant recall recommendations during the two readings only the group examinations that were "newly recalled" during repeat reading, took significantly longer (P < .01). CONCLUSION DBT-based breast imaging may ultimately result in a substantial increase in performance; however, without efficiency improvements DBT may take longer to interpret. Addition of "false-positive recalls" was most strongly associated with increase in interpretation time while elimination of "false-positive recalls" did not require longer interpretation time.
Collapse
|
31
|
Gur D, Bandos AI, King JL, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, Rockette HE. Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 2008; 35:4404-9. [PMID: 18975686 DOI: 10.1118/1.2977766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary "operating point" that was statistically distant from the same reader's empirical ROC curve. Reader-specific differences ranged from -0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p-values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader's binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was (-0.0206, 0.0631) (two-sided p-value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.
Collapse
|
32
|
Gur D, Bandos AI, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, Rockette HE. Agreement of the order of overall performance levels under different reading paradigms. Acad Radiol 2008; 15:1567-73. [PMID: 19000873 DOI: 10.1016/j.acra.2008.07.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 07/15/2008] [Accepted: 07/15/2008] [Indexed: 11/27/2022]
Abstract
RATIONALE AND OBJECTIVES To investigate consistency of the orders of performance levels when interpreting mammograms under three different reading paradigms. MATERIALS AND METHODS We performed a retrospective observer study in which nine experienced radiologists rated an enriched set of mammography examinations that they personally had read in the clinic ("individualized") mixed with a set that none of them had read in the clinic ("common set"). Examinations were interpreted under three different reading paradigms: binary using screening Breast Imaging Reporting and Data System (BI-RADS), receiver-operating characteristic (ROC), and free-response ROC (FROC). The performance in discriminating between cancer and noncancer findings under each of the paradigms was summarized using Youden's index/2+0.5 (Binary), nonparameteric area under the ROC curve (AUC), and an overall FROC index (JAFROC-2). Pearson correlation coefficients were then computed to assess consistency in the ordering of observers' performance levels. Statistical significance of the computed correlation coefficients was assessed using bootstrap confidence intervals obtained by resampling sets of examination-specific observations. RESULTS All but one of the computed pair-wise correlation coefficients were larger than 0.66 and were significantly different from zero. The correlation between the overall performance measures under the Binary and ROC paradigms was the lowest (0.43) and was not significantly different from zero (95% confidence interval -0.078 to 0.733). CONCLUSION The use of different evaluation paradigms in the laboratory tends to lead to consistent ordering of the overall performance levels of observers. However, one should recognize that conceptually similar performance indexes resulting from different paradigms often measure different performance characteristics and thus disagreements are not only possible but frequently quite natural.
Collapse
|
33
|
Zheng B, Zuley ML, Sumkin JH, Catullo VJ, Abrams GS, Rathfon GY, Chough DM, Gruss MZ, Gur D. Detection of breast abnormalities using a prototype resonance electrical impedance spectroscopy system: a preliminary study. Med Phys 2008; 35:3041-8. [PMID: 18697526 DOI: 10.1118/1.2936221] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Electrical impedance spectroscopy has been investigated with but limited success as an adjunct procedure to mammography and as a possible pre-screening tool to stratify risk for having or developing breast cancer in younger women. In this study, the authors explored a new resonance frequency based [resonance electrical impedance spectroscopy (REIS)] approach to identify breasts that may have highly suspicious abnormalities that had been recommended for biopsies. The authors assembled a prototype REIS system generating multifrequency electrical sweeps ranging from 100 to 4100 kHz every 12 s. Using only two probes, one in contact with the nipple and the other with the outer breast skin surface 60 mm away, a paired transmission signal detection system is generated. The authors recruited 150 women between 30 and 50 years old to participate in this study. REIS measurements were performed on both breasts. Of these women 58 had been scheduled for a breast biopsy and 13 had been recalled for additional imaging procedures due to suspicious findings. The remaining 79 women had negative screening examinations. Eight REIS output signals at and around the resonance frequency were computed for each breast and the subtracted signals between the left and right breasts were used in a simple jackknifing method to select an optimal feature set to be inputted into a multi-feature based artificial neural network (ANN) that aims to predict whether a woman's breast had been determined as abnormal (warranting a biopsy) or not. The classification performance was evaluated using a leave-one-case-out method and receiver operating characteristics (ROC) analysis. The study shows that REIS examination is easy to perform, short in duration, and acceptable to all participants in terms of comfort level and there is no indication of sensation of an electrical current during the measurements. Six REIS difference features were selected as input signals to the ANN. The area under the ROC curve (A(z)) was 0.707 +/- 0.033 for classifying between biopsy cases and non-biopsy (including recalled and screening negative) and the performance (A(z)) increased to 0.746 +/- 0.033 after excluding recalled but negative cases. At 95% specificity, the sensitivity levels were approximately 20.5% and 30.4% in the two data sets tested. The results suggest that differences in REIS signals between two breasts measured in and around the tissue resonance frequency can be used to identify at least some of the women with suspicious abnormalities warranting biopsy with high specificity.
Collapse
|
34
|
Gur D, Bandos AI, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, Rockette HE. The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations. Radiology 2008; 249:47-53. [PMID: 18682584 DOI: 10.1148/radiol.2491072025] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
PURPOSE To compare radiologists' performance during interpretation of screening mammograms in the clinic with their performance when reading the same mammograms in a retrospective laboratory study. MATERIALS AND METHODS This study was conducted under an institutional review board-approved, HIPAA-compliant protocol; the need for informed consent was waived. Nine experienced radiologists rated an enriched set of mammograms that they had personally read in the clinic (the "reader-specific" set) mixed with an enriched "common" set of mammograms that none of the participants had previously read in the clinic by using a screening Breast Imaging Reporting and Data System (BI-RADS) rating scale. The original clinical recommendations to recall the women for a diagnostic work-up, for both reader-specific and common sets, were compared with their recommendations during the retrospective experiment. The results are presented in terms of reader-specific and group-averaged sensitivity and specificity levels and the dispersion (spread) of reader-specific performance estimates. RESULTS On average, the radiologists' performance was significantly better in the clinic than in the laboratory (P = .035). Interreader dispersion of the computed performance levels was significantly lower during the clinical interpretations (P < .01). CONCLUSION Retrospective laboratory experiments may not represent either expected performance levels or interreader variability during clinical interpretations of the same set of mammograms in the clinical environment well.
Collapse
|
35
|
Zheng B, Mello-Thoms C, Wang XH, Abrams GS, Sumkin JH, Chough DM, Ganott MA, Lu A, Gur D. Interactive computer-aided diagnosis of breast masses: computerized selection of visually similar image sets from a reference library. Acad Radiol 2007; 14:917-27. [PMID: 17659237 PMCID: PMC2043128 DOI: 10.1016/j.acra.2007.04.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2007] [Revised: 04/15/2007] [Accepted: 04/18/2007] [Indexed: 10/23/2022]
Abstract
RATIONALE AND OBJECTIVES The clinical utility of interactive computer-aided diagnosis (ICAD) systems depends on clinical relevance and visual similarity between the queried breast lesions and the ICAD-selected reference regions. The objective of this study is to develop and test a new ICAD scheme that aims improve visual similarity of ICAD-selected reference regions. MATERIALS AND METHODS A large and diverse reference library involving 3,000 regions of interests was established. For each queried breast mass lesion by the observer, the ICAD scheme segments the lesion, classifies its boundary spiculation level, and computes 14 image features representing the segmented lesion and its surrounding tissue background. A conditioned k-nearest neighbor algorithm is applied to select a set of the 25 most "similar" lesions from the reference library. After computing the mutual information between the queried lesion and each of these initially selected 25 lesions, the scheme displays the six reference lesions with the highest mutual information scores. To evaluate the automated selection process of the six "visually similar" lesions to the queried lesion, we conducted a two-alternative forced-choice observer preference study using 85 queried mass lesions. Two sets of reference lesions selected by one new automated ICAD scheme and the other previously reported scheme using a subjective rating method were randomly displayed on the left and right side of the queried lesion. Nine observers were asked to decide for each of the 85 queried lesions which one of the two reference sets was "more visually similar" to the queried lesion. RESULTS In classification of mass boundary spiculation levels, the overall agreement rate between the automated scheme and an observer is 58.8% (Kappa = 0.31). In observer preference study, the nine observers preferred on average the reference lesion sets selected by the automated scheme as being more visually similar than the set selected by the subjective rating approach in 53.2% of the queried lesions. The results were not significantly different for the two methods (P = .128). CONCLUSIONS This study suggests that using the new automated ICAD scheme, the interobserver variability related issues can thus be avoided. Furthermore, the new scheme maintains the similar performance level as the previous scheme using the subjective rating method that can select reference sets that are significantly more visually similar (P < .05) than when using traditional ICAD schemes in which the mass boundary spiculation levels are not accurately detected and quantified.
Collapse
|
36
|
Leader JK, Hakim CM, Ganott MA, Chough DM, Wallace LP, Clearfield RJ, Perrin RL, Drescher JM, Maitz GS, Sumkin JH, Gur D. A multisite telemammography system for remote management of screening mammography: an assessment of technical, operational, and clinical issues. J Digit Imaging 2007; 19:216-25. [PMID: 16710798 PMCID: PMC3045147 DOI: 10.1007/s10278-006-0585-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
OBJECTIVE This paper describes a high-quality, multisite telemammography system to enable "almost real-time" remote patient management while the patient remains in the clinic. One goal is to reduce the number of women who would physically need to return to the clinic for additional imaging procedures (termed "recall") to supplement "routine" imaging of screening mammography. MATERIALS AND METHODS Mammography films from current and prior (when available) examinations are digitized at three remote sites and transmitted along with other pertinent information across low-level communication systems to the central site. Images are automatically cropped, wavelet compressed, and encrypted prior to transmission to the central site. At the central site, radiologists review and rate examinations on a high-resolution workstation that displays the images, computer-assisted detection results, and the technologist's communication. Intersite communication is provided instantly via a messaging "chat" window. RESULTS The technologists recommended additional procedures at 2.7 times the actual clinical recall rate for the same cases. Using the telemammography system during a series of "off-line" clinically simulated studies, radiologists recommended additional procedures at 1.3 times the actual clinical recall rate. Percent agreement and kappa between the study and actual clinical interpretations were 66.1% and 0.315, respectively. For every physical recall potentially avoided using the telemammography system, approximately one presumed "unnecessary" imaging procedure was recommended. CONCLUSION Remote patient management can reduce the number of women recalled by as much as 50% without performing an unreasonable number of presumed "unnecessary" procedures.
Collapse
|
37
|
Abstract
OBJECTIVE The benefit and cost of computer-assisted detection (CAD) mammography screening remains a topic of great interest in breast imaging. Our purpose is to reflect on and interleave two articles in this issue of the AJR that highlight the difficulty in assessing the actual benefit of using CAD from either retrospective or prospective studies. CONCLUSION This commentary describes the possible benefit and some of the issues associated with the clinical use of current CAD technology while emphasizing the expectation of and need for future improvements in CAD performance.
Collapse
|
38
|
Rubinstein WS, Latimer JJ, Sumkin JH, Huerbin M, Grant SG, Vogel VG. Prospective screening study of 0.5 Tesla dedicated magnetic resonance imaging for the detection of breast cancer in young, high-risk women. BMC WOMENS HEALTH 2006; 6:10. [PMID: 16800895 PMCID: PMC1553433 DOI: 10.1186/1472-6874-6-10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Accepted: 06/26/2006] [Indexed: 11/10/2022]
Abstract
Background Evidence-based screening guidelines are needed for women under 40 with a family history of breast cancer, a BRCA1 or BRCA2 mutation, or other risk factors. An accurate assessment of breast cancer risk is required to balance the benefits and risks of surveillance, yet published studies have used narrow risk assessment schemata for enrollment. Breast density limits the sensitivity of film-screen mammography but is not thought to pose a limitation to MRI, however the utility of MRI surveillance has not been specifically examined before in women with dense breasts. Also, all MRI surveillance studies yet reported have used high strength magnets that may not be practical for dedicated imaging in many breast centers. Medium strength 0.5 Tesla MRI may provide an alternative economic option for surveillance. Methods We conducted a prospective, nonrandomized pilot study of 30 women age 25–49 years with dense breasts evaluating the addition of 0.5 Tesla MRI to conventional screening. All participants had a high quantitative breast cancer risk, defined as ≥ 3.5% over the next 5 years per the Gail or BRCAPRO models, and/or a known BRCA1 or BRCA2 germline mutation. Results The average age at enrollment was 41.4 years and the average 5-year risk was 4.8%. Twenty-two subjects had BIRADS category 1 or 2 breast MRIs (negative or probably benign), whereas no category 4 or 5 MRIs (possibly or probably malignant) were observed. Eight subjects had BIRADS 3 results, identifying lesions that were "probably benign", yet prompting further evaluation. One of these subjects was diagnosed with a stage T1aN0M0 invasive ductal carcinoma, and later determined to be a BRCA1 mutation carrier. Conclusion Using medium-strength MRI we were able to detect 1 early breast tumor that was mammographically undetectable among 30 young high-risk women with dense breasts. These results support the concept that breast MRI can enhance surveillance for young high-risk women with dense breasts, and further suggest that a medium-strength instrument is sufficient for this application. For the first time, we demonstrate the use of quantitative breast cancer risk assessment via a combination of the Gail and BRCAPRO models for enrollment in a screening trial.
Collapse
|
39
|
Sumkin JH, Gur D. Computer-aided Detection with Screening Mammography: Improving Performance or Simply Shifting the Operating Point? Radiology 2006; 239:916-7; author reply 917-8. [PMID: 16714469 DOI: 10.1148/radiol.2393051392] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
40
|
Zheng B, Lu A, Hardesty LA, Sumkin JH, Hakim CM, Ganott MA, Gur D. A method to improve visual similarity of breast masses for an interactive computer-aided diagnosis environment. Med Phys 2006; 33:111-7. [PMID: 16485416 DOI: 10.1118/1.2143139] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The purpose of this study was to develop and test a method for selecting "visually similar" regions of interest depicting breast masses from a reference library to be used in an interactive computer-aided diagnosis (CAD) environment. A reference library including 1000 malignant mass regions and 2000 benign and CAD-generated false-positive regions was established. When a suspicious mass region is identified, the scheme segments the region and searches for similar regions from the reference library using a multifeature based k-nearest neighbor (KNN) algorithm. To improve selection of reference images, we added an interactive step. All actual masses in the reference library were subjectively rated on a scale from 1 to 9 as to their "visual margins speculations". When an observer identifies a suspected mass region during a case interpretation he/she first rates the margins and the computerized search is then limited only to regions rated as having similar levels of spiculation (within +/-1 scale difference). In an observer preference study including 85 test regions, two sets of the six "similar" reference regions selected by the KNN with and without the interactive step were displayed side by side with each test region. Four radiologists and five nonclinician observers selected the more appropriate ("similar") reference set in a two alternative forced choice preference experiment. All four radiologists and five nonclinician observers preferred the sets of regions selected by the interactive method with an average frequency of 76.8% and 74.6%, respectively. The overall preference for the interactive method was highly significant (p < 0.001). The study demonstrated that a simple interactive approach that includes subjectively perceived ratings of one feature alone namely, a rating of margin "spiculation," could substantially improve the selection of "visually similar" reference images.
Collapse
|
41
|
Ganott MA, Sumkin JH, King JL, Klym AH, Catullo VJ, Cohen CS, Gur D. Screening Mammography: Do Women Prefer a Higher Recall Rate Given the Possibility of Earlier Detection of Cancer? Radiology 2006; 238:793-800. [PMID: 16505392 DOI: 10.1148/radiol.2383050852] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To prospectively survey women undergoing screening mammography to assess their attitudes toward and preference for the level of recall rates given the possibility that an increase in recall rates may result in earlier detection of cancer. MATERIALS AND METHODS This HIPAA-compliant survey was performed with an institutional review board-approved protocol. Women who arrived for their routine screening mammographic examination from November 2004 to March 2005 were informed before they consented to participate. The distribution of responses for each survey question was summarized, and proportions for the entire group and different subgroups were computed. The z score statistic was used to assess significant differences between subgroups. RESULTS Fifteen hundred seventy anonymized questionnaires were collected; 1171 (75%) were from women between 40 and 59 years of age. Of 1528 respondents, 1486 (97%) believed that a false-positive result would not deter them from continuing with regular screening, and most would have been willing to be recalled more often for either a noninvasive (86% [1308 of 1519 respondents]) or an invasive (82% [1248 of 1515 respondents]) procedure if it might increase the chance of detecting a cancer (if present) earlier. Compared with respondents undergoing their initial screening mammographic examination, women who had undergone at least one prior screening examination reported that they were more likely to continue with screening if they had received a previous false-positive result (P = .02). Women younger than 60 years and those previously recalled were more willing to be called back more often for a noninvasive or, when indicated, an invasive procedure (P < .05). CONCLUSION A substantial fraction of women in this study would have preferred the inconvenience of and anxiety associated with a higher recall rate if it resulted in the possibility of detecting breast cancer earlier.
Collapse
|
42
|
Wang XH, Good WF, Fuhrman CR, Sumkin JH, Britton CA, Golla SK. Stereo CT image compositing methods for lung nodule detection and characterization. Acad Radiol 2005; 12:1512-20. [PMID: 16321739 DOI: 10.1016/j.acra.2005.06.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2005] [Revised: 05/09/2005] [Accepted: 06/12/2005] [Indexed: 11/18/2022]
Abstract
RATIONALE AND OBJECTIVES Stereographic display has been proposed as a possible method of improving performance in reading computed tomographic (CT) examinations acquired for lung cancer screening. Optimizing such displays is important given the large volume of image data that must be evaluated for each of these examinations. This study is designed to explore certain tradeoffs between rendering methods designed for the stereo display of CT images. MATERIALS AND METHODS Stereo CT image compositing methods, including distance-weighted averaging, distance-weighted maximum intensity projection (MIP), and conventional MIP, were applied to lung CT images and compared for lung nodule detection and characterization. RESULTS Using the Jonckheere test indicated a statistically significant (P < .01) increase in contrast among the three compositing methods. Wilcoxon-Mann-Whitney test showed significant differences in contrast between distance-weighted averaging and conventional MIP (P < .01) and between averaging and distance-weighted MIP (P < .05), but not between distance-weighted MIP and conventional MIP (P > .05). Conventional MIP compositing provided the highest image contrast, but produced ambiguities in local geometric detail and texture, whereas averaging resulted in the lowest contrast, but preserved geometric detail. Distance-weighted MIP partially recovered geometric information, which was lost in images composited by means of conventional MIP. CONCLUSION Our results indicate that distance-weighted MIP may be a better choice for nodule detection in stereo lung CT images for its high local contrast and partial preservation of geometric information, whereas compositing by means of distance-weighted averaging is preferable for nodule characterization. The relative clinical value of these compositing methods needs to be evaluated further.
Collapse
|
43
|
Gur D, Wallace LP, Klym AH, Hardesty LA, Abrams GS, Shah R, Sumkin JH. Trends in Recall, Biopsy, and Positive Biopsy Rates for Screening Mammography in an Academic Practice. Radiology 2005; 235:396-401. [PMID: 15770039 DOI: 10.1148/radiol.2352040422] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To retrospectively evaluate whether recall, biopsy, and positive biopsy rates for a group of radiologists who met requirements of Mammography Quality Standards Act of 1992 (MQSA) demonstrated any change over time during a 27-month period (nine consecutive calendar quarters). MATERIALS AND METHODS Institutional review board approved study protocol, and informed consent was waived. All screening mammograms that had been interpreted by MQSA-qualified radiologists between January 1, 2001, and March 31, 2003, were reviewed. Group recall rates, biopsy rates, and detected cancer rates for nine calendar quarters were computed and attributed to performance date of original screening mammogram. Type of biopsy performed was classified as follows: stereotactic vacuum-assisted biopsy, ultrasonography (US)-guided core biopsy, US-guided fine-needle aspiration biopsy, surgical excision, and multiple biopsies. chi(2) Test for trend (two sided) and linear regression were used to assess trends over time for recall and biopsy rates, biopsy rates according to type of biopsy performed, and percentage of biopsy results positive for cancer. RESULTS Group recall rate did not show a statistically significant trend during period studied (P = .59). Biopsy rates increased significantly from 13.02 to 20.12 per 1000 screening examinations (P < .001). A corresponding substantial decrease was seen in percentage of biopsies in which malignancy was found, although this trend was not statistically significant (P = .24). A significant increase (from 4.72 to 9.88 per 1000 screening examinations) was found in rate of stereotactic vacuum-assisted 11-gauge core biopsies performed (P < .001). CONCLUSION Observed increase in biopsy rates reinforces the need to carefully select patients for biopsy to achieve efficient, efficacious, and cost-effective programs for early detection of breast cancers.
Collapse
|
44
|
Hardesty LA, Klym AH, Shindel BE, Chough DM, Sumkin JH, Gur D. Is Maximum Positive Predictive Value a Good Indicator of an Optimal Screening Mammography Practice? AJR Am J Roentgenol 2005; 184:1505-7. [PMID: 15855105 DOI: 10.2214/ajr.184.5.01841505] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVE Positive predictive value (PPV1) has been used as one important indicator of the quality of screening mammography programs. We show how the relationship between sensitivity and recall rate may affect the operating point at which optimal (maximum) PPV1 occurs. CONCLUSION Optimal (maximum) PPV1 can occur at any sensitivity level and should not be used as the sole indicator for practice optimization because it does not take into account the number of cancers that would be missed at that sensitivity.
Collapse
|
45
|
Gur D, Sumkin JH, Hardesty LA. Author reply. Cancer 2004. [DOI: 10.1002/cncr.20685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
46
|
Gur D, Stalder JS, Hardesty LA, Zheng B, Sumkin JH, Chough DM, Shindel BE, Rockette HE. Computer-aided detection performance in mammographic examination of masses: assessment. Radiology 2004; 233:418-23. [PMID: 15358846 DOI: 10.1148/radiol.2332040277] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To compare performance of two computer-aided detection (CAD) systems and an in-house scheme applied to five groups of sequentially acquired screening mammograms. MATERIALS AND METHODS Two hundred nineteen film-based mammographic examinations, classified into five groups, were included in this study. Group 1 included 58 examinations in which verified malignant masses were detected during screening; group 2, 39 in which all available latest examinations were performed prior to diagnosis of these malignant masses (subset of 39 women from group 1); group 3, 22 in which findings were interpreted as negative but were verified as cancer within 1 year from the negative interpretation (missed cancers); group 4, 50 in which findings were negative and patients were not recalled for additional procedures; and group 5, 50 in which patients were recalled for additional procedures and findings were negative for cancer. In all examinations, images were processed with two Food and Drug Administration-approved commercially available CAD systems and an in-house scheme. Performance levels in terms of true-positive detection rates and number of false-positive identifications per image and per examination were compared. RESULTS Mass detection rates in positive examinations (group 1) were 67%-72%. Detection rates among three systems were not significantly different (P > .05). In 50 negative screening examinations (group 4), false-positive rates ranged from 1.08 to 1.68 per four-view examination. Performance level differences among systems were significant for false-positive rates (P = .008). Performance of all systems was at levels lower than publicly suggested in some retrospective studies. False-positive CAD cueing rates were significantly higher for negative examinations in which patients were recalled (group 5) than they were for those in which patients were not recalled (group 4) (P < or = .002). CONCLUSION Performance of CAD systems for mass detection at mammography varies significantly, depending on examination and system used. Actual performance of all systems in clinical environment can be improved.
Collapse
|
47
|
Gur D, Sumkin JH, Hardesty LA, Clearfield RJ, Cohen CS, Ganott MA, Hakim CM, Harris KM, Poller WR, Shah R, Wallace LP, Rockette HE. Recall and detection rates in screening mammography. Cancer 2004; 100:1590-4. [PMID: 15073844 DOI: 10.1002/cncr.20053] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
BACKGROUND The authors investigated the correlation between recall and detection rates in a group of 10 radiologists who had read a high volume of screening mammograms in an academic institution. METHODS Practice-related and outcome-related databases of verified cases were used to compute recall rates and tumor detection rates for a group of 10 Mammography Quality Standard Act (MQSA)-certified radiologists who interpreted a total of 98,668 screening mammograms during the years 2000, 2001, and 2002. The relation between recall and detection rates for these individuals was investigated using parametric Pearson (r) and nonparametric Spearman (rho) correlation coefficients. The effect of the volume of mammograms interpreted by individual radiologists was assessed using partial correlations controlling for total reading volumes. RESULTS A wide variability of recall rates (range, 7.7-17.2%) and detection rates (range, 2.6-5.4 per 1000 mammograms) was observed in the current study. A statistically significant correlation (P < 0.05) between recall and detection rates was observed in this group of 10 experienced radiologists. The results remained significant (P < 0.05) after accounting for the volume of mammograms interpreted by each radiologist. CONCLUSIONS Optimal performance in screening mammography should be evaluated quantitatively. The general pressure to reduce recall rates through "practice guidelines" to below a fixed level for all radiologists should be assessed carefully.
Collapse
|
48
|
Gur D, Sumkin JH, Hardesty LA, Rockette HE. Re: Computer-Aided Detection of Breast Cancer: Has Promise Outstripped Performance? J Natl Cancer Inst 2004; 96:717-8; author reply 718. [PMID: 15126614 DOI: 10.1093/jnci/djh129] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
Gur D, Sumkin JH, Rockette HE, Ganott M, Hakim C, Hardesty L, Poller WR, Shah R, Wallace L. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst 2004; 96:185-90. [PMID: 14759985 DOI: 10.1093/jnci/djh067] [Citation(s) in RCA: 195] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Computer-aided mammography is rapidly gaining clinical acceptance, but few data demonstrate its actual benefit in the clinical environment. We assessed changes in mammography recall and cancer detection rates after the introduction of a computer-aided detection system into a clinical radiology practice in an academic setting. METHODS We used verified practice- and outcome-related databases to compute recall rates and cancer detection rates for 24 Mammography Quality Standards Act-certified academic radiologists in our practice who interpreted 115,571 screening mammograms with (n = 59,139) or without (n = 56,432) the use of a computer-aided detection system. All statistical tests were two-sided. RESULTS For the entire group of 24 radiologists, recall rates were similar for mammograms interpreted without and with computer-aided detection (11.39% versus 11.40%; percent difference = 0.09, 95% confidence interval [CI] = -11 to 11; P =.96) as were the breast cancer detection rates for mammograms interpreted without and with computer-aided detection (3.49% versus 3.55% per 1000 screening examinations; percent difference = 1.7, 95% CI = -11 to 19; P =.68). For the seven high-volume radiologists (i.e., those who interpreted more than 8000 screening mammograms each over a 3-year period), the recall rates were similar for mammograms interpreted without and with computer-aided detection (11.62% versus 11.05%; percent difference = -4.9, 95% CI = -21 to 4; P =.16), as were the breast cancer detection rates for mammograms interpreted without and with computer-aided detection (3.61% versus 3.49% per 1000 screening examinations; percent difference = -3.2, 95% CI = -15 to 9; P =.54). CONCLUSION The introduction of computer-aided detection into this practice was not associated with statistically significant changes in recall and breast cancer detection rates, both for the entire group of radiologists and for the subset of radiologists who interpreted high volumes of mammograms.
Collapse
|
50
|
Zheng B, Hardesty LA, Poller WR, Sumkin JH, Golla S. Mammography with computer-aided detection: reproducibility assessment initial experience. Radiology 2003; 228:58-62. [PMID: 12759470 DOI: 10.1148/radiol.2281020489] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To examine the performance and reproducibility of a commercially available computer-aided detection (CAD) system with a set of mammograms obtained in 100 patients who had undergone biopsy after positive findings at mammography. MATERIALS AND METHODS One hundred positive mammographic examinations (four views each), depicting 96 masses and 50 microcalcification clusters, were scanned and analyzed three times by the CAD system. Reproducibility of detection sensitivity and the individual CAD-generated cues in the three images were examined. Both abnormality- and region-based detection sensitivities were compared. RESULTS Forty-eight (96.0%) of 50 microcalcification clusters were marked on all three images in the abnormality-based analysis. Of the remaining two clusters, one was marked in two images and one was marked in only one. The abnormality-based sensitivity for mass detection ranged from 66.7% (64 of 96) to 70.8% (68 of 96). The system generated identical patterns (including images with and those without cues) for all three images in 53.3% (213 of 400) of images. For true-positive cluster regions, 88.9% (80 of 90) were marked at the same location in all images. For true-positive mass regions, 69.5% (82 of 118) were marked at the same locations in all images. In false-positive detections, only 44.0% (81 of 184) of false-positive mass regions and 31.9% (38 of 119) of false-positive cluster regions were marked at the same locations on all three images. CONCLUSION Reproducibility of marked regions generated by the CAD system is improved from that reported previously, largely as a result of the substantial reduction in the false-positive detection rates. Reproducibility of true-positive identification of masses remains an important issue that may have methodologic and clinical practice implications.
Collapse
|