1
|
Armato SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Van Beeke EJR, Yankelevitz D, Biancardi AM, Bland PH, Brown MS, Engelmann RM, Laderach GE, Max D, Pais RC, Qing DPY, Roberts RY, Smith AR, Starkey A, Batrah P, Caligiuri P, Farooqi A, Gladish GW, Jude CM, Munden RF, Petkovska I, Quint LE, Schwartz LH, Sundaram B, Dodd LE, Fenimore C, Gur D, Petrick N, Freymann J, Kirby J, Hughes B, Casteele AV, Gupte S, Sallamm M, Heath MD, Kuhn MH, Dharaiya E, Burns R, Fryd DS, Salganicoff M, Anand V, Shreter U, Vastagh S, Croft BY. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 2011; 38:915-31. [PMID: 21452728 PMCID: PMC3041807 DOI: 10.1118/1.3528204] [Citation(s) in RCA: 940] [Impact Index Per Article: 67.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2010] [Revised: 11/16/2010] [Accepted: 11/20/2010] [Indexed: 11/07/2022] Open
Abstract
PURPOSE The development of computer-aided diagnostic (CAD) methods for lung nodule detection, classification, and quantitative assessment can be facilitated through a well-characterized repository of computed tomography (CT) scans. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) completed such a database, establishing a publicly available reference for the medical imaging research community. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process. METHODS Seven academic centers and eight medical imaging companies collaborated to identify, address, and resolve challenging organizational, technical, and clinical issues to provide a solid foundation for a robust database. The LIDC/IDRI Database contains 1018 cases, each of which includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus. RESULTS The Database contains 7371 lesions marked "nodule" by at least one radiologist. 2669 of these lesions were marked "nodule > or =3 mm" by at least one radiologist, of which 928 (34.7%) received such marks from all four radiologists. These 2669 lesions include nodule outlines and subjective nodule characteristic ratings. CONCLUSIONS The LIDC/IDRI Database is expected to provide an essential medical imaging research resource to spur CAD development, validation, and dissemination in clinical practice.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
940 |
2
|
Grani G, Lamartina L, Cantisani V, Maranghi M, Lucia P, Durante C. Interobserver agreement of various thyroid imaging reporting and data systems. Endocr Connect 2018; 7:1-7. [PMID: 29196301 PMCID: PMC5744624 DOI: 10.1530/ec-17-0336] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Accepted: 11/09/2017] [Indexed: 12/29/2022]
Abstract
Ultrasonography is the best available tool for the initial work-up of thyroid nodules. Substantial interobserver variability has been documented in the recognition and reporting of some of the lesion characteristics. A number of classification systems have been developed to estimate the likelihood of malignancy: several of them have been endorsed by scientific societies, but their reproducibility is yet to be assessed. We evaluated the interobserver variability of the AACE/ACE/AME, ACR, ATA, EU-TIRADS and K-TIRADS classification systems and the interobserver concordance in the indication to FNA biopsy. Two raters independently evaluated 1055 ultrasound images of thyroid nodules identified in 265 patients at multiple time points, in two separate sets (501 and 554 images). After the first set of nodules, a joint reading was performed to reach a consensus in the feature definitions. The interobserver agreement (Krippendorff alpha) in the first set of nodules was 0.47, 0.49, 0.49, 0.61 and 0.53, for AACE/ACE/AME, ACR, ATA, EU-TIRADS and K-TIRADS systems, respectively. The agreement for the indication to biopsy was substantial to near-perfect, being 0.73, 0.61, 0.75, 0.68 and 0.82, respectively (Cohen's kappa). For all systems, agreement on the nodules of the second set increased. Despite the wide variability in the description of single ultrasonographic features, the classification systems may improve the interobserver agreement that further ameliorates after a specific training. When selecting nodules to be submitted to FNA biopsy, that is main purpose of these classifications, the interobserver agreement is substantial to almost perfect.
Collapse
|
research-article |
7 |
133 |
3
|
Interobserver Variability of Sonographic Features Used in the American College of Radiology Thyroid Imaging Reporting and Data System. AJR Am J Roentgenol 2018; 211:162-167. [PMID: 29702015 DOI: 10.2214/ajr.17.19192] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
OBJECTIVE The purpose of this study was to assess interobserver variability in assigning features in the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) lexicon and in making recommendations for thyroid nodule biopsy. MATERIALS AND METHODS The study cohort comprised 100 nodules in 92 patients who underwent fine-needle aspiration with definitive cytologic results (Bethesda category II or VI) or diagnostic lobectomy between April 2009 and May 2010. Eight board-certified radiologists evaluated the nodules according to the five feature categories that constitute ACR TI-RADS and gave a biopsy recommendation based on their own practice. Variability in feature assignment and biopsy recommendation was assessed with the Fleiss kappa statistic. RESULTS Agreement in interpretation was fair to moderate for all features except shape (κ = 0.61) and macrocalcifications (κ = 0.73), which had substantial agreement. The features with the poorest agreement were margin and other types of echogenic foci, which had kappa values ranging from 0.25 to 0.39, indicating fair agreement. Interobserver agreement regarding biopsy recommendation was fair (κ = 0.22) based on radiologists' current practice. Applying ACR TI-RADS resulted in moderate agreement (κ = 0.51). CONCLUSION Variability in interpreting thyroid nodule sonographic features was highest for margin and all types of echogenic foci, except for macrocalcifications. Because radiologists' interpretations of these features change the level of suspicion of thyroid malignancy, the results of this study suggest a need for further education. Despite the variability in assigning features, adoption of ACR TI-RADS improves agreement for recommending biopsy.
Collapse
|
Journal Article |
7 |
120 |
4
|
Hoetzenecker K, Benazzo A, Stork T, Sinn K, Schwarz S, Schweiger T, Klepetko W. Bilateral lung transplantation on intraoperative extracorporeal membrane oxygenator: An observational study. J Thorac Cardiovasc Surg 2019; 160:320-327.e1. [PMID: 31932054 DOI: 10.1016/j.jtcvs.2019.10.155] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 10/05/2019] [Accepted: 10/07/2019] [Indexed: 02/06/2023]
Abstract
OBJECTIVE Intraoperative extracorporeal membrane oxygenation (ECMO) is usually reserved to support patients during complex lung transplantation. We hypothesized that a routine application of intraoperative ECMO in all patients improves primary graft function. METHODS Patients receiving a bilateral lung transplantation between November 2016 and July 2018 at the Medical University of Vienna were included in this prospective, single-center observational study. All transplantations were uniformly performed on central venoarterial ECMO support, with the possibility to extend ECMO into the early postoperative period whenever graft function did not meet established quality criteria at the end of implantation. Primary graft dysfunction (PGD) grades were evaluated at 24, 48, and 72 hours after transplantation. Perioperative complications and survival outcome were assessed. RESULTS A total of 159 patients were included in the study. At 24 hours post-transplantation, 38.4% (n = 61) of patients were already extubated, 48.4% (n = 77) were classified as PGD0, 4.4% (n = 7) as PGD1, 3.1% (n = 5) as PGD2, 2.5% (n = 4) as PGD3, and 3.1% (n = 5) were "ungradable" due to prophylactic postoperative prolongation of ECMO. At 72 hours after transplantation, 76.7% (n = 122) of the patients were extubated, as opposed to only 1.3% (n = 2) of patients classified as PGD3. The median time of mechanical ventilation was 29 hours (interquartile range, 17-58). The 90-day-mortality was 3.1%, and 2-year survival was 86%. CONCLUSIONS Routine use of intraoperative ECMO resulted in excellent primary graft function and mid-term outcome in patients undergoing lung transplantation. To the best of our knowledge, the herein measured PGD rates are the lowest reported in the literature to date. Our results advocate a routine intraoperative use of ECMO in bilateral lung transplantation.
Collapse
|
Observational Study |
6 |
98 |
5
|
Segedin B, Petric P. Uncertainties in target volume delineation in radiotherapy - are they relevant and what can we do about them? Radiol Oncol 2016; 50:254-62. [PMID: 27679540 PMCID: PMC5024655 DOI: 10.1515/raon-2016-0023] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 02/01/2016] [Indexed: 02/03/2023] Open
Abstract
Background Modern radiotherapy techniques enable delivery of high doses to the target volume without escalating dose to organs at risk, offering the possibility of better local control while preserving good quality of life. Uncertainties in target volume delineation have been demonstrated for most tumour sites, and various studies indicate that inconsistencies in target volume delineation may be larger than errors in all other steps of the treatment planning and delivery process. The aim of this paper is to summarize the degree of delineation uncertainties for different tumour sites reported in the literature and review the effect of strategies to minimize them. Conclusions Our review confirmed that interobserver variability in target volume contouring represents the largest uncertainty in the process for most tumour sites, potentially resulting in a systematic error in dose delivery, which could influence local control in individual patients. For most tumour sites the optimal combination of imaging modalities for target delineation still needs to be determined. Strict use of delineation guidelines and protocols is advisable both in every day clinical practice and in clinical studies to diminish interobserver variability. Continuing medical education of radiation oncologists cannot be overemphasized, intensive formal training on interpretation of sectional imaging should be included in the program for radiation oncology residents.
Collapse
|
Journal Article |
9 |
90 |
6
|
Kweldam CF, Nieboer D, Algaba F, Amin MB, Berney DM, Billis A, Bostwick DG, Bubendorf L, Cheng L, Compérat E, Delahunt B, Egevad L, Evans AJ, Hansel DE, Humphrey PA, Kristiansen G, van der Kwast TH, Magi-Galluzzi C, Montironi R, Netto GJ, Samaratunga H, Srigley JR, Tan PH, Varma M, Zhou M, van Leenders GJLH. Gleason grade 4 prostate adenocarcinoma patterns: an interobserver agreement study among genitourinary pathologists. Histopathology 2016; 69:441-9. [PMID: 27028587 DOI: 10.1111/his.12976] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 03/27/2016] [Indexed: 01/02/2023]
Abstract
AIMS To assess the interobserver reproducibility of individual Gleason grade 4 growth patterns. METHODS AND RESULTS Twenty-three genitourinary pathologists participated in the evaluation of 60 selected high-magnification photographs. The selection included 10 cases of Gleason grade 3, 40 of Gleason grade 4 (10 per growth pattern), and 10 of Gleason grade 5. Participants were asked to select a single predominant Gleason grade per case (3, 4, or 5), and to indicate the predominant Gleason grade 4 growth pattern, if present. 'Consensus' was defined as at least 80% agreement, and 'favoured' as 60-80% agreement. Consensus on Gleason grading was reached in 47 of 60 (78%) cases, 35 of which were assigned to grade 4. In the 13 non-consensus cases, ill-formed (6/13, 46%) and fused (7/13, 54%) patterns were involved in the disagreement. Among the 20 cases where at least one pathologist assigned the ill-formed growth pattern, none (0%, 0/20) reached consensus. Consensus for fused, cribriform and glomeruloid glands was reached in 2%, 23% and 38% of cases, respectively. In nine of 35 (26%) consensus Gleason grade 4 cases, participants disagreed on the growth pattern. Six of these were characterized by large epithelial proliferations with delicate intervening fibrovascular cores, which were alternatively given the designation fused or cribriform growth pattern ('complex fused'). CONCLUSIONS Consensus on Gleason grade 4 growth pattern was predominantly reached on cribriform and glomeruloid patterns, but rarely on ill-formed and fused glands. The complex fused glands seem to constitute a borderline pattern of unknown prognostic significance on which a consensus could not be reached.
Collapse
|
Journal Article |
9 |
81 |
7
|
van Ooij P, Powell AL, Potters WV, Carr JC, Markl M, Barker AJ. Reproducibility and interobserver variability of systolic blood flow velocity and 3D wall shear stress derived from 4D flow MRI in the healthy aorta. J Magn Reson Imaging 2015; 43:236-48. [PMID: 26140480 DOI: 10.1002/jmri.24959] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 05/13/2015] [Indexed: 11/08/2022] Open
Abstract
PURPOSE To investigate the reproducibility and interobserver variability of 3D aortic velocity vector fields and wall shear stress (WSS) averaged over five systolic timeframes derived from noncontrast 4D flow magnetic resonance imaging (MRI). MATERIALS AND METHODS Fourteen controls underwent test-retest 4D flow MRI examinations separated by 16 ± 3 days (resolution = 3.0-3.6 × 2.3-2.6 × 2.5-2.7 mm(3) ; TE/TR/FA = 2.5/4.9 msec/7°; Venc = 150 cm/s). Two observers segmented the aorta, and WSS was calculated for both series of scans and both segmentations. Test-retest and interobserver velocity and WSS vectors were compared on a voxel-by-voxel basis in the aorta and on a regional basis by subdividing the aortas in six segments. RESULTS Test-retest: voxel-by-voxel Bland-Altman analysis revealed small differences (-0.03/-0.02 m/s/Pa), limits of agreement (LOA) of 0.25 m/s/0.29 Pa, and coefficients of variation (CV) of 20% for velocity/WSS. Voxel-by-voxel orthogonal regression analysis showed moderate agreement (slope: 1.14/1.16, intraclass correlation coefficient [ICC]: 0.76/0.67 for velocity/WSS). The regional analysis revealed a CV of 9%/8% and ICC of 0.9/0.9 for velocity/WSS. Interobserver: voxel-by-voxel difference for WSS was 0, LOA: 0.17/0.19 Pa, CV: 12/13%, slope: 1.01/1.09, ICC: 0.87/0.85 for test/retest. The CV/ICC for WSS in the regional analysis was 4%/1.0 for test and 3%/1.0 for retest. CONCLUSION Systolic velocity and WSS derived from 4D flow MRI are reproducible between consecutive visits, with low interobserver variability in healthy volunteers.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
79 |
8
|
Leung SCY, Nielsen TO, Zabaglo LA, Arun I, Badve SS, Bane AL, Bartlett JMS, Borgquist S, Chang MC, Dodson A, Ehinger A, Fineberg S, Focke CM, Gao D, Gown AM, Gutierrez C, Hugh JC, Kos Z, Laenkholm AV, Mastropasqua MG, Moriya T, Nofech-Mozes S, Osborne CK, Penault-Llorca FM, Piper T, Sakatani T, Salgado R, Starczynski J, Sugie T, van der Vegt B, Viale G, Hayes DF, McShane LM, Dowsett M. Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration. Histopathology 2019; 75:225-235. [PMID: 31017314 DOI: 10.1111/his.13880] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 04/19/2019] [Indexed: 01/12/2023]
Abstract
AIMS The nuclear proliferation marker Ki67 assayed by immunohistochemistry has multiple potential uses in breast cancer, but an unacceptable level of interlaboratory variability has hampered its clinical utility. The International Ki67 in Breast Cancer Working Group has undertaken a systematic programme to determine whether Ki67 measurement can be analytically validated and standardised among laboratories. This study addresses whether acceptable scoring reproducibility can be achieved on excision whole sections. METHODS AND RESULTS Adjacent sections from 30 primary ER+ breast cancers were centrally stained for Ki67 and sections were circulated among 23 pathologists in 12 countries. All pathologists scored Ki67 by two methods: (i) global: four fields of 100 tumour cells each were selected to reflect observed heterogeneity in nuclear staining; (ii) hot-spot: the field with highest apparent Ki67 index was selected and up to 500 cells scored. The intraclass correlation coefficient (ICC) for the global method [confidence interval (CI) = 0.87; 95% CI = 0.799-0.93] marginally met the prespecified success criterion (lower 95% CI ≥ 0.8), while the ICC for the hot-spot method (0.83; 95% CI = 0.74-0.90) did not. Visually, interobserver concordance in location of selected hot-spots varies between cases. The median times for scoring were 9 and 6 min for global and hot-spot methods, respectively. CONCLUSIONS The global scoring method demonstrates adequate reproducibility to warrant next steps towards evaluation for technical and clinical validity in appropriate cohorts of cases. The time taken for scoring by either method is practical using counting software we are making publicly available. Establishment of external quality assessment schemes is likely to improve the reproducibility between laboratories further.
Collapse
|
Validation Study |
6 |
71 |
9
|
Kini AS, Vengrenyuk Y, Yoshimura T, Matsumura M, Pena J, Baber U, Moreno P, Mehran R, Maehara A, Sharma S, Narula J. Fibrous Cap Thickness by Optical Coherence Tomography In Vivo. J Am Coll Cardiol 2016; 69:644-657. [PMID: 27989887 DOI: 10.1016/j.jacc.2016.10.028] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Revised: 10/24/2016] [Accepted: 10/24/2016] [Indexed: 11/15/2022]
Abstract
BACKGROUND Optical coherence tomography (OCT) imaging is considered to be the only imaging modality with sufficient resolution to measure fibrous cap thickness (FCT) in vivo. However, reproducibility of the measurements in vivo has been unsatisfactory. OBJECTIVES The authors aimed to investigate whether satisfactory reproducibility of FCT measurements by OCT in vivo can be achieved between independent observers. METHODS One hundred seventy OCT pullbacks were analyzed by 2 independent observers with intravascular imaging expertise in accordance with current guidelines to assess the interobserver variability of FCT measurement by intraclass correlation coefficient (ICC). The main sources of the variability were analyzed and incorporated in lesion assessment criteria. The same 170 OCT pullbacks were reanalyzed by the same observers using the developed criteria, and the interobserver reproducibility of the measurements was reassessed. On the basis of the developed criteria, a third independent observer interpreted all 170 OCT images. Assessment of the maximal lipid arc was also undertaken similarly before and after the development of interpretation criteria. RESULTS The original ICC of the FC thickness was 0.56 (95% confidence interval [CI]: 0.38 to 0.69). The poor definition of necrotic core facing border of FC and the neointimal presence of macrophages and calcification contributed to the high interobserver variability of FCT measurement. The ICC of FCT measurements by OCT in vivo was 0.88 (95% CI: 0.80 to 0.93) after we developed lesion assessment criteria. The ICC for the maximal lipid arc assessment before and after was 0.76 and 0.82 respectively. The third independent observer was extensively coached and returned the ICC of 0.82 (95% CI: 0.74 to 0.87) with observer 1 and 0.90 (95% CI: 0.86 to 0.94) with observer 2. CONCLUSIONS Careful consideration of OCT features mimicking fibroatheroma lesions and imaging artifacts contributed to significantly higher levels of interobserver agreement. Interobserver variation can be partially resolved by development of standard interpretation algorithms.
Collapse
|
Journal Article |
9 |
56 |
10
|
Girometti R, Giannarini G, Greco F, Isola M, Cereser L, Como G, Sioletic S, Pizzolitto S, Crestani A, Ficarra V, Zuiani C. Interreader agreement of PI-RADS v. 2 in assessing prostate cancer with multiparametric MRI: A study using whole-mount histology as the standard of reference. J Magn Reson Imaging 2018; 49:546-555. [PMID: 30187600 DOI: 10.1002/jmri.26220] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 05/23/2018] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Most studies assessing interreader agreement of Prostate Imaging Reporting and Data System v. 2 (PI-RADS v2) have used biopsy as the standard of reference, thus carrying the risk of not definitively noting all existent cancers. PURPOSE To evaluate the interreader agreement in assessing prostate cancer (PCa) of PI-RADS v2, using whole-mount histology as the standard of reference. STUDY TYPE Monocentric prospective cohort study. POPULATION In all, 48 patients with biopsy-proven PCa referred for radical prostatectomy, undergoing staging multiparametric magnetic resonance imaging (mpMRI) between May 2016 to February 2017. FIELD STRENGTH/SEQUENCE 3.0T system using high-resolution T2 -weighted imaging, diffusion-weighted imaging (echo-planar imaging with maximum b-value 2000 sec/mm2 ), and dynamic contrast-enhanced imaging (T1 -weighted high resolution isotropic volume examination; THRIVE) ASSESSMENT: Three radiologists blinded to final histology (2-8 years of experience) analyzed mpMRI images independently, scoring imaging findings in accordance with PI-RADS v2. On a per-lesion basis, we calculated overall and pairwise interreader agreement in assigning PI-RADS categories, as well as assessing malignancy with categories ≥3 or ≥4, and stage ≥pT3. STATISTICAL TESTS Cohen's kappa analysis of agreement. RESULTS On 71 lesions found on histology, there was moderate agreement in assigning PI-RADS categories to all cancers (k = 0.53) and clinically significant cancers (csPCa) (k = 0.47). Assessing csPCa with PI-RADS ≥4 cutoff provided higher agreement than PI-RADS ≥3 cutoff (k = 0.63 vs. 0.57). Interreader agreement was higher between more experienced readers, with the most experienced one achieving the highest cancer detection rate (0.73 for csPCa using category ≥4). There was substantial agreement in assessing stage ≥pT3 (k = 0.72). DATA CONCLUSION We found moderate to substantial agreement in assigning the PI-RADS v2 categories and assessing the spectrum of cancers found on whole-mount histology, with category 4 as the most reproducible cutoff for csPCa. Readers' experience influenced interreader agreement and cancer detection rate. LEVEL OF EVIDENCE 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2019;49:546-555.
Collapse
|
Journal Article |
7 |
50 |
11
|
Lee HJ, Yoon DY, Seo YL, Kim JH, Baek S, Lim KJ, Cho YK, Yun EJ. Intraobserver and Interobserver Variability in Ultrasound Measurements of Thyroid Nodules. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2018; 37:173-178. [PMID: 28736947 DOI: 10.1002/jum.14316] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 04/01/2017] [Accepted: 04/03/2017] [Indexed: 06/07/2023]
Abstract
OBJECTIVES The purpose of this study was to assess the intraobserver and interobserver variability in ultrasound (US) measurements of thyroid nodules. METHODS We performed a prospective study of the US examinations of 73 patients with 122 thyroid nodules greater than 5 mm in size. Ultrasound measurements in 4 dimensions (anteroposterior, transverse, longitudinal, and maximum diameters) and measurement of the estimated volume (using the ellipsoid formula) of each thyroid nodule were performed twice by 2 independent radiologists (A and B, with 10 years and 6 months of experience, respectively). The intraobserver and interobserver variability in measurements of thyroid nodules was assessed by a Bland-Altman analysis of agreement. The absolute values for intraobserver and interobserver variability were compared by a paired t test. RESULTS The 95% intraobserver and interobserver limits of agreement for the anteroposterior, transverse, longitudinal, and maximum diameters and estimated volume of thyroid nodules were ±18.2%, ± 14.3%, and ±21.0%; ± 17.2%%, ± 17.3%, and 18.2%; ± 14.6%, ± 15.5%, and ±22.3%; ± 13.8%, ± 15.5%, and ±19.6%; and ±30.2%, ± 27.7%, and ±44.1%, respectively. The absolute values for intraobserver variability were lower than those for interobserver variability for all measurements. CONCLUSIONS There was considerable intraobserver and interobserver variability in US measurement of thyroid nodules, which must be taken into account during follow-up US examinations of patients with thyroid nodules.
Collapse
|
|
7 |
45 |
12
|
Ma C, Liu L, Li J, Wang L, Chen LG, Zhang Y, Chen SY, Lu JP. Apparent diffusion coefficient (ADC) measurements in pancreatic adenocarcinoma: A preliminary study of the effect of region of interest on ADC values and interobserver variability. J Magn Reson Imaging 2015; 43:407-13. [PMID: 26182908 DOI: 10.1002/jmri.25007] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 06/30/2015] [Accepted: 06/30/2015] [Indexed: 12/13/2022] Open
Abstract
PURPOSE To assess the influence of region of interest (ROI) on tumor apparent diffusion coefficient (ADC) measurements and interobserver variability in pancreatic ductal adenocarcinoma (PDAC). MATERIALS AND METHODS Twenty-two patients recruited with pathology-proven PDAC underwent diffusion-weighted imaging (DWI, 3.0T) prior to the surgical resection. Two independent readers measured tumor ADCs according to three ROI methods: whole-volume, single-slice, and small solid sample of tumor. Minimum and mean ADCs were obtained. The interobserver variability for each of the three methods was analyzed using interclass correlation coefficient (ICC) and Bland-Altman analysis. The minimum and mean ADCs among the ROI methods were compared using nonparametric tests. RESULTS The single-slice ROI method showed the best reproducibility in the minimum ADC measurements (mean difference ± limits of agreement between two readers were 0.025 ± 0.25 × 10(-3) mm2 /s; ICC, 0.92) among the three ROI methods. For the solid tumor sample ROI, both minimum ADC and mean ADC measurements reproducibility were the worst, with limits of agreement up to ±0.50 × 10(-3) mm2 /s and ±0.32 × 10(-3) mm2 /s, respectively (ICCs, 0.41/0.58). Both the minimum and mean ADCs demonstrated significant differences among the three ROI methods (both P < 0.001). The post-hoc analyses results showed no significant difference with regard to the mean ADCs between whole-volume and single-slice ROI methods (P = 0.14). CONCLUSION The ROI method had a considerable influence on both the minimum and mean ADC values and the interobserver variability in PDAC. The worst interobserver variability was observed for both the minimum and mean ADCs derived from small solid-sample ROI.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
42 |
13
|
Solass W, Sempoux C, Carr NJ, Bibeau F, Neureiter D, Jäger T, Di Caterino T, Brunel C, Klieser E, Fristrup CW, Mortensen MB, Detlefsen S. Reproducibility of the peritoneal regression grading score for assessment of response to therapy in peritoneal metastasis. Histopathology 2019; 74:1014-1024. [PMID: 30687944 DOI: 10.1111/his.13829] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Accepted: 01/23/2019] [Indexed: 12/11/2022]
Abstract
AIMS The four-tiered peritoneal regression grading score (PRGS) assesses the response to chemotherapy in peritoneal metastasis (PM). The PRGS is used, for example, to assess the response to pressurised intraperitoneal aerosol chemotherapy (PIPAC). However, the reproducibility of the PRGS is currently unknown. We aimed to evaluate the inter- and intraobserver variability of the PRGS. METHODS AND RESULTS Thirty-three patients who underwent at least three PIPAC treatments as part of the PIPAC-OPC1 or PIPAC-OPC2 clinical trials at Odense University Hospital, Denmark, were included. Prior to each therapy cycle, peritoneal quadrant biopsies were obtained and three haematoxylin and eosin (H&E)-stained step sections were scanned and uploaded to a pseudonymised web library. For determining interobserver variability, eight pathologists assessed the PRGS for each quadrant biopsy, and Krippendorff's alpha and intraclass correlation coefficients (ICCs) were calculated. For determining intraobserver variability, three pathologists repeated their own assessments and Cohen's kappa and ICCs were calculated. A total of 331 peritoneal biopsies were analysed. Interobserver variability for PRGS of each biopsy and for the mean and maximum PRGS per biopsy set was moderate to good/substantial. The intraobserver variability for PRGS of each biopsy and for the mean and maximum PRGS per biopsy set was good to excellent/almost perfect. CONCLUSIONS Our data support the PRGS as a reproducible and useful tool to assess response to intraperitoneal chemotherapy in PM. Future studies should evaluate the prognostic and predictive role of the PRGS.
Collapse
|
Observational Study |
6 |
41 |
14
|
den Hartogh MD, Philippens MEP, van Dam IE, Kleynen CE, Tersteeg RJHA, Pijnappel RM, Kotte ANTJ, Verkooijen HM, van den Bosch MAAJ, van Vulpen M, van Asselen B, van den Bongard HJGD. MRI and CT imaging for preoperative target volume delineation in breast-conserving therapy. Radiat Oncol 2014; 9:63. [PMID: 24571783 PMCID: PMC3942765 DOI: 10.1186/1748-717x-9-63] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Accepted: 02/14/2014] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Accurate tumor bed delineation after breast-conserving surgery is important. However, consistency among observers on standard postoperative radiotherapy planning CT is low and volumes can be large due to seroma formation. A preoperative delineation of the tumor might be more consistent. Therefore, the purpose of this study was to determine the consistency of preoperative target volume delineation on CT and MRI for breast-conserving radiotherapy. METHODS Tumors were delineated on preoperative contrast-enhanced (CE) CT and newly developed 3D CE-MR images, by four breast radiation oncologists. Clinical target volumes (CTVs) were created by addition of a 1.5 cm margin around the tumor, excluding skin and chest wall. Consistency in target volume delineation was expressed by the interobserver variability. Therefore, the conformity index (CI), center of mass distance (dCOM) and volumes were calculated. Tumor characteristics on CT and MRI were scored by an experienced breast radiologist. RESULTS Preoperative tumor delineation resulted in a high interobserver agreement with a high median CI for the CTV, for both CT (0.80) and MRI (0.84). The tumor was missed on CT in 2/14 patients (14%). Leaving these 2 patients out of the analysis, CI was higher on MRI compared to CT for the GTV (p<0.001) while not for the CTV (CT (0.82) versus MRI (0.84), p=0.123). The dCOM did not differ between CT and MRI. The median CTV was 48 cm3 (range 28-137 cm3) on CT and 59 cm3 (range 30-153 cm3) on MRI (p<0.001). Tumor shapes and margins were rated as more irregular and spiculated on CE-MRI. CONCLUSIONS This study showed that preoperative target volume delineation resulted in small target volumes with a high consistency among observers. MRI appeared to be necessary for tumor detection and the visualization of irregularities and spiculations. Regarding the tumor delineation itself, no clinically relevant differences in interobserver variability were observed. These results will be used to study the potential for future MRI-guided and neoadjuvant radiotherapy. TRIAL REGISTRATION International Clinical Trials Registry Platform NTR3198.
Collapse
|
Clinical Trial |
11 |
41 |
15
|
Krajewski KM, Nishino M, Franchetti Y, Ramaiya NH, Van den Abbeele AD, Choueiri TK. Intraobserver and interobserver variability in computed tomography size and attenuation measurements in patients with renal cell carcinoma receiving antiangiogenic therapy: implications for alternative response criteria. Cancer 2013; 120:711-21. [PMID: 24264883 DOI: 10.1002/cncr.28493] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 10/23/2013] [Accepted: 10/28/2013] [Indexed: 01/31/2023]
Abstract
BACKGROUND Alternative response criteria have been proposed in patients with metastatic renal cell carcinoma (mRCC) who are receiving vascular endothelial growth factor (VEGF)-targeted therapy, including 10% tumor shrinkage as an indicator of response/outcome. However, to the authors' knowledge, intraobserver and interobserver measurement variability have not been defined in this setting. The objective of the current study was to determine intraobserver and interobserver agreement of computed tomography (CT) size and attenuation measurements to establish reproducible response indicators. METHODS Seventy-one patients with mRCC with 179 target lesions were enrolled in phase 2 and phase 3 trials of VEGF-targeted therapies and retrospectively studied with Institutional Review Board approval. Two radiologists independently measured the long axis diameter and mean attenuation of target lesions at baseline and on follow-up CT. Concordance correlation coefficients and Bland-Altman plots were used to assess intraobserver and interobserver agreement. RESULTS High concordance correlation coefficients (range, 0.8602-0.9984) were observed in all types of measurements. The 95% limits of agreement for the percentage change of the sum longest diameter was -7.30% to 7.86% for intraobserver variability, indicating that 10% tumor shrinkage represents a true change in tumor size when measured by a single observer. The 95% limits of interobserver variability were -16.3% to 15.4%. On multivariate analysis, the location of the lesion was found to significantly contribute to interobserver variability (P = .048). The 95% limits of intraobserver agreement for the percentage change in CT attenuation were -18.34% to 16.7%. CONCLUSIONS In patients with mRCC who are treated with VEGF inhibitors, 10% tumor shrinkage is a reproducible radiologic response indicator when baseline and follow-up studies are measured by a single radiologist. Lesion location contributes significantly to measurement variability and should be considered when selecting target lesions.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
38 |
16
|
Ordi J, Bergeron C, Hardisson D, McCluggage WG, Hollema H, Felix A, Soslow RA, Oliva E, Tavassoli FA, Alvarado-Cabrero I, Wells M, Nogales FF. Reproducibility of current classifications of endometrial endometrioid glandular proliferations: further evidence supporting a simplified classification. Histopathology 2013; 64:284-92. [PMID: 24111732 DOI: 10.1111/his.12249] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 08/07/2013] [Indexed: 11/29/2022]
Abstract
AIMS To compare the reproducibility of the current (2003) World Health Organization (WHO), endometrial intraepithelial neoplasia (EIN) and European Working Group (EWG) classifications of endometrial endometrioid proliferations. METHODS AND RESULTS Nine expert gynaecological pathologists from Europe and North America reviewed 198 endometrial biopsy/curettage specimens originally diagnosed as low-grade lesions. All observers were asked to classify the cases by using the categories described in each scheme: six for WHO, four for EIN, and three for EWG. The results were evaluated by kappa statistics for more than two observations. The analysis was repeated using only two major categories (benign versus atypical/carcinoma). Both the WHO and EIN classifications showed poor interobserver agreement (κ = 0.337 and κ = 0.419, respectively), whereas the EWG classification showed moderate agreement (κ = 0.530). Full agreement between pathologists occurred in only 28% for the WHO classification, 39% for the EIN classification, and 59% for the EWG classification. With only two diagnostic categories, kappa values increased in all classifications, but only the EWG classification reached a substantial level of agreement (κ = 0.621); similarly, full agreement among all pathologists increased to 70% for the WHO classification, 69% for the EIN classification, and 72% for the EWG classification. CONCLUSIONS A two-tier classification of endometrial endometrioid proliferative lesions improves reproducibility, and should be considered for the diagnosis of endometrial biopsy/curettage specimens.
Collapse
|
Journal Article |
12 |
34 |
17
|
Tammaa A, Fritzer N, Lozano P, Krell A, Salzer H, Salama M, Hudelist G. Interobserver agreement and accuracy of non-invasive diagnosis of endometriosis by transvaginal sonography. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2015; 46:737-740. [PMID: 25766661 DOI: 10.1002/uog.14843] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 03/02/2015] [Accepted: 03/07/2015] [Indexed: 06/04/2023]
Abstract
OBJECTIVES To evaluate interobserver agreement and accuracy of transvaginal sonography (TVS) in diagnosing deep infiltrating endometriosis (DIE) and endometriomas. METHODS A total of 67 consecutive patients referred to a pelvic pain clinic and scheduled for laparoscopy were enrolled in the study between January 2013 and January 2014. Patients were independently examined prospectively by two experienced sonographers (Observers A and B) who were blinded to the other's results. For the two observers, Gwet's first-order agreement coefficient (Gwet's AC1) was used to calculate interobserver agreement and diagnostic accuracy, as well as sensitivity, specificity, positive (PPV) and negative (NPV) predictive values using TVS, as compared to laparoscopy, for diagnosing DIE and endometriomas. RESULTS Among the 67 patients enrolled, 65 were analyzed. For the diagnosis of DIE and endometriomas by TVS, the level of agreement (Gwet's AC1) between Observers A and B and sensitivity/specificity values for the respective Observers were, by site: vagina (Gwet's AC1, 0.933; 62%/94% and 82%/94%), bladder (Gwet's AC1, 1.00; 67%/97% and 67%/97%), uterosacral ligaments (Gwet's AC1, 0.84; 73%/83% and 53%/90%), adnexa (Gwet's AC1, 0.95; 71%/93% and 71%/93%), rectovaginal septum (Gwet's AC1, 0.95; 40%/90% and 33%/87%) and rectosigmoid (Gwet's AC1, 0.98; 93%/96% and 94%/98%) which reflected high interobserver agreement. With the exception of sensitivity of diagnosis of DIE affecting the RVS, similar results were observed when TVS was compared with laparoscopy. CONCLUSIONS TVS is a highly accurate and reproducible method for non-invasive diagnosis of DIE by well-trained professionals.
Collapse
|
Evaluation Study |
10 |
31 |
18
|
Bowman ZS, Eller AG, Kennedy AM, Richards DS, Winter TC, Woodward PJ, Silver RM. Interobserver variability of sonography for prediction of placenta accreta. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2014; 33:2153-2158. [PMID: 25425372 DOI: 10.7863/ultra.33.12.2153] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
OBJECTIVES The sensitivity of sonography to predict accreta has been reported as higher than 90%. However, most studies are from single expert investigators. Our objective was to analyze interobserver variability of sonography for prediction of placenta accreta. METHODS Patients with previa with and without accreta were ascertained, and images with placental views were collected, deidentified, and placed in random sequence. Three radiologists and 3 maternal-fetal medicine specialists interpreted each study for the presence of accreta and specific findings reported to be associated with its diagnosis. Investigator-specific sensitivity, specificity, and accuracy were calculated. κ statistics were used to assess variability between individuals and types of investigators. RESULTS A total of 229 sonographic studies from 55 patients with accreta and 56 control patients were examined. Accuracy ranged from 55.9% to 76.4%. Of imaging studies yielding diagnoses, sensitivity ranged from 53.4% to 74.4%, and specificity ranged from 70.8% to 94.8%. Overall interobserver agreement was moderate (mean κ ± SD = 0.47 ± 0.12). κ values between pairs of investigators ranged from 0.32 (fair agreement) to 0.73 (substantial agreement). Average individual agreement ranged from fair (κ = 0.35) to moderate (κ = 0.53). CONCLUSIONS Blinded from clinical data, sonography has significant interobserver variability for the diagnosis of placenta accreta.
Collapse
|
Controlled Clinical Trial |
11 |
30 |
19
|
Finazzo F, D'antonio F, Masselli G, Forlani F, Palacios-Jaraquemada J, Minneci G, Gambarini S, Timor-Tritsch I, Prefumo F, Buca D, Liberati M, Khalil A, Cali G. Interobserver agreement in MRI assessment of severity of placenta accreta spectrum disorders. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2020; 55:467-473. [PMID: 31237043 DOI: 10.1002/uog.20381] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 06/03/2019] [Accepted: 06/12/2019] [Indexed: 06/09/2023]
Abstract
OBJECTIVE To evaluate the level of agreement in the prenatal magnetic resonance imaging (MRI) assessment of the presence and severity of placenta accreta spectrum (PAS) disorders between examiners with expertise in the diagnosis and management of these conditions. METHODS This was a secondary analysis of a prospective study including women with placenta previa or low-lying placenta and at least one prior Cesarean delivery or uterine surgery, who underwent MRI assessment at a regional referral center for PAS disorders in Italy, between 2007 and 2017. The MRI scans were retrieved from the hospital electronic database and assessed by four examiners, who are considered to be experts in the diagnosis and surgical management of PAS disorders. The examiners were blinded to the ultrasound diagnosis, histopathological findings and clinical data of the patients. Each examiner was asked to assess 20 features on the MRI scans, including the presence, depth and topography of placental invasion. Depth of invasion was defined as the degree of adhesion and invasion of the placenta into the myometrium and uterine serosa (placenta accreta, increta or percreta) and the histopathological examination of the removed uterus was considered the reference standard. Topography of the placental invasion was defined as the site of placental invasion within the uterus in relation to the posterior bladder wall (posterior upper bladder wall and uterine body, posterior lower bladder wall and lower uterine segment and cervix or no visible bladder invasion) and the site of invasion at surgery was considered the reference standard. The degree of interrater agreement (IRA) was evaluated by calculating both the percentage of observed agreement among raters and the Fleiss kappa (κ) value. RESULTS Forty-six women were included in the study. The median gestational age at MRI was 33.8 (interquartile range, 33.1-34.0) weeks. A final diagnosis of placenta accreta, increta and percreta was made in 15.2%, 17.4% and 50.0% patients, respectively. There was excellent agreement between the four examiners in the assessment of the overall presence of a PAS disorder (IRA, 92.1% (95% CI, 86.8-94.0%); κ, 0.90 (95% CI, 0.89-1.00)). However, there was significant heterogeneity in IRA when assessing the different MRI signs suggestive of a PAS disorder. There was excellent agreement between the examiners in the identification of the depth of placental invasion on MRI (IRA, 98.9% (95% CI, 96.8-100.0%); κ, 0.95 (95% CI, 0.89-1.00)). However, agreement in assessing the topography of placental invasion was only moderate (IRA, 72.8% (95% CI, 72.7-72.9%); κ, 0.56 (95% CI, 0.54-0.66)). More importantly, when assessing parametrial invasion, which is one of the most significant prognostic factors in women affected by PAS, the agreement was substantial and moderate in judging the presence of invasion in the coronal (IRA, 86.6% (95% CI, 86.5-86.7%); κ, 0.69 (95% CI, 0.59-0.71)) and axial (IRA, 78.6% (95% CI, 78.5-78.7%); κ, 0.56 (95% CI, 0.33-0.60)) planes, respectively. Likewise, interobserver agreement in judging the presence and the number of newly formed vessels in the parametrial tissue was moderate (IRA, 88.0% (95% CI, 88.0-88.1%); κ, 0.59 (95% CI, 0.45-0.68)) and fair (IRA, 66.7% (95% CI, 66.6-66.7%); κ, 0.22 (95% CI, 0.12-0.37)), respectively. CONCLUSIONS MRI has excellent interobserver agreement in detecting the presence and depth of placental invasion, while agreement between the examiners is lower when assessing the topography of invasion. The findings of this study highlight the need for a standardized MRI staging system for PAS disorders, in order to facilitate objective correlation between prenatal imaging, pregnancy outcome and surgical management of these patients. Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd.
Collapse
|
Evaluation Study |
5 |
28 |
20
|
Ghofrani M, Tapia B, Tavassoli FA. Discrepancies in the diagnosis of intraductal proliferative lesions of the breast and its management implications: results of a multinational survey. Virchows Arch 2006; 449:609-16. [PMID: 17058097 PMCID: PMC1888715 DOI: 10.1007/s00428-006-0245-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 05/31/2006] [Indexed: 12/21/2022]
Abstract
To measure discrepancies in diagnoses and recommendations impacting management of proliferative lesions of the breast, a questionnaire of five problem scenarios was distributed among over 300 practicing pathologists. Of the 230 respondents, 56.5% considered a partial cribriform proliferation within a duct adjacent to unequivocal ductal carcinoma in situ (DCIS) as atypical ductal hyperplasia (ADH), 37.7% of whom recommended reexcision if it were at a resection margin. Of the 43.5% who diagnosed the partially involved duct as DCIS, 28.0% would not recommend reexcision if the lesion were at a margin. When only five ducts had a partial cribriform proliferation, 35.7% considered it as DCIS, while if >or=20 ducts were so involved, this figure rose to 60.4%. When one duct with a complete cribriform pattern measured 0.5, 1.5, or 4 mm, a diagnosis of DCIS was made by 22.6, 31.3, and 94.8%, respectively. When multiple ducts with flat epithelial atypia were at a margin, 20.9% recommended reexcision. Much of these discrepancies arise from the artificial separation of ADH and low-grade DCIS and emphasize the need for combining these two under the umbrella designation of ductal intraepithelial neoplasia grade 1 (DIN 1) to diminish the impact of different terminologies applied to biologically similar lesions.
Collapse
|
research-article |
19 |
25 |
21
|
van Seijen M, Jóźwiak K, Pinder SE, Hall A, Krishnamurthy S, Thomas JSJ, Collins LC, Bijron J, Bart J, Cohen D, Ng W, Bouybayoune I, Stobart H, Hudecek J, Schaapveld M, Thompson A, Lips EH, Wesseling J. Variability in grading of ductal carcinoma in situ among an international group of pathologists. J Pathol Clin Res 2021; 7:233-242. [PMID: 33620141 PMCID: PMC8073001 DOI: 10.1002/cjp2.201] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 12/11/2020] [Accepted: 01/08/2021] [Indexed: 01/04/2023]
Abstract
The prognostic value of cytonuclear grade in ductal carcinoma in situ (DCIS) is debated, partly due to high interobserver variability and the use of multiple guidelines. The aim of this study was to evaluate interobserver agreement in grading DCIS between Dutch, British, and American pathologists. Haematoxylin and eosin-stained slides of 425 women with primary DCIS were independently reviewed by nine breast pathologists based in the Netherlands, the UK, and the USA. Chance-corrected kappa (κma ) for association between pathologists was calculated based on a generalised linear mixed model using the ordinal package in R. Overall κma for grade of DCIS (low, intermediate, or high) was estimated to be 0.50 (95% confidence interval [CI] 0.44-0.56), indicating a moderate association between pathologists. When the model was adjusted for national guidelines, the association for grade did not change (κma = 0.53; 95% CI 0.48-0.57); subgroup analysis for pathologists using the UK pathology guidelines only had significantly higher association (κma = 0.58; 95% CI 0.56-0.61). To assess if concordance of grading relates to the expression of the oestrogen receptor (ER) and HER2, archived immunohistochemistry was analysed on a subgroup (n = 106). This showed that non-high grade according to the majority opinion was associated with ER positivity and HER2 negativity (100 and 89% of non-high grade cases, respectively). In conclusion, DCIS grade showed only moderate association using whole slide images scored by nine breast pathologists. As therapeutic decisions and inclusion in ongoing clinical trials are guided by DCIS grade, there is a pressing need to reduce interobserver variability in grading. ER and HER2 might be supportive to prevent the accidental and unwanted inclusion of high-grade DCIS in such trials.
Collapse
|
Multicenter Study |
4 |
25 |
22
|
Harris DL, Bloomfield FH, Teele RL, Harding JE. Variable interpretation of ultrasonograms may contribute to variation in the reported incidence of white matter damage between newborn intensive care units in New Zealand. Arch Dis Child Fetal Neonatal Ed 2006; 91:F11-6. [PMID: 16159954 PMCID: PMC2672639 DOI: 10.1136/adc.2005.079806] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
BACKGROUND The incidence of cerebral white matter damage reported to the Australian and New Zealand Neonatal Network (ANZNN) varies between neonatal intensive care units (NICUs). HYPOTHESIS Differences in the capture, storage, and interpretation of the cerebral ultrasound scans could account for some of this variation. METHODS A total of 255 infants of birth weight <1500 g and gestation <32 weeks born between 1997 and 2002 and drawn equally from each of the six NICUs in New Zealand were randomly selected from the ANZNN database. Half had early cerebral ultrasound scans previously reported to ANZNN as normal, and half had scans reported as abnormal. The original scans were copied, anonymised, and independently read by a panel of three experts using a standardised method of reviewing and reporting. RESULTS There was considerable variation between NICUs in methods of image capture, quality, and completeness of the scans. There was only moderate agreement between the reviewers' reports and the original reports to the ANZNN (kappa 0.45-0.51) and between the reviewers (kappa 0.54-0.64). The reviewers reported three to six times more white matter damage than had been reported to the ANZNN. CONCLUSION Some of the reported variation in white matter damage between NICUs may be due to differences in capture and interpretation of cerebral ultrasound scans.
Collapse
|
research-article |
19 |
23 |
23
|
Bracamonte E, Gibson BA, Klein R, Krupinski EA, Weinstein RS. Communicating Uncertainty in Surgical Pathology Reports: A Survey of Staff Physicians and Residents at an Academic Medical Center. Acad Pathol 2016; 3:2374289516659079. [PMID: 28725774 PMCID: PMC5497900 DOI: 10.1177/2374289516659079] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 06/17/2016] [Accepted: 06/18/2016] [Indexed: 11/17/2022] Open
Abstract
In order to document perceptions of text comments appearing in surgical pathology reports, questionnaires were distributed to 4 groups of caregivers: university staff pathologists, resident pathologists, faculty clinicians (other than pathologists), and resident clinicians at a teaching hospital. Results of this pilot study showed a wide degree of variability existed within each group of surgical pathology report users, with respect to percent confidence assigned to various phrases, commonly used to express diagnostic uncertainty, appearing often as free-text comments in surgical pathology reports. The unavailability of immunohistochemistry tests, or ambiguous immunohistochemistry test results, was especially problematic. With respect to modes of communication between the surgical pathology laboratory and its service users, clinicians indicated they preferred to use tumor boards/interdisciplinary conferences, face-to-face meetings, and phone calls to clarify their interpretations of a pathologist’s diagnoses, as compared with simply reading free-text comments. On the other hand, surgical pathologists rely heavily on their use of the comment portion of a surgical pathology report to clarify, modify, or expand on the diagnoses they render. The majority of clinicians stated that they “always” read the free-text comment portion of a surgical pathology report, whereas some acknowledged they do not always read it. Pathology residents had significantly less confidence in the ability of a free-text comment on a surgical pathology report to clarify a diagnosis (χ2 = 46.36, P < .0001). Pathology departments should consider standardizing definitions and weighting the words and phrases they use in their free-text comment sections of surgical pathology reports.
Collapse
|
Journal Article |
9 |
20 |
24
|
Van Bockstal M, Baldewijns M, Colpaert C, Dano H, Floris G, Galant C, Lambein K, Peeters D, Van Renterghem S, Van Rompuy AS, Verbeke S, Verschuere S, Van Dorpe J. Dichotomous histopathological assessment of ductal carcinoma in situ of the breast results in substantial interobserver concordance. Histopathology 2018; 73:923-932. [PMID: 30168167 DOI: 10.1111/his.13741] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 08/20/2018] [Indexed: 12/28/2022]
Abstract
AIMS Robust prognostic markers for ductal carcinoma in situ (DCIS) of the breast require high reproducibility and thus low interobserver variability. The aim of this study was to compare interobserver variability among 13 pathologists, in order to enable the identification of robust histopathological characteristics. METHODS AND RESULTS One representative haematoxylin and eosin-stained slide was selected for 153 DCIS cases. All pathologists independently assessed nuclear grade, intraductal calcifications, necrosis, solid growth, stromal changes, stromal inflammation, and apocrine differentiation. All characteristics were assessed categorically. Krippendorff's alpha was calculated to assess overall interobserver concordance. Cohen's kappa was calculated for every observer duo to further explore interobserver variability. The highest concordance was observed for necrosis, calcifications, and stromal inflammation. Assessment of solid growth, nuclear grade and stromal changes resulted in lower concordance. Poor concordance was observed for apocrine differentiation. Kappa values for each observer duo identified the 'ideal' cut-off for dichotomisation of multicategory variables. For instance, concordance was higher for 'non-high versus high' nuclear grade than for 'low versus non-low' nuclear grade. 'Absent/mild' versus 'moderate/extensive' stromal inflammation resulted in substantially higher concordance than other dichotomous cut-offs. CONCLUSIONS Dichotomous assessment of the histopathological features of DCIS resulted in moderate to substantial agreement among pathologists. Future studies on prognostic markers in DCIS should take into account this degree of interobserver variability to define cut-offs for categorically assessed histopathological features, as reproducibility is paramount for robust prognostic markers in daily clinical practice. A new prognostic index for DCIS might be considered, based on two-tier grading of histopathological features. Future research should explore the prognostic potential of such two-tier assessment.
Collapse
|
Journal Article |
7 |
20 |
25
|
Younes M, Hanly PJ. Minimizing Interrater Variability in Staging Sleep by Use of Computer-Derived Features. J Clin Sleep Med 2016; 12:1347-1356. [PMID: 27448418 DOI: 10.5664/jcsm.6186] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/06/2016] [Indexed: 01/16/2023]
Abstract
STUDY OBJECTIVES Inter-scorer variability in sleep staging of polysomnograms (PSGs) results primarily from difficulty in determining whether: (1) an electroencephalogram pattern of wakefulness spans > 15 sec in transitional epochs, (2) spindles or K complexes are present, and (3) duration of delta waves exceeds 6 sec in a 30-sec epoch. We hypothesized that providing digitally derived information about these variables to PSG scorers may reduce inter-scorer variability. METHODS Fifty-six PSGs were scored (five-stage) by two experienced technologists, (first manual, M1). Months later, the technologists edited their own scoring (second manual, M2). PSGs were then scored with an automatic system and the same two technologists and an additional experienced technologist edited them, epoch-by-epoch (Edited-Auto). This resulted in seven manual scores for each PSG. The two M2 scores were then independently modified using digitally obtained values for sleep depth and delta duration and digitally identified spindles and K complexes. RESULTS Percent agreement between scorers in M2 was 78.9 ± 9.0% before modification and 96.5 ± 2.6% after. Errors of this approach were defined as a change in a manual score to a stage that was not assigned by any scorer during the seven manual scoring sessions. Total errors averaged 7.1 ± 3.7% and 6.9 ± 3.8% of epochs for scorers 1 and 2, respectively, and there was excellent agreement between the modified score and the initial manual score of each technologist. CONCLUSIONS Providing digitally obtained information about sleep depth, delta duration, spindles and K complexes during manual scoring can greatly reduce interrater variability in sleep staging by eliminating the guesswork in scoring epochs with equivocal features.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
16 |