1
|
Huff DT, Santoro-Fernandes V, Chen S, Chen M, Kashuk C, Weisman AJ, Jeraj R, Perk TG. Performance of an automated registration-based method for longitudinal lesion matching and comparison to inter-reader variability. Phys Med Biol 2023; 68:175031. [PMID: 37567220 PMCID: PMC10461173 DOI: 10.1088/1361-6560/acef8f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/25/2023] [Accepted: 08/11/2023] [Indexed: 08/13/2023]
Abstract
Objective.Patients with metastatic disease are followed throughout treatment with medical imaging, and accurately assessing changes of individual lesions is critical to properly inform clinical decisions. The goal of this work was to assess the performance of an automated lesion-matching algorithm in comparison to inter-reader variability (IRV) of matching lesions between scans of metastatic cancer patients.Approach.Forty pairs of longitudinal PET/CT and CT scans were collected and organized into four cohorts: lung cancers, head and neck cancers, lymphomas, and advanced cancers. Cases were also divided by cancer burden: low-burden (<10 lesions), intermediate-burden (10-29), and high-burden (30+). Two nuclear medicine physicians conducted independent reviews of each scan-pair and manually matched lesions. Matching differences between readers were assessed to quantify the IRV of lesion matching. The two readers met to form a consensus, which was considered a gold standard and compared against the output of an automated lesion-matching algorithm. IRV and performance of the automated method were quantified using precision, recall, F1-score, and the number of differences.Main results.The performance of the automated method did not differ significantly from IRV for any metric in any cohort (p> 0.05, Wilcoxon paired test). In high-burden cases, the F1-score (median [range]) was 0.89 [0.63, 1.00] between the automated method and reader consensus and 0.93 [0.72, 1.00] between readers. In low-burden cases, F1-scores were 1.00 [0.40, 1.00] and 1.00 [0.40, 1.00], for the automated method and IRV, respectively. Automated matching was significantly more efficient than either reader (p< 0.001). In high-burden cases, median matching time for the readers was 60 and 30 min, respectively, while automated matching took a median of 3.9 minSignificance.The automated lesion-matching algorithm was successful in performing lesion matching, meeting the benchmark of IRV. Automated lesion matching can significantly expedite and improve the consistency of longitudinal lesion-matching.
Collapse
Affiliation(s)
- Daniel T Huff
- AIQ Solutions, Madison, WI, United States of America
| | - Victor Santoro-Fernandes
- University of Wisconsin-Madison, Department of Medical Physics, Madison, WI, United States of America
| | - Song Chen
- The First Hospital of China Medical University, Department of Nuclear Medicine, Shenyang, Liaoning, CN, People’s Republic of China
| | - Meijie Chen
- The First Hospital of China Medical University, Department of Nuclear Medicine, Shenyang, Liaoning, CN, People’s Republic of China
| | - Carl Kashuk
- AIQ Solutions, Madison, WI, United States of America
| | - Amy J Weisman
- AIQ Solutions, Madison, WI, United States of America
| | - Robert Jeraj
- University of Wisconsin-Madison, Department of Medical Physics, Madison, WI, United States of America
- University of Ljubljana, Faculty of Mathematics and Physics, Ljubljana, SI, Slovenia
| | | |
Collapse
|
2
|
Zimmermann M, Kuhl C, Engelke H, Bettermann G, Keil S. Volumetric measurements of target lesions: does it improve inter-reader variability for oncological response assessment according to RECIST 1.1 guidelines compared to standard unidimensional measurements? Pol J Radiol 2021; 86:e594-600. [PMID: 34876940 DOI: 10.5114/pjr.2021.111048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 11/23/2020] [Indexed: 11/30/2022] Open
Abstract
Purpose Target lesion selection is known to be a major factor for inter-reader discordance in RECIST 1.1. The purpose of this study was to assess whether volumetric measurements of target lesions result in different response categorization, as opposed to standard unidimensional measurements, and to evaluate the impact on inter-reader agreement for response categorization when different readers select different sets of target lesions. Material and methods Fifty patients with measurable disease from solid tumours, in which 3 readers had blindly and independently selected different sets of target lesions and subsequently reached clinically significant discordant response categorizations (progressive disease [PD] vs. non-progressive disease [non-PD]) based on RECIST 1.1 analyses were included in this study. Additional volumetric measurements of all target lesions were performed by the same readers in a second read. Intra-reader agreement between standard unidimensional measurements (uRECIST) and volumetric measurements (vRECIST) was assessed using Cohen’s k statistics. Fleiss k statistics was used to analyse the inter-reader agreement for uRECIST and vRECIST results. Results The 3 readers assigned the same response classifications based on uRECIST and vRECIST in 33/50 (66%), 42/50 patients (84%), and 44/50 patients (88%), respectively. Inter-reader agreement improved from 0% when using uRECIST to 36% when using vRECIST. Conclusions Volumetric measurement of target lesions may improve inter-reader variability for response assessment as opposed to standard unidimensional measurements. However, in about two-thirds of patients, readers disagreed regardless of the measurement method, indicating that a limited set of target lesions may not be sufficiently representative of the whole-body tumour burden.
Collapse
|
3
|
Chen Y, Zee J, Smith A, Jayapandian C, Hodgin J, Howell D, Palmer M, Thomas D, Cassol C, Farris AB, Perkinson K, Madabhushi A, Barisoni L, Janowczyk A. Assessment of a computerized quantitative quality control tool for whole slide images of kidney biopsies. J Pathol 2021; 253:268-278. [PMID: 33197281 DOI: 10.1002/path.5590] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/30/2020] [Accepted: 11/11/2020] [Indexed: 12/16/2022]
Abstract
Inconsistencies in the preparation of histology slides and whole-slide images (WSIs) may lead to challenges with subsequent image analysis and machine learning approaches for interrogating the WSI. These variabilities are especially pronounced in multicenter cohorts, where batch effects (i.e. systematic technical artifacts unrelated to biological variability) may introduce biases to machine learning algorithms. To date, manual quality control (QC) has been the de facto standard for dataset curation, but remains highly subjective and is too laborious in light of the increasing scale of tissue slide digitization efforts. This study aimed to evaluate a computer-aided QC pipeline for facilitating a reproducible QC process of WSI datasets. An open source tool, HistoQC, was employed to identify image artifacts and compute quantitative metrics describing visual attributes of WSIs to the Nephrotic Syndrome Study Network (NEPTUNE) digital pathology repository. A comparison in inter-reader concordance between HistoQC aided and unaided curation was performed to quantify improvements in curation reproducibility. HistoQC metrics were additionally employed to quantify the presence of batch effects within NEPTUNE WSIs. Of the 1814 WSIs (458 H&E, 470 PAS, 438 silver, 448 trichrome) from n = 512 cases considered in this study, approximately 9% (163) were identified as unsuitable for subsequent computational analysis. The concordance in the identification of these WSIs among computational pathologists rose from moderate (Gwet's AC1 range 0.43 to 0.59 across stains) to excellent (Gwet's AC1 range 0.79 to 0.93 across stains) agreement when aided by HistoQC. Furthermore, statistically significant batch effects (p < 0.001) in the NEPTUNE WSI dataset were discovered. Taken together, our findings strongly suggest that quantitative QC is a necessary step in the curation of digital pathology cohorts. © 2020 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yijiang Chen
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
| | - Jarcy Zee
- Arbor Research Collaborative for Health, Ann Arbor, MI, USA
| | - Abigail Smith
- Arbor Research Collaborative for Health, Ann Arbor, MI, USA
| | - Catherine Jayapandian
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA
| | - Jeffrey Hodgin
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - David Howell
- Department of Pathology, Duke University, Durham, NC, USA
| | - Matthew Palmer
- Department of Pathology, University of Pennsylvania, Philadelphia, PA, USA
| | - David Thomas
- Department of Pathology, Duke University, Durham, NC, USA.,Nephrocor, Memphis, TN, USA
| | - Clarissa Cassol
- Renal Pathology Division, Arkana Laboratories, Little Rock, AK, USA.,Department of Pathology - Renal Pathology Division, Ohio State University Medical Center, Columbus, OH, USA
| | - Alton B Farris
- Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA, USA
| | | | - Anant Madabhushi
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA.,Louis Stokes VA Medical Center, Cleveland, OH, USA
| | - Laura Barisoni
- Department of Pathology, Duke University, Durham, NC, USA.,Department of Medicine, Division of Nephrology, Duke University, Durham, NC, USA
| | - Andrew Janowczyk
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA.,Precision Oncology Center, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
4
|
Ming S, Yang W, Cui SJ, Huang S, Gong XY. Consistency of radiologists in identifying pulmonary nodules based on low-dose computed tomography. J Thorac Dis 2019; 11:2973-2980. [PMID: 31463127 DOI: 10.21037/jtd.2019.07.52] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Background To study the consistency of radiologists in identifying pulmonary nodules based on low-dose computed tomography (LDCT), and to analyze factors that affect the consistency. Methods A total of 750 LDCT cases were collected randomly from three medical centers. Three experienced chest radiologists independently evaluated and detected the pulmonary nodules on 625 cases of LDCT images. The detected nodules were classified into 3 groups: group I (detected by all radiologists); group II (detected by two radiologists); group III (detected by only one radiologist). The consistency with respect to the image features of individual nodules was assessed. Results A total of 1,206 nodules were identified by the three radiologists. There were 234 (19.4%) nodules in group I, 377 (31.3%) nodules in group II, and 595 (49.3%) nodules in group III. Logistic regression showed that the size, density, and location of the nodules correlated with the detection of nodules. Nodules sized great than or equal to 4 mm were more consistently identified than nodules sized less than 4 mm. Solid and calcified nodules were more consistently identified than sub-solid nodules. Nodules located in the outer zone were more consistently identified than hilar nodules. Conclusions There was considerable inter-reader variability with respect to identification of pulmonary nodules in LDCT. Larger nodules, solid or calcified nodules, and nodules located in the outer zone were more consistently identified.
Collapse
Affiliation(s)
- Shuai Ming
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
| | - Wei Yang
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
| | - Si-Jia Cui
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
| | - Shuai Huang
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
| | - Xiang-Yang Gong
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
| |
Collapse
|
5
|
Saha A, Harowicz MR, Mazurowski MA. Breast cancer MRI radiomics: An overview of algorithmic features and impact of inter-reader variability in annotating tumors. Med Phys 2018; 45:3076-3085. [PMID: 29663411 DOI: 10.1002/mp.12925] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 03/01/2018] [Accepted: 04/04/2018] [Indexed: 12/30/2022] Open
Abstract
PURPOSE To review features used in MRI radiomics of breast cancer and study the inter-reader stability of the features. METHODS We implemented 529 algorithmic features that can be extracted from tumor and fibroglandular tissue (FGT) in breast MRIs. The features were identified based on a review of the existing literature with consideration of their usage, prognostic ability, and uniqueness. The set was then extended so that it comprehensively describes breast cancer imaging characteristics. The features were classified into 10 groups based on the type of data used to extract them and the type of calculation being performed. For the assessment of inter-reader variability, four fellowship-trained readers annotated tumors on preoperative dynamic contrast-enhanced MRIs for 50 breast cancer patients. Based on the annotations, an algorithm automatically segmented the image and extracted all features resulting in one set of features for each reader. For a given feature, the inter-reader stability was defined as the intraclass correlation coefficient (ICC) computed using the feature values obtained through all readers for all cases. RESULTS The average inter-reader stability for all features was 0.8474 (95% CI: 0.8068-0.8858). The mean inter-reader stability was lower for tumor-based features (0.6348, 95% CI: 0.5391-0.7257) than FGT-based features (0.9984, 95% CI: 0.9970-0.9992). The feature group with the highest inter-reader stability quantifies breast and FGT volume. The feature group with the lowest inter-reader stability quantifies variations in tumor enhancement. CONCLUSIONS Breast MRI radiomics features widely vary in terms of their stability in the presence of inter-reader variability. Appropriate measures need to be taken for reducing this variability in tumor-based radiomics.
Collapse
Affiliation(s)
- Ashirbani Saha
- Department of Radiology, Duke University School of Medicine, 2424 Erwin Road, Suite 302, Durham, NC, 27705, USA
| | - Michael R Harowicz
- Department of Radiology, Duke University School of Medicine, 2424 Erwin Road, Suite 302, Durham, NC, 27705, USA
| | - Maciej A Mazurowski
- Department of Radiology, Duke University School of Medicine, 2424 Erwin Road, Suite 302, Durham, NC, 27705, USA.,Department of Electrical and Computer Engineering, Duke University, Box 90291, Durham, NC, 27708, USA.,Duke University Medical Physics Program, DUMC 2729, 2424 Erwin Road, Suite 101, Durham, NC, 27705, USA
| |
Collapse
|
6
|
Belluco S, Carnier P, Castagnaro M, Chiers K, Millanta F, Peña L, Pires I, Queiroga F, Riffard S, Scase T, Polton G. Immunohistochemical Labelling for Cyclo-oxygenase-2: Does the Positive Control Guarantee Standardized Results? J Comp Pathol 2016; 154:186-94. [PMID: 26895886 DOI: 10.1016/j.jcpa.2016.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 01/05/2016] [Accepted: 01/08/2016] [Indexed: 01/11/2023]
Abstract
Since the identification of cyclo-oxygenase-2 as a potentially important therapeutic target in veterinary oncology, numerous studies on its expression have been conducted. Unfortunately, results have been heterogeneous and conclusions are difficult to draw. We tested the ability of a defined positive control to guarantee reproducibility of results among different laboratories. Valid positive controls were defined by positivity of the renal macula densa without background labelling. Fifteen colorectal tumours and 15 oral squamous cell carcinomas were labelled immunohistochemically by six European laboratories. Slides were evaluated in blinded fashion for percentage of positive cells and labelling intensity by three pathologists, and results were analyzed statistically for reproducibility and inter-reader variability. Macula densa positivity was an insufficiently sensitive control to guarantee reproducible results for percentage of positive cells and labelling intensity. Inter-reader variability was proven statistically, making the case for image analysis or other automated quantitative evaluation techniques.
Collapse
Affiliation(s)
- S Belluco
- Equipe Recherche UPSP ICE 2011-03-101: Oncology, Vetagro-sup, Campus Vétérinaire, 1 Avenue Bourgelat, Marcy l'etoile, France.
| | - P Carnier
- Department of Comparative Biomedicine and Food Science, Faculty of Veterinary Medicine, AGRIPOLIS, Viale dell'Università 16, Legnaro, Italy
| | - M Castagnaro
- Department of Comparative Biomedicine and Food Science, Faculty of Veterinary Medicine, AGRIPOLIS, Viale dell'Università 16, Legnaro, Italy
| | - K Chiers
- Faculty of Veterinary Medicine, University of Ghent, Salisburylaan 133, Merelbeke, Belgium
| | - F Millanta
- Department of Animal Pathology, School of Veterinary Medicine, University of Pisa, Viale delle Piagge 2, Pisa, Italy
| | - L Peña
- Veterinary School, Complutense University Madrid, Madrid, Spain
| | - I Pires
- University of Trás-os-Montes and Alto Douro, Quinta de Prados, Vila Real, Portugal
| | - F Queiroga
- University of Trás-os-Montes and Alto Douro, Quinta de Prados, Vila Real, Portugal
| | - S Riffard
- Merial, 254 rue Marcel Mérieux, Lyon, France
| | - T Scase
- Bridge Pathology Ltd., Courtyard House, 26A Oakfield Road, Bristol, UK
| | - G Polton
- North Downs Specialist Referrals, Friesian Building 3&4, The Brewer Street Dairy Business Park, Brewer Street, Bletchingley, Surrey, UK
| |
Collapse
|
7
|
Lozanski G, Pennell M, Shana'ah A, Zhao W, Gewirtz A, Racke F, Hsi E, Simpson S, Mosse C, Alam S, Swierczynski S, Hasserjian RP, Gurcan MN. Inter-reader variability in follicular lymphoma grading: Conventional and digital reading. J Pathol Inform 2013; 4:30. [PMID: 24392244 PMCID: PMC3869955 DOI: 10.4103/2153-3539.120747] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/03/2013] [Indexed: 11/04/2022] Open
Abstract
CONTEXT Pathologists grade follicular lymphoma (FL) cases by selecting 10, random high power fields (HPFs), counting the number of centroblasts (CBs) in these HPFs under the microscope and then calculating the average CB count for the whole slide. Previous studies have demonstrated that there is high inter-reader variability among pathologists using this methodology in grading. AIMS The objective of this study was to explore if newly available digital reading technologies can reduce inter-reader variability. SETTINGS AND DESIGN IN THIS STUDY, WE CONSIDERED THREE DIFFERENT READING CONDITIONS (RCS) IN GRADING FL: (1) Conventional (glass-slide based) to establish the baseline, (2) digital whole slide viewing, (3) digital whole slide viewing with selected HPFs. Six board-certified pathologists from five different institutions read 17 FL slides in these three different RCs. RESULTS Although there was relative poor consensus in conventional reading, with lack of consensus in 41.2% of cases, which was similar to previously reported studies; we found that digital reading with pre-selected fields improved the inter-reader agreement, with only 5.9% lacking consensus among pathologists. CONCLUSIONS Digital whole slide RC resulted in the worst concordance among pathologists while digital whole slide reading selected HPFs improved the concordance. Further studies are underway to determine if this performance can be sustained with a larger dataset and our automated HPF and CB detection algorithms can be employed to further improve the concordance.
Collapse
Affiliation(s)
- Gerard Lozanski
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Michael Pennell
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Arwa Shana'ah
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Weiqiang Zhao
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Amy Gewirtz
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Frederick Racke
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Eric Hsi
- Cleveland Clinic, Cleveland, OH, USA
| | - Sabrina Simpson
- Department of Pathology, Central Ohio Pathology Associates, Westerville, OH, USA
| | | | - Shadia Alam
- Department of Pathology, Battle Creek, MI, USA
| | | | | | - Metin N Gurcan
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, USA
| |
Collapse
|
8
|
Chapman CB, Ewer SM, Kelly AF, Jacobson KM, Leal MA, Rahko PS. Classification of left ventricular diastolic function using American Society of Echocardiography Guidelines: agreement among echocardiographers. Echocardiography 2013; 30:1022-31. [PMID: 23551740 DOI: 10.1111/echo.12185] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Guidelines for assessing diastolic function by echocardiography are continually being updated. Our ability to use available guidelines effectively has not been completely investigated. Six trained echocardiographers were asked to interpret 105 echocardiograms using current American Society of Echocardiography (ASE) algorithms for interpretation of diastolic grade and estimation of left atrial (LA) pressure. Diastolic grade was categorized as normal, mild, moderate, or severe dysfunction. The presence or absence of elevated LA pressure was determined using a second ASE algorithm. As a reference comparison for level of agreement, left ventricular ejection fraction was visually determined. By the ASE algorithm, 29 subjects (28%) met all measurement criteria in their assigned grade and 57 subjects (55%) met all or all but one criterion of their assigned grade. Of the 45 subjects (43%) for whom the guidelines disagreed by more than 1 criterion, the readers debated between normal and moderate dysfunction in 22% or mild and moderate diastolic dysfunction in 31%. Percent inter-reader agreement and kappa values were 76% (0.7) for determining diastolic grade, 84% (0.67) for determining elevated LA pressure, and 84% (0.67) for estimation of ejection fraction, the reference standard. For all subjects, if multiple echocardiographic criteria failed to fit into the proposed guidelines, agreement fell to 66% (0.58) for determining diastolic grade and 74% (0.48) for determining LA pressure. There is reasonable agreement estimating diastolic grade and LA pressure using current guidelines. Further refinements in the definition of mild and moderate dysfunction may improve agreement.
Collapse
Affiliation(s)
- Carrie B Chapman
- Cardiovascular Medicine Division, Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | | | | | | | | | | |
Collapse
|