1
|
Elfer K, Dudgeon S, Garcia V, Blenman K, Hytopoulos E, Wen S, Li X, Ly A, Werness B, Sheth MS, Amgad M, Gupta R, Saltz J, Hanna MG, Ehinger A, Peeters D, Salgado R, Gallas BD. Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms. J Med Imaging (Bellingham) 2022; 9:047501. [PMID: 35911208 PMCID: PMC9326105 DOI: 10.1117/1.jmi.9.4.047501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 06/28/2022] [Indexed: 11/14/2022] Open
Abstract
Purpose: Validation of artificial intelligence (AI) algorithms in digital pathology with a reference standard is necessary before widespread clinical use, but few examples focus on creating a reference standard based on pathologist annotations. This work assesses the results of a pilot study that collects density estimates of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer biopsy specimens. This work will inform the creation of a validation dataset for the evaluation of AI algorithms fit for a regulatory purpose. Approach: Collaborators and crowdsourced pathologists contributed glass slides, digital images, and annotations. Here, "annotations" refer to any marks, segmentations, measurements, or labels a pathologist adds to a report, image, region of interest (ROI), or biological feature. Pathologists estimated sTILs density in 640 ROIs from hematoxylin and eosin stained slides of 64 patients via two modalities: an optical light microscope and two digital image viewing platforms. Results: The pilot study generated 7373 sTILs density estimates from 29 pathologists. Analysis of annotations found the variability of density estimates per ROI increases with the mean; the root mean square differences were 4.46, 14.25, and 26.25 as the mean density ranged from 0% to 10%, 11% to 40%, and 41% to 100%, respectively. The pilot study informs three areas of improvement for future work: technical workflows, annotation platforms, and agreement analysis methods. Upgrades to the workflows and platforms will improve operability and increase annotation speed and consistency. Conclusions: Exploratory data analysis demonstrates the need to develop new statistical approaches for agreement. The pilot study dataset and analysis methods are publicly available to allow community feedback. The development and results of the validation dataset will be publicly available to serve as an instructive tool that can be replicated by developers and researchers.
Collapse
Affiliation(s)
- Katherine Elfer
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
- National Institutes of Health, National Cancer Institute, Division of Cancer Prevention, Cancer Prevention Fellowship Program, Bethesda, Maryland, United States
| | - Sarah Dudgeon
- Yale University Computational Biology and Bioinformatics, New Haven, Connecticut, United States
- Yale New Haven Hospital, Center for Outcomes Research and Evaluation, New Haven, Connecticut, United States
| | - Victor Garcia
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
| | - Kim Blenman
- School of Medicine, Yale Cancer Center, Department of Internal Medicine, Section of Medical Oncology, New Haven, Connecticut, United States
- Yale University, School of Engineering and Applied Science, Department of Computer Science, New Haven, Connecticut, United States
| | | | - Si Wen
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
| | - Xiaoxian Li
- Emory University School of Medicine, Department of Pathology and Laboratory Medicine, Atlanta, Georgia, United States
| | - Amy Ly
- Massachusetts General Hospital, Boston, Massachusetts, United States
| | - Bruce Werness
- Inova Health System Department of Pathology, Falls Church, Virginia, United States
- Arrive Bio LLC, San Francisco, California, United States
| | - Manasi S. Sheth
- United States Food and Drug Administration (FDA), Center for Devices and Radiologic Health, Office of Product Evaluation and Quality, Office of Clinical Evidence and Analysis, Division of Biostatistics, White Oak, Maryland, United States
| | - Mohamed Amgad
- Northwestern University Feinberg School of Medicine, Department of Pathology, Chicago, Illinois, United States
| | - Rajarsi Gupta
- SUNY Stony Brook Medicine, Department of Biomedical Informatics, Stony Brook, New York, United States
| | - Joel Saltz
- SUNY Stony Brook Medicine, Department of Biomedical Informatics, Stony Brook, New York, United States
- SUNY Stony Brook Medicine, Department of Pathology, Stony Brook, New York, United States
| | - Matthew G. Hanna
- Memorial Sloan Kettering Cancer Center, New York, New York, United States
| | - Anna Ehinger
- Lund University, Laboratory Medicine, Region Skåne, Department of Genetics and Pathology, Lund, Sweden
| | - Dieter Peeters
- Sint-Maarten Hospital, Department of Pathology, Mechelen, Belgium
- University of Antwerp, Department of Biomedical Sciences, Antwerp, Belgium
| | - Roberto Salgado
- Peter Mac Callum Cancer Centre, Division of Research, Melbourne, Australia
- GZA-ZNA Hospitals, Department of Pathology, Antwerp, Belgium
| | - Brandon D. Gallas
- United States Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Silver Spring, Maryland, United States
- Address all correspondence to Brandon D. Gallas,
| |
Collapse
|
2
|
Elfer KN, Blenman K, Dudgeon SN, Garcia V, Ehinger A, Li X, Ly A, Peeters D, Werness B, Hanna M, Salgado R. Abstract 460: Tools for collecting pathologist annotations and understanding interobserver variability. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: The High Throughput Truthing (HTT) project is assessing pathologist agreement estimates of stromal tumor-infiltrating lymphocytes (sTILs) density in hematoxylin and eosin (H&E) stained breast cancer biopsy slides. The HTT project will create a validation dataset for artificial intelligence and machine learning (AI/ML) algorithms in digital pathology fit for a training, proficiency testing, and regulatory purpose.
Methods: The pilot study crowdsourced pathologists to estimate sTIL density in 640 regions of interest (ROIs) across 64 slides via two modalities: an optical microscope (eeDAP) and two digital platforms (caMicroscope and PathPresenter). eeDAP is a hardware-software interface that presents the observer with pre-defined fields of view on H&E slides that corresponds to the ROI on a whole slide image. The PathPresenter and caMicroscope web-applications replicate the eeDAP workflow on the whole slide image without microscope hardware. In the workflow, pathologists evaluated the eligibility of an ROI for sTILs content then estimated the densities of tumor-associated stroma and sTILs in the ROI. Inter-pathologist agreement within ROIs was characterized with the root mean-squared difference. Using 72 of the highest variability ROIs selected from the pilot study, seven practicing pathologists participated in a subsequent focus group to improve the clinical training and data-collection workflows.
Results: The pilot study collected 7,373 sTIL density estimates from 35 pathologists between February 2020 and May 2021. The focus group provided an additional 411 evaluations on 72 ROIs and in-depth discussions to identify pitfalls, gaps in training, and workflow improvements. Installation of eeDAP for physical data collection guided improvements in documentation and operation capabilities. Updated training materials refine the definition of tumor-associated stroma, provide reference images to differentiate sTILs from other cell types, and provide feedback during training. Digital and microscope platforms benefitted from enforcing registration and training, standardizing workflows, and accelerating eeDAP slide-image registration.
Conclusions: The slides, images, and annotations provided by volunteer collaborators and participants for our pilot study led to improvements in data collection tools and crowdsourcing workflows to ensure consistency and minimize annotation variability. Our pilot dataset and analysis methods are available on a public HTT Github repository to allow open access to our methodology and feedback from the digital pathology and statistics communities. These data-collection and analysis methods are applicable to other quantitative biomarkers for validation of AI/ML algorithms. The lessons learned from this work will be applied to the HTT pivotal study and inform future quality data-collection methods of pathologist annotations.
Citation Format: Katherine N. Elfer, Kim Blenman, Sarah N. Dudgeon, Victor Garcia, Anna Ehinger, Xiaoxian Li, Amy Ly, Dieter Peeters, Bruce Werness, Matthew Hanna, Roberto Salgado. Tools for collecting pathologist annotations and understanding interobserver variability [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 460.
Collapse
Affiliation(s)
- Katherine N. Elfer
- 1National Cancer Institute, Cancer Prevention Fellowship Program; United States Food and Drug Administration, Center for Devices and Radiologic Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Bethesda; White Oak, MD
| | - Kim Blenman
- 2Yale University: School of Medicine and Yale Cancer Center; School of Engineering and Applied Science, New Haven, CT
| | | | - Victor Garcia
- 4United States Food and Drug Administration, Center for Devices and Radiologic Health, Office of Science and Engineering Laboratories, Division of Imaging Diagnostics & Software Reliability, Bethesda, MD
| | - Anna Ehinger
- 5Lund University, Skåne University Hospital, Lund, Sweden
| | - Xiaoxian Li
- 6Emory University School of Medicine, Atlanta, GA
| | - Amy Ly
- 7Massachusetts General Hospital, Boston, MA
| | - Dieter Peeters
- 8Sint-Maarten Hospital; University of Antwerp, Mechelen; Antwerp, Belgium
| | - Bruce Werness
- 9Inova Health System; Arrive Bio, Falls Church, VA; San Francisco, CA
| | - Matthew Hanna
- 10Memorial Sloan Kettering Cancer Center, New York, NY
| | - Roberto Salgado
- 11Peter Mac Callum Cancer Centre, Division of Research, Melbourne, Australia; GZA-ZNA Hospitals, Antwerp, Belgium
| |
Collapse
|
3
|
Dudgeon SN, Wen S, Hanna MG, Gupta R, Amgad M, Sheth M, Marble H, Huang R, Herrmann MD, Szu CH, Tong D, Werness B, Szu E, Larsimont D, Madabhushi A, Hytopoulos E, Chen W, Singh R, Hart SN, Sharma A, Saltz J, Salgado R, Gallas BD. A Pathologist-Annotated Dataset for Validating Artificial Intelligence: A Project Description and Pilot Study. J Pathol Inform 2021; 12:45. [PMID: 34881099 PMCID: PMC8609287 DOI: 10.4103/jpi.jpi_83_20] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 01/23/2021] [Accepted: 03/16/2021] [Indexed: 12/13/2022] Open
Abstract
Purpose: Validating artificial intelligence algorithms for clinical use in medical images is a challenging endeavor due to a lack of standard reference data (ground truth). This topic typically occupies a small portion of the discussion in research papers since most of the efforts are focused on developing novel algorithms. In this work, we present a collaboration to create a validation dataset of pathologist annotations for algorithms that process whole slide images. We focus on data collection and evaluation of algorithm performance in the context of estimating the density of stromal tumor-infiltrating lymphocytes (sTILs) in breast cancer. Methods: We digitized 64 glass slides of hematoxylin- and eosin-stained invasive ductal carcinoma core biopsies prepared at a single clinical site. A collaborating pathologist selected 10 regions of interest (ROIs) per slide for evaluation. We created training materials and workflows to crowdsource pathologist image annotations on two modes: an optical microscope and two digital platforms. The microscope platform allows the same ROIs to be evaluated in both modes. The workflows collect the ROI type, a decision on whether the ROI is appropriate for estimating the density of sTILs, and if appropriate, the sTIL density value for that ROI. Results: In total, 19 pathologists made 1645 ROI evaluations during a data collection event and the following 2 weeks. The pilot study yielded an abundant number of cases with nominal sTIL infiltration. Furthermore, we found that the sTIL densities are correlated within a case, and there is notable pathologist variability. Consequently, we outline plans to improve our ROI and case sampling methods. We also outline statistical methods to account for ROI correlations within a case and pathologist variability when validating an algorithm. Conclusion: We have built workflows for efficient data collection and tested them in a pilot study. As we prepare for pivotal studies, we will investigate methods to use the dataset as an external validation tool for algorithms. We will also consider what it will take for the dataset to be fit for a regulatory purpose: study size, patient population, and pathologist training and qualifications. To this end, we will elicit feedback from the Food and Drug Administration via the Medical Device Development Tool program and from the broader digital pathology and AI community. Ultimately, we intend to share the dataset, statistical methods, and lessons learned.
Collapse
Affiliation(s)
- Sarah N Dudgeon
- Division of Imaging Diagnostics and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, United States Food and Drug Administration, White Oak, MD, USA
| | - Si Wen
- Division of Imaging Diagnostics and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, United States Food and Drug Administration, White Oak, MD, USA
| | | | - Rajarsi Gupta
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Mohamed Amgad
- Department of Pathology, Northwestern University, Chicago, IL, USA
| | - Manasi Sheth
- Division of Biostatistics, Center for Devices and Radiologic Health, United States Food and Drug Administration, White Oak, MD, USA
| | - Hetal Marble
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard Huang
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Markus D Herrmann
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | | | - Evan Szu
- Arrive Bio, San Francisco, CA, USA
| | - Denis Larsimont
- Department of Pathology, Institute Jules Bordet, Brussels, Belgium
| | - Anant Madabhushi
- Louis Stokes Cleveland Veterans Administration Medical Center, Cleveland, OH, USA
| | | | - Weijie Chen
- Division of Imaging Diagnostics and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, United States Food and Drug Administration, White Oak, MD, USA
| | - Rajendra Singh
- Northwell Health and Zucker School of Medicine, New York, NY, USA
| | - Steven N Hart
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Ashish Sharma
- Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
| | - Joel Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Roberto Salgado
- Division of Research, Peter Mac Callum Cancer Centre, Melbourne, Australia.,Department of Pathology, GZA-ZNA Hospitals, Antwerp, Belgium
| | - Brandon D Gallas
- Division of Imaging Diagnostics and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiologic Health, United States Food and Drug Administration, White Oak, MD, USA
| |
Collapse
|
4
|
Ramus SJ, Pharoah PDP, Harrington P, Pye C, Werness B, Bobrow L, Ayhan A, Wells D, Fishman A, Gore M, DiCioccio RA, Piver MS, Whittemore AS, Ponder BAJ, Gayther SA. BRCA1/2 mutation status influences somatic genetic progression in inherited and sporadic epithelial ovarian cancer cases. Cancer Res 2003; 63:417-23. [PMID: 12543797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2023]
Abstract
Metaphase comparative genomic hybridization was used to analyze the spectrum of genetic alterations in 141 epithelial ovarian cancers from BRCA1 and BRCA2 mutation carriers, individuals with familial non-BRCA1/2 epithelial ovarian cancer, and women with nonfamilial epithelial ovarian cancer. Multiple genetic alterations were identified in almost all tumors. The high frequency with which some alterations were identified suggests the location of genes that are commonly altered during ovarian tumor development. In multiple chromosome regions, there were significant differences in alteration frequency between the four tumor types suggesting that BRCA1/2 mutation status and a family history of ovarian cancer influences the somatic genetic pathway of ovarian cancer progression. These findings were supported by hierarchical cluster analysis, which identified genetic events that tend to occur together during tumorigenesis and several alterations that were specific to tumors of a particular type. In addition, some genetic alterations were strongly associated with differences in tumor differentiation and disease stage. Taken together, these data provide molecular genetic evidence to support previous findings from histopathological studies, which suggest that clinical features of ovarian and breast tumors differ with respect to BRCA1/2 mutation status and/or cancer family history.
Collapse
Affiliation(s)
- Susan J Ramus
- Department of Oncology, Strangeways Research Laboratories, Cambridge, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Abstract
Thyroid nodules are found in 5% to 10% of the population. While these nodules carry only a 5% to 10% risk of malignancy, tests that complement fine-needle aspiration (FNA) cytology in preoperative diagnosis and risk stratification are lacking. Telomerase is a ribonucleoprotein polymerase with activity found in many malignant tissues, but absent from most normal adult tissue. In this study, we have investigated telomerase activity in 24 thyroid tumors, 14 matched adjacent thyroid tissues, and 3 chronic thyroiditis tissue samples. Using a telomeric repeat amplification protocol (TRAP) assay on frozen tissue, telomerase activity was detected in 11 of 20 thyroid carcinomas, including 10 of 14 papillary carcinomas and a Hurthle cell carcinoma. Telomerase activity was not detected in 4 benign adenomas, 3 follicular carcinomas, or a single case each of medullary and anaplastic thyroid carcinoma. Telomerase activity was detected in 3 of 14 samples of adjacent thyroid tissue from patients with thyroid tumors. Interestingly, all 3 cases of adjacent thyroid tissue that tested positive had a moderate to marked degree of chronic inflammation. In addition, 3 of 3 samples from chronic thyroiditis specimens tested positive for telomerase activity. When tumor invasiveness (vascular and/or capsular) was compared with telomerase activity in papillary carcinomas, only 1 of 4 telomerase-negative tumors was invasive, while 6 of 10 of telomerase-positive tumors were invasive. Moreover, 6 of 7 invasive papillary carcinomas had telomerase activity. In summary, this is the first report of telomerase activity in thyroid tissue and nodules. This activity was detected in a large percentage of papillary thyroid carcinomas, but not benign adenomas, follicular carcinomas, or most normal thyroid tissue. Telomerase activity may also correlate with tumor invasiveness. Further studies will focus on larger numbers of tumors, metastatic tissue, and undifferentiated carcinomas, as well as application of this assay to products from fine-needle aspirates as a potential diagnostic and prognostic marker in thyroid neoplasms.
Collapse
Affiliation(s)
- B R Haugen
- Department of Medicine, University of Colorado Health Sciences Center, Denver, USA
| | | | | | | | | | | | | |
Collapse
|