1
|
Kefeli J, Tatonetti N. TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models. Patterns (N Y) 2024; 5:100933. [PMID: 38487800 PMCID: PMC10935496 DOI: 10.1016/j.patter.2024.100933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/16/2023] [Accepted: 01/25/2024] [Indexed: 03/17/2024]
Abstract
In cancer research, pathology report text is a largely untapped data source. Pathology reports are routinely generated, more nuanced than structured data, and contain added insight from pathologists. However, there are no publicly available datasets for benchmarking report-based models. Two recent advances suggest the urgent need for a benchmark dataset. First, improved optical character recognition (OCR) techniques will make it possible to access older pathology reports in an automated way, increasing the data available for analysis. Second, recent improvements in natural language processing (NLP) techniques using artificial intelligence (AI) allow more accurate prediction of clinical targets from text. We apply state-of-the-art OCR and customized post-processing to report PDFs from The Cancer Genome Atlas, generating a machine-readable corpus of 9,523 reports. Finally, we perform a proof-of-principle cancer-type classification across 32 tissues, achieving 0.992 average AU-ROC. This dataset will be useful to researchers across specialties, including research clinicians, clinical trial investigators, and clinical NLP researchers.
Collapse
Affiliation(s)
- Jenna Kefeli
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Nicholas Tatonetti
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| |
Collapse
|
2
|
Kim HAJ, Zeng PYF, Cecchini M, Shaikh MH, Laxague F, Deng X, Jarycki L, Ryan SEB, Dawson A, Liu MH, Palma DA, Patel K, Mundi N, Barrett JW, Mymryk JS, Boutros PC, Nichols AC. HPV-negative head and neck cancers with adverse pathological features carry specific molecular changes that are associated with survival. Head Neck 2024; 46:353-366. [PMID: 38059331 DOI: 10.1002/hed.27591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 10/21/2023] [Accepted: 11/21/2023] [Indexed: 12/08/2023] Open
Abstract
BACKGROUND Adverse pathological features following surgery in head and neck squamous cell carcinoma (HNSCC) are strongly associated with survival and guide adjuvant therapy. We investigated molecular changes associated with these features. METHODS We downloaded data from the Cancer Genome Atlas and Cancer Proteome Atlas HNSCC cohorts. We compared tumors positive versus negative for perineural invasion (PNI), lymphovascular invasion (LVI), extracapsular spread (ECS), and positive margins (PSM), with multivariable analysis. RESULTS All pathological features were associated with poor survival, as were the following molecular changes: low cyclin E1 (HR = 1.7) and high PKC-alpha (HR = 1.8) in tumors with PNI; six of 13 protein abundance changes with LVI; greater tumor hypoxia and high Raptor (HR = 2.0) and Rictor (HR = 1.6) with ECS; and low p38 (HR = 2.3), high fibronectin (HR = 1.6), low annexin A1 (HR = 3.1), and high caspase-9 (HR = 1.6) abundances with PSM. CONCLUSIONS Pathological features in HNSCC carry specific molecular changes that may explain their poor prognostic associations.
Collapse
Affiliation(s)
- Hugh Andrew Jinwook Kim
- Department of Otolaryngology-Head and Neck Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Peter Y F Zeng
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - Matthew Cecchini
- Department of Pathology and Laboratory Medicine, University of Western Ontario, London, Ontario, Canada
| | - Mushfiq Hassan Shaikh
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - Francisco Laxague
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - Xiaoxiao Deng
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - Laura Jarycki
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - Sarah Elizabeth Belle Ryan
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Pathology and Laboratory Medicine, University of Western Ontario, London, Ontario, Canada
| | - Alice Dawson
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Pathology and Laboratory Medicine, University of Western Ontario, London, Ontario, Canada
| | - Mu Han Liu
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
| | - David A Palma
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Oncology, University of Western Ontario, London, Ontario, Canada
| | - Krupal Patel
- Department of Otolaryngology-Head & Neck Surgery, Moffitt Cancer Center, Tampa, Florida, USA
| | - Neil Mundi
- Department of Otolaryngology-Head & Neck Surgery, Southern Illinois University School of Medicine, Springfield, Illinois, USA
| | - John W Barrett
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Oncology, University of Western Ontario, London, Ontario, Canada
| | - Joe S Mymryk
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Oncology, University of Western Ontario, London, Ontario, Canada
- Department of Microbiology & Immunology, University of Western Ontario, London, Ontario, Canada
| | - Paul C Boutros
- Department of Human Genetics, University of California, Los Angeles, California, USA
- Department of Urology, University of California, Los Angeles, California, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, California, USA
- Institute for Precision Health, University of California, Los Angeles, California, USA
- Jonsson Comprehensive Cancer Centre, University of California, Los Angeles, California, USA
| | - Anthony C Nichols
- Department of Otolaryngology-Head and Neck Surgery, University of Western Ontario, London, Ontario, Canada
- Department of Oncology, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
3
|
Negoita S, Chen HS, Sanchez PV, Sherman RL, Henley SJ, Siegel R, Sung H, Scott S, Benard VB, Kohler BA, Jemal A, Cronin K. Annual Report to the Nation on the Status of Cancer, part 2: Early assessment of the COVID-19 pandemic's impact on cancer diagnosis. Cancer 2024; 130:117-127. [PMID: 37755665 PMCID: PMC10841454 DOI: 10.1002/cncr.35026] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 07/14/2023] [Accepted: 08/11/2023] [Indexed: 09/28/2023]
Abstract
BACKGROUND With access to cancer care services limited because of coronavirus disease 2019 control measures, cancer diagnosis and treatment have been delayed. The authors explored changes in the counts of US incident cases by cancer type, age, sex, race, and disease stage in 2020. METHODS Data were extracted from selected US population-based cancer registries for diagnosis years 2015-2020 using first-submission data from the North American Association of Central Cancer Registries. After a quality assessment, the monthly numbers of newly diagnosed cancer cases were extracted for six cancer types: colorectal, female breast, lung, pancreas, prostate, and thyroid. The observed numbers of incident cancer cases in 2020 were compared with the estimated numbers by calculating observed-to-expected (O/E) ratios. The expected numbers of incident cases were extrapolated using Joinpoint trend models. RESULTS The authors report an O/E ratio <1.0 for major screening-eligible cancer sites, indicating fewer newly diagnosed cases than expected in 2020. The O/E ratios were lowest in April 2020. For every cancer site except pancreas, Asians/Pacific Islanders had the lowest O/E ratio of any race group. O/E ratios were lower for cases diagnosed at localized stages than for cases diagnosed at advanced stages. CONCLUSIONS The current analysis provides strong evidence for declines in cancer diagnoses, relative to the expected numbers, between March and May of 2020. The declines correlate with reductions in pathology reports and are greater for cases diagnosed at in situ and localized stage, triggering concerns about potential poor cancer outcomes in the coming years, especially in Asians/Pacific Islanders. PLAIN LANGUAGE SUMMARY To help control the spread of coronavirus disease 2019 (COVID-19), health care organizations suspended nonessential medical procedures, including preventive cancer screening, during early 2020. Many individuals canceled or postponed cancer screening, potentially delaying cancer diagnosis. This study examines the impact of the COVID-19 pandemic on the number of newly diagnosed cancer cases in 2020 using first-submission, population-based cancer registry database. The monthly numbers of newly diagnosed cancer cases in 2020 were compared with the expected numbers based on past trends for six cancer sites. April 2020 had the sharpest decrease in cases compared with previous years, most likely because of the COVID-19 pandemic.
Collapse
Affiliation(s)
- Serban Negoita
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland
| | - Huann-Sheng Chen
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland
| | - Pamela V. Sanchez
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland
| | - Recinda L. Sherman
- North American Association of Central Cancer Registries, Springfield, Illinois
| | - S. Jane Henley
- Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Rebecca Siegel
- Surveillance and Health Services Research, American Cancer Society, Atlanta, Georgia
| | - Hyuna Sung
- Surveillance and Health Services Research, American Cancer Society, Atlanta, Georgia
| | - Susan Scott
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland
| | - Vicki B. Benard
- Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Betsy A. Kohler
- North American Association of Central Cancer Registries, Springfield, Illinois
| | - Ahmedin Jemal
- Surveillance and Health Services Research, American Cancer Society, Atlanta, Georgia
| | - Kathleen Cronin
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland
| |
Collapse
|
4
|
Kefeli J, Tatonetti N. Benchmark Pathology Report Text Corpus with Cancer Type Classification. medRxiv 2023:2023.08.03.23293618. [PMID: 37609238 PMCID: PMC10441484 DOI: 10.1101/2023.08.03.23293618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
In cancer research, pathology report text is a largely un-tapped data source. Pathology reports are routinely generated, more nuanced than structured data, and contain added insight from pathologists. However, there are no publicly-available datasets for benchmarking report-based models. Two recent advances suggest the urgent need for a benchmark dataset. First, improved optical character recognition (OCR) techniques will make it possible to access older pathology reports in an automated way, increasing data available for analysis. Second, recent improvements in natural language processing (NLP) techniques using AI allow more accurate prediction of clinical targets from text. We apply state-of-the-art OCR and customized post-processing to publicly available report PDFs from The Cancer Genome Atlas, generating a machine-readable corpus of 9,523 reports. We perform a proof-of-principle cancer-type classification across 32 tissues, achieving 0.992 average AU-ROC. This dataset will be useful to researchers across specialties, including research clinicians, clinical trial investigators, and clinical NLP researchers.
Collapse
Affiliation(s)
- Jenna Kefeli
- Department of Systems Biology, Columbia University, New York, New York, 10032, United States
| | - Nicholas Tatonetti
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, 90048, United States
| |
Collapse
|
5
|
Diab KM, Deng J, Wu Y, Yesha Y, Collado-Mesa F, Nguyen P. Natural Language Processing for Breast Imaging: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13081420. [PMID: 37189521 DOI: 10.3390/diagnostics13081420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open
Abstract
Natural Language Processing (NLP) has gained prominence in diagnostic radiology, offering a promising tool for improving breast imaging triage, diagnosis, lesion characterization, and treatment management in breast cancer and other breast diseases. This review provides a comprehensive overview of recent advances in NLP for breast imaging, covering the main techniques and applications in this field. Specifically, we discuss various NLP methods used to extract relevant information from clinical notes, radiology reports, and pathology reports and their potential impact on the accuracy and efficiency of breast imaging. In addition, we reviewed the state-of-the-art in NLP-based decision support systems for breast imaging, highlighting the challenges and opportunities of NLP applications for breast imaging in the future. Overall, this review underscores the potential of NLP in enhancing breast imaging care and offers insights for clinicians and researchers interested in this exciting and rapidly evolving field.
Collapse
Affiliation(s)
- Kareem Mahmoud Diab
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
| | - Jamie Deng
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
| | - Yusen Wu
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
| | - Yelena Yesha
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
- Department of Radiology, Miller School of Medicine, University of Miami, Miami, FL 33146, USA
| | - Fernando Collado-Mesa
- Department of Radiology, Miller School of Medicine, University of Miami, Miami, FL 33146, USA
| | - Phuong Nguyen
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
- OpenKnect Inc., Halethorpe, MD 21227, USA
| |
Collapse
|
6
|
Levy J, Vattikonda N, Haudenschild C, Christensen B, Vaickus L. Comparison of Machine-Learning Algorithms for the Prediction of Current Procedural Terminology (CPT) Codes from Pathology Reports. J Pathol Inform 2022; 13:3. [PMID: 35127232 PMCID: PMC8802304 DOI: 10.4103/jpi.jpi_52_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/20/2021] [Accepted: 11/30/2021] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Pathology reports serve as an auditable trial of a patient's clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs).
Collapse
Affiliation(s)
- Joshua Levy
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA,Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Corresponding author at: Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, 1 Medical Center Drive, Borwell Building 4th Floor, Lebanon NH 03766, USA.
| | - Nishitha Vattikonda
- Thomas Jefferson High School for Science and Technology, Alexandria, VA, USA
| | | | - Brock Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Louis Vaickus
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| |
Collapse
|
7
|
Warren JL, Noone AM, Stevens J, Wu XC, Hseih MC, Mumphrey B, Schmidt R, Coyle L, Shields R, Mariotto AB. The Utility of Pathology Reports to Identify Persons With Cancer Recurrence. Med Care 2022; 60:44-49. [PMID: 34812787 PMCID: PMC8720471 DOI: 10.1097/mlr.0000000000001669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
BACKGROUND Cancer recurrence is an important measure of the impact of cancer treatment. However, no population-based data on recurrence are available. Pathology reports could potentially identify cancer recurrences. Their utility to capture recurrences is unknown. OBJECTIVE This analysis assesses the sensitivity of pathology reports to identify patients with cancer recurrence and the stage at recurrence. SUBJECTS The study includes patients with recurrent breast (n=214) or colorectal (n=203) cancers. RESEARCH DESIGN This retrospective analysis included patients from a population-based cancer registry who were part of the Patient-Centered Outcomes Research (PCOR) Study, a project that followed cancer patients in-depth for 5 years after diagnosis to identify recurrences. MEASURES Information abstracted from pathology reports for patients with recurrence was compared with their PCOR data (gold standard) to determine what percent had a pathology report at the time of recurrence, the sensitivity of text in the report to identify recurrence, and if the stage at recurrence could be determined from the pathology report. RESULTS One half of cancer patients had a pathology report near the time of recurrence. For patients with a pathology report, the report's sensitivity to identify recurrence was 98.1% for breast cancer cases and 95.7% for colorectal cancer cases. The specific stage at recurrence from the pathology report had a moderate agreement with gold-standard data. CONCLUSIONS Pathology reports alone cannot measure population-based recurrence of solid cancers but can identify specific cohorts of recurrent cancer patients. As electronic submission of pathology reports increases, these reports may identify specific recurrent patients in near real-time.
Collapse
Affiliation(s)
- Joan L. Warren
- National Cancer Institute/Division of Cancer Control and Population Science, Bethesda, Maryland 20892
| | - Anne-Michelle Noone
- National Cancer Institute/Division of Cancer Control and Population Science, Bethesda, Maryland 20892
| | | | - Xiao-Cheng Wu
- Louisiana Tumor Registry, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112
| | - Mei-chin Hseih
- Louisiana Tumor Registry, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112
| | - Brent Mumphrey
- Louisiana Tumor Registry, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112
| | | | - Linda Coyle
- Information Management Services, Calverton, Maryland 20705
| | - Rusty Shields
- Information Management Services, Calverton, Maryland 20705
| | - Angela B. Mariotto
- National Cancer Institute/Division of Cancer Control and Population Science, Bethesda, Maryland 20892
| |
Collapse
|
8
|
Robertson S. A Novel Web Application for Rapidly Searching the Diagnostic Case Archive. J Pathol Inform 2020; 11:39. [PMID: 33828897 PMCID: PMC8020840 DOI: 10.4103/jpi.jpi_43_20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 06/24/2020] [Accepted: 08/31/2020] [Indexed: 01/12/2023] Open
Abstract
Academic pathologists must have the ability to search their institution's archive of diagnostic case data. This ability is foundational for research, education, and other academic activities. However, the built-in search functions of commercial laboratory information systems are not always optimized for this activity, leading to delays between an initial search request, and eventual results delivery. To solve this problem, a novel web-based search platform was developed, named Pathtools, which allows our staff and trainees to directly and rapidly search our diagnostic case archive. Pathtools was built with open-source components and features a web-based user-interface. Pathtools uses an SQL database which was populated with anatomic pathology case data going back to 1980, and contains 4.2 million cases (as of July 31, 2020). Pathtools has two major modes of operation, “Preview Mode” and “Research Mode.” Since deployment in February of 2019, Pathtools carried out 33,817 searches in Preview Mode, averaging 0.72 s (standard deviation = 1.7) between search submission, and on-screen display of search results. In Research Mode, Pathtools has also been used to produce data sets for research activity, providing the data used in many abstracts and manuscripts our investigators submitted recently. Interestingly, 75% of search activity is from trainees during their preview time. In a survey of residents and fellows, 83% used Pathtools during the majority of their preview sessions, demonstrating an important role for this resource in trainee education. In conclusion, a web-based search tool can rapidly and securely provide search capability directly to end-users, which has augmented trainee education and research activity in our department.
Collapse
Affiliation(s)
- Scott Robertson
- Department of Anatomic Pathology, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
9
|
Short B. Selected Aspects of Ocular Toxicity Studies With a Focus on High-Quality Pathology Reports: A Pathology/Toxicology Consultant's Perspective. Toxicol Pathol 2020; 49:673-699. [PMID: 32815474 DOI: 10.1177/0192623320946712] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Ocular toxicity studies are the bedrock of nonclinical ocular drug and drug-device development, and there has been an evolution in experience, technologies, and challenges to address that ensures safe clinical trials and marketing authorization. The expectations of a well-designed ocular toxicity study and the generation of a coherent, integrative ocular toxicology report and subreports are high, and this article provides a pathology/toxicology consultant's perspective on achieving that goal. The first objective is to cover selected aspects of study designs for ocular toxicity studies including considerations for contract research organization selection, minipig species selection, unilateral versus bilateral dosing, and in-life parameters based on fit-for-purpose study objectives. The main objective is a focus on a high-quality ocular pathology report that includes ocular histology procedures to meet regulatory expectations and a report narrative and tables that correlate microscopic findings with key ophthalmic findings and presents a clear interpretation of test article-, vehicle-, and procedure-related ocular and extraocular findings with identification of adversity and a pathology peer review. The last objective covers considerations for a high-quality ophthalmology report, which in concert with a high-quality pathology report, will pave the way for a best quality toxicology report for an ocular toxicity study.
Collapse
Affiliation(s)
- Brian Short
- Brian Short Consulting, LLC, Laguna Beach, CA, USA
| |
Collapse
|
10
|
Alawad M, Gao S, Qiu J, Schaefferkoetter N, Hinkle JD, Yoon HJ, Christian JB, Wu XC, Durbin EB, Jeong JC, Hands I, Rust D, Tourassi G. Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports. IEEE EMBS Int Conf Biomed Health Inform 2019; 2019:10.1109/bhi.2019.8834586. [PMID: 36081613 PMCID: PMC9450101 DOI: 10.1109/bhi.2019.8834586] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.
Collapse
Affiliation(s)
- Mohammed Alawad
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Shang Gao
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - John Qiu
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Noah Schaefferkoetter
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jacob D Hinkle
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Hong-Jun Yoon
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - J Blair Christian
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Xiao-Cheng Wu
- Louisiana Tumor Registry, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Eric B Durbin
- Kentucky Cancer Registry, University of Kentucky, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, University of Kentucky, Lexington, KY, USA
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - David Rust
- Kentucky Cancer Registry, University of Kentucky, Lexington, KY, USA
| | - Georgia Tourassi
- Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| |
Collapse
|
11
|
Musselman RP, Rothwell D, Auer RC, Moloo H, Boushey RP, van Walraven C. Can Text-Search Methods of Pathology Reports Accurately Identify Patients with Rectal Cancer in Large Administrative Databases? J Pathol Inform 2018; 9:18. [PMID: 29862128 PMCID: PMC5952547 DOI: 10.4103/jpi.jpi_71_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 02/26/2018] [Indexed: 01/05/2023] Open
Abstract
Background: The aim of this study is to derive and to validate a cohort of rectal cancer surgical patients within administrative datasets using text-search analysis of pathology reports. Materials and Methods: A text-search algorithm was developed and validated on pathology reports from 694 known rectal cancers, 1000 known colon cancers, and 1000 noncolorectal specimens. The algorithm was applied to all pathology reports available within the Ottawa Hospital Data Warehouse from 1996 to 2010. Identified pathology reports were validated as rectal cancer specimens through manual chart review. Sensitivity, specificity, and positive predictive value (PPV) of the text-search methodology were calculated. Results: In the derivation cohort of pathology reports (n = 2694), the text-search algorithm had a sensitivity and specificity of 100% and 98.6%, respectively. When this algorithm was applied to all pathology reports from 1996 to 2010 (n = 284,032), 5588 pathology reports were identified as consistent with rectal cancer. Medical record review determined that 4550 patients did not have rectal cancer, leaving a final cohort of 1038 rectal cancer patients. Sensitivity and specificity of the text-search algorithm were 100% and 98.4%, respectively. PPV of the algorithm was 18.6%. Conclusions: Text-search methodology is a feasible way to identify all rectal cancer surgery patients through administrative datasets with high sensitivity and specificity. However, in the presence of a low pretest probability, text-search methods must be combined with a validation method, such as manual chart review, to be a viable approach.
Collapse
Affiliation(s)
| | - Deanna Rothwell
- Department Epidemiology and Community Medicine, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - Rebecca C Auer
- Division of General Surgery, University of Ottawa, Ottawa, ON, Canada
| | - Husein Moloo
- Division of General Surgery, University of Ottawa, Ottawa, ON, Canada
| | - Robin P Boushey
- Division of General Surgery, University of Ottawa, Ottawa, ON, Canada
| | - Carl van Walraven
- Department Epidemiology and Community Medicine, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| |
Collapse
|
12
|
Xie F, Lee J, Munoz-Plaza CE, Hahn EE, Chen W. Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization. J Pathol Inform 2017; 8:48. [PMID: 29416911 PMCID: PMC5760847 DOI: 10.4103/jpi.jpi_55_17] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 11/01/2017] [Indexed: 12/29/2022] Open
Abstract
Background: Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. Methods: We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG) were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM) concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS) application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. Results: A total of 437 breast cancer concept terms and 14 combinations of “breast“and “cancer“ terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. Conclusions: The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis in support of clinical research studies.
Collapse
Affiliation(s)
- Fagen Xie
- Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group, Pasadena, CA, USA
| | - Janet Lee
- Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group, Pasadena, CA, USA
| | - Corrine E Munoz-Plaza
- Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group, Pasadena, CA, USA
| | - Erin E Hahn
- Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group, Pasadena, CA, USA
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California Medical Group, Pasadena, CA, USA
| |
Collapse
|
13
|
Krigbaum NY, Rubin RA, Cirillo PM, Terry MB, Habel LA, Morris C, Cohn BA. Feasibility of collecting tumor samples of breast cancer patients diagnosed up to 50 years ago in the Child Health and Development Studies. J Dev Orig Health Dis 2017; 8:331-336. [PMID: 28260556 PMCID: PMC7089678 DOI: 10.1017/s204017441700006x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Environmental exposures during pregnancy may increase breast cancer risk for mothers and female offspring. Tumor tissue assays may provide insight regarding the mechanisms. This study assessed the feasibility of obtaining tumor samples and pathology reports from mothers (F0) who were enrolled in the Child Health and Development Studies during pregnancy from 1959 to 1967 and their daughters (F1) who developed breast cancer over more than 50 years of follow-up. Breast cancer cases were identified through linkage to the California Cancer Registry and self-report. Written consent was obtained from 116 F0 and 95 F1 breast cancer survivors to access their pathology reports and tumor blocks. Of those contacted, 62% consented, 13% refused and 24% did not respond. We obtained tissue samples for 57% and pathology reports for 75%, and if diagnosis was made ⩽10 years we obtained tissue samples and pathology reports for 91% and 79%, respectively. Obtaining pathology reports and tumor tissues of two generations is feasible and will support investigation of the relationship between early-life exposures and molecular tumor markers. However, we found that more recent diagnosis increased the accessibility of tumor tissue. We recommend that cohorts request consent for obtaining future tumor tissues at study enrollment and implement real-time tissue collection to enhance success of collecting tumor samples and data.
Collapse
Affiliation(s)
- N. Y. Krigbaum
- Child Health and Development Studies, Public Health Institute, Oakland, CA, USA
| | - R. A. Rubin
- Child Health and Development Studies, Public Health Institute, Oakland, CA, USA
| | - P. M. Cirillo
- Child Health and Development Studies, Public Health Institute, Oakland, CA, USA
| | - M. B. Terry
- Columbia University, Mailman School of Public Health, New York, NY, USA
| | - L. A. Habel
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - C. Morris
- California Cancer Reporting and Epidemiologic Surveillance Program, Institute for Population Health Improvement, UCD Health System, Sacramento, CA, USA
| | - B. A. Cohn
- Child Health and Development Studies, Public Health Institute, Oakland, CA, USA
| |
Collapse
|
14
|
Zheng S, Lu JJ, Appin C, Brat D, Wang F. Support patient search on pathology reports with interactive online learning based data extraction. J Pathol Inform 2015; 6:51. [PMID: 26605116 PMCID: PMC4629306 DOI: 10.4103/2153-3539.166012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 06/06/2015] [Indexed: 11/09/2022] Open
Abstract
Background: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. Methods: We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users’ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. Results: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Conclusions: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
Collapse
Affiliation(s)
- Shuai Zheng
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| | - James J Lu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Christina Appin
- Department of Pathology and Laboratory Medicine, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Daniel Brat
- Department of Pathology and Laboratory Medicine, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Fusheng Wang
- Department of Biomedical Informatics and Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|