1
|
Penberthy L, Friedman S. The SEER Program's evolution: supporting clinically meaningful population-level research. J Natl Cancer Inst Monogr 2024; 2024:110-117. [PMID: 39102886 DOI: 10.1093/jncimonographs/lgae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/06/2024] [Accepted: 04/17/2024] [Indexed: 08/07/2024] Open
Abstract
Although the Surveillance, Epidemiology, and End Results (SEER) Program has maintained high standards of quality and completeness, the traditional data captured through population-based cancer surveillance are no longer sufficient to understand the impact of cancer and its outcomes. Therefore, in recent years, the SEER Program has expanded the population it covers and enhanced the types of data that are being collected. Traditionally, surveillance systems collected data characterizing the patient and their cancer at the time of diagnosis, as well as limited information on the initial course of therapy. SEER performs active follow-up on cancer patients from diagnosis until death, ascertaining critical information on mortality and survival over time. With the growth of precision oncology and rapid development and dissemination of new diagnostics and treatments, the limited data that registries have traditionally captured around the time of diagnosis-although useful for characterizing the cancer-are insufficient for understanding why similar patients may have different outcomes. The molecular composition of the tumor and genetic factors such as BRCA status affect the patient's treatment response and outcomes. Capturing and stratifying by these critical risk factors are essential if we are to understand differences in outcomes among patients who may be demographically similar, have the same cancer, be diagnosed at the same stage, and receive the same treatment. In addition to the tumor characteristics, it is essential to understand all the therapies that a patient receives over time, not only for the initial treatment period but also if the cancer recurs or progresses. Capturing this subsequent therapy is critical not only for research but also to help patients understand their risk at the time of therapeutic decision making. This article serves as an introduction and foundation for a JNCI Monograph with specific articles focusing on innovative new methods and processes implemented or under development for the SEER Program. The following sections describe the need to evaluate the SEER Program and provide a summary or introduction of those key enhancements that have been or are in the process of being implemented for SEER.
Collapse
Affiliation(s)
- Lynne Penberthy
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA
| | - Steven Friedman
- Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA
| |
Collapse
|
2
|
Peluso A, Danciu I, Yoon HJ, Yusof JM, Bhattacharya T, Spannaus A, Schaefferkoetter N, Durbin EB, Wu XC, Stroup A, Doherty J, Schwartz S, Wiggins C, Coyle L, Penberthy L, Tourassi GD, Gao S. Deep learning uncertainty quantification for clinical text classification. J Biomed Inform 2024; 149:104576. [PMID: 38101690 PMCID: PMC11467893 DOI: 10.1016/j.jbi.2023.104576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 12/06/2023] [Accepted: 12/10/2023] [Indexed: 12/17/2023]
Abstract
INTRODUCTION Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.
Collapse
Affiliation(s)
- Alina Peluso
- Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
| | - Ioana Danciu
- Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States
| | - Hong-Jun Yoon
- Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States
| | | | | | - Adam Spannaus
- Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States
| | | | - Eric B Durbin
- University of Kentucky, Lexington, KY 40536, United States
| | - Xiao-Cheng Wu
- Louisiana State University, New Orleans, LA 70112, United States
| | - Antoinette Stroup
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, United States
| | | | - Stephen Schwartz
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, United States
| | - Charles Wiggins
- University of New Mexico, Albuquerque, NM 87131, United States
| | - Linda Coyle
- Information Management Services Inc., Calverton, MD 20705, United States
| | - Lynne Penberthy
- National Cancer Institute, Bethesda, MD 20814, United States
| | | | - Shang Gao
- Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States
| |
Collapse
|
3
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.05.23289524. [PMID: 37205575 PMCID: PMC10187451 DOI: 10.1101/2023.05.05.23289524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Objective The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eric B Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | | | - Jeremy L Warner
- Lifespan Health System, Providence, RI, USA
- Legorreta Cancer Center at Brown University, Providence, RI, USA
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
4
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. JCO Clin Cancer Inform 2023; 7:e2300156. [PMID: 38113411 PMCID: PMC10752457 DOI: 10.1200/cci.23.00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 12/21/2023] Open
Abstract
PURPOSE Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - Eric B. Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | | | - Jeremy L. Warner
- Lifespan Health System, Providence, RI
- Legorreta Cancer Center at Brown University, Providence, RI
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|