1
|
McGonagle K, Dematt EJ, Mi Z, Biswas K, Schroeck FR. Non-Muscle Invasive Bladder Cancer: Many More Patients Die With It Than Of It. Bladder Cancer 2024; 10:113-117. [PMID: 39131873 PMCID: PMC11308635 DOI: 10.3233/blc-230099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 04/06/2024] [Indexed: 08/13/2024]
Abstract
BACKGROUND The National Cancer Institute SEER Program regularly publishes bladder-cancer specific survival statistics. However, this data is for all bladder cancers, and information for non-muscle invasive bladder cancer (NMIBC) is difficult to obtain. OBJECTIVE To quantify 5-year overall and bladder cancer-specific survival in a cohort of Department of Veterans Affairs (VA) patients diagnosed with NMIBC. METHODS We identified VA patients diagnosed with NMIBC who underwent a transurethral resection from 2003-2013. The patient demographics and Charlson Comorbidity Index were categorized. We acquired the patients' date of death from the Veterans Health Administration's Death Ascertainment File and their cause of death from the Mortality Data Repository. We calculated Kaplan Meier estimates of survival. RESULTS A total of 27,008 patients were included; median age was 69 and almost all were male (99%). The median comorbidity score was 4. The most prevalent comorbidity indicators included Chronic Pulmonary Disease (48%), cancer other than Bladder (41%), and diabetes (40%). This cohort was found to have a 5-year overall survival of 68% (99% CI 67% -69%) and a 5-year bladder cancer-specific survival of 93% (99% CI 92% -94%). CONCLUSIONS The 5-year bladder cancer-specific survival in patients diagnosed with non-muscle invasive bladder cancer is substantially higher than the 5-year overall survival. This difference may be related to the severity and number of comorbidities that patients in this population must manage. This warrants further research into the necessity of currently recommended high-intensity cancer surveillance for individuals with NMIBC.
Collapse
Affiliation(s)
- Kathryn McGonagle
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- White River Junction Department of Veterans Affairs (VA) Healthcare System, White River Junction, VT, USA
| | - Ellen J. Dematt
- VA Cooperative Studies Program Coordinating Center, Perry Point, MD, USA
| | - Zhibao Mi
- VA Cooperative Studies Program Coordinating Center, Perry Point, MD, USA
| | - Kousick Biswas
- VA Cooperative Studies Program Coordinating Center, Perry Point, MD, USA
| | - Florian R. Schroeck
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
- White River Junction Department of Veterans Affairs (VA) Healthcare System, White River Junction, VT, USA
- Dartmouth Cancer Center and the Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, NH, USA
| |
Collapse
|
2
|
Hashemi Gheinani A, Kim J, You S, Adam RM. Bioinformatics in urology - molecular characterization of pathophysiology and response to treatment. Nat Rev Urol 2024; 21:214-242. [PMID: 37604982 DOI: 10.1038/s41585-023-00805-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/13/2023] [Indexed: 08/23/2023]
Abstract
The application of bioinformatics has revolutionized the practice of medicine in the past 20 years. From early studies that uncovered subtypes of cancer to broad efforts spearheaded by the Cancer Genome Atlas initiative, the use of bioinformatics strategies to analyse high-dimensional data has provided unprecedented insights into the molecular basis of disease. In addition to the identification of disease subtypes - which enables risk stratification - informatics analysis has facilitated the identification of novel risk factors and drivers of disease, biomarkers of progression and treatment response, as well as possibilities for drug repurposing or repositioning; moreover, bioinformatics has guided research towards precision and personalized medicine. Implementation of specific computational approaches such as artificial intelligence, machine learning and molecular subtyping has yet to become widespread in urology clinical practice for reasons of cost, disruption of clinical workflow and need for prospective validation of informatics approaches in independent patient cohorts. Solving these challenges might accelerate routine integration of bioinformatics into clinical settings.
Collapse
Affiliation(s)
- Ali Hashemi Gheinani
- Department of Urology, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Urology, Inselspital, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Jina Kim
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Sungyong You
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Rosalyn M Adam
- Department of Urology, Boston Children's Hospital, Boston, MA, USA.
- Department of Surgery, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
3
|
Narayan VM, Siolas D, Meadows ES, Turzhitsky V, Sillah A, Imai K, McMurry AJ, Li H. Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer. JCO Clin Cancer Inform 2023; 7:e2300096. [PMID: 37906722 PMCID: PMC10642898 DOI: 10.1200/cci.23.00096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/08/2023] [Accepted: 09/14/2023] [Indexed: 11/02/2023] Open
Abstract
PURPOSE Treatment of non-muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics. METHODS We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard. RESULTS The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS. CONCLUSION The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.
Collapse
Affiliation(s)
- Vikram M. Narayan
- Emory University School of Medicine, Grady Memorial Hospital, Atlanta, GA
| | - Despina Siolas
- Weill Cornell Medical College, New York, NY
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY
| | | | | | | | | | - Andrew J. McMurry
- Ciox Health, Alpharetta, GA
- Boston Children's Hospital, Harvard Medical School, Boston, MA
| | | |
Collapse
|
4
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
5
|
Santos T, Tariq A, Gichoya JW, Trivedi H, Banerjee I. Automatic Classification of Cancer Pathology Reports: A Systematic Review. J Pathol Inform 2022; 13:100003. [PMID: 35242443 PMCID: PMC8860734 DOI: 10.1016/j.jpi.2022.100003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 11/12/2021] [Indexed: 11/30/2022] Open
Abstract
Pathology reports primarily consist of unstructured free text and thus the clinical information contained in the reports is not trivial to access or query. Multiple natural language processing (NLP) techniques have been proposed to automate the coding of pathology reports via text classification. In this systematic review, we follow the guidelines proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2020: BMJ.) to identify the NLP systems for classifying pathology reports published between the years of 2010 and 2021. Based on our search criteria, a total of 3445 records were retrieved, and 25 articles met the final review criteria. We benchmarked the systems based on methodology, complexity of the prediction task and core types of NLP models: i) Rule-based and Intelligent systems, ii) statistical machine learning, and iii) deep learning. While certain tasks are well addressed by these models, many others have limitations and remain as open challenges, such as, extraction of many cancer characteristics (size, shape, type of cancer, others) from pathology reports. We investigated the final set of papers (25) and addressed their potential as well as their limitations. We hope that this systematic review helps researchers prioritize the development of innovated approaches to tackle the current limitations and help the advancement of cancer research.
Collapse
Affiliation(s)
- Thiago Santos
- Department of Computer Science, Emory University, Atlanta, GA, USA
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Corresponding author.
| | - Amara Tariq
- Department of Radiology, Mayo Clinic, Phoenix, AZ, USA
| | - Judy Wawira Gichoya
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Department of Radiology, Emory School of Medicine, Atlanta, GA, USA
| | - Hari Trivedi
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Department of Radiology, Emory School of Medicine, Atlanta, GA, USA
| | - Imon Banerjee
- Department of Radiology, Mayo Clinic, Phoenix, AZ, USA
- Department of Computer Engineering, Arizona State University, AZ, USA
| |
Collapse
|
6
|
Yang R, Zhu D, Howard LE, De Hoedt A, Schroeck FR, Klaassen Z, Freedland SJ, Williams SB. Context-Based Identification of Muscle Invasion Status in Patients With Bladder Cancer Using Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2100097. [PMID: 35073149 DOI: 10.1200/cci.21.00097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Mortality from bladder cancer (BC) increases exponentially once it invades the muscle, with inherent challenges delineating at the population level. We sought to develop and validate a natural language processing (NLP) model for automatically identifying patients with muscle-invasive bladder cancer (MIBC). METHODS All patients with a Current Procedural Terminology code for transurethral resection of bladder tumor (TURBT; n = 76,060) were selected from the Department of Veterans Affairs (VA) database. A sample of 600 patients (with 2,337 full-text notes) who had TURBT and confirmed pathology results were selected for NLP model development and validation. The NLP performance was assessed by calculating the sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and overall accuracy at the individual note and patient levels. RESULTS In the validation cohort, the NLP model had average overall accuracies of 94% and 96% at the note and patient levels. Specifically, the F1 score and overall accuracy for predicting muscle invasion at the patient level were 0.87% and 96%, respectively. The model classified nonmuscle-invasive bladder cancer (NMIBC) with overall accuracies of 90% and 93% at the note and patient levels. When applying the model to 71,200 patients VA-wide, the model classified 13,642 (19%) as having MIBC and 47,595 (66%) as NMIBC and was able to identify invasion status for 96% of patients with TURBT at the population level. Inherent limitations include a relatively small training set, given the size of the VA population. CONCLUSION This NLP model, with high accuracy, may be a practical tool for efficiently identifying BC invasion status and aid in population-based BC research.
Collapse
Affiliation(s)
- Ruixin Yang
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Di Zhu
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Lauren E Howard
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Duke Cancer Institute, Duke University School of Medicine, Durham, NC
| | - Amanda De Hoedt
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Florian R Schroeck
- White River Junction VA Medical Center, White River Junction, VT.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH
| | - Zachary Klaassen
- Division of Urology, Medical College of Georgia at Augusta University, Augusta, GA.,Georgia Cancer Center, Augusta, GA
| | - Stephen J Freedland
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Division of Urology, Department of Surgery, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA.,Center for Integrated Research in Cancer and Lifestyle, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Stephen B Williams
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Department of Surgery, Division of Urology, The University of Texas Medical Branch at Galveston, Galveston, TX
| |
Collapse
|
7
|
Park B, Altieri N, DeNero J, Odisho AY, Yu B. Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity. JAMIA Open 2021; 4:ooab085. [PMID: 34604711 PMCID: PMC8484934 DOI: 10.1093/jamiaopen/ooab085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 09/06/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVE We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. MATERIALS AND METHODS Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. RESULTS For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. CONCLUSIONS Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports.
Collapse
Affiliation(s)
- Briton Park
- Department of Statistics, University of California, Berkeley, California, USA
| | - Nicholas Altieri
- Department of Statistics, University of California, Berkeley, California, USA
| | - John DeNero
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
| | - Anobel Y Odisho
- Department of Urology and Helen Diller Family Comprehensive Cancer Center, School of Medicine, University of California, San Francisco, California, USA
- Department of Epidemiology & Biostatistics, School of Medicine, University of California, San Francisco, California, USA
- Center for Digital Health Innovation, University of California, San Francisco, California, USA
| | - Bin Yu
- Department of Statistics, University of California, Berkeley, California, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
- Chan-Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
8
|
Chang RW, Tucker LY, Rothenberg KA, Lancaster EM, Avins AL, Kuang HC, Faruqi RM, Nguyen-Huynh MN. Establishing a carotid artery stenosis disease cohort for comparative effectiveness research using natural language processing. J Vasc Surg 2021; 74:1937-1947.e3. [PMID: 34182027 DOI: 10.1016/j.jvs.2021.05.054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 05/19/2021] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Investigation of asymptomatic carotid stenosis treatment is hindered by the lack of a contemporary population-based disease cohort. We describe the use of natural language processing (NLP) to identify stenosis in patients undergoing carotid imaging. METHODS Adult patients with carotid imaging between 2008 and 2012 in a large integrated health care system were identified and followed through 2017. An NLP process was developed to characterize carotid stenosis according to the Society of Radiologists in Ultrasound (for ultrasounds) and North American Symptomatic Carotid Endarterectomy Trial (NASCET) (for axial imaging) guidelines. The resulting algorithm assessed text descriptors to categorize normal/non-hemodynamically significant stenosis, moderate or severe stenosis as well as occlusion in both carotid ultrasound (US) and axial imaging (computed tomography and magnetic resonance angiography [CTA/MRA]). For US reports, internal carotid artery systolic and diastolic velocities and velocity ratios were assessed and matched for laterality to supplement accuracy. To validate the NLP algorithm, positive predictive value (PPV or precision) and sensitivity (recall) were calculated from simple random samples from the population of all imaging studies. Lastly, all non-normal studies were manually reviewed for confirmation for prevalence estimates and disease cohort assembly. RESULTS A total of 95,896 qualifying index studies (76,276 US and 19,620 CTA/MRA) were identified among 94,822 patients including 1059 patients who underwent multiple studies on the same day. For studies of normal/non-hemodynamically significant stenosis arteries, the NLP algorithm showed excellent performance with a PPV of 99% for US and 96.5% for CTA/MRA. PPV/sensitivity to identify a non-normal artery with correct laterality in the CTA/MRA and US samples were 76.9% (95% confidence interval [CI], 74.1%-79.5%)/93.1% (95% CI, 91.1%-94.8%) and 74.7% (95% CI, 69.3%-79.5%)/94% (95% CI, 90.2%-96.7%), respectively. Regarding cohort assembly, 15,522 patients were identified with diseased carotid artery, including 2674 exhibiting equal bilateral disease. This resulted in a laterality-specific cohort with 12,828 moderate, 5283 severe, and 1895 occluded arteries and 326 diseased arteries with unknown stenosis. During follow-up, 30.1% of these patients underwent 61,107 additional studies. CONCLUSIONS Use of NLP to detect carotid stenosis or occlusion can result in accurate exclusion of normal/non-hemodynamically significant stenosis disease states with more moderate precision with lesion identification, which can substantially reduce the need for manual review. The resulting cohort allows for efficient research and holds promise for similar reporting in other vascular diseases.
Collapse
Affiliation(s)
- Robert W Chang
- Department of Vascular Surgery, The Permanente Medical Group, South San Francisco, Calif; Division of Research, Kaiser Permanente, Oakland, Calif.
| | | | - Kara A Rothenberg
- Department of Surgery, University of California San Francisco - East Bay, Oakland, Calif
| | | | - Andrew L Avins
- Division of Research, Kaiser Permanente, Oakland, Calif; Departments of Medicine and Epidemiology and Biostatistics, University of California, San Francisco, Calif
| | - Hui C Kuang
- Department of Vascular Surgery, The Permanente Medical Group, San Francisco, Calif
| | - Rishad M Faruqi
- Department of Vascular Surgery, The Permanente Medical Group, Santa Clara, Calif
| | - Mai N Nguyen-Huynh
- Division of Research, Kaiser Permanente, Oakland, Calif; Department of Neurology, The Permanente Medical Group, Walnut Creek, Calif
| |
Collapse
|
9
|
Senders JT, Cho LD, Calvachi P, McNulty JJ, Ashby JL, Schulte IS, Almekkawi AK, Mehrtash A, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma. JCO Clin Cancer Inform 2021; 4:25-34. [PMID: 31977252 DOI: 10.1200/cci.19.00060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
PURPOSE The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others. MATERIALS AND METHODS Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports. RESULTS In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents. CONCLUSION This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.
Collapse
Affiliation(s)
- Joeky T Senders
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands
| | - Logan D Cho
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Neuroscience, Brown University, Providence, RI
| | - Paola Calvachi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - John J McNulty
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Joanna L Ashby
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Isabelle S Schulte
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Ahmad Kareem Almekkawi
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Alireza Mehrtash
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - William B Gormley
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Timothy R Smith
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Marike L D Broekman
- Department of Neurosurgery, Leiden University Medical Center, Leiden, the Netherlands.,Department of Neurosurgery, Haaglanden Medical Center, The Hague, the Netherlands
| | - Omar Arnaout
- Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
10
|
Rezaee ME, Ismail AAO, Okorie CL, Seigne JD, Lynch KE, Schroeck FR. Partial Versus Complete Bacillus Calmette-Guérin Intravesical Therapy and Bladder Cancer Outcomes in High-risk Non-muscle-invasive Bladder Cancer: Is NIMBUS the Full Story? EUR UROL SUPPL 2021; 26:35-43. [PMID: 34337506 PMCID: PMC8317819 DOI: 10.1016/j.euros.2021.01.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/25/2021] [Indexed: 01/09/2023] Open
Abstract
Background It is important to understand the implications of reduced bacillus Calmette-Guérin (BCG) treatment intensity, given global shortages and early termination of the NIMBUS trial. Objective To assess the association of partial versus complete BCG induction with outcomes. Design, setting, and participants This is a retrospective cohort study of veterans diagnosed with high-risk non–muscle-invasive bladder cancer (NMIBC; high grade [HG] Ta, T1, or carcinoma in situ) between 2005 and 2011 with follow-up through 2014. Intervention Patients were categorized into partial versus complete BCG induction (one to five vs five or more instillations). Partial BCG induction subgroups were defined for comparison with the NIMBUS trial. Outcome measurements and statistical analysis Propensity score–adjusted regression models were used to assess the association of partial BCG induction with risk of recurrence and bladder cancer death. Results and limitations Among 540 patients, 114 (21.1%) underwent partial BCG induction. Partial versus complete BCG induction was not significantly associated with the risk of recurrence in HG Ta (cumulative incidence [CIn] 46.6% vs 53.9% at 5 yr, p = 0.38) or T1 (CIn 47.1% vs 56.7 at 5 yr, p = 0.19) disease. Similarly, we found no increased risk of bladder cancer death (HG Ta: CIn 4.7%7vs 5.4% at 5 yr, p = 0.87; T1: CIn 10.0% vs 11.4% at 5 yr, p = 0.77). NIMBUS-like induction was associated with an increased risk of recurrence in patients with HG Ta disease, although not statistically significant. Unmeasured confounding is a limitation. Conclusions Cancer outcomes were similar among high-risk NMIBC patients who underwent partial versus complete BCG induction, suggesting that future research is needed to determine how to optimize BCG delivery for the greatest number of patients, especially during global shortages. Patient summary Outcomes were similar between patients receiving partial and complete courses of bacillus Calmette-Guérin (BCG) therapy. Future research is needed to determine how to best deliver BCG to the greatest number of patients, particularly during medication shortages.
Collapse
Affiliation(s)
- Michael E Rezaee
- White River Junction VA Medical Center, White River Junction, VT, USA.,Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| | | | | | - John D Seigne
- White River Junction VA Medical Center, White River Junction, VT, USA.,Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| | - Kristine E Lynch
- VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT, USA
| | - Florian R Schroeck
- White River Junction VA Medical Center, White River Junction, VT, USA.,Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.,Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, USA
| |
Collapse
|
11
|
Oliveira CR, Niccolai P, Ortiz AM, Sheth SS, Shapiro ED, Niccolai LM, Brandt CA. Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study. JMIR Med Inform 2020; 8:e20826. [PMID: 32469840 PMCID: PMC7671846 DOI: 10.2196/20826] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/18/2020] [Accepted: 10/04/2020] [Indexed: 12/13/2022] Open
Abstract
Background Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. Objective This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. Methods A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. Results The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). Conclusions This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.
Collapse
Affiliation(s)
- Carlos R Oliveira
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Patrick Niccolai
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Anette Michelle Ortiz
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States
| | - Sangini S Sheth
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale University School of Medicine, New Haven, CT, United States
| | - Eugene D Shapiro
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, United States.,Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Linda M Niccolai
- Departments of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, United States
| | - Cynthia A Brandt
- Departments of Emergency Medicine, Biostatistics, and Health Informatics, Yale Schools of Medicine and Public Health, New Haven, CT, United States.,Veteran Affairs Connecticut Healthcare System, West Haven, CT, United States
| |
Collapse
|
12
|
Odisho AY, Park B, Altieri N, DeNero J, Cooperberg MR, Carroll PR, Yu B. Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation. JAMIA Open 2020; 3:431-438. [PMID: 33381748 PMCID: PMC7751177 DOI: 10.1093/jamiaopen/ooaa029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 06/09/2020] [Accepted: 07/13/2020] [Indexed: 12/05/2022] Open
Abstract
OBJECTIVE Cancer is a leading cause of death, but much of the diagnostic information is stored as unstructured data in pathology reports. We aim to improve uncertainty estimates of machine learning-based pathology parsers and evaluate performance in low data settings. MATERIALS AND METHODS Our data comes from the Urologic Outcomes Database at UCSF which includes 3232 annotated prostate cancer pathology reports from 2001 to 2018. We approach 17 separate information extraction tasks, involving a wide range of pathologic features. To handle the diverse range of fields, we required 2 statistical models, a document classification method for pathologic features with a small set of possible values and a token extraction method for pathologic features with a large set of values. For each model, we used isotonic calibration to improve the model's estimates of its likelihood of being correct. RESULTS Our best document classifier method, a convolutional neural network, achieves a weighted F1 score of 0.97 averaged over 12 fields and our best extraction method achieves an accuracy of 0.93 averaged over 5 fields. The performance saturates as a function of dataset size with as few as 128 data points. Furthermore, while our document classifier methods have reliable uncertainty estimates, our extraction-based methods do not, but after isotonic calibration, expected calibration error drops to below 0.03 for all extraction fields. CONCLUSIONS We find that when applying machine learning to pathology parsing, large datasets may not always be needed, and that calibration methods can improve the reliability of uncertainty estimates.
Collapse
Affiliation(s)
- Anobel Y Odisho
- Department of Urology, UCSF Helen Diller Family Comprehensive Cancer Center, San Francisco, California, USA
| | - Briton Park
- Department of Statistics, University of California, Berkeley, California, USA
| | - Nicholas Altieri
- Department of Statistics, University of California, Berkeley, California, USA
| | - John DeNero
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, USA
| | - Matthew R Cooperberg
- Department of Urology, UCSF Helen Diller Family Comprehensive Cancer Center, San Francisco, California, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, California, USA
| | - Peter R Carroll
- Department of Urology, UCSF Helen Diller Family Comprehensive Cancer Center, San Francisco, California, USA
| | - Bin Yu
- Department of Statistics, University of California, Berkeley, California, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, USA
- Chan-Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
13
|
Rezaee ME, Lynch KE, Li Z, MacKenzie TA, Seigne JD, Robertson DJ, Sirovich B, Goodney PP, Schroeck FR. The impact of low- versus high-intensity surveillance cystoscopy on surgical care and cancer outcomes in patients with high-risk non-muscle-invasive bladder cancer (NMIBC). PLoS One 2020; 15:e0230417. [PMID: 32203532 PMCID: PMC7089561 DOI: 10.1371/journal.pone.0230417] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 02/28/2020] [Indexed: 11/18/2022] Open
Abstract
Purpose To assess the association of low- vs. guideline-recommended high-intensity cystoscopic surveillance with outcomes among patients with high-risk non-muscle invasive bladder cancer (NMIBC). Materials & methods A retrospective cohort study of Veterans Affairs patients diagnosed with high-risk NMIBC between 2005 and 2011 with follow-up through 2014. Patients were categorized by number of surveillance cystoscopies over two years following diagnosis: low- (1–5) vs. high-intensity (6 or more) surveillance. Propensity score adjusted regression models were used to assess the association of low-intensity cystoscopic surveillance with frequency of transurethral resections, and risk of progression to invasive disease and bladder cancer death. Results Among 1,542 patients, 520 (33.7%) underwent low-intensity cystoscopic surveillance. Patients undergoing low-intensity surveillance had fewer transurethral resections (37 vs. 99 per 100 person-years; p<0.001). Risk of death from bladder cancer did not differ significantly by low (cumulative incidence [CIn] 8.4% [95% CI 6.5–10.9) at 5 years) vs. high-intensity surveillance (CIn 9.1% [95% CI 7.4–11.2) at 5 years, p = 0.61). Low vs. high-intensity surveillance was not associated with increased risk of bladder cancer death among patients with Ta (CIn 5.7% vs. 8.2% at 5 years p = 0.24) or T1 disease at diagnosis (CIn 10.2% vs. 9.1% at 5 years, p = 0.58). Among patients with Ta disease, low-intensity surveillance was associated with decreased risk of progression to invasive disease (T1 or T2) or bladder cancer death (CIn 19.3% vs. 31.3% at 5 years, p = 0.002). Conclusions Patients with high-risk NMIBC undergoing low- vs. high-intensity cystoscopic surveillance underwent fewer transurethral resections, but did not experience an increased risk of progression or bladder cancer death. These findings provide a strong rationale for a clinical trial to determine whether low-intensity surveillance is comparable to high-intensity surveillance for cancer control in high-risk NMIBC.
Collapse
Affiliation(s)
- Michael E. Rezaee
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- Section of Urology Dartmouth Hitchcock Medical Center, Lebanon, NH, United States of America
| | - Kristine E. Lynch
- VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT, United States of America
| | - Zhongze Li
- Biomedical Data Science Department, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
| | - Todd A. MacKenzie
- Biomedical Data Science Department, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
| | - John D. Seigne
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH, United States of America
| | - Douglas J. Robertson
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
| | - Brenda Sirovich
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
| | - Philip P. Goodney
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
| | - Florian R. Schroeck
- White River Junction VA Medical Center, White River Junction, VT, United States of America
- Section of Urology Dartmouth Hitchcock Medical Center, Lebanon, NH, United States of America
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH, United States of America
- Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH, United States of America
- * E-mail:
| |
Collapse
|
14
|
Levine MN, Alexander G, Sathiyapalan A, Agrawal A, Pond G. Learning Health System for Breast Cancer: Pilot Project Experience. JCO Clin Cancer Inform 2019; 3:1-11. [DOI: 10.1200/cci.19.00032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Clinicians need accurate and timely information on the impact of treatments on patient outcomes. The electronic health record (EHR) offers the potential for insight into real-world patient experiences and outcomes, but it is difficult to tap into. Our goal was to apply artificial intelligence technology to the EHR to characterize the clinical course of patients with stage III breast cancer. PATIENTS AND METHODS Data from patients with stage III breast cancer who presented between 2013 and 2015 were extracted from the EHR, de-identified, and imported into the IBM Cloud. Specialized natural language processing (NLP) annotators were developed to extract medical concepts from unstructured clinical text and transform them to structured attributes. In the validation phase, these annotators were applied to 19 additional patients with stage III breast cancer from the same period. The resulting data were compared with that in the medical chart (gold standard) for nine key indicators. RESULTS Information was extracted for 50 patients, including tumor stage (94% stage IIIA, 6% stage IIIB), age (28% 50 years or younger, 52% between 51 and 70 years, and 24% older than 70 years), receptor status (84% estrogen receptor positive, 74% progesterone receptor positive), and first treatment (72% surgery, 26% chemotherapy, 2% endocrine). Events in the patient’s journey were compiled to create a timeline. For 171 data elements, NLP and the chart disagreed for 41 (24%; 95% CI, 17.8% to 31.1%). With additional manipulation using simple logic, the disagreement was reduced to six elements (3.5%; 95% CI, 1.3% to 7.5%; F1 statistic, 0.9694). CONCLUSION It is possible to extract, read, and combine data from the EHR to view the patient journey. The agreement between NLP and the gold standard was high, which supports validity.
Collapse
Affiliation(s)
- Mark N. Levine
- McMaster University, Hamilton, Ontario, Canada
- Escarpment Cancer Research Institute, Hamilton, Ontario, Canada
| | | | | | | | - Greg Pond
- McMaster University, Hamilton, Ontario, Canada
- Escarpment Cancer Research Institute, Hamilton, Ontario, Canada
| |
Collapse
|
15
|
A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019; 100:103301. [PMID: 31589927 DOI: 10.1016/j.jbi.2019.103301] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 09/04/2019] [Accepted: 10/03/2019] [Indexed: 02/07/2023]
Abstract
OBJECTIVE There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. METHODS We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps. RESULTS Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis. CONCLUSION The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.
Collapse
|
16
|
Odisho AY, Bridge M, Webb M, Ameli N, Eapen RS, Stauf F, Cowan JE, Washington SL, Herlemann A, Carroll PR, Cooperberg MR. Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research. JCO Clin Cancer Inform 2019; 3:1-8. [PMID: 31314550 PMCID: PMC6874052 DOI: 10.1200/cci.18.00084] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/17/2019] [Indexed: 01/19/2023] Open
Abstract
PURPOSE Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches. MATERIALS AND METHODS From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen's κ) between approaches assuming no gold standard. RESULTS There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0. CONCLUSION We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.
Collapse
Affiliation(s)
| | - Mark Bridge
- University of California, San Francisco, San Francisco, CA
| | - Mitchell Webb
- University of California, San Francisco Medical Center, San Francisco, CA
| | - Niloufar Ameli
- University of California, San Francisco, San Francisco, CA
| | - Renu S Eapen
- University of California, San Francisco, San Francisco, CA
| | - Frank Stauf
- University of California, San Francisco, San Francisco, CA
| | - Janet E Cowan
- University of California, San Francisco, San Francisco, CA
| | | | | | | | | |
Collapse
|
17
|
Jain NM, Culley A, Knoop T, Micheel C, Osterman T, Levy M. Conceptual Framework to Support Clinical Trial Optimization and End-to-End Enrollment Workflow. JCO Clin Cancer Inform 2019; 3:1-10. [PMID: 31225983 PMCID: PMC6873934 DOI: 10.1200/cci.19.00033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2019] [Indexed: 12/19/2022] Open
Abstract
In this work, we present a conceptual framework to support clinical trial optimization and enrollment workflows and review the current state, limitations, and future trends in this space. This framework includes knowledge representation of clinical trials, clinical trial optimization, clinical trial design, enrollment workflows for prospective clinical trial matching, waitlist management, and, finally, evaluation strategies for assessing improvement.
Collapse
Affiliation(s)
- Neha M. Jain
- Vanderbilt University Medical Center, Nashville, TN
| | | | - Teresa Knoop
- Vanderbilt University Medical Center, Nashville, TN
| | | | | | - Mia Levy
- Vanderbilt University Medical Center, Nashville, TN
- Rush University Medical Center, Chicago, IL
| |
Collapse
|
18
|
Han DS, Lynch KE, Chang JW, Sirovich B, Robertson DJ, Swanton AR, Seigne JD, Goodney PP, Schroeck FR. Overuse of Cystoscopic Surveillance Among Patients With Low-risk Non-Muscle-invasive Bladder Cancer - A National Study of Patient, Provider, and Facility Factors. Urology 2019; 131:112-119. [PMID: 31145947 DOI: 10.1016/j.urology.2019.04.036] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 03/05/2019] [Accepted: 04/06/2019] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To understand cystoscopic surveillance practices among patients with low-risk non-muscle-invasive bladder cancer (NMIBC) within the Department of Veterans Affairs (VA). METHODS Using a validated natural language processing algorithm, we included patients newly diagnosed with low-risk (ie low-grade Ta) NMIBC from 2005 to 2011 in the VA. Patients were followed until cancer recurrence, death, last contact, or 2 years after diagnosis. Based on guidelines, surveillance overuse was defined as >1 cystoscopy if followed <1 year, >2 cystoscopies if followed 1 to <2 years, or >3 cystoscopies if followed for 2 years. We identified patient, provider, and facility factors associated with overuse using multilevel logistic regression. RESULTS Overuse occurred in 75% of patients (852/1135) - with an excess of 1846 more cystoscopies performed than recommended. Adjusting for 14 factors, overuse was associated with patient race (odds ratio [OR] 0.49, 95% confidence interval [CI]: 0.28, 0.85 unlisted race vs White), having 2 comorbidities (OR 1.60, 95% CI: 1.00, 2.55 vs no comorbidities), and earlier year of diagnosis (OR 2.50, 95% CI: 1.29, 4.83 for 2005 vs 2011, and OR 2.03, 95% CI: 1.11, 3.69 for 2006 vs 2011). On sensitivity analyses assuming all patients were diagnosed with multifocal or large low-grade tumors (ie, intermediate-risk), overuse would have still occurred in 45% of patients. CONCLUSION Overuse of cystoscopy among patients with low-risk NMIBC was common, raising concerns about bladder cancer surveillance cost and quality. However, few factors were associated with overuse. Further qualitative research is needed to identify other determinants of overuse not readily captured in administrative data.
Collapse
Affiliation(s)
- David S Han
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH
| | - Kristine E Lynch
- VA Salt Lake City Health Care System and the Division of Epidemiology, University of Utah, Salt Lake City, UT
| | - Ji Won Chang
- VA Salt Lake City Health Care System and the Division of Epidemiology, University of Utah, Salt Lake City, UT
| | - Brenda Sirovich
- The White River Junction VA Medical Center, White River Junction, VT
| | | | - Amanda R Swanton
- Section of Urology, Dartmouth-Hitchcock Medical Center, Lebanon, NH
| | - John D Seigne
- Section of Urology, Dartmouth-Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center, Lebanon, NH
| | - Philip P Goodney
- The White River Junction VA Medical Center, White River Junction, VT
| | - Florian R Schroeck
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, NH; The White River Junction VA Medical Center, White River Junction, VT; Section of Urology, Dartmouth-Hitchcock Medical Center, Lebanon, NH; Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center, Lebanon, NH.
| |
Collapse
|
19
|
Schroeck FR, Lynch KE, Li Z, MacKenzie TA, Han DS, Seigne JD, Robertson DJ, Sirovich B, Goodney PP. The impact of frequent cystoscopy on surgical care and cancer outcomes among patients with low-risk, non-muscle-invasive bladder cancer. Cancer 2019; 125:3147-3154. [PMID: 31120559 DOI: 10.1002/cncr.32185] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 03/21/2019] [Accepted: 04/29/2019] [Indexed: 01/23/2023]
Abstract
BACKGROUND Surveillance recommendations for patients with low-risk, non-muscle-invasive bladder cancer (NMIBC) are based on limited evidence. The objective of this study was to add to the evidence by assessing outcomes after frequent versus recommended cystoscopic surveillance. METHODS This was a retrospective cohort study of patients diagnosed with low-risk (low-grade Ta (AJCC)) NMIBC from 2005 to 2011 with follow-up through 2014 from the Department of Veterans Affairs. Patients were classified as having undergone frequent versus recommended cystoscopic surveillance (>3 vs 1-3 cystoscopies in the first 2 years after diagnosis). By using propensity score-adjusted models, the authors estimated the impact of frequent cystoscopy on the number of transurethral resections, the number of resections without cancer in the specimen, and the risk of progression to muscle-invasive cancer or bladder cancer death. RESULTS Among 1042 patients, 798 (77%) had more frequent cystoscopy than recommended. In adjusted analyses, the frequent cystoscopy group had twice as many transurethral resections (55 vs 26 per 100 person-years; P < .001) and more than 3 times as many resections without cancer in the specimen (5.7 vs 1.6 per 100 person-years; P < .001). Frequent cystoscopy was not associated with time to progression or bladder cancer death (3% at 5 years in both groups; P = .990). CONCLUSIONS Frequent cystoscopy among patients with low-risk NMIBC was associated with twice as many transurethral resections and did not decrease the risk for bladder cancer progression or death, supporting current guidelines.
Collapse
Affiliation(s)
- Florian R Schroeck
- Department of Veterans Affairs (VA) Outcomes Group, White River Junction VA Medical Center, White River Junction, Vermont.,Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire.,Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - Kristine E Lynch
- VA Salt Lake City Health Care System and Division of Epidemiology, University of Utah, Salt Lake City, Utah
| | - Zhongze Li
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - Todd A MacKenzie
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire.,Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - David S Han
- Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - John D Seigne
- Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire.,Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
| | - Douglas J Robertson
- Department of Veterans Affairs (VA) Outcomes Group, White River Junction VA Medical Center, White River Junction, Vermont.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - Brenda Sirovich
- Department of Veterans Affairs (VA) Outcomes Group, White River Junction VA Medical Center, White River Junction, Vermont.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - Philip P Goodney
- Department of Veterans Affairs (VA) Outcomes Group, White River Junction VA Medical Center, White River Junction, Vermont.,The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| |
Collapse
|
20
|
Graham LA. Databases for surgical health services research: Veterans Health Administration data. Surgery 2018; 165:876-878. [PMID: 30177251 DOI: 10.1016/j.surg.2018.07.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 07/25/2018] [Indexed: 11/20/2022]
Affiliation(s)
- Laura A Graham
- Center for Innovation to Implementation (Ci2i), VA Palo Alto Health Care System, CA; Stanford-Surgery Policy Improvement Research & Education (S-SPIRE), Stanford University, CA.
| |
Collapse
|
21
|
Schroeck FR, Lynch KE, Chang JW, MacKenzie TA, Seigne JD, Robertson DJ, Goodney PP, Sirovich B. Extent of Risk-Aligned Surveillance for Cancer Recurrence Among Patients With Early-Stage Bladder Cancer. JAMA Netw Open 2018; 1:e183442. [PMID: 30465041 PMCID: PMC6241521 DOI: 10.1001/jamanetworkopen.2018.3442] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 08/12/2018] [Indexed: 12/23/2022] Open
Abstract
IMPORTANCE Cancer care guidelines recommend aligning surveillance frequency with underlying cancer risk, ie, more frequent surveillance for patients at high vs low risk of cancer recurrence. OBJECTIVE To assess the extent to which such risk-aligned surveillance is practiced within US Department of Veterans Affairs facilities by classifying surveillance patterns for low- vs high-risk patients with early-stage bladder cancer. DESIGN SETTING AND PARTICIPANTS US national retrospective cohort study of a population-based sample of patients diagnosed with low-risk or high-risk early-stage bladder between January 1, 2005, and December 31, 2011, with follow-up through December 31, 2014. Analyses were performed March 2017 to April 2018. The study included all Veterans Affairs facilities (n = 85) where both low-and high-risk patients were treated. EXPOSURES Low-risk vs high-risk cancer status, based on definitions from the European Association of Urology risk stratification guidelines and on data extracted from diagnostic pathology reports via validated natural language processing algorithms. MAIN OUTCOMES AND MEASURES Adjusted cystoscopy frequency for low-risk and high-risk patients for each facility, estimated using multilevel modeling. RESULTS The study included 1278 low-risk and 2115 high-risk patients (median [interquartile range] age, 77 [71-82] years; 99% [3368 of 3393] male). Across facilities, the adjusted frequency of surveillance cystoscopy ranged from 3.7 to 6.2 (mean, 4.8) procedures over 2 years per patient for low-risk patients and from 4.6 to 6.0 (mean, 5.4) procedures over 2 years per patient for high-risk patients. In 70 of 85 facilities, surveillance was performed at a comparable frequency for low- and high-risk patients, differing by less than 1 cystoscopy over 2 years. Surveillance frequency among high-risk patients statistically significantly exceeded surveillance among low-risk patients at only 4 facilities. Across all facilities, surveillance frequencies for low- vs high-risk patients were moderately strongly correlated (r = 0.52; P < .001). CONCLUSIONS AND RELEVANCE Patients with early-stage bladder cancer undergo cystoscopic surveillance at comparable frequencies regardless of risk. This finding highlights the need to understand barriers to risk-aligned surveillance with the goal of making it easier for clinicians to deliver it in routine practice.
Collapse
Affiliation(s)
- Florian R. Schroeck
- Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
- Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
- White River Junction VA Medical Center, White River Junction, Vermont
| | - Kristine E. Lynch
- VA Salt Lake City Health Care System, Salt Lake City, Utah
- University of Utah, Salt Lake City
| | - Ji won Chang
- VA Salt Lake City Health Care System, Salt Lake City, Utah
- University of Utah, Salt Lake City
| | - Todd A. MacKenzie
- Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
| | - John D. Seigne
- Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
- Section of Urology, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire
| | - Douglas J. Robertson
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
- White River Junction VA Medical Center, White River Junction, Vermont
| | - Philip P. Goodney
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
- White River Junction VA Medical Center, White River Junction, Vermont
| | - Brenda Sirovich
- The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire
- White River Junction VA Medical Center, White River Junction, Vermont
| |
Collapse
|