1
|
Yang S, Yang X, Lyu T, Huang JL, Chen A, He X, Braithwaite D, Mehta HJ, Wu Y, Guo Y, Bian J. Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:463-477. [PMID: 39131104 PMCID: PMC11310180 DOI: 10.1007/s41666-024-00166-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 04/12/2024] [Accepted: 05/12/2024] [Indexed: 08/13/2024]
Abstract
Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best F1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best F1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best F1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall F1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-024-00166-5.
Collapse
Affiliation(s)
- Shuang Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Tianchen Lyu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - James L. Huang
- Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Xing He
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Dejana Braithwaite
- Departments of Surgery and Epidemiology, University of Florida, Gainesville, FL USA
| | - Hiren J. Mehta
- Division of Pulmonary, Critical Care, and Sleep Medicine, College of Medicine, University of Florida, Gainesville, FL USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA
| |
Collapse
|
2
|
Mojibian A, Jaskolka J, Ching G, Lee B, Myers R, Devine C, Nicolaou S, Parker W. The Efficacy of a Named Entity Recognition AI Model for Identifying Incidental Pulmonary Nodules in CT Reports. Can Assoc Radiol J 2024:8465371241266785. [PMID: 39066637 DOI: 10.1177/08465371241266785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024] Open
Abstract
Purpose: This study evaluates the efficacy of a commercial medical Named Entity Recognition (NER) model combined with a post-processing protocol in identifying incidental pulmonary nodules from CT reports. Methods: We analyzed 9165 anonymized CT reports and classified them into 3 categories: no nodules, nodules present, and nodules >6 mm. For each report, a generic medical NER model annotated entities and their relations, which were then filtered through inclusion/exclusion criteria selected to identify pulmonary nodules. Ground truth was established by manual review. To better understand the relationship between model performance and nodule prevalence, a subset of the data was programmatically balanced to equalize the number of reports in each class category. Results: In the unbalanced subset of the data, the model achieved a sensitivity of 97%, specificity of 99%, and accuracy of 99% in detecting pulmonary nodules mentioned in the reports. For nodules >6 mm, sensitivity was 95%, specificity was 100%, and accuracy was 100%. In the balanced subset of the data, sensitivity was 99%, specificity 96%, and accuracy 97% for nodule detection; for larger nodules, sensitivity was 94%, specificity 99%, and accuracy 98%. Conclusions: The NER model demonstrated high sensitivity and specificity in detecting pulmonary nodules reported in CT scans, including those >6 mm which are potentially clinically significant. The results were consistent across both unbalanced and balanced datasets indicating that the model performance is independent of nodule prevalence. Implementing this technology in hospital systems could automate the identification of at-risk patients, ensuring timely follow-up and potentially reducing missed or late-stage cancer diagnoses.
Collapse
Affiliation(s)
- Alireza Mojibian
- Sapien Machine Learning Corporation (SapienML), Vancouver, BC, Canada
| | - Jeff Jaskolka
- Radiology Department, Brampton Civic Hospital, Brampton, ON, Canada
- Faculty of Medicine - Medical Imaging, University of Toronto, Toronto, ON, Canada
| | - Geoffrey Ching
- Schulich School of Medicine & Dentistry - University of Western Ontario, London, On, Canada
| | - Brian Lee
- Sapien Machine Learning Corporation (SapienML), Vancouver, BC, Canada
| | - Renelle Myers
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
- BC Cancer Agency, Provincial Health Services Authority, Vancouver, BC, Canada
- Respirology, Vancouver General Hospital, Vancouver, BC, Canada
| | - Chloe Devine
- Sapien Machine Learning Corporation (SapienML), Vancouver, BC, Canada
| | - Savvas Nicolaou
- Sapien Machine Learning Corporation (SapienML), Vancouver, BC, Canada
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
- Radiology Department, Vancouver General Hospital, Vancouver, BC, Canada
| | - William Parker
- Sapien Machine Learning Corporation (SapienML), Vancouver, BC, Canada
- Radiology Department, Vancouver General Hospital, Vancouver, BC, Canada
- Radiology Department, Nanaimo Regional General Hospital, Nanaimo, BC, Canada
| |
Collapse
|
3
|
Li M, Zhuang L, Hu S, Sun L, Liu Y, Dou Z, Jiang T. Intelligent diagnosis of lung nodule images based on machine learning in the context of lung teaching. Medicine (Baltimore) 2024; 103:e37266. [PMID: 38457590 PMCID: PMC10919509 DOI: 10.1097/md.0000000000037266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 01/21/2024] [Accepted: 01/24/2024] [Indexed: 03/10/2024] Open
Abstract
The vast majority of intelligent diagnosis models have widespread problems, which seriously affect the medical staff judgment of patients' injuries. So depending on the situation, you need to use different algorithms, The study suggests a model for intelligent diagnosis of lung nodule images based on machine learning, and a support vector machine-based machine learning algorithm is selected. In order to improve the diagnostic accuracy of intelligent diagnosis of lung nodule images as well as the diagnostic model of lung nodule images. The objectives are broken down into algorithm determination and model construction, and the proposed optimized model is solved using machine learning techniques in order to achieve the original algorithm selected for intelligent diagnosis of lung nodule photos. The validation findings demonstrated that dimensionality reduction of the features produced 17 × 1120 and 17 × 2980 non-node matrices with 1216 nodes and 3407 non-nodes in 17 features. The support vector machine classification method has more benefits in terms of accuracy, sensitivity, and specificity when compared to other classification methods. Since there were some anomalies among both benign and malignant tumors and no discernible difference between them, the distribution of median values revealed that the data was symmetrical in terms of texture and gray scale. Non-small nodules can be identified from benign nodules, but more training is needed to separate them from the other 2 types. Pulmonary nodules are a common disease. MN are distinct from the other 2 types, non-small nodules and benign small nodules, which require further training to differentiate. This has great practical value in teaching practice. Therefore, building a machine learning-based intelligent diagnostic model for pulmonary nodules is of significant importance in helping to solve medical imaging diagnostic problems.
Collapse
Affiliation(s)
- Miaomiao Li
- Department of Respiratory and Critical Care Medicine, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| | - Lilei Zhuang
- Department of Gastroenterology, Yiwu Central Hospital, Yiwu, Zhejiang, People’s Republic of China
| | - Sheng Hu
- Department of Radiology, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| | - Li Sun
- Department of Respiratory and Critical Care Medicine, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| | - Yangxiang Liu
- Department of Respiratory and Critical Care Medicine, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| | - Zhengwei Dou
- Department of Respiratory and Critical Care Medicine, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| | - Tao Jiang
- Department of Respiratory and Critical Care Medicine, The Fourth Affiliated Hospital Zhejiang University School of Medicine, Yiwu, Zhejiang, People’s Republic of China
| |
Collapse
|
4
|
Nobel JM, Puts S, Krdzalic J, Zegers KML, Lobbes MBI, F Robben SG, Dekker ALAJ. Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use". JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:3-12. [PMID: 38343237 DOI: 10.1007/s10278-023-00913-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 08/26/2023] [Accepted: 09/03/2023] [Indexed: 03/02/2024]
Abstract
Natural language processing (NLP) can be used to process and structure free text, such as (free text) radiological reports. In radiology, it is important that reports are complete and accurate for clinical staging of, for instance, pulmonary oncology. A computed tomography (CT) or positron emission tomography (PET)-CT scan is of great importance in tumor staging, and NLP may be of additional value to the radiological report when used in the staging process as it may be able to extract the T and N stage of the 8th tumor-node-metastasis (TNM) classification system. The purpose of this study is to evaluate a new TN algorithm (TN-PET-CT) by adding a layer of metabolic activity to an already existing rule-based NLP algorithm (TN-CT). This new TN-PET-CT algorithm is capable of staging chest CT examinations as well as PET-CT scans. The study design made it possible to perform a subgroup analysis to test the external validation of the prior TN-CT algorithm. For information extraction and matching, pyContextNLP, SpaCy, and regular expressions were used. Overall TN accuracy score of the TN-PET-CT algorithm was 0.73 and 0.62 in the training and validation set (N = 63, N = 100). The external validation of the TN-CT classifier (N = 65) was 0.72. Overall, it is possible to adjust the TN-CT algorithm into a TN-PET-CT algorithm. However, outcomes highly depend on the accuracy of the report, the used vocabulary, and its context to express, for example, uncertainty. This is true for both the adjusted PET-CT algorithm and for the CT algorithm when applied in another hospital.
Collapse
Affiliation(s)
- J Martijn Nobel
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands.
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands.
| | - Sander Puts
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Jasenko Krdzalic
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Karen M L Zegers
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Marc B I Lobbes
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Simon G F Robben
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands
| | - André L A J Dekker
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
5
|
Jin Y, Kattan MW. Methodologic Issues Specific to Prediction Model Development and Evaluation. Chest 2023; 164:1281-1289. [PMID: 37414333 DOI: 10.1016/j.chest.2023.06.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Developing and evaluating statistical prediction models is challenging, and many pitfalls can arise. This article identifies what the authors believe are some common methodologic concerns that may be encountered. We describe each problem and make suggestions regarding how to address them. The hope is that this article will result in higher-quality publications of statistical prediction models.
Collapse
Affiliation(s)
- Yuxuan Jin
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Michael W Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH.
| |
Collapse
|
6
|
Basilio R, Carvalho AR, Rodrigues R, Conrado M, Accorsi S, Forghani R, Machuca T, Zanon M, Altmayer S, Hochhegger B. Natural Language Processing for the Identification of Incidental Lung Nodules in Computed Tomography Reports: A Quality Control Tool. JCO Glob Oncol 2023; 9:e2300191. [PMID: 37769221 PMCID: PMC10581645 DOI: 10.1200/go.23.00191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/09/2023] [Accepted: 08/22/2023] [Indexed: 09/30/2023] Open
Abstract
PURPOSE To evaluate the diagnostic performance of a natural language processing (NLP) model in detecting incidental lung nodules (ILNs) in unstructured chest computed tomography (CT) reports. METHODS All unstructured consecutive reports of chest CT scans performed at a tertiary hospital between 2020 and 2021 were retrospectively reviewed (n = 21,542) to train the NLP tool. Internal validation was performed using reference readings by two radiologists of both CT scans and reports, using a different external cohort of 300 chest CT scans. Second, external validation was performed in a cohort of all random unstructured chest CT reports from 57 different hospitals conducted in May 2022. A review by the same thoracic radiologists was used as the gold standard. The sensitivity, specificity, and accuracy were calculated. RESULTS Of 21,542 CT reports, 484 mentioned at least one ILN (mean age, 71 ± 17.6 [standard deviation] years; women, 52%) and were included in the training set. In the internal validation (n = 300), the NLP tool detected ILN with a sensitivity of 100.0% (95% CI, 97.6 to 100.0), a specificity of 95.9% (95% CI, 91.3 to 98.5), and an accuracy of 98.0% (95% CI, 95.7 to 99.3). In the external validation (n = 977), the NLP tool yielded a sensitivity of 98.4% (95% CI, 94.5 to 99.8), a specificity of 98.6% (95% CI, 97.5 to 99.3), and an accuracy of 98.6% (95% CI, 97.6 to 99.2). Twelve months after the initial reports, 8 (8.60%) patients had a final diagnosis of lung cancer, among which 2 (2.15%) would have been lost to follow-up without the NLP tool. CONCLUSION NLP can be used to identify ILNs in unstructured reports with high accuracy, allowing a timely recall of patients and a potential diagnosis of early-stage lung cancer that might have been lost to follow-up.
Collapse
Affiliation(s)
- Rodrigo Basilio
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | | | - Rosana Rodrigues
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | - Marco Conrado
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | - Sephania Accorsi
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | - Reza Forghani
- Radiomics and Augmented Intelligence Laboratory (RAIL), University of Florida, Gainesville, FL
| | - Tiago Machuca
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | - Matheus Zanon
- Federal University of Health Sciences of Porto Alegre, Porto Alegre, Brazil
| | - Stephan Altmayer
- Stanford Hospital, Stanford University Medical Center, Palo Alto, CA
| | - Bruno Hochhegger
- Radiomics and Augmented Intelligence Laboratory (RAIL), University of Florida, Gainesville, FL
- Federal University of Health Sciences of Porto Alegre, Porto Alegre, Brazil
| |
Collapse
|
7
|
Lee K, Liu Z, Chandran U, Kalsekar I, Laxmanan B, Higashi MK, Jun T, Ma M, Li M, Mai Y, Gilman C, Wang T, Ai L, Aggarwal P, Pan Q, Oh W, Stolovitzky G, Schadt E, Wang X. Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing. JMIR AI 2023; 2:e44537. [PMID: 38875565 PMCID: PMC11041451 DOI: 10.2196/44537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/30/2023] [Accepted: 03/31/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes. OBJECTIVE We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes. METHODS We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time. RESULTS Our NLP pipeline built on the GGO ontology we developed achieved between 95% and 100% precision, 89% and 100% recall, and 92% and 100% F1-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes. CONCLUSIONS Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.
Collapse
Affiliation(s)
| | | | - Urmila Chandran
- Lung Cancer Initiative, Johnson & Johnson, New Brunswick, NJ, United States
| | - Iftekhar Kalsekar
- Lung Cancer Initiative, Johnson & Johnson, New Brunswick, NJ, United States
| | - Balaji Laxmanan
- Lung Cancer Initiative, Johnson & Johnson, New Brunswick, NJ, United States
| | | | - Tomi Jun
- Sema4, Stamford, CT, United States
| | - Meng Ma
- Sema4, Stamford, CT, United States
| | | | - Yun Mai
- Sema4, Stamford, CT, United States
| | | | | | - Lei Ai
- Sema4, Stamford, CT, United States
| | | | - Qi Pan
- Sema4, Stamford, CT, United States
| | - William Oh
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Eric Schadt
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | |
Collapse
|
8
|
Bobba PS, Sailer A, Pruneski JA, Beck S, Mozayan A, Mozayan S, Arango J, Cohan A, Chheang S. Natural language processing in radiology: Clinical applications and future directions. Clin Imaging 2023; 97:55-61. [PMID: 36889116 DOI: 10.1016/j.clinimag.2023.02.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/10/2023] [Accepted: 02/20/2023] [Indexed: 03/07/2023]
Abstract
Natural language processing (NLP) is a wide range of techniques that allows computers to interact with human text. Applications of NLP in everyday life include language translation aids, chat bots, and text prediction. It has been increasingly utilized in the medical field with increased reliance on electronic health records. As findings in radiology are primarily communicated via text, the field is particularly suited to benefit from NLP based applications. Furthermore, rapidly increasing imaging volume will continue to increase burden on clinicians, emphasizing the need for improvements in workflow. In this article, we highlight the numerous non-clinical, provider focused, and patient focused applications of NLP in radiology. We also comment on challenges associated with development and incorporation of NLP based applications in radiology as well as potential future directions.
Collapse
Affiliation(s)
- Pratheek S Bobba
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | - Anne Sailer
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | | | - Spencer Beck
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | - Ali Mozayan
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | - Sara Mozayan
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | - Jennifer Arango
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States
| | - Arman Cohan
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Sophie Chheang
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, United States.
| |
Collapse
|
9
|
Zhang Y, Grant BMM, Hope AJ, Hung RJ, Warkentin MT, Lam ACL, Aggawal R, Xu M, Shepherd FA, Tsao MS, Xu W, Pakkal M, Liu G, McInnis MC. Using Recurrent Neural Networks to Extract High-Quality Information From Lung Cancer Screening Computerized Tomography Reports for Inter-Radiologist Audit and Feedback Quality Improvement. JCO Clin Cancer Inform 2023; 7:e2200153. [PMID: 36930839 DOI: 10.1200/cci.22.00153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
PURPOSE Lung cancer screening programs generate a high volume of low-dose computed tomography (LDCT) reports that contain valuable information, typically in a free-text format. High-performance named-entity recognition (NER) models can extract relevant information from these reports automatically for inter-radiologist quality control. METHODS Using LDCT report data from a longitudinal lung cancer screening program (8,305 reports; 3,124 participants; 2006-2019), we trained a rule-based model and two bidirectional long short-term memory (Bi-LSTM) NER neural network models to detect clinically relevant information from LDCT reports. Model performance was tested using F1 scores and compared with a published open-source radiology NER model (Stanza) in an independent evaluation set of 150 reports. The top performing model was applied to a data set of 6,948 reports for an inter-radiologist quality control assessment. RESULTS The best performing model, a Bi-LSTM NER recurrent neural network model, had an overall F1 score of 0.950, which outperformed Stanza (F1 score = 0.872) and a rule-based NER model (F1 score = 0.809). Recall (sensitivity) for the best Bi-LSTM model ranged from 0.916 to 0.991 for different entity types; precision (positive predictive value) ranged from 0.892 to 0.997. Test performance remained stable across time periods. There was an average of a 2.86-fold difference in the number of identified entities between the most and the least detailed radiologists. CONCLUSION We built an open-source Bi-LSTM NER model that outperformed other open-source or rule-based radiology NER models. This model can efficiently extract clinically relevant information from lung cancer screening computerized tomography reports with high accuracy, enabling efficient audit and feedback to improve quality of patient care.
Collapse
Affiliation(s)
- Yucheng Zhang
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Benjamin M M Grant
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Andrew J Hope
- Radiation Medicine Program, Princess Margaret Cancer Centre, and Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
| | - Rayjean J Hung
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health Systems, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Matthew T Warkentin
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health Systems, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Andrew C L Lam
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Reenika Aggawal
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Maria Xu
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Frances A Shepherd
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Ming-Sound Tsao
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Laboratory Medicine and Pathology, University Health Network, Toronto, ON, Canada
| | - Wei Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Biostatistics, Princess Margaret Cancer Centre, Toronto, ON, Canada
- Computational Biology and Medicine Program, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Mini Pakkal
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Cardiothoracic Imaging, Joint Department of Medical Imaging, Toronto General Hospital, Toronto, ON, Canada
| | - Geoffrey Liu
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Biostatistics, Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Micheal C McInnis
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Cardiothoracic Imaging, Joint Department of Medical Imaging, Toronto General Hospital, Toronto, ON, Canada
| |
Collapse
|
10
|
Wayne MT, Prescott HC, Arenberg DA. Prevalence and consequences of non-adherence to an evidence-based approach for incidental pulmonary nodules. PLoS One 2022; 17:e0274107. [PMID: 36084105 PMCID: PMC9462825 DOI: 10.1371/journal.pone.0274107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 08/22/2022] [Indexed: 11/18/2022] Open
Abstract
Importance Distinguishing benign from malignant pulmonary nodules is challenging. Evidence-based guidelines exist, but their impact on patient-centered outcomes is unknown. Objective To understand if the evaluation of incidental pulmonary nodules that follows an evidence-based management strategy is associated with fewer invasive procedures for benign lesions and/or fewer delays in cancer diagnosis. Design Retrospective cohort study. Setting Large academic medical center. Participants Adults (≥18 years age) with an incidental pulmonary nodule discovered between January 2012 and December 2014. Patients with calcified nodules, prior nodules, prior diagnosis of cancer, high suspicion for pulmonary metastasis, or limited life expectancy were excluded. Exposure Nodule management strategy (pre-specified based on evidence-based practices). Outcome Composite of any invasive procedure for a benign nodule or delay in diagnosis in patients with cancer (>3 month delay once probability of cancer was >15%). Results Of 314 patients that met inclusion criteria, median age was 61, 46.5% were men, and 66.5% had current or former tobacco use. The mean nodule size was 10.3 mm, mean probability of cancer was 11.8%, and 14.3% of nodules were malignant. Evaluation followed an evidence-based strategy in 245 patients (78.0%), and deviated in 69 patients (22%). The composite outcome occurred in 26 (8.3%) patients. Among patients whose nodule evaluation was concordant with an evidence-based evaluation, 6.1% (15/245) experienced the composite outcome versus 15.9% (11/69) of patients with an evaluation that deviated from evidence-based recommendations (P<0.01). Conclusions and relevance At a large academic medical center, more than 1 in 5 patients with an incidental pulmonary nodule underwent evaluation that deviated from evidence-based practice recommendations. Nodule evaluation that deviated from an evidence-based strategy was associated with biopsy of benign lesions and delays in cancer diagnosis, suggesting a need to improve guideline uptake.
Collapse
Affiliation(s)
- Max T. Wayne
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States of America
- * E-mail:
| | - Hallie C. Prescott
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States of America
- VA Center for Clinical Management Research, Ann Arbor, MI, United States of America
| | - Douglas A. Arenberg
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
11
|
Farjah F, Monsell SE, Smith-Bindman R, Gould MK, Banegas MP, Ramaprasan A, Schoen K, Buist DSM, Greenlee R. Fleischner Society Guideline Recommendations for Incidentally Detected Pulmonary Nodules and the Probability of Lung Cancer. J Am Coll Radiol 2022; 19:1226-1235. [PMID: 36049538 DOI: 10.1016/j.jacr.2022.06.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/24/2022] [Accepted: 06/03/2022] [Indexed: 11/18/2022]
Abstract
PURPOSE The Fleischner Society aims to limit further evaluations of incidentally detected pulmonary nodules when the probability of lung cancer is <1% and to pursue further evaluations when the probability of lung cancer is ≥1%. To evaluate the internal consistency of guideline goals and recommendations, the authors evaluated stratum-specific recommendations and 2-year probabilities of lung cancer. METHODS A retrospective cohort study (2005-2015) was conducted of individuals enrolled in one of two integrated health systems with solid nodules incidentally detected on CT. The 2017 Fleischner Society guidelines were used to define strata on the basis of smoking status and nodule size and number. Lung cancer diagnoses within 2 years of nodule detection were ascertained using cancer registry data. Confidence interval (CI) inspection was used to determine if stratum-specific probabilities of lung cancer were different than 1%. RESULTS Among 5,444 individuals with incidentally detected lung nodule (median age, 66 years; 54% women; 57% smoked; median nodule size, 5.5 mm; 55% with multiple nodules) 214 (3.9%; 95% CI, 3.4%-4.5%) were diagnosed with lung cancer within 2 years. For 7 of 12 strata (58%), 2,765 patients (51%), and 194 lung cancer cases (91%), there was alignment between Fleischner Society goals and recommendations. Alignment was indeterminate for 5 strata (42%), 2,679 patients (49%), and 20 lung cancer cases (9%) because CIs for the probability of lung cancer spanned 1%. CONCLUSIONS Fleischner Society guideline goals and recommendations align at least half the time. It is uncertain whether alignment of guideline goals and recommendations occurs more often.
Collapse
Affiliation(s)
- Farhood Farjah
- Department of Surgery, University of Washington, Seattle, Washington.
| | - Sarah E Monsell
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Rebecca Smith-Bindman
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California
| | - Michael K Gould
- Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, California
| | - Matthew P Banegas
- Department of Radiation Medicine and Applied Sciences, University of San Diego, San Diego, California
| | - Arvind Ramaprasan
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington
| | - Kurt Schoen
- Marshfield Clinic Research Institute, Marshfield, Wisconsin
| | - Diana S M Buist
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington
| | | |
Collapse
|
12
|
Zhang D, Neely B, Lo JY, Patel BN, Hyslop T, Gupta RT. Utility of a Rule-Based Algorithm in the Assessment of Standardized Reporting in PI-RADS. Acad Radiol 2022; 30:1141-1147. [PMID: 35909050 DOI: 10.1016/j.acra.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 06/15/2022] [Accepted: 06/28/2022] [Indexed: 11/26/2022]
Abstract
RATIONALE AND OBJECTIVES Adoption of the Prostate Imaging Reporting & Data System (PI-RADS) has been shown to increase detection of clinically significant prostate cancer on prostate mpMRI. We propose that a rule-based algorithm based on Regular Expression (RegEx) matching can be used to automatically categorize prostate mpMRI reports into categories as a means by which to assess for opportunities for quality improvement. MATERIALS AND METHODS All prostate mpMRIs performed in the Duke University Health System from January 2, 2015, to January 29, 2021, were analyzed. Exclusion criteria were applied, for a total of 5343 male patients and 6264 prostate mpMRI reports. These reports were then analyzed by our RegEx algorithm to be categorized as PI-RADS 1 through PI-RADS 5, Recurrent Disease, or "No Information Available." A stratified, random sample of 502 mpMRI reports was reviewed by a blinded clinical team to assess performance of the RegEx algorithm. RESULTS Compared to manual review, the RegEx algorithm achieved overall accuracy of 92.6%, average precision of 88.8%, average recall of 85.6%, and F1 score of 0.871. The clinical team also reviewed 344 cases that were classified as "No Information Available," and found that in 150 instances, no numerical PI-RADS score for any lesion was included in the impression section of the mpMRI report. CONCLUSION Rule-based processing is an accurate method for the large-scale, automated extraction of PI-RADS scores from the text of radiology reports. These natural language processing approaches can be used for future initiatives in quality improvement in prostate mpMRI reporting with PI-RADS.
Collapse
|
13
|
Hu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inform 2022; 10:e35475. [PMID: 35468085 PMCID: PMC9086872 DOI: 10.2196/35475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/31/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open
Abstract
Background Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non–small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician’s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician’s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician’s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Huanyao Zhang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
14
|
Donnelly LF, Grzeszczuk R, Guimaraes CV. Use of Natural Language Processing (NLP) in Evaluation of Radiology Reports: An Update on Applications and Technology Advances. Semin Ultrasound CT MR 2022; 43:176-181. [PMID: 35339258 DOI: 10.1053/j.sult.2022.02.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Natural language processing (NLP) is focused on the computer interpretation of human language and can be used to evaluate radiology reports and has demonstrated useful applications in essentially all aspects of medical imaging delivery: interpretation of imaging data, improving image acquisition, image analysis, and increasing efficiency of imaging services. This manuscript reviews general technologic approaches to NLP at a level hopefully understandable by clinical radiologists, discusses recent advancements in NLP techniques, and discusses current and potential applications of NLP in radiology.
Collapse
Affiliation(s)
- Lane F Donnelly
- University of North Carolina, School of Medicine, Department of Radiology, Chapel Hill, NC; Stanford University, School of Medicine, Department of Radiology, Palo Alto, CA; Stanford University, School of Medicine, Department of Pediatrics, Palo Alto, CA.
| | | | - Carolina V Guimaraes
- University of North Carolina, School of Medicine, Department of Radiology, Chapel Hill, NC; Stanford University, School of Medicine, Department of Radiology, Palo Alto, CA
| |
Collapse
|
15
|
Zhuang W, Tang Y, Xu W, Huang S, Deng C, Chen R, Zhang D, Zeng C, Tian D, Ben X, Lan Z, Wu H, Gao Z, Wang M, Chen Y, Shi Q, Qiao G. Should psychological distress be listed as a surgical indication for indeterminate pulmonary nodules: protocol for a prospective cohort study in real-world settings. J Thorac Dis 2022; 14:769-778. [PMID: 35399240 PMCID: PMC8987829 DOI: 10.21037/jtd-21-1423] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/14/2022] [Indexed: 02/05/2023]
Abstract
BACKGROUND Pulmonary nodules (PNs) are documented in up to 30% of computed tomography (CT) reports. PNs of indeterminate nature (IPN) have been reported to be associated with increased psychological distress and deterioration of the quality of life. Despite lack of solid evidence, severe anxiety or depression has been proposed to be one of the surgical indications in expert consensus for IPN management. So far, there is no established criterion to guide the decision-making process, or to ensure evidence-based management. This study aims to evaluate whether psychological distress could be a surgical indication for IPN, and to establish an evidence-based distress threshold for necessary surgical intervention. METHODS This prospective observational study in real-world setting will involve an expected sample size of 1,253 IPN patients from the thoracic clinic of Guangdong Provincial People's Hospital. Web-based questionnaires powered by Wen Juan Xing (WJX) platform will be delivered to the patients for baseline data collection and psychological screening. Based on our pilot study, a total of 376 IPN patients with abnormal or borderline abnormal psychological states, as assessed by the Hospital Anxiety and Depression Scale (HADS), will be followed for 1 year before proceeding to the final analysis. The planned study period is from Jan 1, 2021, to Sept 30, 2022, and will entail two HADS assessments at baseline and follow-up. Sleep quality and indicators of healthcare-seeking behavior, such as the number of unplanned clinic visits or CT scans per year, will be used as anchors of psychological state. Patients who undergo surgical resection against the follow-up plan will be enrolled into a surgical group (expected n=94), while those who adhere to their plan will be automatically classified as a follow-up group after 1-year follow-up (expected n=282). Statistical measures such as independent-samples t-test and receiver operating characteristics (ROC) analysis will be used to assess the difference in psychological changes between the groups, and to generate an optimal threshold alerting surgical need. A Chi-square test or nonparametric test will be used to compare the baseline characteristics. Contributors to psychological burden and their effect sizes will be evaluated using general linear regression. DISCUSSION To date, data on the psychological benefits of surgical resection of IPN remains scanty. Evidence-based procedure of patient selection using appropriate psychological screening tools is crucial in improving the quality of care and preventing overtreatment. This protocol describes the rationale and methodology to address this unmet clinical need using real-world data, aiming to bridge the gap between clinical guidelines and real-world practice. TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT04857333. Registered April 23, 2021.
Collapse
Affiliation(s)
- Weitao Zhuang
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yong Tang
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Wei Xu
- School of Public Health and Management, Chongqing Medical University, Chongqing, China
| | - Shujie Huang
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Cheng Deng
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Rixin Chen
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Research Center of Medical Sciences, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Dongkun Zhang
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Ceng Zeng
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
| | - Dan Tian
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Xiaosong Ben
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zihua Lan
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Hansheng Wu
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- Department of Thoracic Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Zhen Gao
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
| | - Mengdie Wang
- School of Medicine, South China University of Technology, Guangzhou, China
| | - Yali Chen
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Qiuling Shi
- School of Public Health and Management, Chongqing Medical University, Chongqing, China
| | - Guibin Qiao
- Department of Thoracic Surgery, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| |
Collapse
|
16
|
Li YH, Lee IT, Chen YW, Lin YK, Liu YH, Lai FP. Using Text Content From Coronary Catheterization Reports to Predict 5-Year Mortality Among Patients Undergoing Coronary Angiography: A Deep Learning Approach. Front Cardiovasc Med 2022; 9:800864. [PMID: 35295250 PMCID: PMC8918537 DOI: 10.3389/fcvm.2022.800864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 01/24/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundCurrent predictive models for patients undergoing coronary angiography have complex parameters which limit their clinical application. Coronary catheterization reports that describe coronary lesions and the corresponding interventions provide information of the severity of the coronary artery disease and the completeness of the revascularization. This information is relevant for predicting patient prognosis. However, no predictive model has been constructed using the text content from coronary catheterization reports before.ObjectiveTo develop a deep learning model using text content from coronary catheterization reports to predict 5-year all-cause mortality and 5-year cardiovascular mortality for patients undergoing coronary angiography and to compare the performance of the model to the established clinical scores.MethodThis retrospective cohort study was conducted between January 1, 2006, and December 31, 2015. Patients admitted for coronary angiography were enrolled and followed up until August 2019. The main outcomes were 5-year all-cause mortality and 5-year cardiovascular mortality. In total, 11,576 coronary catheterization reports were collected. BioBERT (bidirectional encoder representations from transformers for biomedical text mining), which is a BERT-based model in the biomedical domain, was utilized to construct the model. The area under the receiver operating characteristic curve (AUC) was used to assess model performance. We also compared our results to the residual SYNTAX (SYNergy between PCI with TAXUS and Cardiac Surgery) score.ResultsThe dataset was divided into the training (60%), validation (20%), and test (20%) sets. The mean age of the patients in each dataset was 65.5 ± 12.1, 65.4 ± 11.2, and 65.6 ± 11.2 years, respectively. A total of 1,411 (12.2%) patients died, and 664 (5.8%) patients died of cardiovascular causes within 5 years after coronary angiography. The best of our models had an AUC of 0.822 (95% CI, 0.790–0.855) for 5-year all-cause mortality, and an AUC of 0.858 (95% CI, 0.816–0.900) for 5-year cardiovascular mortality. We randomly selected 300 patients who underwent percutaneous coronary intervention (PCI), and our model outperformed the residual SYNTAX score in predicting 5-year all-cause mortality (AUC, 0.867 [95% CI, 0.813–0.921] vs. 0.590 [95% CI, 0.503–0.684]) and 5-year cardiovascular mortality (AUC, 0.880 [95% CI, 0.873–0.925] vs. 0.649 [95% CI, 0.535–0.764]), respectively, after PCI among these patients.ConclusionsWe developed a predictive model using text content from coronary catheterization reports to predict the 5-year mortality in patients undergoing coronary angiography. Since interventional cardiologists routinely write reports after procedures, our model can be easily implemented into the clinical setting.
Collapse
Affiliation(s)
- Yu-Hsuan Li
- Department of Computer Science & Information Engineering, National Taiwan University, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
| | - I-Te Lee
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
- School of Medicine, National Yang-Ming University, Taipei, Taiwan
- School of Medicine, Chung Shan Medical University, Taichung, Taiwan
| | - Yu-Wei Chen
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Yow-Kuan Lin
- Department of Computer Science, Columbia University, New York, NY, United States
| | - Yu-Hsin Liu
- Department of Computer Science, Columbia University, New York, NY, United States
| | - Fei-Pei Lai
- Department of Computer Science & Information Engineering, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
- Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
- *Correspondence: Fei-Pei Lai
| |
Collapse
|
17
|
Vachani A, Zheng C, Amy Liu IL, Huang BZ, Osuji TA, Gould MK. The Probability of Lung Cancer in Patients With Incidentally Detected Pulmonary Nodules: Clinical Characteristics and Accuracy of Prediction Models. Chest 2021; 161:562-571. [PMID: 34364866 DOI: 10.1016/j.chest.2021.07.2168] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/18/2021] [Accepted: 07/28/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The frequency of cancer and accuracy of prediction models have not been studied in large, population-based samples of patients with incidental pulmonary nodules measuring > 8 mm in diameter. RESEARCH QUESTIONS How does the frequency of cancer vary by size and smoking history among patients with incidental nodules? How accurate are two widely used models for identifying cancer in these patients? STUDY DESIGN AND METHODS We assembled a retrospective cohort of individuals with incidental nodules measuring > 8 mm in diameter identified by chest CT imaging between 2006 and 2016. We used a validated natural language processing algorithm to identify nodules and their characteristics by scanning the text of dictated radiology reports. We reported patient and nodule characteristics stratified by the presence or absence of a lung cancer diagnosis within 27 months of nodule identification and estimated the area under the receiver operating characteristic curve (AUC) to compare the accuracy of the Mayo Clinic and Brock models for identifying cancer. RESULTS The sample included 23,780 individuals with a nodule measuring > 8 mm, including 2,356 patients (9.9%) with a lung cancer diagnosis within 27 months of nodule identification. Cancer was diagnosed in 5.4% of never smokers, 12.2% of former smokers, and 17.7% of current smokers. Cancer was diagnosed in 5.7% of patients with nodules measuring 9 to 15 mm, 12.1% of patients with nodules > 15 to 20 mm, and 18.4% of patients with nodules > 20 to 30 mm. In the full sample, the Mayo Clinic model (AUC, 0.747; 95% CI, 0.737-0.757) was more accurate than the Brock model (AUC, 0.713; 95% CI, 0.702-0.724; P < .0001). When restricted to ever smokers, the Mayo Clinic model was still more accurate. Both models overestimated the probability of cancer. INTERPRETATION Almost 10% of patients with an incidental pulmonary nodule measuring > 8 mm in diameter will receive a lung cancer diagnosis. Existing prediction models have only fair accuracy and overestimate the probability of cancer.
Collapse
Affiliation(s)
- Anil Vachani
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA
| | - Chengyi Zheng
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - In-Lu Amy Liu
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Brian Z Huang
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Thearis A Osuji
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Michael K Gould
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA; Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA.
| |
Collapse
|