Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bae JH, Han HW, Yang SY, Song G, Sa S, Chung GE, Seo JY, Jin EH, Kim H, An D. Development of a Natural Language Processing System for Assessing Quality Indicators from Free-Text Colonoscopy and Pathology Reports: Methodology Development and Applications (Preprint). JMIR Med Inform 2021;10:e35257. [PMID: 35436226 PMCID: PMC9055472 DOI: 10.2196/35257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 02/13/2022] [Accepted: 02/25/2022] [Indexed: 12/25/2022] Open

For:	Bae JH, Han HW, Yang SY, Song G, Sa S, Chung GE, Seo JY, Jin EH, Kim H, An D. Development of a Natural Language Processing System for Assessing Quality Indicators from Free-Text Colonoscopy and Pathology Reports: Methodology Development and Applications (Preprint). JMIR Med Inform 2021;10:e35257. [PMID: 35436226 PMCID: PMC9055472 DOI: 10.2196/35257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 02/13/2022] [Accepted: 02/25/2022] [Indexed: 12/25/2022] Open

Number

Cited by Other Article(s)

Omar M, Levkovich I. Exploring the efficacy and potential of large language models for depression: A systematic review. J Affect Disord 2025;371:234-244. [PMID: 39581383 DOI: 10.1016/j.jad.2024.11.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/21/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]

Lindsay SE, Madison CJ, Ramsey DC, Doung YC, Gundle KR. De Novo Natural Language Processing Algorithm Accurately Identifies Myxofibrosarcoma From Pathology Reports. Clin Orthop Relat Res 2025;483:80-87. [PMID: 39360774 DOI: 10.1097/corr.0000000000003270] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 09/13/2024] [Indexed: 01/04/2025]

Abstract

BACKGROUND

Available codes in the ICD-10 do not accurately reflect soft tissue sarcoma diagnoses, and this can result in an underrepresentation of soft tissue sarcoma in databases. The National VA Database provides a unique opportunity for soft tissue sarcoma investigation because of the availability of all clinical results and pathology reports. In the setting of soft tissue sarcoma, natural language processing (NLP) has the potential to be applied to clinical documents such as pathology reports to identify soft tissue sarcoma independent of ICD codes, allowing sarcoma researchers to build more comprehensive databases capable of answering a myriad of research questions.

QUESTIONS/PURPOSES

(1) What proportion of patients with myxofibrosarcoma within the National VA Database would be missed by searching only by soft tissue sarcoma ICD codes? (2) Is a de novo NLP algorithm capable of analyzing pathology reports to accurately identify patients with myxofibrosarcoma?

METHODS

All pathology reports (10.7 million) in the national VA corporate data warehouse were identified from 2003 to 2022. Using the word-search functionality, reports from 403 veterans were found to contain the term "myxofibrosarcoma." The resulting pathology reports were manually reviewed to develop a gold-standard cohort that contained only those veterans with pathologist-confirmed myxofibrosarcoma diagnoses. The cohort had a mean ± SD age of 70 ± 12 years, and 96% (287 of 300) were men. Diagnosis codes were abstracted, and differences in appropriate ICD coding were compared. An NLP algorithm was iteratively refined and tested using confounders, negation, and emphasis terms for myxofibrosarcoma. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the NLP-generated cohorts through comparison with the manually reviewed gold-standard cohorts.

RESULTS

The records of 27% (81 of 300) of myxofibrosarcoma patients within the VA database were missing a sarcoma ICD code. A de novo NLP algorithm more accurately (92% [276 of 300]) identified patients with myxofibrosarcoma compared with ICD codes (73% [219 of 300]) or basic word searches (74% [300 of 403]) (p < 0.001). Three final algorithm models were generated with accuracies ranging from 92% to 100%.

CONCLUSION

An NLP algorithm can identify patients with myxofibrosarcoma from pathology reports with high accuracy, which is an improvement over ICD-based cohort creation and simple word search. This algorithm is freely available on GitHub ( https://github.com/sarcoma-shark/myxofibrosarcoma-shark ) and is available to facilitate external validation and improvement through testing in other cohorts.

LEVEL OF EVIDENCE

Level II, diagnostic study.

Collapse

Gadi SR, Muralidharan SS, Glissen Brown JR. Colonoscopy Quality, Innovation, and the Assessment of New Technology. TECHNIQUES AND INNOVATIONS IN GASTROINTESTINAL ENDOSCOPY 2024;26:177-192. [DOI: 10.1016/j.tige.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]

Guével E, Priou S, Flicoteaux R, Lamé G, Bey R, Tannier X, Cohen A, Chatellier G, Daniel C, Tournigand C, Kempf E. Development of a natural language processing model for deriving breast cancer quality indicators : A cross-sectional, multicenter study. Rev Epidemiol Sante Publique 2023;71:102189. [PMID: 37972522 DOI: 10.1016/j.respe.2023.102189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open

Abstract

OBJECTIVES

Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information.

METHOD

We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique - Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score.

RESULTS

Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0-71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators. The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%-93.3%]), an average precision of 77.7% [10.0%-97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators.

DISCUSSION

The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated.

CONCLUSIONS

The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.

Collapse

Sabrie N, Khan R, Jogendran R, Scaffidi M, Bansal R, Gimpaya N, Youssef M, Forbes N, Mosko JD, Berzin TM, Lightfoot D, Grover SC. Performance of natural language processing in identifying adenomas from colonoscopy reports: a systematic review and meta-analysis. IGIE 2023;2:350-356.e7. [DOI: 10.1016/j.igie.2023.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]

Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023;23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open

Abstract

BACKGROUND

Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information.

METHODS

This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model.

RESULTS

The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers.

CONCLUSIONS

This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.

Collapse

Rex DK. Key quality indicators in colonoscopy. Gastroenterol Rep (Oxf) 2023;11:goad009. [PMID: 36911141 PMCID: PMC10005623 DOI: 10.1093/gastro/goad009] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 01/05/2023] [Indexed: 03/12/2023] Open

Benson R, Winterton C, Winn M, Krick B, Liu M, Abu-el-rub N, Conway M, Del Fiol G, Gawron A, Hardikar S. Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study. JCO Clin Cancer Inform 2023;7:e2200131. [PMID: 36753686 PMCID: PMC10166420 DOI: 10.1200/cci.22.00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2022] [Indexed: 02/10/2023] Open

Abstract

PURPOSE

Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set.

METHODS

We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data.

RESULTS

Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm.

CONCLUSION

Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.

Collapse

Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022;10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework.

OBJECTIVE

This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes.

METHODS

In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph.

RESULTS

For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows.

CONCLUSIONS

Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.

Collapse

Song G, Chung SJ, Seo JY, Yang SY, Jin EH, Chung GE, Shim SR, Sa S, Hong MS, Kim KH, Jang E, Lee CW, Bae JH, Han HW. Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research. J Clin Med 2022;11:jcm11112967. [PMID: 35683353 PMCID: PMC9181010 DOI: 10.3390/jcm11112967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/16/2022] [Accepted: 05/23/2022] [Indexed: 11/21/2022] Open

Affiliation(s)

Gyuseon Song Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Su Jin Chung Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
Ji Yeon Seo Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
Sun Young Yang Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
Eun Hyo Jin Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
Goh Eun Chung Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
Sung Ryul Shim Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea Department of Health and Medical Informatics, Kyungnam University College of Health Sciences, Changwon 51767, Korea
Soonok Sa Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Moongi Simon Hong Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Kang Hyun Kim Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Eunchan Jang Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Chae Won Lee Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
Jung Ho Bae Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.) Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)
Hyun Wook Han Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.) Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)

Collapse