1
|
Omar M, Levkovich I. Exploring the efficacy and potential of large language models for depression: A systematic review. J Affect Disord 2025; 371:234-244. [PMID: 39581383 DOI: 10.1016/j.jad.2024.11.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/21/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
BACKGROUND AND OBJECTIVE Depression is a substantial public health issue, with global ramifications. While initial literature reviews explored the intersection between artificial intelligence (AI) and mental health, they have not yet critically assessed the specific contributions of Large Language Models (LLMs) in this domain. The objective of this systematic review was to examine the usefulness of LLMs in diagnosing and managing depression, as well as to investigate their incorporation into clinical practice. METHODS This review was based on a thorough search of the PubMed, Embase, Web of Science, and Scopus databases for the period January 2018 through March 2024. The search used PROSPERO and adhered to PRISMA guidelines. Original research articles, preprints, and conference papers were included, while non-English and non-research publications were excluded. Data extraction was standardized, and the risk of bias was evaluated using the ROBINS-I, QUADAS-2, and PROBAST tools. RESULTS Our review included 34 studies that focused on the application of LLMs in detecting and classifying depression through clinical data and social media texts. LLMs such as RoBERTa and BERT demonstrated high effectiveness, particularly in early detection and symptom classification. Nevertheless, the integration of LLMs into clinical practice is in its nascent stage, with ongoing concerns about data privacy and ethical implications. CONCLUSION LLMs exhibit significant potential for transforming strategies for diagnosing and treating depression. Nonetheless, full integration of LLMs into clinical practice requires rigorous testing, ethical considerations, and enhanced privacy measures to ensure their safe and effective use.
Collapse
Affiliation(s)
- Mahmud Omar
- Tel-Aviv University, Faculty of Medicine, Israel.
| | | |
Collapse
|
2
|
Lindsay SE, Madison CJ, Ramsey DC, Doung YC, Gundle KR. De Novo Natural Language Processing Algorithm Accurately Identifies Myxofibrosarcoma From Pathology Reports. Clin Orthop Relat Res 2025; 483:80-87. [PMID: 39360774 DOI: 10.1097/corr.0000000000003270] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 09/13/2024] [Indexed: 01/04/2025]
Abstract
BACKGROUND Available codes in the ICD-10 do not accurately reflect soft tissue sarcoma diagnoses, and this can result in an underrepresentation of soft tissue sarcoma in databases. The National VA Database provides a unique opportunity for soft tissue sarcoma investigation because of the availability of all clinical results and pathology reports. In the setting of soft tissue sarcoma, natural language processing (NLP) has the potential to be applied to clinical documents such as pathology reports to identify soft tissue sarcoma independent of ICD codes, allowing sarcoma researchers to build more comprehensive databases capable of answering a myriad of research questions. QUESTIONS/PURPOSES (1) What proportion of patients with myxofibrosarcoma within the National VA Database would be missed by searching only by soft tissue sarcoma ICD codes? (2) Is a de novo NLP algorithm capable of analyzing pathology reports to accurately identify patients with myxofibrosarcoma? METHODS All pathology reports (10.7 million) in the national VA corporate data warehouse were identified from 2003 to 2022. Using the word-search functionality, reports from 403 veterans were found to contain the term "myxofibrosarcoma." The resulting pathology reports were manually reviewed to develop a gold-standard cohort that contained only those veterans with pathologist-confirmed myxofibrosarcoma diagnoses. The cohort had a mean ± SD age of 70 ± 12 years, and 96% (287 of 300) were men. Diagnosis codes were abstracted, and differences in appropriate ICD coding were compared. An NLP algorithm was iteratively refined and tested using confounders, negation, and emphasis terms for myxofibrosarcoma. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the NLP-generated cohorts through comparison with the manually reviewed gold-standard cohorts. RESULTS The records of 27% (81 of 300) of myxofibrosarcoma patients within the VA database were missing a sarcoma ICD code. A de novo NLP algorithm more accurately (92% [276 of 300]) identified patients with myxofibrosarcoma compared with ICD codes (73% [219 of 300]) or basic word searches (74% [300 of 403]) (p < 0.001). Three final algorithm models were generated with accuracies ranging from 92% to 100%. CONCLUSION An NLP algorithm can identify patients with myxofibrosarcoma from pathology reports with high accuracy, which is an improvement over ICD-based cohort creation and simple word search. This algorithm is freely available on GitHub ( https://github.com/sarcoma-shark/myxofibrosarcoma-shark ) and is available to facilitate external validation and improvement through testing in other cohorts. LEVEL OF EVIDENCE Level II, diagnostic study.
Collapse
Affiliation(s)
- Sarah E Lindsay
- Department of Orthopaedics and Rehabilitation, Oregon Health & Science University, Portland, OR, USA
| | | | - Duncan C Ramsey
- Department of Orthopaedics and Rehabilitation, Oregon Health & Science University, Portland, OR, USA
| | - Yee-Cheen Doung
- Department of Orthopaedics and Rehabilitation, Oregon Health & Science University, Portland, OR, USA
| | - Kenneth R Gundle
- Department of Orthopaedics and Rehabilitation, Oregon Health & Science University, Portland, OR, USA
- Portland VA Medical Center, Portland, OR, USA
| |
Collapse
|
3
|
Gadi SR, Muralidharan SS, Glissen Brown JR. Colonoscopy Quality, Innovation, and the Assessment of New Technology. TECHNIQUES AND INNOVATIONS IN GASTROINTESTINAL ENDOSCOPY 2024; 26:177-192. [DOI: 10.1016/j.tige.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Guével E, Priou S, Flicoteaux R, Lamé G, Bey R, Tannier X, Cohen A, Chatellier G, Daniel C, Tournigand C, Kempf E. Development of a natural language processing model for deriving breast cancer quality indicators : A cross-sectional, multicenter study. Rev Epidemiol Sante Publique 2023; 71:102189. [PMID: 37972522 DOI: 10.1016/j.respe.2023.102189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
OBJECTIVES Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information. METHOD We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique - Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score. RESULTS Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0-71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators. The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%-93.3%]), an average precision of 77.7% [10.0%-97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators. DISCUSSION The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated. CONCLUSIONS The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.
Collapse
Affiliation(s)
- Etienne Guével
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Sonia Priou
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Rémi Flicoteaux
- Assistance Publique - Hôpitaux de Paris, Department of medical information, 75012 Paris, France
| | - Guillaume Lamé
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Romain Bey
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Xavier Tannier
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France
| | - Ariel Cohen
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Gilles Chatellier
- Université Paris CIté, Department of medical informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), 75015 Paris, France
| | - Christel Daniel
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Christophe Tournigand
- Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France
| | - Emmanuelle Kempf
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France; Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France.
| |
Collapse
|
5
|
Sabrie N, Khan R, Jogendran R, Scaffidi M, Bansal R, Gimpaya N, Youssef M, Forbes N, Mosko JD, Berzin TM, Lightfoot D, Grover SC. Performance of natural language processing in identifying adenomas from colonoscopy reports: a systematic review and meta-analysis. IGIE 2023; 2:350-356.e7. [DOI: 10.1016/j.igie.2023.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
|
6
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
7
|
Rex DK. Key quality indicators in colonoscopy. Gastroenterol Rep (Oxf) 2023; 11:goad009. [PMID: 36911141 PMCID: PMC10005623 DOI: 10.1093/gastro/goad009] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 01/05/2023] [Indexed: 03/12/2023] Open
Abstract
Many quality indicators have been proposed for colonoscopy, but most colonoscopists and endoscopy groups focus on measuring the adenoma detection rate and the cecal intubation rate. Use of proper screening and surveillance intervals is another accepted key indicator but it is seldom evaluated in clinical practice. Bowel preparation efficacy and polyp resection skills are areas that are emerging as potential key or priority indicators. This review summarizes and provides an update on key performance indicators for colonoscopy quality.
Collapse
Affiliation(s)
- Douglas K Rex
- Division of Gastroenterology/Hepatology, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
8
|
Benson R, Winterton C, Winn M, Krick B, Liu M, Abu-el-rub N, Conway M, Del Fiol G, Gawron A, Hardikar S. Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study. JCO Clin Cancer Inform 2023; 7:e2200131. [PMID: 36753686 PMCID: PMC10166420 DOI: 10.1200/cci.22.00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2022] [Indexed: 02/10/2023] Open
Abstract
PURPOSE Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set. METHODS We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data. RESULTS Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm. CONCLUSION Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.
Collapse
Affiliation(s)
- Ryzen Benson
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | | | - Maci Winn
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT
| | - Benjamin Krick
- Department of Political Science, Duke University, Durham, NC
| | - Mei Liu
- Deparment of Internal Medicine, University of Kansas Medical Center, Kansas City, KS
| | - Noor Abu-el-rub
- Deparment of Internal Medicine, University of Kansas Medical Center, Kansas City, KS
| | - Mike Conway
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Andrew Gawron
- Salt Lake City VA Specialty Care Center of Innovation, University of Utah, Salt Lake City, UT
- Department of Internal Medicine, University of Utah, Salt Lake City, UT
| | - Sheetal Hardikar
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT
| |
Collapse
|
9
|
Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022; 10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. OBJECTIVE This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. METHODS In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. RESULTS For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. CONCLUSIONS Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.
Collapse
Affiliation(s)
- Laurent Binsfeld Gonçalves
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Ivan Nesic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marko Obradovic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Bram Stieltjes
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Thomas Weikert
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Jens Bremerich
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| |
Collapse
|
10
|
Song G, Chung SJ, Seo JY, Yang SY, Jin EH, Chung GE, Shim SR, Sa S, Hong MS, Kim KH, Jang E, Lee CW, Bae JH, Han HW. Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research. J Clin Med 2022; 11:jcm11112967. [PMID: 35683353 PMCID: PMC9181010 DOI: 10.3390/jcm11112967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/16/2022] [Accepted: 05/23/2022] [Indexed: 11/21/2022] Open
Abstract
Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.
Collapse
Affiliation(s)
- Gyuseon Song
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Su Jin Chung
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Ji Yeon Seo
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Sun Young Yang
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Eun Hyo Jin
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Goh Eun Chung
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Sung Ryul Shim
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Department of Health and Medical Informatics, Kyungnam University College of Health Sciences, Changwon 51767, Korea
| | - Soonok Sa
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Moongi Simon Hong
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Kang Hyun Kim
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Eunchan Jang
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Chae Won Lee
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Jung Ho Bae
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
- Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)
| | - Hyun Wook Han
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)
| |
Collapse
|