1
|
Gholipour M, Khajouei R, Amiri P, Hajesmaeel Gohari S, Ahmadian L. Extracting cancer concepts from clinical notes using natural language processing: a systematic review. BMC Bioinformatics 2023; 24:405. [PMID: 37898795 PMCID: PMC10613366 DOI: 10.1186/s12859-023-05480-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 09/13/2023] [Indexed: 10/30/2023] Open
Abstract
BACKGROUND Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. METHODS PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning "Cancer", "NLP", "Coding", and "Registries" until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. RESULTS Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). CONCLUSION The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.
Collapse
Affiliation(s)
- Maryam Gholipour
- Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Reza Khajouei
- Department of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran
| | - Parastoo Amiri
- Student Research Committee, Kerman University of Medical Sciences, Kerman, Iran
| | - Sadrieh Hajesmaeel Gohari
- Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Leila Ahmadian
- Department of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical Sciences, Kerman, Iran.
| |
Collapse
|
2
|
Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A relevant problem in medicine is the standardization of the diagnosis associated with a clinical case. Although diagnosis formulation is an intrinsically subjective and uncertain process, its standardization may take benefit from digital solutions automating the routines at the basis of such a decision. In this work, we propose ARGO 2.0: a framework for the development of decision support systems for diagnosis formulation. The framework can read free-text reports and store their clinically relevant information as personalized electronic Case Report Forms. A hybrid strategy, exploiting the synergy of Natural Language Processing and Machine Learning techniques, is used to automatically suggest a diagnosis in a standardized fashion. ARGO 2.0 has been designed to be template-independent and easily tailored to specific medical fields. We here demonstrate its feasibility in hemo lympho-pathology, by detailing its implementation, object of an ongoing validation campaign in a standing medical institute. ARGO 2.0 achieved an average Accuracy of 95.07%, an average precision of 94.85%, an average Recall of 96.31% and a F-Score of 95.32% onto the test set, outperforming both its embedded components, based on Natural Language Processing and Machine Learning.
Collapse
|
3
|
Barr B, Harasemiw O, Gibson IW, Tremblay-Savard O, Tangri N. The Development of a Comprehensive Clinicopathologic Registry for Glomerular Diseases Using Natural Language Processing. Can J Kidney Health Dis 2023; 10:20543581231178963. [PMID: 37342151 PMCID: PMC10278432 DOI: 10.1177/20543581231178963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/22/2023] [Indexed: 06/22/2023] Open
Abstract
Background Glomerulonephritis (GN) represents a common cause of chronic kidney disease, and treatment to slow or prevent progression of GN is associated with significant morbidity. Large patient registries have improved the understanding of risk stratification, treatment selection, and definitions of treatment response in GN, but can be resource-intensive, with incomplete patient capture. Objective To describe the creation of a comprehensive clinicopathologic registry for all patients undergoing kidney biopsy in Manitoba, using natural language processing software for data extraction from pathology reports, as well as to describe cohort characteristics and outcomes. Design Retrospective population-based cohort study. Setting Tertiary care center in the province of Manitoba. Patients All patients undergoing a kidney biopsy in the province of Manitoba from 2002 to 2019. Measurements Descriptive statistics are presented for the most common glomerular diseases, along with outcomes of kidney failure and mortality for the individual diseases. Methods Data from native kidney biopsy reports from January 2002 to December 2019 were extracted into a structured database using a natural language processing algorithm employing regular expressions. The pathology database was then linked with population-level clinical, laboratory, and medication data, creating a comprehensive clinicopathologic registry. Kaplan-Meier curves and Cox models were constructed to assess the relationship between type of GN and outcomes of kidney failure and mortality. Results Of 2421 available biopsies, 2103 individuals were linked to administrative data, of which 1292 had a common glomerular disease. The incidence of yearly biopsies increased almost 3-fold over the study period. Among common glomerular diseases, immunoglobulin A (IgA) nephropathy was the most common (28.6%), whereas infection-related GN had the highest proportions of kidney failure (70.3%) and all-cause mortality (42.3%). Predictors of kidney failure included urine albumin-to-creatinine ratio at the time of biopsy (adjusted hazard ratio [HR] = 1.43, 95% confidence interval [CI] = 1.24-1.65), whereas predictors of mortality included age at the time of biopsy (adjusted HR = 1.05, 95% CI = 1.04-1.06) and infection-related GN (adjusted HR = 1.85, 95% CI = 1.14-2.99, compared with the reference category of IgA nephropathy). Limitations Retrospective, single-center study with a relatively small number of biopsies. Conclusions Creation of a comprehensive glomerular diseases registry is feasible and can be facilitated through the use of novel data extraction methods. This registry will facilitate further epidemiological research in GN.
Collapse
Affiliation(s)
- Bryce Barr
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| | - Oksana Harasemiw
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| | - Ian W Gibson
- Department of Pathology, University of Manitoba, Winnipeg, Canada
- Shared Health Services Manitoba, Winnipeg, Canada
| | | | - Navdeep Tangri
- Department of Internal Medicine, University of Manitoba, Winnipeg, Canada
- Chronic Disease Innovation Centre, Seven Oaks General Hospital, Winnipeg, MB, Canada
| |
Collapse
|
4
|
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023; 142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Michael Gurley
- Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Georgina Kennedy
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Ruth M Reeves
- TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jianlin Shi
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Xiaoyan Wang
- Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
| | - Yanshan Wang
- Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Rui Zhang
- Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | | | | | - Clair Blacketer
- Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
5
|
Kempf E, Vaterkowski M, Leprovost D, Griffon N, Ouagne D, Breant S, Serre P, Mouchet A, Rance B, Chatellier G, Bellamine A, Frank M, Guerin J, Tannier X, Livartowski A, Hilka M, Daniel C. How to Improve Cancer Patients ENrollment in Clinical Trials From rEal-Life Databases Using the Observational Medical Outcomes Partnership Oncology Extension: Results of the PENELOPE Initiative in Urologic Cancers. JCO Clin Cancer Inform 2023; 7:e2200179. [PMID: 37167578 DOI: 10.1200/cci.22.00179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
PURPOSE To compare the computability of Observational Medical Outcomes Partnership (OMOP)-based queries related to prescreening of patients using two versions of the OMOP common data model (CDM; v5.3 and v5.4) and to assess the performance of the Greater Paris University Hospital (APHP) prescreening tool. MATERIALS AND METHODS We identified the prescreening information items being relevant for prescreening of patients with cancer. We randomly selected 15 academic and industry-sponsored urology phase I-IV clinical trials (CTs) launched at APHP between 2016 and 2021. The computability of the related prescreening criteria (PC) was defined by their translation rate in OMOP-compliant queries and by their execution rate on the APHP clinical data warehouse (CDW) containing data of 205,977 patients with cancer. The overall performance of the prescreening tool was assessed by the rate of true- and false-positive cases of three randomly selected CTs. RESULTS We defined a list of 15 minimal information items being relevant for patients' prescreening. We identified 83 PC of the 534 eligibility criteria from the 15 CTs. We translated 33 and 62 PC in queries on the basis of OMOP CDM v5.3 and v5.4, respectively (translation rates of 40% and 75%, respectively). Of the 33 PC translated in the v5.3 of the OMOP CDM, 19 could be executed on the APHP CDW (execution rate of 58%). Of 83 PC, the computability rate on the APHP CDW reached 23%. On the basis of three CTs, we identified 17, 32, and 63 patients as being potentially eligible for inclusion in those CTs, resulting in positive predictive values of 53%, 41%, and 21%, respectively. CONCLUSION We showed that PC could be formalized according to the OMOP CDM and that the oncology extension increased their translation rate through better representation of cancer natural history.
Collapse
Affiliation(s)
- Emmanuelle Kempf
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Department of Medical Oncology, Assistance Publique Hôpitaux de Paris, Henri Mondor Teaching Hospital, Créteil, France
| | - Morgan Vaterkowski
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
- EPITA School of Engineering and Computer Science, Paris, France
| | - Damien Leprovost
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Nicolas Griffon
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - David Ouagne
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Stéphane Breant
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Patricia Serre
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Alexandre Mouchet
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Bastien Rance
- Department of Medical Informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, France
| | - Gilles Chatellier
- Department of Medical Informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), Université de Paris, Paris, France
| | - Ali Bellamine
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Marie Frank
- Department of Medical Information, Paris Saclay Teaching Hospital, Assistance Publique Hôpitaux de Paris, Paris, France
| | | | - Xavier Tannier
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | | | - Martin Hilka
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Christel Daniel
- Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
- Innovation and Data, Paris, IT Department, Assistance Publique Hôpitaux de Paris, Paris, France
| |
Collapse
|
6
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
7
|
Lovis C, Vakkuri A, Palojoki S. Systematized Nomenclature of Medicine-Clinical Terminology (SNOMED CT) Clinical Use Cases in the Context of Electronic Health Record Systems: Systematic Literature Review. JMIR Med Inform 2023; 11:e43750. [PMID: 36745498 PMCID: PMC9941898 DOI: 10.2196/43750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/05/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The Systematized Medical Nomenclature for Medicine-Clinical Terminology (SNOMED CT) is a clinical terminology system that provides a standardized and scientifically validated way of representing clinical information captured by clinicians. It can be integrated into electronic health records (EHRs) to increase the possibilities for effective data use and ensure a better quality of documentation that supports continuity of care, thus enabling better quality in the care process. Even though SNOMED CT consists of extensively studied clinical terminology, previous research has repeatedly documented a lack of scientific evidence for SNOMED CT in the form of reported clinical use cases in electronic health record systems. OBJECTIVE The aim of this study was to explore evidence in previous literature reviews of clinical use cases of SNOMED CT integrated into EHR systems or other clinical applications during the last 5 years of continued development. The study sought to identify the main clinical use purposes, use phases, and key clinical benefits documented in SNOMED CT use cases. METHODS The Cochrane review protocol was applied for the study design. The application of the protocol was modified step-by-step to fit the research problem by first defining the search strategy, identifying the articles for the review by isolating the exclusion and inclusion criteria for assessing the search results, and lastly, evaluating and summarizing the review results. RESULTS In total, 17 research articles illustrating SNOMED CT clinical use cases were reviewed. The use purpose of SNOMED CT was documented in all the articles, with the terminology as a standard in EHR being the most common (8/17). The clinical use phase was documented in all the articles. The most common category of use phases was SNOMED CT in development (6/17). Core benefits achieved by applying SNOMED CT in a clinical context were identified by the researchers. These were related to terminology use outcomes, that is, to data quality in general or to enabling a consistent way of indexing, storing, retrieving, and aggregating clinical data (8/17). Additional benefits were linked to the productivity of coding or to advances in the quality and continuity of care. CONCLUSIONS While the SNOMED CT use categories were well supported by previous research, this review demonstrates that further systematic research on clinical use cases is needed to promote the scalability of the review results. To achieve the best out-of-use case reports, more emphasis is suggested on describing the contextual factors, such as the electronic health care system and the use of previous frameworks to enable comparability of results. A lesson to be drawn from our study is that SNOMED CT is essential for structuring clinical data; however, research is needed to gather more evidence of how SNOMED CT benefits clinical care and patient safety.
Collapse
Affiliation(s)
| | - Anne Vakkuri
- Perioperative, Intensive Care and Pain Medicine, Helsinki University Hospital, Vantaa, Finland
| | - Sari Palojoki
- Unit for Digital Transformation, European Centre for Disease Prevention and Control, Stockholm, Sweden
| |
Collapse
|
8
|
López-Úbeda P, Martín-Noguerol T, Aneiros-Fernández J, Luna A. Natural Language Processing in Pathology: Current Trends and Future Insights. THE AMERICAN JOURNAL OF PATHOLOGY 2022; 192:1486-1495. [PMID: 35985480 DOI: 10.1016/j.ajpath.2022.07.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/21/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Natural language processing (NLP) plays a key role in advancing health care, being key to extracting structured information from electronic health reports. In the last decade, several advances in the field of pathology have been derived from the application of NLP to pathology reports. Herein, a comprehensive review of the most used NLP methods for extracting, coding, and organizing information from pathology reports is presented, including how the development of tools is used to improve workflow. In addition, this article discusses, from a practical point of view, the steps necessary to extract data and encode natural language information for its analytical processing, ranging from preprocessing of text to its inclusion in complex algorithms. Finally, the potential of NLP-based automatic solutions for improving workflow in pathology and their further applications in the near future is highlighted.
Collapse
Affiliation(s)
| | | | | | - Antonio Luna
- MRI Unit, Radiology Department, HT Medica, Jaén, Spain
| |
Collapse
|
9
|
Yoo S, Yoon E, Boo D, Kim B, Kim S, Paeng JC, Yoo IR, Choi IY, Kim K, Ryoo HG, Lee SJ, Song E, Joo YH, Kim J, Lee HY. Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model. Appl Clin Inform 2022; 13:521-531. [PMID: 35705182 PMCID: PMC9200482 DOI: 10.1055/s-0042-1748144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. OBJECTIVE We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. METHODS Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. RESULTS The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. CONCLUSION As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.
Collapse
Affiliation(s)
- Sooyoung Yoo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Eunsil Yoon
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Dachung Boo
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Borham Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Seok Kim
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Jin Chul Paeng
- Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| | - Ie Ryung Yoo
- Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea
| | - In Young Choi
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Kwangsoo Kim
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea
| | - Hyun Gee Ryoo
- Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Sun Jung Lee
- Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea.,Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea
| | - Eunhye Song
- Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Young-Hwan Joo
- Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea
| | - Junmo Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea
| | - Ho-Young Lee
- Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.,Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea
| |
Collapse
|
10
|
Arvisais-Anhalt S, Lehmann CU, Bishop JA, Balani J, Boutte L, Morales M, Park JY, Araj E. Searching Full-Text Anatomic Pathology Reports Using Business Intelligence Software. J Pathol Inform 2022; 13:100014. [PMID: 35251753 PMCID: PMC8892022 DOI: 10.1016/j.jpi.2022.100014] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 01/24/2023] Open
Abstract
Although the laboratory information system has largely solved the problem of storing anatomic pathology reports and disseminating their contents across the healthcare system, the retrospective query of anatomic pathology reports remains an area for improvement across laboratory information system vendors. Our institution desired the ability to query our repository of anatomic pathology reports for clinical, operational, research, and educational purposes. To address this need, we developed a full-text anatomic pathology search tool using the business intelligence software, Tableau. Our search tool allows users to query the 333,685 anatomic pathology reports from our institutional clinical relational database using the business intelligence tool's built-in regular expression functionality. Users securely access the search tool using any web browser, thereby avoiding the cost of installing or maintaining software on users' computers. This tool is laboratory information system vendor agnostic and as many institutions already subscribe to business intelligence software, we believe this solution could be easily reproduced at other institutions and in other clinical departments.
Collapse
Affiliation(s)
- Simone Arvisais-Anhalt
- Department of Hospital Medicine and Department of Laboratory Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Christoph U. Lehmann
- Clinical Informatics Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Justin A. Bishop
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jyoti Balani
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Laurie Boutte
- Health System Quality & Operational Excellence, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Marjorie Morales
- Health System Quality & Operational Excellence, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jason Y. Park
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ellen Araj
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA,Corresponding author at: Department of Pathology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390-9072, USA.
| |
Collapse
|
11
|
Zaccaria GM, Colella V, Colucci S, Clemente F, Pavone F, Vegliante MC, Esposito F, Opinto G, Scattone A, Loseto G, Minoia C, Rossini B, Quinto AM, Angiulli V, Grieco LA, Fama A, Ferrero S, Moia R, Di Rocco A, Quaglia FM, Tabanelli V, Guarini A, Ciavarella S. Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology. Sci Rep 2021; 11:23823. [PMID: 34893665 PMCID: PMC8664934 DOI: 10.1038/s41598-021-03204-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/23/2021] [Indexed: 12/04/2022] Open
Abstract
The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
Collapse
Affiliation(s)
- Gian Maria Zaccaria
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.
| | - Vito Colella
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Simona Colucci
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Felice Clemente
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Fabio Pavone
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Maria Carmela Vegliante
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Flavia Esposito
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.,Department of Mathematics, University of Bari Aldo Moro, Bari, Italy
| | - Giuseppina Opinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Anna Scattone
- Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Giacomo Loseto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Carla Minoia
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Bernardo Rossini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Angela Maria Quinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Vito Angiulli
- Clinical Engineering Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Luigi Alfredo Grieco
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Angelo Fama
- Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy
| | - Simone Ferrero
- Division of Hematology 1, AOU "Città Della Salute e Della Scienza di Torino", Torino, Italy.,Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy
| | - Riccardo Moia
- Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Novara, Italy
| | - Alice Di Rocco
- Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
| | | | - Valentina Tabanelli
- Division of Diagnostic Haematopathology, European Institute of Oncology, IRCCS, Milano, Italy
| | - Attilio Guarini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Sabino Ciavarella
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| |
Collapse
|
12
|
Lamer A, Abou-Arab O, Bourgeois A, Parrot A, Popoff B, Beuscart JB, Tavernier B, Moussa MD. Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study. J Med Internet Res 2021; 23:e29259. [PMID: 34714250 PMCID: PMC8590192 DOI: 10.2196/29259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 06/14/2021] [Accepted: 07/05/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM. OBJECTIVE The primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM. METHODS Using our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards. RESULTS We identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care. CONCLUSIONS Generic data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.
Collapse
Affiliation(s)
- Antoine Lamer
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- InterHop, Paris, France
- Univ. Lille, Faculté Ingénierie et Management de la Santé, Lille, France
| | - Osama Abou-Arab
- Department of Anaesthesiology and Critical Care Medicine, Amiens Picardie University Hospital, Amiens, France
| | - Alexandre Bourgeois
- Department of Anesthesiology and Critical Care Medicine, Regional University Hospital of Nancy, Nancy, France
| | | | - Benjamin Popoff
- Department of Anaesthesiology and Critical Care, Rouen University Hospital, Rouen, France
| | - Jean-Baptiste Beuscart
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
| | - Benoît Tavernier
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- Department of Anesthesiology and Critical Care, CHU Lille, Lille, France
| | | |
Collapse
|