1
|
Sivarajkumar S, Mohammad HA, Oniani D, Roberts K, Hersh W, Liu H, He D, Visweswaran S, Wang Y. Clinical Information Retrieval: A Literature Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:313-352. [PMID: 38681755 PMCID: PMC11052968 DOI: 10.1007/s41666-024-00159-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 12/07/2023] [Accepted: 01/08/2024] [Indexed: 05/01/2024]
Abstract
Clinical information retrieval (IR) plays a vital role in modern healthcare by facilitating efficient access and analysis of medical literature for clinicians and researchers. This scoping review aims to offer a comprehensive overview of the current state of clinical IR research and identify gaps and potential opportunities for future studies in this field. The main objective was to assess and analyze the existing literature on clinical IR, focusing on the methods, techniques, and tools employed for effective retrieval and analysis of medical information. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we conducted an extensive search across databases such as Ovid Embase, Ovid Medline, Scopus, ACM Digital Library, IEEE Xplore, and Web of Science, covering publications from January 1, 2010, to January 4, 2023. The rigorous screening process led to the inclusion of 184 papers in our review. Our findings provide a detailed analysis of the clinical IR research landscape, covering aspects like publication trends, data sources, methodologies, evaluation metrics, and applications. The review identifies key research gaps in clinical IR methods such as indexing, ranking, and query expansion, offering insights and opportunities for future studies in clinical IR, thus serving as a guiding framework for upcoming research efforts in this rapidly evolving field. The study also underscores an imperative for innovative research on advanced clinical IR systems capable of fast semantic vector search and adoption of neural IR techniques for effective retrieval of information from unstructured electronic health records (EHRs). Supplementary Information The online version contains supplementary material available at 10.1007/s41666-024-00159-4.
Collapse
Affiliation(s)
| | | | - David Oniani
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA USA
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - William Hersh
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR USA
| | - Hongfang Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX USA
| | - Daqing He
- Department of Information Science, University of Pittsburgh, Pittsburgh, PA USA
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA USA
| | - Yanshan Wang
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA USA
| |
Collapse
|
2
|
Wen A, He H, Fu S, Liu S, Miller K, Wang L, Roberts KE, Bedrick SD, Hersh WR, Liu H. The IMPACT framework and implementation for accessible in silico clinical phenotyping in the digital era. NPJ Digit Med 2023; 6:132. [PMID: 37479735 PMCID: PMC10362064 DOI: 10.1038/s41746-023-00878-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023] Open
Abstract
Clinical phenotyping is often a foundational requirement for obtaining datasets necessary for the development of digital health applications. Traditionally done via manual abstraction, this task is often a bottleneck in development due to time and cost requirements, therefore raising significant interest in accomplishing this task via in-silico means. Nevertheless, current in-silico phenotyping development tends to be focused on a single phenotyping task resulting in a dearth of reusable tools supporting cross-task generalizable in-silico phenotyping. In addition, in-silico phenotyping remains largely inaccessible for a substantial portion of potentially interested users. Here, we highlight the barriers to the usage of in-silico phenotyping and potential solutions in the form of a framework of several desiderata as observed during our implementation of such tasks. In addition, we introduce an example implementation of said framework as a software application, with a focus on ease of adoption, cross-task reusability, and facilitating the clinical phenotyping algorithm development process.
Collapse
Affiliation(s)
- Andrew Wen
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Huan He
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
| | - Sunyang Fu
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Sijia Liu
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
| | - Kurt Miller
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
| | - Liwei Wang
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Kirk E Roberts
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Steven D Bedrick
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, 97239, USA
| | - William R Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Hongfang Liu
- Department of AI & Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Rui A, Garabedian PM, Marceau M, Syrowatka A, Volk LA, Edrees HH, Seger DL, Amato MG, Cambre J, Dulgarian S, Newmark LP, Nanji KC, Schultz P, Jackson GP, Rozenblum R, Bates DW. Performance of a Web-Based Reference Database With Natural Language Searching Capabilities: Usability Evaluation of DynaMed and Micromedex With Watson. JMIR Hum Factors 2023; 10:e43960. [PMID: 37067858 PMCID: PMC10152386 DOI: 10.2196/43960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/25/2023] [Accepted: 01/29/2023] [Indexed: 04/18/2023] Open
Abstract
BACKGROUND Evidence-based point-of-care information (POCI) tools can facilitate patient safety and care by helping clinicians to answer disease state and drug information questions in less time and with less effort. However, these tools may also be visually challenging to navigate or lack the comprehensiveness needed to sufficiently address a medical issue. OBJECTIVE This study aimed to collect clinicians' feedback and directly observe their use of the combined POCI tool DynaMed and Micromedex with Watson, now known as DynaMedex. EBSCO partnered with IBM Watson Health, now known as Merative, to develop the combined tool as a resource for clinicians. We aimed to identify areas for refinement based on participant feedback and examine participant perceptions to inform further development. METHODS Participants (N=43) within varying clinical roles and specialties were recruited from Brigham and Women's Hospital and Massachusetts General Hospital in Boston, Massachusetts, United States, between August 10, 2021, and December 16, 2021, to take part in usability sessions aimed at evaluating the efficiency and effectiveness of, as well as satisfaction with, the DynaMed and Micromedex with Watson tool. Usability testing methods, including think aloud and observations of user behavior, were used to identify challenges regarding the combined tool. Data collection included measurements of time on task; task ease; satisfaction with the answer; posttest feedback on likes, dislikes, and perceived reliability of the tool; and interest in recommending the tool to a colleague. RESULTS On a 7-point Likert scale, pharmacists rated ease (mean 5.98, SD 1.38) and satisfaction (mean 6.31, SD 1.34) with the combined POCI tool higher than the physicians, nurse practitioner, and physician's assistants (ease: mean 5.57, SD 1.64, and satisfaction: mean 5.82, SD 1.60). Pharmacists spent longer (mean 2 minutes, 26 seconds, SD 1 minute, 41 seconds) on average finding an answer to their question than the physicians, nurse practitioner, and physician's assistants (mean 1 minute, 40 seconds, SD 1 minute, 23 seconds). CONCLUSIONS Overall, the tool performed well, but this usability evaluation identified multiple opportunities for improvement that would help inexperienced users.
Collapse
Affiliation(s)
- Angela Rui
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Pamela M Garabedian
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
| | - Marlika Marceau
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
| | - Ania Syrowatka
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Lynn A Volk
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
| | - Heba H Edrees
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Massachusetts College of Pharmacy and Health Sciences (MCPHS), Boston, MA, United States
| | - Diane L Seger
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
| | - Mary G Amato
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
- Massachusetts College of Pharmacy and Health Sciences (MCPHS), Boston, MA, United States
| | - Jacob Cambre
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Sevan Dulgarian
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Lisa P Newmark
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
| | - Karen C Nanji
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA, United States
| | | | - Gretchen Purcell Jackson
- Vanderbilt University Medical Center, Nashville, TN, United States
- Intuitive Surgical, Sunnyvale, CA, United States
| | - Ronen Rozenblum
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - David W Bates
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, United States
- Clinical and Quality Analysis, Mass General Brigham, Somerville, MA, United States
- Harvard Medical School, Boston, MA, United States
- Harvard TH Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
4
|
The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria. Sci Data 2022; 9:490. [PMID: 35953524 PMCID: PMC9372145 DOI: 10.1038/s41597-022-01521-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 06/28/2022] [Indexed: 11/08/2022] Open
Abstract
Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work. Measurement(s) | Clinical Trial Eligibility Criteria | Technology Type(s) | natural language processing | Sample Characteristic - Organism | Homo sapiens |
Collapse
|
5
|
Patel R, Wee SN, Ramaswamy R, Thadani S, Tandi J, Garg R, Calvanese N, Valko M, Rush AJ, Rentería ME, Sarkar J, Kollins SH. NeuroBlu, an electronic health record (EHR) trusted research environment (TRE) to support mental healthcare analytics with real-world data. BMJ Open 2022; 12:e057227. [PMID: 35459671 PMCID: PMC9036423 DOI: 10.1136/bmjopen-2021-057227] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
PURPOSE NeuroBlu is a real-world data (RWD) repository that contains deidentified electronic health record (EHR) data from US mental healthcare providers operating the MindLinc EHR system. NeuroBlu enables users to perform statistical analysis through a secure web-based interface. Structured data are available for sociodemographic characteristics, mental health service contacts, hospital admissions, International Classification of Diseases ICD-9/ICD-10 diagnosis, prescribed medications, family history of mental disorders, Clinical Global Impression-Severity and Improvement (CGI-S/CGI-I) and Global Assessment of Functioning (GAF). To further enhance the data set, natural language processing (NLP) tools have been applied to obtain mental state examination (MSE) and social/environmental data. This paper describes the development and implementation of NeuroBlu, the procedures to safeguard data integrity and security and how the data set supports the generation of real-world evidence (RWE) in mental health. PARTICIPANTS As of 31 July 2021, 562 940 individuals (48.9% men) were present in the data set with a mean age of 33.4 years (SD: 18.4 years). The most frequently recorded diagnoses were substance use disorders (1 52 790 patients), major depressive disorder (1 29 120 patients) and anxiety disorders (1 03 923 patients). The median duration of follow-up was 7 months (IQR: 1.3 to 24.4 months). FINDINGS TO DATE The data set has supported epidemiological studies demonstrating increased risk of psychiatric hospitalisation and reduced antidepressant treatment effectiveness among people with comorbid substance use disorders. It has also been used to develop data visualisation tools to support clinical decision-making, evaluate comparative effectiveness of medications, derive models to predict treatment response and develop NLP applications to obtain clinical information from unstructured EHR data. FUTURE PLANS The NeuroBlu data set will be further analysed to better understand factors related to poor clinical outcome, treatment responsiveness and the development of predictive analytic tools that may be incorporated into the source EHR system to support real-time clinical decision-making in the delivery of mental healthcare services.
Collapse
Affiliation(s)
- Rashmi Patel
- Holmusk Technologies Inc, New York, New York, USA
- Department of Psychosis Studies, King's College London, Institute of Psychiatry Psychology and Neuroscience, London, UK
| | - Soon Nan Wee
- Holmusk Technologies Inc, New York, New York, USA
| | | | | | | | - Ruchir Garg
- Holmusk Technologies Inc, New York, New York, USA
| | | | | | - A John Rush
- Curbstone Consultant LLC, Santa Fe, New Mexico, USA
| | | | | | - Scott H Kollins
- Holmusk Technologies Inc, New York, New York, USA
- Duke University School of Medicine, Durham, North Carolina, USA
| |
Collapse
|
6
|
Daniel C, Bellamine A, Kalra D. Key Contributions in Clinical Research Informatics. Yearb Med Inform 2021; 30:233-238. [PMID: 34479395 PMCID: PMC8416193 DOI: 10.1055/s-0041-1726514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Objectives:
To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2020.
Method:
A bibliographic search using a combination of Medical Subject Headings (MeSH) descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between two section editors and the editorial team was organized to finally conclude on the selected four best papers.
Results:
Among the 877 papers published in 2020 and returned by the search, there were four best papers selected. The first best paper describes a method for mining temporal sequences from clinical documents to infer disease trajectories and enhancing high-throughput phenotyping. The authors of the second best paper demonstrate that the generation of synthetic Electronic Health Record (EHR) data through Generative Adversarial Networks (GANs) could be substantially improved by more appropriate training and evaluation criteria. The third best paper offers an efficient advance on methods to detect adverse drug events by computer-assisting expert reviewers with annotated candidate mentions in clinical documents. The large-scale data quality assessment study reported by the fourth best paper has clinical research informatics implications, in terms of the trustworthiness of inferences made from analysing electronic health records.
Conclusions:
The most significant research efforts in the CRI field are currently focusing on data science with active research in the development and evaluation of Artificial Intelligence/Machine Learning (AI/ML) algorithms based on ever more intensive use of real-world data and especially EHR real or synthetic data. A major lesson that the coronavirus disease 2019 (COVID-19) pandemic has already taught the scientific CRI community is that timely international high-quality data-sharing and collaborative data analysis is absolutely vital to inform policy decisions.
Collapse
Affiliation(s)
- Christel Daniel
- Information Technology Department, AP-HP, F-75012 Paris, France.,Sorbonne University, University Paris 13, Sorbonne Paris Cité, INSERM UMR_S 1142, LIMICS, F-75006 Paris, France
| | - Ali Bellamine
- Information Technology Department, AP-HP, F-75012 Paris, France
| | | | | |
Collapse
|
7
|
Almeida JR, Silva JF, Matos S, Oliveira JL. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021; 120:103849. [PMID: 34214696 DOI: 10.1016/j.jbi.2021.103849] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023]
Abstract
BACKGROUND The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | | | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Aveiro, Portugal.
| | | |
Collapse
|
8
|
Park J, You SC, Jeong E, Weng C, Park D, Roh J, Lee DY, Cheong JY, Choi JW, Kang M, Park RW. A Framework (SOCRATex) for Hierarchical Annotation of Unstructured Electronic Health Records and Integration Into a Standardized Medical Database: Development and Usability Study. JMIR Med Inform 2021; 9:e23983. [PMID: 33783361 PMCID: PMC8044740 DOI: 10.2196/23983] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/14/2020] [Accepted: 01/23/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Although electronic health records (EHRs) have been widely used in secondary assessments, clinical documents are relatively less utilized owing to the lack of standardized clinical text frameworks across different institutions. OBJECTIVE This study aimed to develop a framework for processing unstructured clinical documents of EHRs and integration with standardized structured data. METHODS We developed a framework known as Staged Optimization of Curation, Regularization, and Annotation of clinical text (SOCRATex). SOCRATex has the following four aspects: (1) extracting clinical notes for the target population and preprocessing the data, (2) defining the annotation schema with a hierarchical structure, (3) performing document-level hierarchical annotation using the annotation schema, and (4) indexing annotations for a search engine system. To test the usability of the proposed framework, proof-of-concept studies were performed on EHRs. We defined three distinctive patient groups and extracted their clinical documents (ie, pathology reports, radiology reports, and admission notes). The documents were annotated and integrated into the Observational Medical Outcomes Partnership (OMOP)-common data model (CDM) database. The annotations were used for creating Cox proportional hazard models with different settings of clinical analyses to measure (1) all-cause mortality, (2) thyroid cancer recurrence, and (3) 30-day hospital readmission. RESULTS Overall, 1055 clinical documents of 953 patients were extracted and annotated using the defined annotation schemas. The generated annotations were indexed into an unstructured textual data repository. Using the annotations of pathology reports, we identified that node metastasis and lymphovascular tumor invasion were associated with all-cause mortality among colon and rectum cancer patients (both P=.02). The other analyses involving measuring thyroid cancer recurrence using radiology reports and 30-day hospital readmission using admission notes in depressive disorder patients also showed results consistent with previous findings. CONCLUSIONS We propose a framework for hierarchical annotation of textual data and integration into a standardized OMOP-CDM medical database. The proof-of-concept studies demonstrated that our framework can effectively process and integrate diverse clinical documents with standardized structured data for clinical research.
Collapse
Affiliation(s)
- Jimyung Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Seng Chan You
- Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Eugene Jeong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Dongsu Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Roh
- Department of Pathology, Ajou University Hospital, Suwon, Republic of Korea
| | - Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Jin Wook Choi
- Department of Radiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Mira Kang
- Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
9
|
Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Hwang H, Yoo S. Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer. J Med Internet Res 2020; 22:e18526. [PMID: 33295294 PMCID: PMC7758167 DOI: 10.2196/18526] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/20/2020] [Accepted: 11/11/2020] [Indexed: 01/15/2023] Open
Abstract
Background Common data models (CDMs) help standardize electronic health record data and facilitate outcome analysis for observational and longitudinal research. An analysis of pathology reports is required to establish fundamental information infrastructure for data-driven colon cancer research. The Observational Medical Outcomes Partnership (OMOP) CDM is used in distributed research networks for clinical data; however, it requires conversion of free text–based pathology reports into the CDM’s format. There are few use cases of representing cancer data in CDM. Objective In this study, we aimed to construct a CDM database of colon cancer–related pathology with natural language processing (NLP) for a research platform that can utilize both clinical and omics data. The essential text entities from the pathology reports are extracted, standardized, and converted to the OMOP CDM format in order to utilize the pathology data in cancer research. Methods We extracted clinical text entities, mapped them to the standard concepts in the Observational Health Data Sciences and Informatics vocabularies, and built databases and defined relations for the CDM tables. Major clinical entities were extracted through NLP on pathology reports of surgical specimens, immunohistochemical studies, and molecular studies of colon cancer patients at a tertiary general hospital in South Korea. Items were extracted from each report using regular expressions in Python. Unstructured data, such as text that does not have a pattern, were handled with expert advice by adding regular expression rules. Our own dictionary was used for normalization and standardization to deal with biomarker and gene names and other ungrammatical expressions. The extracted clinical and genetic information was mapped to the Logical Observation Identifiers Names and Codes databases and the Systematized Nomenclature of Medicine (SNOMED) standard terminologies recommended by the OMOP CDM. The database-table relationships were newly defined through SNOMED standard terminology concepts. The standardized data were inserted into the CDM tables. For evaluation, 100 reports were randomly selected and independently annotated by a medical informatics expert and a nurse. Results We examined and standardized 1848 immunohistochemical study reports, 3890 molecular study reports, and 12,352 pathology reports of surgical specimens (from 2017 to 2018). The constructed and updated database contained the following extracted colorectal entities: (1) NOTE_NLP, (2) MEASUREMENT, (3) CONDITION_OCCURRENCE, (4) SPECIMEN, and (5) FACT_RELATIONSHIP of specimen with condition and measurement. Conclusions This study aimed to prepare CDM data for a research platform to take advantage of all omics clinical and patient data at Seoul National University Bundang Hospital for colon cancer pathology. A more sophisticated preparation of the pathology data is needed for further research on cancer genomics, and various types of text narratives are the next target for additional research on the use of data in the CDM.
Collapse
Affiliation(s)
- Borim Ryu
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Eunsil Yoon
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Seok Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sejoon Lee
- Department of Pathology and Translational Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hyunyoung Baek
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Soyoung Yi
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hee Young Na
- Department of Pathology and Translational Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Ji-Won Kim
- Division of Hematology and Medical Oncology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Rong-Min Baek
- Department of Plastic Surgery, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Hee Hwang
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| |
Collapse
|