1
|
Stellmach C, Hopff SM, Jaenisch T, Nunes de Miranda SM, Rinaldi E. Creation of Standardized Common Data Elements for Diagnostic Tests in Infectious Disease Studies: Semantic and Syntactic Mapping. J Med Internet Res 2024; 26:e50049. [PMID: 38857066 PMCID: PMC11196918 DOI: 10.2196/50049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/10/2023] [Accepted: 01/18/2024] [Indexed: 06/11/2024] Open
Abstract
BACKGROUND It is necessary to harmonize and standardize data variables used in case report forms (CRFs) of clinical studies to facilitate the merging and sharing of the collected patient data across several clinical studies. This is particularly true for clinical studies that focus on infectious diseases. Public health may be highly dependent on the findings of such studies. Hence, there is an elevated urgency to generate meaningful, reliable insights, ideally based on a high sample number and quality data. The implementation of core data elements and the incorporation of interoperability standards can facilitate the creation of harmonized clinical data sets. OBJECTIVE This study's objective was to compare, harmonize, and standardize variables focused on diagnostic tests used as part of CRFs in 6 international clinical studies of infectious diseases in order to, ultimately, then make available the panstudy common data elements (CDEs) for ongoing and future studies to foster interoperability and comparability of collected data across trials. METHODS We reviewed and compared the metadata that comprised the CRFs used for data collection in and across all 6 infectious disease studies under consideration in order to identify CDEs. We examined the availability of international semantic standard codes within the Systemized Nomenclature of Medicine - Clinical Terms, the National Cancer Institute Thesaurus, and the Logical Observation Identifiers Names and Codes system for the unambiguous representation of diagnostic testing information that makes up the CDEs. We then proposed 2 data models that incorporate semantic and syntactic standards for the identified CDEs. RESULTS Of 216 variables that were considered in the scope of the analysis, we identified 11 CDEs to describe diagnostic tests (in particular, serology and sequencing) for infectious diseases: viral lineage/clade; test date, type, performer, and manufacturer; target gene; quantitative and qualitative results; and specimen identifier, type, and collection date. CONCLUSIONS The identification of CDEs for infectious diseases is the first step in facilitating the exchange and possible merging of a subset of data across clinical studies (and with that, large research projects) for possible shared analysis to increase the power of findings. The path to harmonization and standardization of clinical study data in the interest of interoperability can be paved in 2 ways. First, a map to standard terminologies ensures that each data element's (variable's) definition is unambiguous and that it has a single, unique interpretation across studies. Second, the exchange of these data is assisted by "wrapping" them in a standard exchange format, such as Fast Health care Interoperability Resources or the Clinical Data Interchange Standards Consortium's Clinical Data Acquisition Standards Harmonization Model.
Collapse
Affiliation(s)
- Caroline Stellmach
- Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sina Marie Hopff
- Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, Department I of Internal Medicine, University Hospital Cologne and Faculty of Medicine, University of Cologne, Cologne, Germany
| | - Thomas Jaenisch
- Heidelberg Institut für Global Health, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Susana Marina Nunes de Miranda
- Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf, Department I of Internal Medicine, University Hospital Cologne and Faculty of Medicine, University of Cologne, Cologne, Germany
| | - Eugenia Rinaldi
- Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
2
|
Greulich L, Hegselmann S, Dugas M. An Open-Source, Standard-Compliant, and Mobile Electronic Data Capture System for Medical Research (OpenEDC): Design and Evaluation Study. JMIR Med Inform 2021; 9:e29176. [PMID: 34806987 PMCID: PMC8663450 DOI: 10.2196/29176] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/13/2021] [Accepted: 09/28/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Medical research and machine learning for health care depend on high-quality data. Electronic data capture (EDC) systems have been widely adopted for metadata-driven digital data collection. However, many systems use proprietary and incompatible formats that inhibit clinical data exchange and metadata reuse. In addition, the configuration and financial requirements of typical EDC systems frequently prevent small-scale studies from benefiting from their inherent advantages. OBJECTIVE The aim of this study is to develop and publish an open-source EDC system that addresses these issues. We aim to plan a system that is applicable to a wide range of research projects. METHODS We conducted a literature-based requirements analysis to identify the academic and regulatory demands for digital data collection. After designing and implementing OpenEDC, we performed a usability evaluation to obtain feedback from users. RESULTS We identified 20 frequently stated requirements for EDC. According to the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 25010 norm, we categorized the requirements into functional suitability, availability, compatibility, usability, and security. We developed OpenEDC based on the regulatory-compliant Clinical Data Interchange Standards Consortium Operational Data Model (CDISC ODM) standard. Mobile device support enables the collection of patient-reported outcomes. OpenEDC is publicly available and released under the MIT open-source license. CONCLUSIONS Adopting an established standard without modifications supports metadata reuse and clinical data exchange, but it limits item layouts. OpenEDC is a stand-alone web app that can be used without a setup or configuration. This should foster compatibility between medical research and open science. OpenEDC is targeted at observational and translational research studies by clinicians.
Collapse
Affiliation(s)
- Leonard Greulich
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Stefan Hegselmann
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
3
|
Kim HH, Park YR, Lee KH, Song YS, Kim JH. Clinical MetaData ontology: a simple classification scheme for data elements of clinical data based on semantics. BMC Med Inform Decis Mak 2019; 19:166. [PMID: 31429750 PMCID: PMC6701018 DOI: 10.1186/s12911-019-0877-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 07/24/2019] [Indexed: 11/26/2022] Open
Abstract
Background The increasing use of common data elements (CDEs) in numerous research projects and clinical applications has made it imperative to create an effective classification scheme for the efficient management of these data elements. We applied high-level integrative modeling of entire clinical documents from real-world practice to create the Clinical MetaData Ontology (CMDO) for the appropriate classification and integration of CDEs that are in practical use in current clinical documents. Methods CMDO was developed using the General Formal Ontology method with a manual iterative process comprising five steps: (1) defining the scope of CMDO by conceptualizing its first-level terms based on an analysis of clinical-practice procedures, (2) identifying CMDO concepts for representing clinical data of general CDEs by examining how and what clinical data are generated with flows of clinical care practices, (3) assigning hierarchical relationships for CMDO concepts, (4) developing CMDO properties (e.g., synonyms, preferred terms, and definitions) for each CMDO concept, and (5) evaluating the utility of CMDO. Results We created CMDO comprising 189 concepts under the 4 first-level classes of Description, Event, Finding, and Procedure. CMDO has 256 definitions that cover the 189 CMDO concepts, with 459 synonyms for 139 (74.0%) of the concepts. All of the CDEs extracted from 6 HL7 templates, 25 clinical documents of 5 teaching hospitals, and 1 personal health record specification were successfully annotated by 41 (21.9%), 89 (47.6%), and 13 (7.0%) of the CMDO concepts, respectively. We created a CMDO Browser to facilitate navigation of the CMDO concept hierarchy and a CMDO-enabled CDE Browser for displaying the relationships between CMDO concepts and the CDEs extracted from the clinical documents that are used in current practice. Conclusions CMDO is an ontology and classification scheme for CDEs used in clinical documents. Given the increasing use of CDEs in many studies and real-world clinical documentation, CMDO will be a useful tool for integrating numerous CDEs from different research projects and clinical documents. The CMDO Browser and CMDO-enabled CDE Browser make it easy to search, share, and reuse CDEs, and also effectively integrate and manage CDEs from different studies and clinical documents. Electronic supplementary material The online version of this article (10.1186/s12911-019-0877-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hye Hyeon Kim
- Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, 03080, Republic of Korea.,Seoul National University Hospital Biomedical Research Institute, Seoul National University Hospital, Seoul, 03080, Republic of Korea
| | - Yu Rang Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, 03722, Republic of Korea
| | - Kye Hwa Lee
- Precision Medicine Center, Seoul National University Hospital, Seoul, 03080, Republic of Korea
| | - Young Soo Song
- Department of Pathology, Hanyang University College of Medicine, Seoul, 04763, Republic of Korea.
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, 03080, Republic of Korea. .,Division of Biomedical Informatics, Seoul National University College of Medicine, 103 Daehak-ro Jongno-gu, Seoul, 03080, Republic of Korea.
| |
Collapse
|
4
|
Shaheen NA, Manezhi B, Thomas A, AlKelya M. Reducing defects in the datasets of clinical research studies: conformance with data quality metrics. BMC Med Res Methodol 2019; 19:98. [PMID: 31077148 PMCID: PMC6511206 DOI: 10.1186/s12874-019-0735-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Accepted: 04/15/2019] [Indexed: 12/26/2022] Open
Abstract
Background A dataset is indispensable to answer the research questions of clinical research studies. Inaccurate data lead to ambiguous results, and the removal of errors results in increased cost. The aim of this Quality Improvement Project (QIP) was to improve the Data Quality (DQ) by enhancing conformance and minimizing data entry errors. Methods This is a QIP which was conducted in the Department of Biostatistics using historical datasets submitted for statistical data analysis from the department’s knowledge base system. Forty-five datasets received for statistical data analysis, were included at baseline. A 12-item checklist based on six DQ domains (i) completeness (ii) uniqueness (iii) timeliness (iv) accuracy (v) validity and (vi) consistency was developed to assess the DQ. The checklist was comprised of 12 items; missing values, un-coded values, miscoded values, embedded values, implausible values, unformatted values, missing codebook, inconsistencies with the codebook, inaccurate format, unanalyzable data structure, missing outcome variables, and missing analytic variables. The outcome was the number of defects per dataset. Quality improvement DMAIC (Define, Measure, Analyze, Improve, Control) framework and sigma improvement tools were used. Pre-Post design was implemented using mode of interventions. Pre-Post change in defects (zero, one, two or more defects) was compared by using chi-square test. Results At baseline, out of forty-five datasets; six (13.3%) datasets had zero defects, eight (17.8%) had one defect, and 31(69%) had ≥2 defects. The association between the nature of data capture (single vs. multiple data points) and defective data was statistically significant (p = 0.008). Twenty-one datasets were received during post-intervention for statistical data analysis. Seventeen (81%) had zero defects, two (9.5%) had one defect, and two (9.5%) had two or more defects. The proportion of datasets with zero defects had increased from 13.3 to 81%, whereas the proportion of datasets with two or more defects had decreased from 69 to 9.5% (p = < 0.001). Conclusion Clinical research study teams often have limited knowledge of data structuring. Given the need for good quality data, we recommend training programs, consultation with data experts prior to data structuring and use of electronic data capturing methods.
Collapse
Affiliation(s)
- Naila A Shaheen
- Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center, P.O. Box 22490, Mail Code 1515, Riyadh, 11426, Kingdom of Saudi Arabia. .,King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia. .,Ministry of National Guard-Health Affairs, Riyadh, Kingdom of Saudi Arabia.
| | - Bipin Manezhi
- Public Health Division, Central Australian Aboriginal Congress, Alice Springs, Australia
| | - Abin Thomas
- Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center, P.O. Box 22490, Mail Code 1515, Riyadh, 11426, Kingdom of Saudi Arabia.,King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia.,Ministry of National Guard-Health Affairs, Riyadh, Kingdom of Saudi Arabia
| | - Mohammed AlKelya
- Research Quality Management Section, King Abdullah International Medical Research Center, Riyadh, Kingdom of Saudi Arabia.,King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia.,Ministry of National Guard-Health Affairs, Riyadh, Kingdom of Saudi Arabia.,Center for Health Research Studies, Saudi Health Council, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|
5
|
Rahbar MH, Lee M, Hessabi M, Tahanan A, Brown MA, Learch TJ, Diekman LA, Weisman MH, Reveille JD. Harmonization, data management, and statistical issues related to prospective multicenter studies in Ankylosing spondylitis (AS): Experience from the Prospective Study Of Ankylosing Spondylitis (PSOAS) cohort. Contemp Clin Trials Commun 2018; 11:127-135. [PMID: 30094388 PMCID: PMC6071581 DOI: 10.1016/j.conctc.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 07/10/2018] [Accepted: 07/24/2018] [Indexed: 01/13/2023] Open
Abstract
Ankylosing spondylitis (AS) is characterized by inflammation of the spine and sacroiliac joints causing pain and stiffness and, in some patients, ultimately new bone formation, and progressive joint ankyloses. The classical definition of AS is based on the modified New York (mNY) criteria. Limited data have been reported regarding data quality assurance procedure for multicenter or multisite prospective cohort of patients with AS. Since 2002, 1272 qualified AS patients have been enrolled from five sites (4 US sites and 1 Australian site) in the Prospective Study Of Ankylosing Spondylitis (PSOAS). In 2012, a Data Management and Statistical Core (DMSC) was added to the PSOAS team to assist in study design, establish a systematic approach to data management and data quality, and develop and apply appropriate statistical analysis of data. With assistance from the PSOAS investigators, DMSC modified Case Report Forms and developed database in Research Electronic Data Capture (REDCap). DMSC also developed additional data quality assurance procedure to assure data quality. The error rate for various forms in PSOAS databases ranged from 0.07% for medications data to 1.1% for arthritis activity questionnaire-Global pain. Furthermore, based on data from a sub study of 48 patients with AS, we showed a strong level (90.0%) of agreement between the two readers of X-rays with respect to modified Stoke Ankylosing Spondylitis Spine Score (mSASSS). This paper not only could serve as reference for future publications from PSOAS cohort but also could serve as a basic guide to ensuring data quality for multicenter clinical studies.
Collapse
Affiliation(s)
- Mohammad H. Rahbar
- Department of Epidemiology, Human Genetics, and Environmental Sciences (EHGES), University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Division of Clinical and Translational Sciences, Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Biostatistics/Epidemiology/Research Design (BERD) Component, Center for Clinical and Translational Sciences (CCTS), University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - MinJae Lee
- Division of Clinical and Translational Sciences, Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Biostatistics/Epidemiology/Research Design (BERD) Component, Center for Clinical and Translational Sciences (CCTS), University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Manouchehr Hessabi
- Biostatistics/Epidemiology/Research Design (BERD) Component, Center for Clinical and Translational Sciences (CCTS), University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Amirali Tahanan
- Biostatistics/Epidemiology/Research Design (BERD) Component, Center for Clinical and Translational Sciences (CCTS), University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Matthew A. Brown
- Institute of Health and Biomedical Innovation, Queensland University of Technology, Translational Research Institute, Princess Alexandra Hospital, Brisbane, Queensland, Australia
| | - Thomas J. Learch
- Division of Rheumatology, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Laura A. Diekman
- Division of Rheumatology, Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Michael H. Weisman
- Division of Rheumatology, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - John D. Reveille
- Division of Rheumatology, Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
6
|
A standard-driven approach for electronic submission to pharmaceutical regulatory authorities. J Biomed Inform 2018; 79:60-70. [PMID: 29355783 DOI: 10.1016/j.jbi.2018.01.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Revised: 10/31/2017] [Accepted: 01/15/2018] [Indexed: 11/20/2022]
Abstract
OBJECTIVE Using standards is not only useful for data interchange during the process of a clinical trial, but also useful for analyzing data in a review process. Any step, which speeds up approval of new drugs, may benefit patients. As a result, adopting standards for regulatory submission becomes mandatory in some countries. However, preparing standard-compliant documents, such as annotated case report form (aCRF), needs a great deal of knowledge and experience. The process is complex and labor-intensive. Therefore, there is a need to use information technology to facilitate this process. MATERIALS AND METHODS Instead of standardizing data after the completion of a clinical trial, this study proposed a standard-driven approach. This approach was achieved by implementing a computer-assisted "standard-driven pipeline (SDP)" in an existing clinical data management system. SDP used CDISC standards to drive all processes of a clinical trial, such as the design, data acquisition, tabulation, etc. RESULTS: A completed phase I/II trial was used to prove the concept and to evaluate the effects of this approach. By using the CDISC-compliant question library, aCRFs were generated automatically when the eCRFs were completed. For comparison purpose, the data collection process was simulated and the collected data was transformed by the SDP. This new approach reduced the missing data fields from sixty-two to eight and the controlled term mismatch field reduced from eight to zero during data tabulation. CONCLUSION This standard-driven approach accelerated CRF annotation and assured data tabulation integrity. The benefits of this approach include an improvement in the use of standards during the clinical trial and a reduction in missing and unexpected data during tabulation. The standard-driven approach is an advanced design idea that can be used for future clinical information system development.
Collapse
|
7
|
Park YR, Kim JJ, Yoon YJ, Yoon YK, Koo HY, Hong YM, Jang GY, Shin SY, Lee JK. Establishment of Kawasaki disease database based on metadata standard. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2016:baw109. [PMID: 27630202 PMCID: PMC4962667 DOI: 10.1093/database/baw109] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Accepted: 06/29/2016] [Indexed: 12/17/2022]
Abstract
Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients’ clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals. Each sample contains the relevant clinical data, genomic DNA and plasma samples isolated from patients’ blood, omics data and KD-associated genotype data. Clinical data was collected and saved using the common data elements based on the ISO/IEC 11179 metadata standard. Two genome-wide association study data of total 482 samples and whole exome sequencing data of 12 samples were also collected. In addition, KDD includes the rare cases of KD (16 cases with family history, 46 cases with recurrence, 119 cases with intravenous immunoglobulin non-responsiveness, and 52 cases with coronary artery aneurysm). As the first public database for KD, KDD can significantly facilitate KD studies. All data in KDD can be searchable and downloadable. KDD was implemented in PHP, MySQL and Apache, with all major browsers supported. Database URL:http://www.kawasakidisease.kr
Collapse
Affiliation(s)
- Yu Rang Park
- Clinical Research Center, Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea
| | - Jae-Jung Kim
- Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea
| | - Young Jo Yoon
- Clinical Research Center, Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea
| | - Young-Kwang Yoon
- Clinical Research Center, Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea
| | - Ha Yeong Koo
- Clinical Research Center, Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea
| | - Young Mi Hong
- Department of Pediatrics, Ewha Womans University Hospital, Seoul, Korea
| | - Gi Young Jang
- Department of Pediatrics, Korea University Hospital, Seoul, Korea
| | - Soo-Yong Shin
- Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea
| | - Jong-Keuk Lee
- Asan Institute of Life Sciences, Asan Medical Center, Seoul, Korea
| | | |
Collapse
|
8
|
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform 2017; 73:14-29. [PMID: 28729030 DOI: 10.1016/j.jbi.2017.07.012] [Citation(s) in RCA: 290] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 06/07/2017] [Accepted: 07/14/2017] [Indexed: 12/24/2022]
Abstract
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.
Collapse
Affiliation(s)
- Kory Kreimeyer
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States.
| | - Matthew Foster
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Abhishek Pandey
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Nina Arya
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Gwendolyn Halford
- FDA Library, US Food and Drug Administration, Silver Spring, MD, United States
| | - Sandra F Jones
- Cancer Surveillance Branch, Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Richard Forshee
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Mark Walderhaug
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Taxiarchis Botsis
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| |
Collapse
|