1
|
Lin AY, Arabandi S, Beale T, Duncan WD, Hicks A, Hogan WR, Jensen M, Koppel R, Martínez-Costa C, Nytrø Ø, Obeid JS, de Oliveira JP, Ruttenberg A, Seppälä S, Smith B, Soergel D, Zheng J, Schulz S. Improving the Quality and Utility of Electronic Health Record Data through Ontologies. STANDARDS (BASEL, SWITZERLAND) 2023; 3:316-340. [PMID: 37873508 PMCID: PMC10591519 DOI: 10.3390/standards3030023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors' rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs.
Collapse
Affiliation(s)
- Asiyah Yu Lin
- National Institutes of Health, Bethesda, MD 20892, USA
| | | | | | - William D. Duncan
- College of Dentistry, University of Florida, Gainesville, FL 32610, USA
| | - Amanda Hicks
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, USA
| | - William R. Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | - Ross Koppel
- Department of Medical Informatics, Jacobs School of Medicine, University at Buffalo, Buffalo, NY 14260, USA
- Department of Medical Informatics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Catalina Martínez-Costa
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, 30100 Murcia, Spain
| | - Øystein Nytrø
- Department of Computer Science, UIT Arctic University of Norway, 9037 Tromsø, Norway
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | - Jihad S. Obeid
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | | | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, NY 14260, USA
| | - Selja Seppälä
- Department of Business Information Systems, University College Cork, T12 K8AF Cork, Ireland
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Dagobert Soergel
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Jie Zheng
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48104, USA
| | - Stefan Schulz
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, 8036 Graz, Austria
- Averbis GmbH, Salzstrasse 15, 79098 Freiburg im Breisgau, Germany
| |
Collapse
|
2
|
Chang E, Mostafa J. The use of SNOMED CT, 2013-2020: a literature review. J Am Med Inform Assoc 2021; 28:2017-2026. [PMID: 34151978 DOI: 10.1093/jamia/ocab084] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/30/2021] [Accepted: 04/26/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE This article reviews recent literature on the use of SNOMED CT as an extension of Lee et al's 2014 review on the same topic. The Lee et al's article covered literature published from 2001-2012, and the scope of this review was 2013-2020. MATERIALS AND METHODS In line with Lee et al's methods, we searched the PubMed and Embase databases and identified 1002 articles for review, including studies from January 2013 to September 2020. The retrieved articles were categorized and analyzed according to SNOMED CT focus categories (ie, indeterminate, theoretical, pre-development, implementation, and evaluation/commodity), usage categories (eg, illustrate terminology systems theory, prospective content coverage, used to classify or code in a study, retrieve or analyze patient data, etc.), medical domains, and countries. RESULTS After applying inclusion and exclusion criteria, 622 articles were selected for final review. Compared to the papers published between 2001 and 2012, papers published between 2013 and 2020 revealed an increase in more mature usage of SNOMED CT, and the number of papers classified in the "implementation" and "evaluation/commodity" focus categories expanded. When analyzed by decade, papers in the "pre-development," "implementation," and "evaluation/commodity" categories were much more numerous in 2011-2020 than in 2001-2010, increasing from 169 to 293, 30 to 138, and 3 to 65, respectively. CONCLUSION Published papers in more mature usage categories have substantially increased since 2012. From 2013 to present, SNOMED CT has been increasingly implemented in more practical settings. Future research should concentrate on addressing whether SNOMED CT influences improvement in patient care.
Collapse
Affiliation(s)
- Eunsuk Chang
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Javed Mostafa
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
3
|
Joseph A, Mullett C, Lilly C, Armistead M, Cox HJ, Denney M, Varma M, Rich D, Adjeroh DA, Doretto G, Neal W, Pyles LA. Coronary Artery Disease Phenotype Detection in an Academic Hospital System Setting. Appl Clin Inform 2021; 12:10-16. [PMID: 33406541 DOI: 10.1055/s-0040-1721012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The United States, and especially West Virginia, have a tremendous burden of coronary artery disease (CAD). Undiagnosed familial hypercholesterolemia (FH) is an important factor for CAD in the U.S. Identification of a CAD phenotype is an initial step to find families with FH. OBJECTIVE We hypothesized that a CAD phenotype detection algorithm that uses discrete data elements from electronic health records (EHRs) can be validated from EHR information housed in a data repository. METHODS We developed an algorithm to detect a CAD phenotype which searched through discrete data elements, such as diagnosis, problem lists, medical history, billing, and procedure (International Classification of Diseases [ICD]-9/10 and Current Procedural Terminology [CPT]) codes. The algorithm was applied to two cohorts of 500 patients, each with varying characteristics. The second (younger) cohort consisted of parents from a school child screening program. We then determined which patients had CAD by systematic, blinded review of EHRs. Following this, we revised the algorithm by refining the acceptable diagnoses and procedures. We ran the second algorithm on the same cohorts and determined the accuracy of the modification. RESULTS CAD phenotype Algorithm I was 89.6% accurate, 94.6% sensitive, and 85.6% specific for group 1. After revising the algorithm (denoted CAD Algorithm II) and applying it to the same groups 1 and 2, sensitivity 98.2%, specificity 87.8%, and accuracy 92.4; accuracy 93% for group 2. Group 1 F1 score was 92.4%. Specific ICD-10 and CPT codes such as "coronary angiography through a vein graft" were more useful than generic terms. CONCLUSION We have created an algorithm, CAD Algorithm II, that detects CAD on a large scale with high accuracy and sensitivity (recall). It has proven useful among varied patient populations. Use of this algorithm can extend to monitor a registry of patients in an EHR and/or to identify a group such as those with likely FH.
Collapse
Affiliation(s)
- Amy Joseph
- Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States
| | - Charles Mullett
- Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States.,West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, West Virginia, United States
| | - Christa Lilly
- Department of Biostatistics, School of Public Health, West Virginia University, Morgantown, West Virginia, United States
| | - Matthew Armistead
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, West Virginia, United States
| | - Harold J Cox
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, West Virginia, United States
| | - Michael Denney
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, West Virginia, United States
| | - Misha Varma
- Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States
| | - David Rich
- West Virginia University Hospital System, Morgantown, West Virginia, United States
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, West Virginia, United States
| | - Gianfranco Doretto
- Lane Department of Computer Science and Electrical Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, West Virginia, United States
| | - William Neal
- Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States
| | - Lee A Pyles
- Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States
| |
Collapse
|
4
|
Zhou L, Cheng C, Ou D, Huang H. Construction of a semi-automatic ICD-10 coding system. BMC Med Inform Decis Mak 2020; 20:67. [PMID: 32293423 PMCID: PMC7157985 DOI: 10.1186/s12911-020-1085-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 03/30/2020] [Indexed: 01/29/2023] Open
Abstract
Background The International Classification of Diseases, 10th Revision (ICD-10) has been widely used to describe the diagnosis information of patients. Automatic ICD-10 coding is important because manually assigning codes is expensive, time consuming and error prone. Although numerous approaches have been developed to explore automatic coding, few of them have been applied in practice. Our aim is to construct a practical, automatic ICD-10 coding machine to improve coding efficiency and quality in daily work. Methods In this study, we propose the use of regular expressions (regexps) to establish a correspondence between diagnosis codes and diagnosis descriptions in outpatient settings and at admission and discharge. The description models of the regexps were embedded in our upgraded coding system, which queries a diagnosis description and assigns a unique diagnosis code. Like most studies, the precision (P), recall (R), F-measure (F) and overall accuracy (A) were used to evaluate the system performance. Our study had two stages. The datasets were obtained from the diagnosis information on the homepage of the discharge medical record. The testing sets were from October 1, 2017 to April 30, 2018 and from July 1, 2018 to January 31, 2019. Results The values of P were 89.27 and 88.38% in the first testing phase and the second testing phase, respectively, which demonstrate high precision. The automatic ICD-10 coding system completed more than 160,000 codes in 16 months, which reduced the workload of the coders. In addition, a comparison between the amount of time needed for manual coding and automatic coding indicated the effectiveness of the system-the time needed for automatic coding takes nearly 100 times less than manual coding. Conclusions Our automatic coding system is well suited for the coding task. Further studies are warranted to perfect the description models of the regexps and to develop synthetic approaches to improve system performance.
Collapse
Affiliation(s)
- Lingling Zhou
- Department of Information, Daping Hospital of Army Medical University, 10 Changjiang Access Road, Chongqing, 400042, China
| | - Cheng Cheng
- Department of Information, Daping Hospital of Army Medical University, 10 Changjiang Access Road, Chongqing, 400042, China
| | - Dong Ou
- Department of Information, Daping Hospital of Army Medical University, 10 Changjiang Access Road, Chongqing, 400042, China
| | - Hao Huang
- Department of Information, Daping Hospital of Army Medical University, 10 Changjiang Access Road, Chongqing, 400042, China.
| |
Collapse
|
5
|
Bousquet C, Souvignet J, Sadou É, Jaulent MC, Declerck G. Ontological and Non-Ontological Resources for Associating Medical Dictionary for Regulatory Activities Terms to SNOMED Clinical Terms With Semantic Properties. Front Pharmacol 2019; 10:975. [PMID: 31551780 PMCID: PMC6747929 DOI: 10.3389/fphar.2019.00975] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 07/31/2019] [Indexed: 11/20/2022] Open
Abstract
Background: Formal definitions allow selecting terms (e.g., identifying all terms related to “Infectious disease” using the query “has causative agent organism”) and terminological reasoning (e.g., “hepatitis B” is a “hepatitis” and is an “infectious disease”). However, the standard international terminology Medical Dictionary for Regulatory Activities (MedDRA) used for coding adverse drug reactions in pharmacovigilance databases does not beneficiate from such formal definitions. Our objective was to evaluate the potential of reuse of ontological and non-ontological resources for generating such definitions for MedDRA. Methods: We developed several methods that collectively allow a semiautomatic semantic enrichment of MedDRA: 1) using MedDRA-to-SNOMED Clinical Terms (SNOMED CT) mappings (available in the Unified Medical Language System metathesaurus or other mapping resources, e.g., the MedDRA preferred term “hepatitis B” is associated to the SNOMED CT concept “type B viral hepatitis”) to extract term definitions (e.g., “hepatitis B” is associated with the following properties: has finding site liver structure, has associated morphology inflammation morphology, and has causative agent hepatitis B virus); 2) using MedDRA labels and lexical/syntactic methods for automatic decomposition of complex MedDRA terms (e.g., the MedDRA systems organ class “blood and lymphatic system disorders” is decomposed in blood system disorders and lymphatic system disorders) or automatic suggestions of properties (e.g., the string “cyclic” in preferred term “cyclic neutropenia” leads to the property has clinical course cyclic). Results: The Unified Medical Language System metathesaurus was the main ontological resource reusable for generating formal definitions for MedDRA terms. The non-ontological resources (another mapping resource provided by Nadkarni and Darer in 2010 and MedDRA labels) allowed defining few additional preferred terms. While the Ci4SeR tool helped the curator to define 1,935 terms by suggesting potential supplemental relations based on the parents’ and siblings’ semantic definition, defining manually all MedDRA terms remains expensive in time. Discussion: Several ontological and non-ontological resources are available for associating MedDRA terms to SNOMED CT concepts with semantic properties, but providing manual definitions is still necessary. The ontology of adverse events is a possible alternative but does not cover all MedDRA terms either. Perspectives are to implement more efficient techniques to find more logical relations between SNOMED CT and MedDRA in an automated way.
Collapse
Affiliation(s)
- Cédric Bousquet
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, Sorbonne Université, Inserm, Université Paris 13, Paris, France.,Unit of Public Health and Medical Informatics, University of Saint Etienne, Saint Etienne, France
| | - Julien Souvignet
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, Sorbonne Université, Inserm, Université Paris 13, Paris, France.,Unit of Public Health and Medical Informatics, University of Saint Etienne, Saint Etienne, France
| | - Éric Sadou
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, Sorbonne Université, Inserm, Université Paris 13, Paris, France
| | - Marie-Christine Jaulent
- Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, Sorbonne Université, Inserm, Université Paris 13, Paris, France
| | - Gunnar Declerck
- EA 2223 Costech (Connaissance, Organisation et Systèmes Techniques), Centre de Recherche, Sorbonne Universités, Université de technologie de Compiègne, Compiègne, France
| |
Collapse
|
6
|
Diercke M, Beermann S, Tolksdorf K, Buda S, Kirchner G. [Infectious diseases and their ICD coding : What could be improved by the introduction of ICD-11?]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2019; 61:806-811. [PMID: 29846743 PMCID: PMC7079900 DOI: 10.1007/s00103-018-2758-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Die Revision der Internationalen statistischen Klassifikation der Krankheiten und verwandter Gesundheitsprobleme (International Classification of Diseases – ICD) geht mit grundlegenden Änderungen der Morbiditäts- und Mortalitätsstatistik einher, die auch den Bereich der Infektionskrankheiten betreffen. Die Zuordnung der einzelnen Infektionskrankheiten zu den Kapiteln in der aktuellen ICD-10 erfolgt aufgrund unterschiedlicher Konzepte, teilweise nach auslösendem Agens, nach betroffenem Organsystem oder nach Lebensperiode. Besondere Herausforderungen der Klassifizierung der Infektionskrankheiten bestehen u. a. darin, dass regelmäßig ein Anpassungsbedarf durch neu auftretende Erreger entstehen kann. Außerdem reichen die Angaben hinsichtlich Umfang und Tiefe in der ICD-10 teilweise nicht aus, um epidemiologische Auswertungen der Daten durchzuführen. Die ICD ermöglicht den weltweiten Vergleich von Statistiken zu Infektionskrankheiten. Zunehmend wird die ICD jedoch auch für die Erhebung von Surveillance- und Forschungsdaten eingesetzt, z. B. im Rahmen des Meldewesens (Identifizierung von Meldetatbeständen), aber auch in der syndromischen Surveillance akuter Atemwegsinfektionen und für den Aufbau neuer Surveillance-Systeme sowie der Evaluation der Datenqualität durch Abgleich mit Sekundärdaten. Die Chancen der ICD-11 liegen vor allem darin, dass Infektionskrankheiten eindeutiger codiert werden können und ihre Codierung mehr relevante Informationen für die epidemiologische Bewertung enthält. Durch die hohe Komplexität können jedoch Verzerrungen in den Daten entstehen, die die Fortschreibung der Morbiditäts- und Mortalitätsstatistiken erschweren.
Collapse
Affiliation(s)
- Michaela Diercke
- Abteilung für Infektionsepidemiologie, Robert Koch-Institut, Seestraße 10, 13353, Berlin, Deutschland.
| | - Sandra Beermann
- Abteilung für Infektionsepidemiologie, Robert Koch-Institut, Seestraße 10, 13353, Berlin, Deutschland
| | - Kristin Tolksdorf
- Abteilung für Infektionsepidemiologie, Robert Koch-Institut, Seestraße 10, 13353, Berlin, Deutschland
| | - Silke Buda
- Abteilung für Infektionsepidemiologie, Robert Koch-Institut, Seestraße 10, 13353, Berlin, Deutschland
| | - Göran Kirchner
- Abteilung für Infektionsepidemiologie, Robert Koch-Institut, Seestraße 10, 13353, Berlin, Deutschland
| |
Collapse
|