1
|
Stransky ML, Bremer-Kamens M, Kistin CJ, Sheldrick RC, Cohen RT. Using Electronic Health Records to Identify Asthma-Related Acute Care Encounters. Acad Pediatr 2024; 24:1229-1235. [PMID: 38761891 DOI: 10.1016/j.acap.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/06/2024] [Accepted: 05/10/2024] [Indexed: 05/20/2024]
Abstract
OBJECTIVE Leveraging "big data" to improve care requires that clinical concepts be operationalized using available data. Electronic health record (EHR) data can be used to evaluate asthma care, but relying solely on diagnosis codes may misclassify asthma-related encounters. We created streamlined, feasible and transparent prototype algorithms for EHR data to classify emergency department (ED) encounters and hospitalizations as "asthma-related." METHODS As part of an asthma program evaluation, expert clinicians conducted a multi-phase iterative chart review to evaluate 467 pediatric ED encounters and 136 hospitalizations with asthma diagnosis codes from calendar years 2017 and 2019, rating the likelihood that each encounter was actually asthma-related. Using this as a reference standard, we developed rule-based algorithms for EHR data to classify visits. Accuracy was evaluated using sensitivity, specificity, and positive and negative predictive values (PPV, NPV). RESULTS Clinicians categorized 38% of ED encounters as "definitely" or "probably" asthma-related; 13% as "possibly" asthma-related; and 49% as "probably not" or "definitely not" related to asthma. Based on this reference standard, we created two rule-based algorithms to identify "definitely" or "probably" asthma-related encounters, one using text and non-text EHR fields and another using non-text fields only. Sensitivity, specificity, PPV, and NPV were >95% for the algorithm using text and non-text fields and >87% for the algorithm using only non-text fields compared to the reference standard. We created a two-rule algorithm to identify asthma-related hospitalizations using only non-text fields. CONCLUSIONS Diagnostic codes alone are insufficient to identify asthma-related visits, but EHR-based prototype algorithms that include additional methods of identification can predict clinician-identified visits with sufficient accuracy.
Collapse
Affiliation(s)
- Michelle L Stransky
- Center for the Urban Child and Healthy Family (ML Stransky and M Bremer-Kamens), Boston Medical Center, Boston, Mass; Department of Pediatrics (ML Stransky, RT Cohen), Boston University Chobanian and Avedisian School of Medicine, Boston, Mass.
| | - Miriam Bremer-Kamens
- Center for the Urban Child and Healthy Family (ML Stransky and M Bremer-Kamens), Boston Medical Center, Boston, Mass
| | - Caroline J Kistin
- Hassenfeld Child Health Innovation Institute (CJ Kistin), Brown University, Providence, RI; Department of Health Services (CJ Kistin), Policy and Practice, Brown University, Providence, RI
| | - R Christopher Sheldrick
- Department of Psychiatry, University of Massachusetts Chan Medical School (RC Sheldrick), Worcester, Mass
| | - Robyn T Cohen
- Department of Pediatrics (ML Stransky, RT Cohen), Boston University Chobanian and Avedisian School of Medicine, Boston, Mass; Division of Pediatric Pulmonary and Allergy (RT Cohen), Boston Medical Center, Boston, MA
| |
Collapse
|
2
|
Carrell DS, Floyd JS, Gruber S, Hazlehurst BL, Heagerty PJ, Nelson JC, Williamson BD, Ball R. A general framework for developing computable clinical phenotype algorithms. J Am Med Inform Assoc 2024; 31:1785-1796. [PMID: 38748991 PMCID: PMC11258420 DOI: 10.1093/jamia/ocae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024] Open
Abstract
OBJECTIVE To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data. MATERIALS AND METHODS Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process. RESULTS We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation. DISCUSSION AND CONCLUSION This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.
Collapse
Affiliation(s)
- David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - James S Floyd
- Department of Medicine, School of Medicine, University of Washington, Seattle, WA 98195, United States
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Susan Gruber
- Putnam Data Sciences, LLC, Cambridge, MA 02139, United States
| | - Brian L Hazlehurst
- Center for Health Research, Kaiser Permanente Northwest, Portland, OR 97227, United States
| | - Patrick J Heagerty
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Jennifer C Nelson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Brian D Williamson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| |
Collapse
|
3
|
Brandt PS, Pacheco JA, Adekkanattu P, Sholle ET, Abedian S, Stone DJ, Knaack DM, Xu J, Xu Z, Peng Y, Benda NC, Wang F, Luo Y, Jiang G, Pathak J, Rasmussen LV. Design and validation of a FHIR-based EHR-driven phenotyping toolbox. J Am Med Inform Assoc 2022; 29:1449-1460. [PMID: 35799370 PMCID: PMC9382394 DOI: 10.1093/jamia/ocac063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 04/04/2022] [Accepted: 06/17/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms. MATERIALS AND METHODS We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation. We validated the performance of the tool by executing a thrombotic event phenotype definition at 3 sites, Mayo Clinic (MC), Northwestern Medicine (NM), and Weill Cornell Medicine (WCM), and used manual review to determine precision and recall. RESULTS An initial version of the PhEMA Workbench has been released, which supports phenotype authoring, execution, and publishing to a shared phenotype definition repository. The resulting thrombotic event phenotype definition consisted of 11 CQL statements, and 24 value sets containing a total of 834 codes. Technical validation showed satisfactory performance (both NM and MC had 100% precision and recall and WCM had a precision of 95% and a recall of 84%). CONCLUSIONS We demonstrate that the PhEMA Workbench can facilitate EHR-driven phenotype definition, execution, and phenotype sharing in heterogeneous clinical research data environments. A phenotype definition that integrates with existing standards-compliant systems, and the use of a formal representation facilitates automation and can decrease potential for human error.
Collapse
Affiliation(s)
- Pascal S Brandt
- Corresponding Author: Pascal S. Brandt, Department of Biomedical Informatics & Medical Education, University of Washington, Box 358047, Seattle, WA 98195, USA;
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Prakash Adekkanattu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Evan T Sholle
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Sajjad Abedian
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Daniel J Stone
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - David M Knaack
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jie Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Zhenxing Xu
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yifan Peng
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Natalie C Benda
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
4
|
Binkheder S, Asiri MA, Altowayan KW, Alshehri TM, Alzarie MF, Aldekhyyel RN, Almaghlouth IA, Almulhem JA. Real-World Evidence of COVID-19 Patients' Data Quality in the Electronic Health Records. Healthcare (Basel) 2021; 9:1648. [PMID: 34946374 PMCID: PMC8701465 DOI: 10.3390/healthcare9121648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/18/2021] [Accepted: 11/25/2021] [Indexed: 11/19/2022] Open
Abstract
Despite the importance of electronic health records data, less attention has been given to data quality. This study aimed to evaluate the quality of COVID-19 patients' records and their readiness for secondary use. We conducted a retrospective chart review study of all COVID-19 inpatients in an academic healthcare hospital for the year 2020, which were identified using ICD-10 codes and case definition guidelines. COVID-19 signs and symptoms were higher in unstructured clinical notes than in structured coded data. COVID-19 cases were categorized as 218 (66.46%) "confirmed cases", 10 (3.05%) "probable cases", 9 (2.74%) "suspected cases", and 91 (27.74%) "no sufficient evidence". The identification of "probable cases" and "suspected cases" was more challenging than "confirmed cases" where laboratory confirmation was sufficient. The accuracy of the COVID-19 case identification was higher in laboratory tests than in ICD-10 codes. When validating using laboratory results, we found that ICD-10 codes were inaccurately assigned to 238 (72.56%) patients' records. "No sufficient evidence" records might indicate inaccurate and incomplete EHR data. Data quality evaluation should be incorporated to ensure patient safety and data readiness for secondary use research and predictive analytics. We encourage educational and training efforts to motivate healthcare providers regarding the importance of accurate documentation at the point-of-care.
Collapse
Affiliation(s)
- Samar Binkheder
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| | - Mohammed Ahmed Asiri
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Khaled Waleed Altowayan
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Turki Mohammed Alshehri
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Mashhour Faleh Alzarie
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Raniah N. Aldekhyyel
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| | - Ibrahim A. Almaghlouth
- Department of Medicine, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Jwaher A. Almulhem
- Medical Informatics and E-Learning Unit, Medical Education Department, College of Medicine, King Saud University, Riyadh 12372, Saudi Arabia; (M.A.A.); (K.W.A.); (T.M.A.); (M.F.A.); (R.N.A.); (J.A.A.)
| |
Collapse
|
5
|
Li M, Cai H, Nan S, Li J, Lu X, Duan H. A Patient-Screening Tool for Clinical Research Based on Electronic Health Records Using OpenEHR: Development Study. JMIR Med Inform 2021; 9:e33192. [PMID: 34673526 PMCID: PMC8569542 DOI: 10.2196/33192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 09/27/2021] [Accepted: 09/27/2021] [Indexed: 11/28/2022] Open
Abstract
Background The widespread adoption of electronic health records (EHRs) has facilitated the secondary use of EHR data for clinical research. However, screening eligible patients from EHRs is a challenging task. The concepts in eligibility criteria are not completely matched with EHRs, especially derived concepts. The lack of high-level expression of Structured Query Language (SQL) makes it difficult and time consuming to express them. The openEHR Expression Language (EL) as a domain-specific language based on clinical information models shows promise to represent complex eligibility criteria. Objective The study aims to develop a patient-screening tool based on EHRs for clinical research using openEHR to solve concept mismatch and improve query performance. Methods A patient-screening tool based on EHRs using openEHR was proposed. It uses the advantages of information models and EL in openEHR to provide high-level expressions and improve query performance. First, openEHR archetypes and templates were chosen to define concepts called simple concepts directly from EHRs. Second, openEHR EL was used to generate derived concepts by combining simple concepts and constraints. Third, a hierarchical index corresponding to archetypes in Elasticsearch (ES) was generated to improve query performance for subqueries and join queries related to the derived concepts. Finally, we realized a patient-screening tool for clinical research. Results In total, 500 sentences randomly selected from 4691 eligibility criteria in 389 clinical trials on stroke from the Chinese Clinical Trial Registry (ChiCTR) were evaluated. An openEHR-based clinical data repository (CDR) in a grade A tertiary hospital in China was considered as an experimental environment. Based on these, 589 medical concepts were found in the 500 sentences. Of them, 513 (87.1%) concepts could be represented, while the others could not be, because of a lack of information models and coarse-grained requirements. In addition, our case study on 6 queries demonstrated that our tool shows better query performance among 4 cases (66.67%). Conclusions We developed a patient-screening tool using openEHR. It not only helps solve concept mismatch but also improves query performance to reduce the burden on researchers. In addition, we demonstrated a promising solution for secondary use of EHR data using openEHR, which can be referenced by other researchers.
Collapse
Affiliation(s)
- Mengyang Li
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, Zhejiang, China
| | - Hailing Cai
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, Zhejiang, China
| | - Shan Nan
- Hainan University School of Biomedical Engineering, Haikou City, China
| | - Jialin Li
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, Zhejiang, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, Zhejiang, China
| | - Huilong Duan
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, Zhejiang, China
| |
Collapse
|
6
|
Negro-Calduch E, Azzopardi-Muscat N, Krishnamurthy RS, Novillo-Ortiz D. Technological progress in electronic health record system optimization: Systematic review of systematic literature reviews. Int J Med Inform 2021; 152:104507. [PMID: 34049051 PMCID: PMC8223493 DOI: 10.1016/j.ijmedinf.2021.104507] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 05/19/2021] [Accepted: 05/20/2021] [Indexed: 01/08/2023]
Abstract
BACKGROUND The recent, rapid development of digital technologies offers new possibilities for more efficient implementation of electronic health record (EHR) and personal health record (PHR) systems. A growing volume of healthcare data has been the hallmark of this digital transformation. The large healthcare datasets' complexity and their dynamic nature pose various challenges related to processing, analysis, storage, security, privacy, data exchange, and usability. MATERIALS AND METHODS We performed a systematic review of systematic reviews to assess technological progress in EHR and PHR systems. We searched MEDLINE, Cochrane, Web of Science, and Scopus for systematic literature reviews on technological advancements that support EHR and PHR systems published between January 1, 2010, and October 06, 2020. RESULTS The searches resulted in a total of 2,448 hits. Of these, we finally selected 23 systematic reviews. Most of the included papers dealt with information extraction tools and natural language processing technology (n = 10), followed by studies that assessed the use of blockchain technology in healthcare (n = 8). Other areas of digital technology research included EHR and PHR systems in austere settings (n = 1), de-identification methods (n = 1), visualization techniques (n = 1), communication tools within EHR and PHR systems (n = 1), and methodologies for defining Clinical Information Models that promoted EHRs and PHRs interoperability (n = 1). CONCLUSIONS Technological advancements can improve the efficiency in the implementation of EHR and PHR systems in numerous ways. Natural language processing techniques, either rule-based, machine-learning, or deep learning-based, can extract information from clinical narratives and other unstructured data locked in EHRs and PHRs, allowing secondary research (i.e., phenotyping). Moreover, EHRs and PHRs are expected to be the primary beneficiaries of the blockchain technology implementation on Health Information Systems. Governance regulations, lack of trust, poor scalability, security, privacy, low performance, and high cost remain the most critical challenges for implementing these technologies.
Collapse
Affiliation(s)
- Elsa Negro-Calduch
- World Health Organization Regional Office for Europe, Copenhagen, Denmark
| | | | | | | |
Collapse
|
7
|
Pellathy T, Saul M, Clermont G, Dubrawski AW, Pinsky MR, Hravnak M. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 2021; 36:397-405. [PMID: 33558981 DOI: 10.1007/s10877-021-00664-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 01/20/2021] [Indexed: 12/23/2022]
Abstract
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
Collapse
Affiliation(s)
- Tiffany Pellathy
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA.
| | - Melissa Saul
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Gilles Clermont
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Artur W Dubrawski
- School of Computer Science, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Michael R Pinsky
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Marilyn Hravnak
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA
| |
Collapse
|
8
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
9
|
Electronic Health Record Algorithm Development for Research Subject Recruitment Using Colonoscopy Appointment Scheduling. J Am Board Fam Med 2021; 34:49-60. [PMID: 33452082 PMCID: PMC8185576 DOI: 10.3122/jabfm.2021.01.200417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
INTRODUCTION Electronic health records (EHRs) are often leveraged in medical research to recruit study participants efficiently. The purpose of this study was to validate and refine the logic of an EHR algorithm for identifying potentially eligible participants for a comparative effectiveness study of fecal immunochemical tests (FITs), using colonoscopy as the standard. METHODS An Epic report was built to identify patients who met the eligibility criteria to recruit patients having a screening or surveillance colonoscopy. With the goal of maximizing the number of potentially eligible patients that could be recruited, researchers, with the assistance of information technology and scheduling staff, developed the algorithm for identifying potential subjects in the EHR. Two validation methods, descriptive statistics and manual verification, were used. RESULTS The algorithm was refined over 3 iterations leading to the following criteria being used for generating the report: Age, Appointment Made On/Cancel Date, Appointment Procedure, Contact Type, Date Range, Encounter Departments, ICD-10 codes, and Patient Type. Appointment Serial Number/Contact Serial Number were output fields that allowed the tracking of cancellations and reschedules. CONCLUSION Development of an EHR algorithm saved time in that most individuals ineligible for the study were excluded before patient medical record review. Running daily reports that included cancellations and rescheduled appointments allowed for maximum recruitment in a time frame appropriate for the use of the FITs. This algorithm demonstrates that refining the algorithm iteratively and adding cancellations and reschedules of colonoscopies increased the accuracy of reaching all potential patients for recruitment.
Collapse
|
10
|
Wagholikar KB, Estiri H, Murphy M, Murphy SN. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics 2020; 36:3200-3206. [PMID: 32049335 PMCID: PMC7214041 DOI: 10.1093/bioinformatics/btaa088] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 01/30/2020] [Accepted: 02/04/2020] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases. RESULTS We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach. AVAILABILITY AND IMPLEMENTATION We provide a Python implementation of the algorithm and the Python code developed for this study on Github. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
11
|
Denaxas S, Gonzalez-Izquierdo A, Direk K, Fitzpatrick NK, Fatemifar G, Banerjee A, Dobson RJB, Howe LJ, Kuan V, Lumbers RT, Pasea L, Patel RS, Shah AD, Hingorani AD, Sudlow C, Hemingway H. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J Am Med Inform Assoc 2019; 26:1545-1559. [PMID: 31329239 PMCID: PMC6857510 DOI: 10.1093/jamia/ocz105] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 04/25/2019] [Accepted: 05/29/2019] [Indexed: 01/13/2023] Open
Abstract
OBJECTIVE Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research. MATERIALS AND METHODS We implemented a rule-based phenotyping framework, with up to 6 approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms, procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of Diseases-Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DM+D prescription codes. RESULTS Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national and international research groups in 60 peer-reviewed publications. CONCLUSIONS We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research.
Collapse
Affiliation(s)
- Spiros Denaxas
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- The Alan Turing Institute, London, United Kingdom
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Arturo Gonzalez-Izquierdo
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, United Kingdom
| | - Kenan Direk
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, United Kingdom
| | - Natalie K Fitzpatrick
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Ghazaleh Fatemifar
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Amitava Banerjee
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Richard J B Dobson
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King’s College London, London, United Kingdom
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Laurence J Howe
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - Valerie Kuan
- Health Data Research UK, London, United Kingdom
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - R Tom Lumbers
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Laura Pasea
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Riyaz S Patel
- Institute of Cardiovascular Science, University College London, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Anoop D Shah
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| | - Aroon D Hingorani
- Health Data Research UK, London, United Kingdom
- Institute of Cardiovascular Science, University College London, London, United Kingdom
| | - Cathie Sudlow
- Centre for Medical Informatics, Usher Institute of Population Health Science and Informatics, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, Scotland, United Kingdom
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London,United Kingdom
- Health Data Research UK, London, United Kingdom
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, United Kingdom
- British Heart Foundation Research Accelerator, University College London, London, United Kingdom
| |
Collapse
|
12
|
Ernecoff NC, Wessell KL, Hanson LC, Lee AM, Shea CM, Dusetzina SB, Weinberger M, Bennett AV. Electronic Health Record Phenotypes for Identifying Patients with Late-Stage Disease: a Method for Research and Clinical Application. J Gen Intern Med 2019; 34:2818-2823. [PMID: 31396813 PMCID: PMC6854193 DOI: 10.1007/s11606-019-05219-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 07/12/2019] [Indexed: 01/03/2023]
Abstract
BACKGROUND Systematic identification of patients allows researchers and clinicians to test new models of care delivery. EHR phenotypes-structured algorithms based on clinical indicators from EHRs-can aid in such identification. OBJECTIVE To develop EHR phenotypes to identify decedents with stage 4 solid-tumor cancer or stage 4-5 chronic kidney disease (CKD). DESIGN We developed two EHR phenotypes. Each phenotype included International Classification of Diseases (ICD)-9 and ICD-10 codes. We used natural language processing (NLP) to further specify stage 4 cancer, and lab values for CKD. SUBJECTS Decedents with cancer or CKD who had been admitted to an academic medical center in the last 6 months of life and died August 26, 2017-December 31, 2017. MAIN MEASURE We calculated positive predictive values (PPV), false discovery rates (FDR), false negative rates (FNR), and sensitivity. Phenotypes were validated by a comparison with manual chart review. We also compared the EHR phenotype results to those admitted to the oncology and nephrology inpatient services. KEY RESULTS The EHR phenotypes identified 271 decedents with cancer, of whom 186 had stage 4 disease; of 192 decedents with CKD, 89 had stage 4-5 disease. The EHR phenotype for stage 4 cancer had a PPV of 68.6%, FDR of 31.4%, FNR of 0.5%, and 99.5% sensitivity. The EHR phenotype for stage 4-5 CKD had a PPV of 46.4%, FDR of 53.7%, FNR of 0.0%, and 100% sensitivity. CONCLUSIONS EHR phenotypes efficiently identified patients who died with late-stage cancer or CKD. Future EHR phenotypes can prioritize specificity over sensitivity, and incorporate stratification of high- and low-palliative care need. EHR phenotypes are a promising method for identifying patients for research and clinical purposes, including equitable distribution of specialty palliative care.
Collapse
Affiliation(s)
- Natalie C Ernecoff
- Division of General Internal Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| | - Kathryn L Wessell
- Sheps Center for Health Services Research, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Laura C Hanson
- Sheps Center for Health Services Research, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Division of Geriatric Medicine & Palliative Care Program, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Adam M Lee
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Christopher M Shea
- Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Stacie B Dusetzina
- Department of Health Policy, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Morris Weinberger
- Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Antonia V Bennett
- Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
13
|
Li RC, Garg T, Cun T, Shieh L, Krishnan G, Fang D, Chen JH. Impact of problem-based charting on the utilization and accuracy of the electronic problem list. J Am Med Inform Assoc 2019; 25:548-554. [PMID: 29360995 DOI: 10.1093/jamia/ocx154] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 12/20/2017] [Indexed: 12/11/2022] Open
Abstract
Objective Problem-based charting (PBC) is a method for clinician documentation in commercially available electronic medical record systems that integrates note writing and problem list management. We report the effect of PBC on problem list utilization and accuracy at an academic intensive care unit (ICU). Materials and Methods An interrupted time series design was used to assess the effect of PBC on problem list utilization, which is defined as the number of new problems added to the problem list by clinicians per patient encounter, and of problem list accuracy, which was determined by calculating the recall and precision of the problem list in capturing 5 common ICU diagnoses. Results In total, 3650 and 4344 patient records were identified before and after PBC implementation at Stanford Hospital. An increase of 2.18 problems (>50% increase) in the mean number of new problems added to the problem list per patient encounter can be attributed to the initiation of PBC. There was a significant increase in recall attributed to the initiation of PBC for sepsis (β = 0.45, P < .001) and acute renal failure (β = 0.2, P = .007), but not for acute respiratory failure, pneumonia, or venous thromboembolism. Discussion The problem list is an underutilized component of the electronic medical record that can be a source of clinician-structured data representing the patient's clinical condition in real time. PBC is a readily available tool that can integrate problem list management into physician workflow. Conclusion PBC improved problem list utilization and accuracy at an academic ICU.
Collapse
Affiliation(s)
- Ron C Li
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Trit Garg
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Tony Cun
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Lisa Shieh
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Gomathi Krishnan
- IRT Research Technology, Stanford University School of Medicine, Stanford, CA, USA
| | - Daniel Fang
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Jonathan H Chen
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
14
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
15
|
Kagawa R, Shinohara E, Imai T, Kawazoe Y, Ohe K. Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping. Int J Med Inform 2019; 124:90-96. [PMID: 30784432 DOI: 10.1016/j.ijmedinf.2018.12.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/13/2018] [Accepted: 12/12/2018] [Indexed: 01/21/2023]
Abstract
OBJECTIVES Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient's diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. METHODS Physicians manually reviewed whether the disease mentions indicated the patients' diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient's diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. RESULTS Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients' diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient's diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. CONCLUSION This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient's diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.
Collapse
Affiliation(s)
- Rina Kagawa
- Department of Medical Informatics, Strategic Planning, and Management, University of Tsukuba Hospital, Japan; Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Japan.
| | - Emiko Shinohara
- Department of Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Japan
| | - Takeshi Imai
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Japan
| | - Yoshimasa Kawazoe
- Department of Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Japan
| |
Collapse
|
16
|
Henderson J, He H, Malin BA, Denny JC, Kho AN, Ghosh J, Ho JC. Phenotyping through Semi-Supervised Tensor Factorization (PSST). AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:564-573. [PMID: 30815097 PMCID: PMC6371355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A computational phenotype is a set of clinically relevant and interesting characteristics that describe patients with a given condition. Various machine learning methods have been proposed to derive phenotypes in an automatic, high-throughput manner. Among these methods, computational phenotyping through tensor factorization has been shown to produce clinically interesting phenotypes. However, few of these methods incorporate auxiliary patient information into the phenotype derivation process. In this work, we introduce Phenotyping through Semi-Supervised Tensor Factorization (PSST), a method that leverages disease status knowledge about subsets of patients to generate computational phenotypes from tensors constructed from the electronic health records of patients. We demonstrate the potential of PSST to uncover predictive and clinically interesting computational phenotypes through case studies focusing on type-2 diabetes and resistant hypertension. PSST yields more discriminative phenotypes compared to the unsupervised methods and more meaningful phenotypes compared to a supervised method.
Collapse
Affiliation(s)
| | - Huan He
- Emory University, Atlanta, GA
| | | | | | | | | | | |
Collapse
|
17
|
Pacheco JA, Rasmussen LV, Kiefer RC, Campion TR, Speltz P, Carroll RJ, Stallings SC, Mo H, Ahuja M, Jiang G, LaRose ER, Peissig PL, Shang N, Benoit B, Gainer VS, Borthwick K, Jackson KL, Sharma A, Wu AY, Kho AN, Roden DM, Pathak J, Denny JC, Thompson WK. A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018; 25:1540-1546. [PMID: 30124903 PMCID: PMC6213083 DOI: 10.1093/jamia/ocy101] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 06/13/2018] [Accepted: 07/10/2018] [Indexed: 12/12/2022] Open
Abstract
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Collapse
Affiliation(s)
- Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Luke V Rasmussen
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Thomas R Campion
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Sarah C Stallings
- Meharry-Vanderbilt Alliance, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Huan Mo
- Department of Pathology, Loma Linda University Health, Loma Linda, California, USA
| | - Monika Ahuja
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric R LaRose
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Peggy L Peissig
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Barbara Benoit
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Vivian S Gainer
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Kenneth Borthwick
- Henry Hood Center for Health Research, Geisinger, Danville, Pennsylvania, USA
| | - Kathryn L Jackson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Ambrish Sharma
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Andy Yizhou Wu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - William K Thompson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
18
|
Ernecoff NC, Wessell KL, Gabriel S, Carey TS, Hanson LC. A Novel Screening Method to Identify Late-Stage Dementia Patients for Palliative Care Research and Practice. J Pain Symptom Manage 2018; 55:1152-1158.e1. [PMID: 29288881 PMCID: PMC6036617 DOI: 10.1016/j.jpainsymman.2017.12.480] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 12/14/2017] [Accepted: 12/18/2017] [Indexed: 12/19/2022]
Abstract
CONTEXT Investigators need novel methods for timely identification of patients with serious illness to test or implement new palliative care models. OBJECTIVES The study's aim was to develop an electronic health record (EHR) phenotype to identify patients with late-stage dementia for a clinical trial of palliative care consultation. METHODS We developed a computerized method to identify patients with dementia on hospital admission. Within a data warehouse derived from the hospital's EHR, we used search terms of age, admission date, and ICD-9 and ICD-10 diagnosis codes to create an EHR dementia phenotype, followed by brief medical record review to confirm late-stage dementia. We calculated positive predictive value, false discovery rate, and false negative rate of this novel screening method. RESULTS The EHR phenotype screening method had a positive predictive value of 76.3% for dementia patients and 24.5% for late-stage dementia patients; a false discovery rate of 23.7% for dementia patients and 75.5% for late-stage dementia patients compared to physician assessment. The sensitivity of this screening method was 59.7% to identify hospitalized patients with dementia. Daily screening-including confirmatory chart reviews-averaged 20 minutes and was more feasible, efficient, and more complete than manual screening. CONCLUSION A novel method using an EHR phenotype plus brief medical record review is effective to identify hospitalized patients with late-stage dementia. In health care systems with similar clinical data warehouses, this method may be applied to serious illness populations to improve enrollment in clinical trials of palliative care or to facilitate access to palliative care services.
Collapse
Affiliation(s)
- Natalie C Ernecoff
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, North Carolina, USA; Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA.
| | - Kathryn L Wessell
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Stacey Gabriel
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Timothy S Carey
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, North Carolina, USA; Departments of Medicine and Social Medicine, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Laura C Hanson
- Cecil G. Sheps Center for Health Services Research, University of North Carolina, Chapel Hill, North Carolina, USA; Division of Geriatric Medicine & Palliative Care Program, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
19
|
Koola JD, Davis SE, Al-Nimri O, Parr SK, Fabbri D, Malin BA, Ho SB, Matheny ME. Development of an automated phenotyping algorithm for hepatorenal syndrome. J Biomed Inform 2018; 80:87-95. [PMID: 29530803 PMCID: PMC5920557 DOI: 10.1016/j.jbi.2018.03.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 02/21/2018] [Accepted: 03/07/2018] [Indexed: 12/27/2022]
Abstract
OBJECTIVE Hepatorenal Syndrome (HRS) is a devastating form of acute kidney injury (AKI) in advanced liver disease patients with high morbidity and mortality, but phenotyping algorithms have not yet been developed using large electronic health record (EHR) databases. We evaluated and compared multiple phenotyping methods to achieve an accurate algorithm for HRS identification. MATERIALS AND METHODS A national retrospective cohort of patients with cirrhosis and AKI admitted to 124 Veterans Affairs hospitals was assembled from electronic health record data collected from 2005 to 2013. AKI was defined by the Kidney Disease: Improving Global Outcomes criteria. Five hundred and four hospitalizations were selected for manual chart review and served as the gold standard. Electronic Health Record based predictors were identified using structured and free text clinical data, subjected through NLP from the clinical Text Analysis Knowledge Extraction System. We explored several dimension reduction techniques for the NLP data, including newer high-throughput phenotyping and word embedding methods, and ascertained their effectiveness in identifying the phenotype without structured predictor variables. With the combined structured and NLP variables, we analyzed five phenotyping algorithms: penalized logistic regression, naïve Bayes, support vector machines, random forest, and gradient boosting. Calibration and discrimination metrics were calculated using 100 bootstrap iterations. In the final model, we report odds ratios and 95% confidence intervals. RESULTS The area under the receiver operating characteristic curve (AUC) for the different models ranged from 0.73 to 0.93; with penalized logistic regression having the best discriminatory performance. Calibration for logistic regression was modest, but gradient boosting and support vector machines were superior. NLP identified 6985 variables; a priori variable selection performed similarly to dimensionality reduction using high-throughput phenotyping and semantic similarity informed clustering (AUC of 0.81 - 0.82). CONCLUSION This study demonstrated improved phenotyping of a challenging AKI etiology, HRS, over ICD-9 coding. We also compared performance among multiple approaches to EHR-derived phenotyping, and found similar results between methods. Lastly, we showed that automated NLP dimension reduction is viable for acute illness.
Collapse
Affiliation(s)
- Jejo D Koola
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA; Division of Hospital Medicine, Department of Medicine, University of California, San Diego, CA, USA.
| | - Sharon E Davis
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Sharidan K Parr
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Samuel B Ho
- VA San Diego Healthcare System, San Diego, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Diego, CA, USA
| | - Michael E Matheny
- Geriatric Research Education and Clinical Center (GRECC), Tennessee Valley Healthcare System Veterans Administration Medical Center, Nashville, TN, USA; Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
20
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
21
|
Kagawa R, Kawazoe Y, Ida Y, Shinohara E, Tanaka K, Imai T, Ohe K. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach. J Diabetes Sci Technol 2017; 11:791-799. [PMID: 27932531 PMCID: PMC5588819 DOI: 10.1177/1932296816681584] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. OBJECTIVE We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. METHODS We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. RESULTS The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. CONCLUSIONS We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users' objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.
Collapse
Affiliation(s)
- Rina Kagawa
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Yoshimasa Kawazoe
- Department of Healthcare Information Management, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Yusuke Ida
- Department of Healthcare Information Management, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Emiko Shinohara
- Department of Healthcare Information Management, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Katsuya Tanaka
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Takeshi Imai
- Center for Disease Biology and Integrative Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Department of Healthcare Information Management, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
22
|
Williams R, Kontopantelis E, Buchan I, Peek N. Clinical code set engineering for reusing EHR data for research: A review. J Biomed Inform 2017; 70:1-13. [PMID: 28442434 DOI: 10.1016/j.jbi.2017.04.010] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 03/21/2017] [Accepted: 04/13/2017] [Indexed: 01/26/2023]
Abstract
INTRODUCTION The construction of reliable, reusable clinical code sets is essential when re-using Electronic Health Record (EHR) data for research. Yet code set definitions are rarely transparent and their sharing is almost non-existent. There is a lack of methodological standards for the management (construction, sharing, revision and reuse) of clinical code sets which needs to be addressed to ensure the reliability and credibility of studies which use code sets. OBJECTIVE To review methodological literature on the management of sets of clinical codes used in research on clinical databases and to provide a list of best practice recommendations for future studies and software tools. METHODS We performed an exhaustive search for methodological papers about clinical code set engineering for re-using EHR data in research. This was supplemented with papers identified by snowball sampling. In addition, a list of e-phenotyping systems was constructed by merging references from several systematic reviews on this topic, and the processes adopted by those systems for code set management was reviewed. RESULTS Thirty methodological papers were reviewed. Common approaches included: creating an initial list of synonyms for the condition of interest (n=20); making use of the hierarchical nature of coding terminologies during searching (n=23); reviewing sets with clinician input (n=20); and reusing and updating an existing code set (n=20). Several open source software tools (n=3) were discovered. DISCUSSION There is a need for software tools that enable users to easily and quickly create, revise, extend, review and share code sets and we provide a list of recommendations for their design and implementation. CONCLUSION Research re-using EHR data could be improved through the further development, more widespread use and routine reporting of the methods by which clinical codes were selected.
Collapse
Affiliation(s)
- Richard Williams
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK.
| | - Evangelos Kontopantelis
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR School for Primary Care Research, University of Manchester, Manchester, UK
| | - Iain Buchan
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK; NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester, UK
| | - Niels Peek
- MRC Health eResearch Centre, University of Manchester, Manchester, UK; NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, University of Manchester, Manchester, UK
| |
Collapse
|
23
|
Hochheiser H, Castine M, Harris D, Savova G, Jacobson RS. An information model for computable cancer phenotypes. BMC Med Inform Decis Mak 2016; 16:121. [PMID: 27629872 PMCID: PMC5024416 DOI: 10.1186/s12911-016-0358-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 09/01/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Standards, methods, and tools supporting the integration of clinical data and genomic information are an area of significant need and rapid growth in biomedical informatics. Integration of cancer clinical data and cancer genomic information poses unique challenges, because of the high volume and complexity of clinical data, as well as the heterogeneity and instability of cancer genome data when compared with germline data. Current information models of clinical and genomic data are not sufficiently expressive to represent individual observations and to aggregate those observations into longitudinal summaries over the course of cancer care. These models are acutely needed to support the development of systems and tools for generating the so called clinical "deep phenotype" of individual cancer patients, a process which remains almost entirely manual in cancer research and precision medicine. METHODS Reviews of existing ontologies and interviews with cancer researchers were used to inform iterative development of a cancer phenotype information model. We translated a subset of the Fast Healthcare Interoperability Resources (FHIR) models into the OWL 2 Description Logic (DL) representation, and added extensions as needed for modeling cancer phenotypes with terms derived from the NCI Thesaurus. Models were validated with domain experts and evaluated against competency questions. RESULTS The DeepPhe Information model represents cancer phenotype data at increasing levels of abstraction from mention level in clinical documents to summaries of key events and findings. We describe the model using breast cancer as an example, depicting methods to represent phenotypic features of cancers, tumors, treatment regimens, and specific biologic behaviors that span the entire course of a patient's disease. CONCLUSIONS We present a multi-scale information model for representing individual document mentions, document level classifications, episodes along a disease course, and phenotype summarization, linking individual observations to high-level summaries in support of subsequent integration and analysis.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA. .,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Melissa Castine
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA
| | - David Harris
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Boulevard, Rm 523, Pittsburgh, 15206-3701, PA, USA.,Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA
| |
Collapse
|
24
|
Richesson RL, Smerek MM, Blake Cameron C. A Framework to Support the Sharing and Reuse of Computable Phenotype Definitions Across Health Care Delivery and Clinical Research Applications. EGEMS 2016; 4:1232. [PMID: 27563686 PMCID: PMC4975566 DOI: 10.13063/2327-9214.1232] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Introduction: The ability to reproducibly identify clinically equivalent patient populations is critical to the vision of learning health care systems that implement and evaluate evidence-based treatments. The use of common or semantically equivalent phenotype definitions across research and health care use cases will support this aim. Currently, there is no single consolidated repository for computable phenotype definitions, making it difficult to find all definitions that already exist, and also hindering the sharing of definitions between user groups. Method: Drawing from our experience in an academic medical center that supports a number of multisite research projects and quality improvement studies, we articulate a framework that will support the sharing of phenotype definitions across research and health care use cases, and highlight gaps and areas that need attention and collaborative solutions. Framework: An infrastructure for re-using computable phenotype definitions and sharing experience across health care delivery and clinical research applications includes: access to a collection of existing phenotype definitions, information to evaluate their appropriateness for particular applications, a knowledge base of implementation guidance, supporting tools that are user-friendly and intuitive, and a willingness to use them. Next Steps: We encourage prospective researchers and health administrators to re-use existing EHR-based condition definitions where appropriate and share their results with others to support a national culture of learning health care. There are a number of federally funded resources to support these activities, and research sponsors should encourage their use.
Collapse
|