1
|
Somani S, Yoffie S, Teng S, Havaldar S, Nadkarni GN, Zhao S, Glicksberg BS. Development and validation of techniques for phenotyping ST-elevation myocardial infarction encounters from electronic health records. JAMIA Open 2021; 4:ooab068. [PMID: 34423260 PMCID: PMC8374370 DOI: 10.1093/jamiaopen/ooab068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 06/07/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open
Abstract
Objectives Classifying hospital admissions into various acute myocardial infarction phenotypes in electronic health records (EHRs) is a challenging task with strong research implications that remains unsolved. To our knowledge, this study is the first study to design and validate phenotyping algorithms using cardiac catheterizations to identify not only patients with a ST-elevation myocardial infarction (STEMI), but the specific encounter when it occurred. Materials and Methods We design and validate multi-modal algorithms to phenotype STEMI on a multicenter EHR containing 5.1 million patients and 115 million patient encounters by using discharge summaries, diagnosis codes, electrocardiography readings, and the presence of cardiac catheterizations on the encounter. Results We demonstrate that robustly phenotyping STEMIs by selecting discharge summaries containing “STEM” has the potential to capture the most number of STEMIs (positive predictive value [PPV] = 0.36, N = 2110), but that addition of a STEMI-related International Classification of Disease (ICD) code and cardiac catheterizations to these summaries yields the highest precision (PPV = 0.94, N = 952). Discussion and Conclusion In this study, we demonstrate that the incorporation of percutaneous coronary intervention increases the PPV for detecting STEMI-related patient encounters from the EHR.
Collapse
Affiliation(s)
- Sulaiman Somani
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Stephen Yoffie
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shelly Teng
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shreyas Havaldar
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Girish N Nadkarni
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shan Zhao
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Benjamin S Glicksberg
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
2
|
Broseta JJ. Different approaches to improve cohort identification using electronic health records: X-linked hypophosphatemia as an example. Intractable Rare Dis Res 2021; 10:17-22. [PMID: 33614371 PMCID: PMC7882088 DOI: 10.5582/irdr.2020.03123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Electronic Health Records (EHRs) represent a source of high value data which is often underutilized because exploiting the information contained therein requires specialized techniques unavailable to the end user i.e. the physician or the investigator. Here I describe four simple and practical avenues that will allow the standard EHR end user to identify patient cohorts: the use of diagnostic codes from different international catalogues; a search in reports from complementary tests (e.g. radiographs or lab tests) for any result of interest; a free text search; or a drug prescription search in the patient's electronic prescription record. This medical approach is acquiring great importance in the field of rare diseases, and here I demonstrate its application with X-linked hypophosphatemia. The use of these four EHR questioning approaches makes finding a cohort of patients of any condition or disease feasible and manageable, and once each case record is checked, a well-defined cohort can be assembled.
Collapse
Affiliation(s)
- Jose Jesus Broseta
- Department of Nephrology and Renal Transplantation, Hospital Clínic of Barcelona, Barcelona, Spain
| |
Collapse
|
3
|
Rodrigues-Jr JF, Gutierrez MA, Spadon G, Brandoli B, Amer-Yahia S. LIG-Doctor: Efficient patient trajectory prediction using bidirectional minimal gated-recurrent networks. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
4
|
Chen L, Gu Y, Ji X, Lou C, Sun Z, Li H, Gao Y, Huang Y. Clinical trial cohort selection based on multi-level rule-based natural language processing system. J Am Med Inform Assoc 2021; 26:1218-1226. [PMID: 31300825 DOI: 10.1093/jamia/ocz109] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 04/16/2019] [Accepted: 06/07/2019] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients' eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials. MATERIALS AND METHODS The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture. RESULTS AND DISCUSSION The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors' rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn't achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts. CONCLUSION Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.
Collapse
Affiliation(s)
- Long Chen
- Med Data Quest, Inc, La Jolla, California, USA
| | - Yu Gu
- Med Data Quest, Inc, La Jolla, California, USA
| | - Xin Ji
- Med Data Quest, Inc, La Jolla, California, USA
| | - Chao Lou
- Med Data Quest, Inc, La Jolla, California, USA
| | - Zhiyong Sun
- Med Data Quest, Inc, La Jolla, California, USA
| | - Haodan Li
- Med Data Quest, Inc, La Jolla, California, USA
| | - Yuan Gao
- Med Data Quest, Inc, La Jolla, California, USA
| | - Yang Huang
- Med Data Quest, Inc, La Jolla, California, USA
| |
Collapse
|
5
|
CardioResyncApp: Un aplicativo móvil para recolectar datos de investigación en Cardiología. REVISTA COLOMBIANA DE CARDIOLOGÍA 2020. [DOI: 10.1016/j.rccar.2020.01.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
6
|
Hassanzadeh H, Karimi S, Nguyen A. Matching patients to clinical trials using semantically enriched document representation. J Biomed Inform 2020; 105:103406. [DOI: 10.1016/j.jbi.2020.103406] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 01/28/2020] [Accepted: 03/02/2020] [Indexed: 12/16/2022]
|
7
|
Electronic health records for the diagnosis of rare diseases. Kidney Int 2020; 97:676-686. [DOI: 10.1016/j.kint.2019.11.037] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/15/2019] [Accepted: 11/22/2019] [Indexed: 01/13/2023]
|
8
|
Al-Shammari A, Zhou R, Naseriparsaa M, Liu C. An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. Int J Med Inform 2019; 126:176-186. [PMID: 31029259 DOI: 10.1016/j.ijmedinf.2019.03.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Revised: 02/12/2019] [Accepted: 03/26/2019] [Indexed: 10/27/2022]
Abstract
BACKGROUND Medical data stream clustering has become an integral part of medical decision systems since it extracts highly-sensitive information from a tremendous flow of medical data. However, clustering and maintaining of medical data streams is still a challenging task. That is because the evolving of medical data streams imposes various challenges for clustering such as the ability to discover the arbitrary shape of a cluster, the ability to group data streams without a predefined number of clusters, and the ability to maintain the data clusters dynamically. OBJECTIVE To support the online medical decisions, there is a need to address the clustering challenges. Therefore, in this paper, we propose an effective density-based clustering and dynamic maintenance framework for grouping the patients with similar symptoms into meaningful clusters and monitoring the patients' status frequently. METHODS For clustering, we generate a set of initial medical data clusters based on the combination of Piece-wise Aggregate Approximation and the density-based spatial clustering of applications with noise called (PAA+DBSCAN) algorithm. For maintenance, when new medical data streams arrive, we maintain the initially generated medical data clusters dynamically. Since the incremental cluster maintenance is time-consuming, we further propose an Advanced Cluster Maintenance (ACM) approach to improve the performance of the dynamic cluster maintenance. RESULTS The experimental results on real-world medical datasets demonstrate the effectiveness and efficiency of our proposed approaches. The PAA+DBSCAN algorithm is more efficient and effective than the exact DBSCAN algorithm. Moreover, the ACM approach requires less running time in comparison with the Baseline Cluster Maintenance (BCM) approach using different tuning parameter values in all datasets. That is because the BCM approach tracks all the data points in the cluster. CONCLUSION The proposed framework is capable of clustering and maintaining the medical data streams effectively by means of grouping the patients who share similar symptoms and tracking the patients status that naturally tends to be changing over time.
Collapse
Affiliation(s)
- Ahmed Al-Shammari
- Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia; University of Al-Qadisiyah, Al Diwaniyah, Iraq.
| | - Rui Zhou
- Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia.
| | - Mehdi Naseriparsaa
- Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia
| | - Chengfei Liu
- Department of Computer Science and Software Engineering, Faculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia
| |
Collapse
|
9
|
Experiences of Transforming a Complex Nephrologic Care and Research Database into i2b2 Using the IDRT Tools. JOURNAL OF HEALTHCARE ENGINEERING 2019; 2019:5640685. [PMID: 30800257 PMCID: PMC6360056 DOI: 10.1155/2019/5640685] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 09/18/2018] [Accepted: 12/05/2018] [Indexed: 01/15/2023]
Abstract
The secondary use of data from electronic medical records has become an important factor to determine and to identify various causes of disease. For this reason, applications like informatics for integrating biology and the bedside (i2b2) offer a GUI-based front end to select patient cohorts. To make use of those tools, however, clinical data need to be extracted from the Electronic Health Record (EHR) system and integrated into the data schema of i2b2. We used TBase, a documentation system for nephrologic transplantations, as a source system and applied the Integrated Data Repository Toolkit (IDRT) for the Extract, Transform, and Load (ETL) process to load the data into i2b2. Since i2b2 uses an entity-attribute-value (EAV) schema, which is a fundamentally different way of modeling data in comparison to a standard relational schema in TBase, we evaluated if (a) the data relationship of the source system entities can still be represented in the i2b2 schema and if (b) the IDRT is a suitable solution for loading the data of a comprehensive data schema like TBase into i2b2. For that reason, we identified entities in the TBase data schema which were relevant for answering questions on cohort identification. By doing so, we found out that the entities had different structures that needed to be handled differently for the ETL process. Furthermore, the use of IDRT revealed shortcomings with regard to large input data and specific data structures that are part of most modern EHR systems. However, this project also showed that our way of modeling the TBase data in i2b2 has been proven to be successful in terms of answering the most common questions of clinicians on cohort identification.
Collapse
|
10
|
Rahman N, Wang DD, Ng SHX, Ramachandran S, Sridharan S, Khoo A, Tan CS, Goh WP, Tan XQ. Processing of Electronic Medical Records for Health Services Research in an Academic Medical Center: Methods and Validation. JMIR Med Inform 2018; 6:e10933. [PMID: 30578188 PMCID: PMC6320424 DOI: 10.2196/10933] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 10/09/2018] [Accepted: 10/10/2018] [Indexed: 01/08/2023] Open
Abstract
Background Electronic medical records (EMRs) contain a wealth of information that can support data-driven decision making in health care policy design and service planning. Although research using EMRs has become increasingly prevalent, challenges such as coding inconsistency, data validity, and lack of suitable measures in important domains still hinder the progress. Objective The objective of this study was to design a structured way to process records in administrative EMR systems for health services research and assess validity in selected areas. Methods On the basis of a local hospital EMR system in Singapore, we developed a structured framework for EMR data processing, including standardization and phenotyping of diagnosis codes, construction of cohort with multilevel views, and generation of variables and proxy measures to supplement primary data. Disease complexity was estimated by Charlson Comorbidity Index (CCI) and Polypharmacy Score (PPS), whereas socioeconomic status (SES) was estimated by housing type. Validity of modified diagnosis codes and derived measures were investigated. Results Visit-level (N=7,778,761) and patient-level records (n=549,109) were generated. The International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) codes were standardized to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) with a mapping rate of 87.1%. In all, 97.4% of the ICD-9-CM codes were phenotyped successfully using Clinical Classification Software by Agency for Healthcare Research and Quality. Diagnosis codes that underwent modification (truncation or zero addition) in standardization and phenotyping procedures had the modification validated by physicians, with validity rates of more than 90%. Disease complexity measures (CCI and PPS) and SES were found to be valid and robust after a correlation analysis and a multivariate regression analysis. CCI and PPS were correlated with each other and positively correlated with health care utilization measures. Larger housing type was associated with lower government subsidies received, suggesting association with higher SES. Profile of constructed cohorts showed differences in disease prevalence, disease complexity, and health care utilization in those aged above 65 years and those aged 65 years or younger. Conclusions The framework proposed in this study would be useful for other researchers working with EMR data for health services research. Further analyses would be needed to better understand differences observed in the cohorts.
Collapse
Affiliation(s)
- Nabilah Rahman
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Debby D Wang
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Sheryl Hui-Xian Ng
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Sravan Ramachandran
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Srinath Sridharan
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Astrid Khoo
- Regional Health System Planning Office, National University Health System, Singapore, Singapore
| | - Chuen Seng Tan
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Wei-Ping Goh
- University Medicine Cluster, National University Hospital, Singapore, Singapore
| | - Xin Quan Tan
- Regional Health System Planning Office, National University Health System, Singapore, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| |
Collapse
|
11
|
He Z, Bian J, Carretta HJ, Lee J, Hogan WR, Shenkman E, Charness N. Prevalence of Multiple Chronic Conditions Among Older Adults in Florida and the United States: Comparative Analysis of the OneFlorida Data Trust and National Inpatient Sample. J Med Internet Res 2018; 20:e137. [PMID: 29650502 PMCID: PMC5920146 DOI: 10.2196/jmir.8961] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 01/20/2018] [Accepted: 02/15/2018] [Indexed: 12/17/2022] Open
Abstract
Background Older patients with multiple chronic conditions are often faced with increased health care needs and subsequent higher medical costs, posing significant financial burden to patients, their caregivers, and the health care system. The increasing adoption of electronic health record systems and the proliferation of clinical data offer new opportunities for prevalence studies and for population health assessment. The last few years have witnessed an increasing number of clinical research networks focused on building large collections of clinical data from electronic health records and claims to make it easier and less costly to conduct clinical research. Objective The aim of this study was to compare the prevalence of common chronic conditions and multiple chronic conditions in older adults between Florida and the United States using data from the OneFlorida Clinical Research Consortium and the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS). Methods We first analyzed the basic demographic characteristics of the older adults in 3 datasets—the 2013 OneFlorida data, the 2013 HCUP NIS data, and the combined 2012 to 2016 OneFlorida data. Then we analyzed the prevalence of each of the 25 chronic conditions in each of the 3 datasets. We stratified the analysis of older adults with hypertension, the most prevalent condition. Additionally, we examined trends (ie, overall trends and then by age, race, and gender) in the prevalence of discharge records representing multiple chronic conditions over time for the OneFlorida (2012-2016) and HCUP NIS cohorts (2003-2013). Results The rankings of the top 10 prevalent conditions are the same across the OneFlorida and HCUP NIS datasets. The most prevalent multiple chronic conditions of 2 conditions among the 3 datasets were—hyperlipidemia and hypertension; hypertension and ischemic heart disease; diabetes and hypertension; chronic kidney disease and hypertension; anemia and hypertension; and hyperlipidemia and ischemic heart disease. We observed increasing trends in multiple chronic conditions in both data sources. Conclusions The results showed that chronic conditions and multiple chronic conditions are prevalent in older adults across Florida and the United States. Even though slight differences were observed, the similar estimates of prevalence of chronic conditions and multiple chronic conditions across OneFlorida and HCUP NIS suggested that clinical research data networks such as OneFlorida, built from heterogeneous data sources, can provide rich data resources for conducting large-scale secondary data analyses.
Collapse
Affiliation(s)
- Zhe He
- School of Information, Florida State University, Tallahassee, FL, United States
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Henry J Carretta
- Department of Behavioral Sciences and Social Medicine, Florida State University, Tallahassee, FL, United States
| | - Jiwon Lee
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Elizabeth Shenkman
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Neil Charness
- Department of Psychology, Florida State University, Tallahassee, FL, United States
| |
Collapse
|