1
|
Abstract
Laboratory clinical decision support (CDS) typically relies on data from the electronic health record (EHR). The implementation of a sustainable, effective laboratory CDS program requires a commitment to standardization and harmonization of key EHR data elements that are the foundation of laboratory CDS. The direct use of artificial intelligence algorithms in CDS programs will be limited unless key elements of the EHR are structured. The identification, curation, maintenance, and preprocessing steps necessary to implement robust laboratory-based algorithms must account for the heterogeneity of data present in a typical EHR.
Collapse
|
2
|
Ozonze O, Scott PJ, Hopgood AA. Automating Electronic Health Record Data Quality Assessment. J Med Syst 2023; 47:23. [PMID: 36781551 PMCID: PMC9925537 DOI: 10.1007/s10916-022-01892-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/15/2022] [Indexed: 02/15/2023]
Abstract
Information systems such as Electronic Health Record (EHR) systems are susceptible to data quality (DQ) issues. Given the growing importance of EHR data, there is an increasing demand for strategies and tools to help ensure that available data are fit for use. However, developing reliable data quality assessment (DQA) tools necessary for guiding and evaluating improvement efforts has remained a fundamental challenge. This review examines the state of research on operationalising EHR DQA, mainly automated tooling, and highlights necessary considerations for future implementations. We reviewed 1841 articles from PubMed, Web of Science, and Scopus published between 2011 and 2021. 23 DQA programs deployed in real-world settings to assess EHR data quality (n = 14), and a few experimental prototypes (n = 9), were identified. Many of these programs investigate completeness (n = 15) and value conformance (n = 12) quality dimensions and are backed by knowledge items gathered from domain experts (n = 9), literature reviews and existing DQ measurements (n = 3). A few DQA programs also explore the feasibility of using data-driven techniques to assess EHR data quality automatically. Overall, the automation of EHR DQA is gaining traction, but current efforts are fragmented and not backed by relevant theory. Existing programs also vary in scope, type of data supported, and how measurements are sourced. There is a need to standardise programs for assessing EHR data quality, as current evidence suggests their quality may be unknown.
Collapse
Affiliation(s)
- Obinwa Ozonze
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK
| | - Philip J Scott
- Institute of Management and Health, University of Wales Trinity Saint David, Lampeter, SA48 7ED, UK
| | - Adrian A Hopgood
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK.
| |
Collapse
|
3
|
Al-Sowi AM, AlMasri N, Hammo B, Al-Qwaqzeh FAZ. Cerebral Palsy classification based on multi-feature analysis using machine learning. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
|
4
|
Miao Z, Sealey MD, Sathyanarayanan S, Delen D, Zhu L, Shepherd S. A data preparation framework for cleaning electronic health records and assessing cleaning outcomes for secondary analysis. INFORM SYST 2022. [DOI: 10.1016/j.is.2022.102130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
5
|
van de Sande D, Van Genderen ME, Smit JM, Huiskens J, Visser JJ, Veen RER, van Unen E, Ba OH, Gommers D, Bommel JV. Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter. BMJ Health Care Inform 2022; 29:bmjhci-2021-100495. [PMID: 35185012 PMCID: PMC8860016 DOI: 10.1136/bmjhci-2021-100495] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/24/2022] [Indexed: 12/23/2022] Open
Abstract
Objective Although the role of artificial intelligence (AI) in medicine is increasingly studied, most patients do not benefit because the majority of AI models remain in the testing and prototyping environment. The development and implementation trajectory of clinical AI models are complex and a structured overview is missing. We therefore propose a step-by-step overview to enhance clinicians’ understanding and to promote quality of medical AI research. Methods We summarised key elements (such as current guidelines, challenges, regulatory documents and good practices) that are needed to develop and safely implement AI in medicine. Conclusion This overview complements other frameworks in a way that it is accessible to stakeholders without prior AI knowledge and as such provides a step-by-step approach incorporating all the key elements and current guidelines that are essential for implementation, and can thereby help to move AI from bytes to bedside.
Collapse
Affiliation(s)
- Davy van de Sande
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Michel E Van Genderen
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jim M Smit
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands.,Pattern Recognition and Bioinformatics group, EEMCS, Delft University of Technology, Delft, The Netherlands
| | | | - Jacob J Visser
- Department of Radiology and Nuclear Medicine, Erasmus Medical Center, Rotterdam, The Netherlands.,Department of Information Technology, Chief Medical Information Officer, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Robert E R Veen
- Department of Information Technology, theme Research Suite, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | - Oliver Hilgers Ba
- Active Medical Devices/Medical Device Software, CE Plus GmbH, Badenweiler, Germany
| | - Diederik Gommers
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jasper van Bommel
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
6
|
Macieira TGR, Yao Y, Keenan GM. Use of machine learning to transform complex standardized nursing care plan data into meaningful research variables: a palliative care exemplar. J Am Med Inform Assoc 2021; 28:2695-2701. [PMID: 34569603 DOI: 10.1093/jamia/ocab205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 09/07/2021] [Indexed: 11/13/2022] Open
Abstract
The aim of this article was to describe a novel methodology for transforming complex nursing care plan data into meaningful variables to assess the impact of nursing care. We extracted standardized care plan data for older adults from the electronic health records of 4 hospitals. We created a palliative care framework with 8 categories. A subset of the data was manually classified under the framework, which was then used to train random forest machine learning algorithms that performed automated classification. Two expert raters achieved a 78% agreement rate. Random forest classifiers trained using the expert consensus achieved accuracy (agreement with consensus) between 77% and 89%. The best classifier was utilized for the automated classification of the remaining data. Utilizing machine learning reduces the cost of transforming raw data into representative constructs that can be used in research and practice to understand the essence of nursing specialty care, such as palliative care.
Collapse
Affiliation(s)
- Tamara G R Macieira
- Department of Family, Community and Health Systems Science, College of Nursing, University of Florida, Gainesville, Florida, USA
| | - Yingwei Yao
- Department of Biobehavioral Nursing Science, College of Nursing, University of Florida, Gainesville, Florida, USA
| | - Gail M Keenan
- Department of Family, Community and Health Systems Science, College of Nursing, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
7
|
Reimer AP, Dai W, Smith B, Schiltz NK, Sun J, Koroukian SM. Subcategorizing EHR diagnosis codes to improve clinical application of machine learning models. Int J Med Inform 2021; 156:104588. [PMID: 34607290 DOI: 10.1016/j.ijmedinf.2021.104588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/17/2021] [Accepted: 09/19/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic health record (EHR) data is commonly used for secondary purposes such as research and clinical decision support. However, reuse of EHR data presents several challenges including but not limited to identifying all diagnoses associated with a patient's clinical encounter. The purpose of this study was to assess the feasibility of developing a schema to identify and subclassify all structured diagnosis codes for a patient encounter. METHODS To develop a subclassification schema we used EHR data from an interhospital transport data repository that contained complete hospital encounter level data. Eight discrete data sources containing structured diagnosis codes were identified. Diagnosis codes were normalized using the Unified Medical Language System and additional EHR data were combined with standardized terminologies to create and validate the subcategories. We then employed random forest to assess the usefulness of the new subcategorized diagnoses to predict post-interhospital transfer mortality by building 2 models, one using standard diagnosis codes, and one using the new subcategorized diagnosis codes. RESULTS Six subcategories of diagnoses were identified and validated. The subcategories included: primary or admitting diagnoses (10%), past medical, surgical or social history (9%), problem list (20%), comorbidity (24%), discharge diagnoses (6%), and unmapped diagnoses (31%). The subcategorized model outperformed the standard model, achieving a training AUROC of 0.97 versus 0.95 and testing model AUROC of 0.81 versus 0.46. DISCUSSION Our work demonstrates that merging structured diagnosis codes with additional EHR data and secondary data sources provides additional information to understand the role of diagnosis throughout a clinical encounter and improves predictive model performance. Further work is necessary to assess if subcategorizing produces benefits in interpreting the results of prognostic models and/or operationalizing the results in clinical decision support applications.
Collapse
Affiliation(s)
- Andrew P Reimer
- Frances Payne Bolton School of Nursing, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH, United States; Critical Care Transport, Cleveland Clinic, 9800 Euclid Ave, Cleveland, OH, United States.
| | - Wei Dai
- Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Benjamin Smith
- Department of Mathematics, Applied Mathematics and Statistics, College of Arts and Sciences, Case Western Reserve University, Cleveland, OH, United States
| | - Nicholas K Schiltz
- Frances Payne Bolton School of Nursing, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH, United States
| | - Jiayang Sun
- Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Siran M Koroukian
- Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
8
|
Leveraging electronic health record data to inform hospital resource management : A systematic data mining approach. Health Care Manag Sci 2021; 24:716-741. [PMID: 34031792 DOI: 10.1007/s10729-021-09554-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 02/02/2021] [Indexed: 10/21/2022]
Abstract
Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.
Collapse
|
9
|
Pellathy T, Saul M, Clermont G, Dubrawski AW, Pinsky MR, Hravnak M. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 2021; 36:397-405. [PMID: 33558981 DOI: 10.1007/s10877-021-00664-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 01/20/2021] [Indexed: 12/23/2022]
Abstract
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
Collapse
Affiliation(s)
- Tiffany Pellathy
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA.
| | - Melissa Saul
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Gilles Clermont
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Artur W Dubrawski
- School of Computer Science, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Michael R Pinsky
- University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Marilyn Hravnak
- University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA
| |
Collapse
|
10
|
Goodwin AJ, Eytan D, Greer RW, Mazwi M, Thommandram A, Goodfellow SD, Assadi A, Jegatheeswaran A, Laussen PC. A practical approach to storage and retrieval of high-frequency physiological signals. Physiol Meas 2020; 41:035008. [PMID: 32131060 DOI: 10.1088/1361-6579/ab7cb5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
OBJECTIVE Storage of physiological waveform data for retrospective analysis presents significant challenges. Resultant data can be very large, and therefore becomes expensive to store and complicated to manage. Traditional database approaches are not appropriate for large scale storage of physiological waveforms. Our goal was to apply modern time series compression and indexing techniques to the problem of physiological waveform storage and retrieval. APPROACH We deployed a vendor-agnostic data collection system and developed domain-specific compression approaches that allowed long term storage of physiological waveform data and other associated clinical and medical device data. The database (called AtriumDB) also facilitates rapid retrieval of retrospective data for high-performance computing and machine learning applications. MAIN RESULTS A prototype system has been recording data in a 42-bed pediatric critical care unit at The Hospital for Sick Children in Toronto, Ontario since February 2016. As of December 2019, the database contains over 720,000 patient-hours of data collected from over 5300 patients, all with complete waveform capture. One year of full resolution physiological waveform storage from this 42-bed unit can be losslessly compressed and stored in less than 300 GB of disk space. Retrospective data can be delivered to analytical applications at a rate of up to 50 million time-value pairs per second. SIGNIFICANCE Stored data are not pre-processed or filtered. Having access to a large retrospective dataset with realistic artefacts lends itself to the process of anomaly discovery and understanding. Retrospective data can be replayed to simulate a realistic streaming data environment where analytical tools can be rapidly tested at scale.
Collapse
Affiliation(s)
- Andrew J Goodwin
- Department of Critical Care Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada. School of Biomedical Engineering, University of Sydney, Sydney, New South Wales, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Park AJ, Weintraub GS, Asgari MM. Leveraging the electronic health record to improve dermatologic care delivery: The importance of finding structure in data. J Am Acad Dermatol 2020; 82:773-775. [DOI: 10.1016/j.jaad.2019.10.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 10/24/2019] [Accepted: 10/26/2019] [Indexed: 10/25/2022]
|
12
|
Horvat CM, Ismail HM, Au AK, Garibaldi L, Siripong N, Kantawala S, Aneja RK, Hupp DS, Kochanek PM, Clark RSB. Presenting predictors and temporal trends of treatment-related outcomes in diabetic ketoacidosis. Pediatr Diabetes 2018; 19:985-992. [PMID: 29573523 PMCID: PMC6863166 DOI: 10.1111/pedi.12663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 02/06/2018] [Accepted: 02/07/2018] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVE This study examines temporal trends in treatment-related outcomes surrounding a diabetic ketoacidosis (DKA) performance improvement intervention consisting of mandated intensive care unit admission and implementation of a standardized management pathway, and identifies physical and biochemical characteristics associated with outcomes in this population. METHODS A retrospective cohort of 1225 children with DKA were identified in the electronic health record by international classification of diseases codes and a minimum pH less than 7.3 during hospitalization at a quaternary children's hospital between April, 2009 and May, 2016. Multivariable regression examined predictors and trends of hypoglycemia, central venous line placement, severe hyperchloremia, head computed tomography (CT) utilization, treated cerebral edema and hospital length of stay (LOS). RESULTS The incidence of severe hyperchloremia and head CT utilization decreased during the study period. Among patients with severe DKA (presenting pH < 7.1), the intervention was associated with decreasing LOS and less variability in LOS. Lower pH at presentation was independently associated with increased risk for all outcomes except hypoglycemia, which was associated with higher pH. Patients treated for cerebral edema had a lower presenting mean systolic blood pressure z score (0.58 [95% confidence interval (CI) -0.02-1.17] vs 1.23 [1.13-1.33]) and a higher maximum mean systolic blood pressure (SBP) z score during hospitalization (3.75 [3.19-4.31] vs 2.48 [2.38-2.58]) compared to patients not receiving cerebral edema treatment. Blood pressure and cerebral edema remained significantly associated after covariate adjustment. CONCLUSION Treatment-related outcomes improved over the entire study period and following a performance improvement intervention. The association of SBP with cerebral edema warrants further study.
Collapse
Affiliation(s)
- Christopher M. Horvat
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Safar Center for Resuscitation Research, University of Pittsburgh, Pittsburgh, PA,Brain Care Institute, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Heba M. Ismail
- Division of Pediatric Endocrinology, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Alicia K. Au
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Safar Center for Resuscitation Research, University of Pittsburgh, Pittsburgh, PA,Brain Care Institute, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Luigi Garibaldi
- Division of Pediatric Endocrinology, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Nalyn Siripong
- The Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA
| | - Sajel Kantawala
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Brain Care Institute, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Rajesh K. Aneja
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Diane S. Hupp
- Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Patrick M. Kochanek
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Safar Center for Resuscitation Research, University of Pittsburgh, Pittsburgh, PA,Brain Care Institute, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| | - Robert S. B. Clark
- Department of Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA,Safar Center for Resuscitation Research, University of Pittsburgh, Pittsburgh, PA,Brain Care Institute, Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA,Children’s Hospital of Pittsburgh of UPMC, Pittsburgh, PA
| |
Collapse
|