1
|
Coutinho-Almeida J, Saez C, Correia R, Rodrigues PP. Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules. JAMIA Open 2024; 7:ooae062. [PMID: 39070966 PMCID: PMC11283181 DOI: 10.1093/jamiaopen/ooae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 06/05/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024] Open
Abstract
Background The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement. Objective This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data. Methods A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020. Results The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined. Discussion Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality. Conclusion This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains.
Collapse
Affiliation(s)
- João Coutinho-Almeida
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Carlos Saez
- Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, 46022 Valencia, Spain
| | - Ricardo Correia
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Pedro Pereira Rodrigues
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
2
|
Crowson MG, Nwosu OI. The Integration and Impact of Artificial Intelligence in Otolaryngology-Head and Neck Surgery: Navigating the Last Mile. Otolaryngol Clin North Am 2024; 57:887-895. [PMID: 38705741 DOI: 10.1016/j.otc.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Incorporating artificial Intelligence and machine learning into otolaryngology requires careful data handling, security, and ethical considerations. Success depends on interdisciplinary cooperation, consistent innovation, and regulatory compliance to improve clinical outcomes, provider experience, and operational effectiveness.
Collapse
Affiliation(s)
- Matthew G Crowson
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear Hospital, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA.
| | - Obinna I Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear Hospital, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
3
|
Balczewski EA, Cao J, Singh K. Risk Prediction and Machine Learning: A Case-Based Overview. Clin J Am Soc Nephrol 2023; 18:524-526. [PMID: 36749160 PMCID: PMC10103261 DOI: 10.2215/cjn.0000000000000083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 01/09/2023] [Indexed: 01/28/2023]
Affiliation(s)
- Emily A. Balczewski
- Medical Scientist Training Program, University of Michigan Medical School, Ann Arbor, Michigan
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan
| | - Jie Cao
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan
| | - Karandeep Singh
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan
- School of Information, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
4
|
Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023; 25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open
Abstract
BACKGROUND The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. OBJECTIVE The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. RESULTS The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. CONCLUSIONS The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.
Collapse
Affiliation(s)
- Rehan Syed
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Rebekah Eden
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Tendai Makasi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Ignatius Chukwudi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Azumah Mamudu
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Mostafa Kamalpour
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Dakshi Kapugama Geeganage
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sareh Sadeghianasl
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sander J J Leemans
- Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany
| | - Kanika Goel
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Robert Andrews
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Moe Thandar Wynn
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Arthur Ter Hofstede
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Trina Myers
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
5
|
Burns CM, Pung L, Witt D, Gao M, Sendak M, Balu S, Krakower D, Marcus JL, Okeke NL, Clement ME. Development of a Human Immunodeficiency Virus Risk Prediction Model Using Electronic Health Record Data From an Academic Health System in the Southern United States. Clin Infect Dis 2023; 76:299-306. [PMID: 36125084 PMCID: PMC10202432 DOI: 10.1093/cid/ciac775] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 09/03/2022] [Accepted: 09/14/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Human immunodeficiency virus (HIV) pre-exposure prophylaxis (PrEP) is underutilized in the southern United States. Rapid identification of individuals vulnerable to diagnosis of HIV using electronic health record (EHR)-based tools may augment PrEP uptake in the region. METHODS Using machine learning, we developed EHR-based models to predict incident HIV diagnosis as a surrogate for PrEP candidacy. We included patients from a southern medical system with encounters between October 2014 and August 2016, training the model to predict incident HIV diagnosis between September 2016 and August 2018. We obtained 74 EHR variables as potential predictors. We compared Extreme Gradient Boosting (XGBoost) versus least absolute shrinkage selection operator (LASSO) logistic regression models, and assessed performance, overall and among women, using area under the receiver operating characteristic curve (AUROC) and area under precision recall curve (AUPRC). RESULTS Of 998 787 eligible patients, 162 had an incident HIV diagnosis, of whom 49 were women. The XGBoost model outperformed the LASSO model for the total cohort, achieving an AUROC of 0.89 and AUPRC of 0.01. The female-only cohort XGBoost model resulted in an AUROC of 0.78 and AUPRC of 0.00025. The most predictive variables for the overall cohort were race, sex, and male partner. The strongest positive predictors for the female-only cohort were history of pelvic inflammatory disease, drug use, and tobacco use. CONCLUSIONS Our machine-learning models were able to effectively predict incident HIV diagnoses including among women. This study establishes feasibility of using these models to identify persons most suitable for PrEP in the South.
Collapse
Affiliation(s)
- Charles M Burns
- Division of Infectious Diseases, Duke University Medical Center, Durham, North Carolina, USA
| | - Leland Pung
- School of Medicine, Duke University, Durham, North Carolina, USA
- Duke Institute for Health Innovation, Durham, North Carolina, USA
| | - Daniel Witt
- Duke Institute for Health Innovation, Durham, North Carolina, USA
| | - Michael Gao
- Duke Institute for Health Innovation, Durham, North Carolina, USA
| | - Mark Sendak
- Duke Institute for Health Innovation, Durham, North Carolina, USA
| | - Suresh Balu
- Duke Institute for Health Innovation, Durham, North Carolina, USA
| | - Douglas Krakower
- Division of Infectious Disease, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
- Department of Population Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Julia L Marcus
- Department of Population Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Nwora Lance Okeke
- Division of Infectious Diseases, Duke University Medical Center, Durham, North Carolina, USA
| | - Meredith E Clement
- Division of Infectious Diseases, Louisiana State University Health Sciences Center, New Orleans, Louisiana, USA
| |
Collapse
|
6
|
Lu JH, Callahan A, Patel BS, Morse KE, Dash D, Pfeffer MA, Shah NH. Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review. JAMA Netw Open 2022; 5:e2227779. [PMID: 35984654 PMCID: PMC9391954 DOI: 10.1001/jamanetworkopen.2022.27779] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 07/04/2022] [Indexed: 12/23/2022] Open
Abstract
Importance Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied. Objectives To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested. Evidence Review MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items. Findings From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex). Conclusions and Relevance These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.
Collapse
Affiliation(s)
- Jonathan H. Lu
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Alison Callahan
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Birju S. Patel
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Keith E. Morse
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
- Department of Clinical Informatics, Lucile Packard Children’s Hospital, Palo Alto, California
| | - Dev Dash
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Michael A. Pfeffer
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
- Technology and Digital Solutions, Stanford Medicine, Stanford, California
| | - Nigam H. Shah
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
- Technology and Digital Solutions, Stanford Medicine, Stanford, California
- Clinical Excellence Research Center, Stanford Medicine, Stanford, California
| |
Collapse
|
7
|
Andrei AC. Data Quality in Electronic Health Records: Practical Considerations. J Am Coll Surg 2020; 230:305. [PMID: 32093899 DOI: 10.1016/j.jamcollsurg.2020.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 01/08/2020] [Indexed: 11/29/2022]
|