1
|
Coutinho-Almeida J, Saez C, Correia R, Rodrigues PP. Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules. JAMIA Open 2024; 7:ooae062. [PMID: 39070966 PMCID: PMC11283181 DOI: 10.1093/jamiaopen/ooae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 06/05/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024] Open
Abstract
Background The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement. Objective This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data. Methods A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020. Results The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined. Discussion Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality. Conclusion This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains.
Collapse
Affiliation(s)
- João Coutinho-Almeida
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Carlos Saez
- Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, 46022 Valencia, Spain
| | - Ricardo Correia
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Pedro Pereira Rodrigues
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
2
|
Sáez C, Ferri P, García-Gómez JM. Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support. J Med Internet Res 2024; 26:e50295. [PMID: 38941134 PMCID: PMC11245653 DOI: 10.2196/50295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 04/16/2024] [Accepted: 05/18/2024] [Indexed: 06/29/2024] Open
Abstract
Artificial intelligence (AI)-based clinical decision support systems are gaining momentum by relying on a greater volume and variety of secondary use data. However, the uncertainty, variability, and biases in real-world data environments still pose significant challenges to the development of health AI, its routine clinical use, and its regulatory frameworks. Health AI should be resilient against real-world environments throughout its lifecycle, including the training and prediction phases and maintenance during production, and health AI regulations should evolve accordingly. Data quality issues, variability over time or across sites, information uncertainty, human-computer interaction, and fundamental rights assurance are among the most relevant challenges. If health AI is not designed resiliently with regard to these real-world data effects, potentially biased data-driven medical decisions can risk the safety and fundamental rights of millions of people. In this viewpoint, we review the challenges, requirements, and methods for resilient AI in health and provide a research framework to improve the trustworthiness of next-generation AI-based clinical decision support.
Collapse
Affiliation(s)
- Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| | - Pablo Ferri
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
3
|
Leviton A, Loddenkemper T. Design, implementation, and inferential issues associated with clinical trials that rely on data in electronic medical records: a narrative review. BMC Med Res Methodol 2023; 23:271. [PMID: 37974111 PMCID: PMC10652539 DOI: 10.1186/s12874-023-02102-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/08/2023] [Indexed: 11/19/2023] Open
Abstract
Real world evidence is now accepted by authorities charged with assessing the benefits and harms of new therapies. Clinical trials based on real world evidence are much less expensive than randomized clinical trials that do not rely on "real world evidence" such as contained in electronic health records (EHR). Consequently, we can expect an increase in the number of reports of these types of trials, which we identify here as 'EHR-sourced trials.' 'In this selected literature review, we discuss the various designs and the ethical issues they raise. EHR-sourced trials have the potential to improve/increase common data elements and other aspects of the EHR and related systems. Caution is advised, however, in drawing causal inferences about the relationships among EHR variables. Nevertheless, we anticipate that EHR-CTs will play a central role in answering research and regulatory questions.
Collapse
Affiliation(s)
- Alan Leviton
- Department of Neurology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Tobias Loddenkemper
- Department of Neurology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
4
|
Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023; 25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open
Abstract
BACKGROUND The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. OBJECTIVE The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. RESULTS The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. CONCLUSIONS The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.
Collapse
Affiliation(s)
- Rehan Syed
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Rebekah Eden
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Tendai Makasi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Ignatius Chukwudi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Azumah Mamudu
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Mostafa Kamalpour
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Dakshi Kapugama Geeganage
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sareh Sadeghianasl
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sander J J Leemans
- Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany
| | - Kanika Goel
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Robert Andrews
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Moe Thandar Wynn
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Arthur Ter Hofstede
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Trina Myers
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
5
|
Tute E, Mast M, Wulff A. Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients. Methods Inf Med 2023. [PMID: 36630987 DOI: 10.1055/s-0042-1760238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
BACKGROUND Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption. OBJECTIVES To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS. METHODS We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool openCQA. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the HIDQF data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs. RESULTS We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases. CONCLUSIONS We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.
Collapse
Affiliation(s)
- Erik Tute
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Niedersachsen, Germany
| | - Marcel Mast
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Niedersachsen, Germany
| | - Antje Wulff
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Niedersachsen, Germany.,Big Data in Medicine, Department of Health Services Research, School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Niedersachsen, Germany
| |
Collapse
|
6
|
Blanes-Selva V, Asensio-Cuesta S, Doñate-Martínez A, Pereira Mesquita F, García-Gómez JM. User-centred design of a clinical decision support system for palliative care: Insights from healthcare professionals. Digit Health 2023; 9:20552076221150735. [PMID: 36644661 PMCID: PMC9837281 DOI: 10.1177/20552076221150735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/26/2022] [Indexed: 01/13/2023] Open
Abstract
Objective Although clinical decision support systems (CDSS) have many benefits for clinical practice, they also have several barriers to their acceptance by professionals. Our objective in this study was to design and validate The Aleph palliative care (PC) CDSS through a user-centred method, considering the predictions of the artificial intelligence (AI) core, usability and user experience (UX). Methods We performed two rounds of individual evaluation sessions with potential users. Each session included a model evaluation, a task test and a usability and UX assessment. Results The machine learning (ML) predictive models outperformed the participants in the three predictive tasks. System Usability Scale (SUS) reported 62.7 ± 14.1 and 65 ± 26.2 on a 100-point rating scale for both rounds, respectively, while User Experience Questionnaire - Short Version (UEQ-S) scores were 1.42 and 1.5 on the -3 to 3 scale. Conclusions The think-aloud method and including the UX dimension helped us to identify most of the workflow implementation issues. The system has good UX hedonic qualities; participants were interested in the tool and responded positively to it. Performance regarding usability was modest but acceptable.
Collapse
Affiliation(s)
- Vicent Blanes-Selva
- Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, Spain,Vicent Blanes-Selva, Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, 46022, Spain.
| | - Sabina Asensio-Cuesta
- Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, Spain
| | | | - Felipe Pereira Mesquita
- Divisão de Hematologia, departamento de Clínica Médica, da Universidade Federal de Juiz de Fora, Minas Gerais, Brasil
| | - Juan M. García-Gómez
- Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
7
|
Li Y, Salimi-Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M. Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2022; 3:535-547. [PMID: 36710898 PMCID: PMC9779795 DOI: 10.1093/ehjdh/ztac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/22/2022] [Indexed: 12/24/2022]
Abstract
Aims Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. Methods and results Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. Conclusion The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.
Collapse
Affiliation(s)
- Yikuan Li
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Gholamreza Salimi-Khorshidi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Shishir Rao
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Dexter Canoy
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Abdelaali Hassaine
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | | | - Kazem Rahimi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Mohammad Mamouei
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| |
Collapse
|
8
|
Souza J, Caballero I, Vasco Santos J, Fernandes Lobo M, Pinto A, Viana J, Sáez C, Lopes F, Freitas A. Multisource and temporal variability in Portuguese hospital administrative datasets: data quality implications. J Biomed Inform 2022; 136:104242. [DOI: 10.1016/j.jbi.2022.104242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/18/2022] [Accepted: 11/06/2022] [Indexed: 11/13/2022]
|
9
|
Singh H, Mhasawade V, Chunara R. Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database. PLOS DIGITAL HEALTH 2022; 1:e0000023. [PMID: 36812510 PMCID: PMC9931319 DOI: 10.1371/journal.pdig.0000023] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 02/17/2022] [Indexed: 12/23/2022]
Abstract
Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm "Fast Causal Inference" that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.
Collapse
Affiliation(s)
| | | | - Rumi Chunara
- New York University, Tandon School of Engineering,New York University, School of Global Public Health
| |
Collapse
|
10
|
Blanes-Selva V, Doñate-Martínez A, Linklater G, García-Gómez JM. Complementary frailty and mortality prediction models on older patients as a tool for assessing palliative care needs. Health Informatics J 2022; 28:14604582221092592. [PMID: 35642719 DOI: 10.1177/14604582221092592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Palliative care (PC) has demonstrated benefits for life-limiting illnesses. Bad survival prognosis and patients' decline are working criteria to guide PC decision-making for older patients. Still, there is not a clear consensus on when to initiate early PC. This work aims to propose machine learning approaches to predict frailty and mortality in older patients in supporting PC decision-making. Predictive models based on Gradient Boosting Machines (GBM) and Deep Neural Networks (DNN) were implemented for binary 1-year mortality classification, survival estimation and 1-year frailty classification. Besides, we tested the similarity between mortality and frailty distributions. The 1-year mortality classifier achieved an Area Under the Curve Receiver Operating Characteristic (AUC ROC) of 0.87 [0.86, 0.87], whereas the mortality regression model achieved an mean absolute error (MAE) of 333.13 [323.10, 342.49] days. Moreover, the 1-year frailty classifier obtained an AUC ROC of 0.89 [0.88, 0.90]. Mortality and frailty criteria were weakly correlated and had different distributions, which can be interpreted as these assessment measurements are complementary for PC decision-making. This study provides new models that can be part of decision-making systems for PC services in older patients after their external validation.
Collapse
Affiliation(s)
- Vicent Blanes-Selva
- Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, Spain
| | | | | | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitarios de Tecnologías de La Información y Comunicaciones (ITACA), Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
11
|
Zhou L, Romero N, Martínez-Miranda J, Conejero JA, García-Gómez JM, Sáez C. Subphenotyping of COVID-19 patients at pre-admission towards anticipated severity stratification: an analysis of 778 692 Mexican patients through an age-sex unbiased meta-clustering technique. JMIR Public Health Surveill 2022; 8:e30032. [PMID: 35144239 PMCID: PMC9098229 DOI: 10.2196/30032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 01/29/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
Background The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes—the division of populations of patients into more meaningful subgroups driven by clinical features—and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources. Objective We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes. Methods We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance. Results In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities. Conclusions The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.
Collapse
Affiliation(s)
- Lexin Zhou
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Nekane Romero
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Juan Martínez-Miranda
- CONACyT - Centro de Investigación Científica y de Educación Superior de Ensenada - CICESE-UT3, Ensenada, MX
| | - J Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada (IUMPA), Universitat Politècnica de València, Valencia, ES
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Camino de Vera s/n, Valencia 46022, España, Valencia, ES
| |
Collapse
|
12
|
Dockès J, Varoquaux G, Poline JB. Preventing dataset shift from breaking machine-learning biomarkers. Gigascience 2021; 10:giab055. [PMID: 34585237 PMCID: PMC8478611 DOI: 10.1093/gigascience/giab055] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/06/2021] [Accepted: 08/02/2021] [Indexed: 01/20/2023] Open
Abstract
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.
Collapse
Affiliation(s)
- Jérôme Dockès
- McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada
| | - Gaël Varoquaux
- McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada
- INRIA
| | | |
Collapse
|
13
|
Aerts H, Kalra D, Sáez C, Ramírez-Anguita JM, Mayer MA, Garcia-Gomez JM, Durà-Hernández M, Thienpont G, Coorevits P. Quality of Hospital Electronic Health Record (EHR) Data Based on the International Consortium for Health Outcomes Measurement (ICHOM) in Heart Failure: Pilot Data Quality Assessment Study. JMIR Med Inform 2021; 9:e27842. [PMID: 34346902 PMCID: PMC8374665 DOI: 10.2196/27842] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 05/30/2021] [Accepted: 06/05/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND There is increasing recognition that health care providers need to focus attention, and be judged against, the impact they have on the health outcomes experienced by patients. The measurement of health outcomes as a routine part of clinical documentation is probably the only scalable way of collecting outcomes evidence, since secondary data collection is expensive and error-prone. However, there is uncertainty about whether routinely collected clinical data within electronic health record (EHR) systems includes the data most relevant to measuring and comparing outcomes and if those items are collected to a good enough data quality to be relied upon for outcomes assessment, since several studies have pointed out significant issues regarding EHR data availability and quality. OBJECTIVE In this paper, we first describe a practical approach to data quality assessment of health outcomes, based on a literature review of existing frameworks for quality assessment of health data and multistakeholder consultation. Adopting this approach, we performed a pilot study on a subset of 21 International Consortium for Health Outcomes Measurement (ICHOM) outcomes data items from patients with congestive heart failure. METHODS All available registries compatible with the diagnosis of heart failure within an EHR data repository of a general hospital (142,345 visits and 12,503 patients) were extracted and mapped to the ICHOM format. We focused our pilot assessment on 5 commonly used data quality dimensions: completeness, correctness, consistency, uniqueness, and temporal stability. RESULTS We found high scores (>95%) for the consistency, completeness, and uniqueness dimensions. Temporal stability analyses showed some changes over time in the reported use of medication to treat heart failure, as well as in the recording of past medical conditions. Finally, the investigation of data correctness suggested several issues concerning the characterization of missing data values. Many of these issues appear to be introduced while mapping the IMASIS-2 relational database contents to the ICHOM format, as the latter requires a level of detail that is not explicitly available in the coded data of an EHR. CONCLUSIONS Overall, results of this pilot study revealed good data quality for the subset of heart failure outcomes collected at the Hospital del Mar. Nevertheless, some important data errors were identified that were caused by fundamentally different data collection practices in routine clinical care versus research, for which the ICHOM standard set was originally developed. To truly examine to what extent hospitals today are able to routinely collect the evidence of their success in achieving good health outcomes, future research would benefit from performing more extensive data quality assessments, including all data items from the ICHOM standards set and across multiple hospitals.
Collapse
Affiliation(s)
- Hannelore Aerts
- Medical Informatics and Statistics Unit, Department of Public Health and Primary Care, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- The European Institute for Innovation through Health Data (i~HD), Ghent, Belgium
| | - Dipak Kalra
- Medical Informatics and Statistics Unit, Department of Public Health and Primary Care, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- The European Institute for Innovation through Health Data (i~HD), Ghent, Belgium
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| | - Juan Manuel Ramírez-Anguita
- Research Programme on Biomedical Informatics, Hospital del Mar Medical Research Institute and Universitat Pompeu Fabra, Barcelona, Spain
| | - Miguel-Angel Mayer
- Research Programme on Biomedical Informatics, Hospital del Mar Medical Research Institute and Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan M Garcia-Gomez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| | - Marta Durà-Hernández
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain
| | - Geert Thienpont
- The European Institute for Innovation through Health Data (i~HD), Ghent, Belgium
- Research in Advanced Medical Informatics and Telematics (RAMIT), Ghent, Belgium
| | - Pascal Coorevits
- Medical Informatics and Statistics Unit, Department of Public Health and Primary Care, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| |
Collapse
|
14
|
Guo LL, Pfohl SR, Fries J, Posada J, Fleming SL, Aftandilian C, Shah N, Sung L. Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. Appl Clin Inform 2021; 12:808-815. [PMID: 34470057 PMCID: PMC8410238 DOI: 10.1055/s-0041-1735184] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/12/2021] [Indexed: 10/20/2022] Open
Abstract
OBJECTIVE The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. METHODS Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. RESULTS Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. CONCLUSION There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.
Collapse
Affiliation(s)
- Lin Lawrence Guo
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada
| | - Stephen R. Pfohl
- Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
| | - Jason Fries
- Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
| | - Jose Posada
- Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
| | - Scott Lanyon Fleming
- Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
| | - Catherine Aftandilian
- Division of Pediatric Hematology/Oncology, Stanford University, Palo Alto, United States
| | - Nigam Shah
- Biomedical Informatics Research, Stanford University, Palo Alto, California, United States
| | - Lillian Sung
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada
- Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Canada
| |
Collapse
|
15
|
Deep ensemble multitask classification of emergency medical call incidents combining multimodal data improves emergency medical dispatch. Artif Intell Med 2021; 117:102088. [PMID: 34127234 DOI: 10.1016/j.artmed.2021.102088] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/20/2022]
Abstract
The objective of this work was to develop a predictive model to aid non-clinical dispatchers to classify emergency medical call incidents by their life-threatening level (yes/no), admissible response delay (undelayable, minutes, hours, days) and emergency system jurisdiction (emergency system/primary care) in real time. We used a total of 1 244 624 independent incidents from the Valencian emergency medical dispatch service in Spain, compiled in retrospective from 2009 to 2012, including clinical features, demographics, circumstantial factors and free text dispatcher observations. Based on them, we designed and developed DeepEMC2, a deep ensemble multitask model integrating four subnetworks: three specialized to context, clinical and text data, respectively, and another to ensemble the former. The four subnetworks are composed in turn by multi-layer perceptron modules, bidirectional long short-term memory units and a bidirectional encoding representations from transformers module. DeepEMC2 showed a macro F1-score of 0.759 in life-threatening classification, 0.576 in admissible response delay and 0.757 in emergency system jurisdiction. These results show a substantial performance increase of 12.5 %, 17.5 % and 5.1 %, respectively, with respect to the current in-house triage protocol of the Valencian emergency medical dispatch service. Besides, DeepEMC2 significantly outperformed a set of baseline machine learning models, including naive bayes, logistic regression, random forest and gradient boosting (α = 0.05). Hence, DeepEMC2 is able to: 1) capture information present in emergency medical calls not considered by the existing triage protocol, and 2) model complex data dependencies not feasible by the tested baseline models. Likewise, our results suggest that most of this unconsidered information is present in the free text dispatcher observations. To our knowledge, this study describes the first deep learning model undertaking emergency medical call incidents classification. Its adoption in medical dispatch centers would potentially improve emergency dispatch processes, resulting in a positive impact in patient wellbeing and health services sustainability.
Collapse
|
16
|
Sáez C, Romero N, Conejero JA, García-Gómez JM. Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset. J Am Med Inform Assoc 2021; 28:360-364. [PMID: 33027509 PMCID: PMC7797735 DOI: 10.1093/jamia/ocaa258] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 09/07/2020] [Accepted: 09/28/2020] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVE The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. MATERIALS AND METHODS We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. RESULTS Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. CONCLUSIONS Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.
Collapse
Affiliation(s)
- Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Nekane Romero
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - J Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada, Universitat Politécnica de València, Valencia, Spain
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| |
Collapse
|