1
|
Usyk M, Hayes RB, Knight R, Gonzalez A, Li H, Osman I, Weber JS, Ahn J. Gut microbiome is associated with recurrence-free survival in patients with resected Stage IIIB-D or Stage IV melanoma treated with immune checkpoint inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.16.589761. [PMID: 38659744 PMCID: PMC11042335 DOI: 10.1101/2024.04.16.589761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The gut microbiome (GMB) has been associated with outcomes of immune checkpoint blockade therapy in melanoma, but there is limited consensus on the specific taxa involved, particularly across different geographic regions. We analyzed pre-treatment stool samples from 674 melanoma patients participating in a phase-III trial of adjuvant nivolumab plus ipilimumab versus nivolumab, across three continents and five regions. Longitudinal analysis revealed that GMB was largely unchanged following treatment, offering promise for lasting GMB-based interventions. In region-specific and cross-region meta-analyses, we identified pre-treatment taxonomic markers associated with recurrence, including Eubacterium, Ruminococcus, Firmicutes, and Clostridium. Recurrence prediction by these markers was best achieved across regions by matching participants on GMB compositional similarity between the intra-regional discovery and external validation sets. AUCs for prediction ranged from 0.83-0.94 (depending on the initial discovery region) for patients closely matched on GMB composition (e.g., JSD ≤0.11). This evidence indicates that taxonomic markers for prediction of recurrence are generalizable across regions, for individuals of similar GMB composition.
Collapse
Affiliation(s)
- Mykhaylo Usyk
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Richard B. Hayes
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
- NYU Laura and Isaac Perlmutter Cancer Center, New York, NY, USA
| | - Rob Knight
- Departments of Pediatrics, Computer Science & Engineering, and Bioengineering; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA
| | - Antonio Gonzalez
- Departments of Pediatrics, Computer Science & Engineering, and Bioengineering; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA
| | - Huilin Li
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
- NYU Laura and Isaac Perlmutter Cancer Center, New York, NY, USA
| | - Iman Osman
- NYU Laura and Isaac Perlmutter Cancer Center, New York, NY, USA
- The Ronald O. Perelman Department of Dermatology, NYU Grossman School of Medicine, New York, NY, USA
- Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA
| | - Jeffrey S. Weber
- NYU Laura and Isaac Perlmutter Cancer Center, New York, NY, USA
- Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA
| | - Jiyoung Ahn
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
- NYU Laura and Isaac Perlmutter Cancer Center, New York, NY, USA
| |
Collapse
|
2
|
Souza J, Caballero I, Vasco Santos J, Fernandes Lobo M, Pinto A, Viana J, Sáez C, Lopes F, Freitas A. Multisource and temporal variability in Portuguese hospital administrative datasets: data quality implications. J Biomed Inform 2022; 136:104242. [DOI: 10.1016/j.jbi.2022.104242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/18/2022] [Accepted: 11/06/2022] [Indexed: 11/13/2022]
|
3
|
Zhou L, Romero N, Martínez-Miranda J, Conejero JA, García-Gómez JM, Sáez C. Subphenotyping of COVID-19 patients at pre-admission towards anticipated severity stratification: an analysis of 778 692 Mexican patients through an age-sex unbiased meta-clustering technique. JMIR Public Health Surveill 2022; 8:e30032. [PMID: 35144239 PMCID: PMC9098229 DOI: 10.2196/30032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 01/29/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
Background The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes—the division of populations of patients into more meaningful subgroups driven by clinical features—and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources. Objective We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes. Methods We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance. Results In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities. Conclusions The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.
Collapse
Affiliation(s)
- Lexin Zhou
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Nekane Romero
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Juan Martínez-Miranda
- CONACyT - Centro de Investigación Científica y de Educación Superior de Ensenada - CICESE-UT3, Ensenada, MX
| | - J Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada (IUMPA), Universitat Politècnica de València, Valencia, ES
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Valencia, ES
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Camino de Vera s/n, Valencia 46022, España, Valencia, ES
| |
Collapse
|
4
|
Penzel T, Dietz-Terjung S, Woehrle H, Schöbel C. New Paths in Respiratory Sleep Medicine: Consumer Devices, e-Health, and Digital Health Measurements. Sleep Med Clin 2021; 16:619-634. [PMID: 34711386 DOI: 10.1016/j.jsmc.2021.08.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Sleep health and tracking sleep with contemporary wearables have become more popular. Sleep disorders, in particular, sleep-disordered breathing, have a higher prevalence than estimated previously. Many patients with apnea and hypopnea events suffer whereas others do not report complaints or show cardiovascular consequences. Assessment with wearables may support efforts to distinguish which type of apnea is related to aging and which to cardiovascular comorbidities. Innovative methods offer smart solutions for problems that are insufficiently addressed. Telemedical concepts help bring patients to sleep medicine expertise at an early stage. To use these methods clinically, they must be certified as medical devices.
Collapse
Affiliation(s)
- Thomas Penzel
- Interdisciplinary Sleep Medicine Center, Charité - Universitätsmedizin Berlin, Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany; Department of Biology, Saratov State University, Astrakhanskaya Str. 12, Saratov 410012, Russia.
| | - Sarah Dietz-Terjung
- Universitätsmedizin Essen, Ruhrlandklinik, Westdeutsches Lungenzentrum am Universitätsklinikum Essen gGmbH, Tüschener Weg 40, 45239 Essen, Germany
| | | | - Christoph Schöbel
- Universitätsmedizin Essen, Ruhrlandklinik, Westdeutsches Lungenzentrum am Universitätsklinikum Essen gGmbH, Tüschener Weg 40, 45239 Essen, Germany
| |
Collapse
|
5
|
Deep ensemble multitask classification of emergency medical call incidents combining multimodal data improves emergency medical dispatch. Artif Intell Med 2021; 117:102088. [PMID: 34127234 DOI: 10.1016/j.artmed.2021.102088] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/20/2022]
Abstract
The objective of this work was to develop a predictive model to aid non-clinical dispatchers to classify emergency medical call incidents by their life-threatening level (yes/no), admissible response delay (undelayable, minutes, hours, days) and emergency system jurisdiction (emergency system/primary care) in real time. We used a total of 1 244 624 independent incidents from the Valencian emergency medical dispatch service in Spain, compiled in retrospective from 2009 to 2012, including clinical features, demographics, circumstantial factors and free text dispatcher observations. Based on them, we designed and developed DeepEMC2, a deep ensemble multitask model integrating four subnetworks: three specialized to context, clinical and text data, respectively, and another to ensemble the former. The four subnetworks are composed in turn by multi-layer perceptron modules, bidirectional long short-term memory units and a bidirectional encoding representations from transformers module. DeepEMC2 showed a macro F1-score of 0.759 in life-threatening classification, 0.576 in admissible response delay and 0.757 in emergency system jurisdiction. These results show a substantial performance increase of 12.5 %, 17.5 % and 5.1 %, respectively, with respect to the current in-house triage protocol of the Valencian emergency medical dispatch service. Besides, DeepEMC2 significantly outperformed a set of baseline machine learning models, including naive bayes, logistic regression, random forest and gradient boosting (α = 0.05). Hence, DeepEMC2 is able to: 1) capture information present in emergency medical calls not considered by the existing triage protocol, and 2) model complex data dependencies not feasible by the tested baseline models. Likewise, our results suggest that most of this unconsidered information is present in the free text dispatcher observations. To our knowledge, this study describes the first deep learning model undertaking emergency medical call incidents classification. Its adoption in medical dispatch centers would potentially improve emergency dispatch processes, resulting in a positive impact in patient wellbeing and health services sustainability.
Collapse
|
6
|
Sáez C, Romero N, Conejero JA, García-Gómez JM. Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset. J Am Med Inform Assoc 2021; 28:360-364. [PMID: 33027509 PMCID: PMC7797735 DOI: 10.1093/jamia/ocaa258] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 09/07/2020] [Accepted: 09/28/2020] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVE The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. MATERIALS AND METHODS We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. RESULTS Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. CONCLUSIONS Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.
Collapse
Affiliation(s)
- Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Nekane Romero
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - J Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada, Universitat Politécnica de València, Valencia, Spain
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| |
Collapse
|
7
|
Catala ODT, Igual IS, Perez-Benito FJ, Escriva DM, Castello VO, Llobet R, Perez-Cortes JC. Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:42370-42383. [PMID: 34812384 PMCID: PMC8545228 DOI: 10.1109/access.2021.3065456] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/07/2021] [Indexed: 05/03/2023]
Abstract
Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.
Collapse
Affiliation(s)
- Omar Del Tejo Catala
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
| | - Ismael Salvador Igual
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
| | | | - David Millan Escriva
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
| | - Vicent Ortiz Castello
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
| | - Rafael Llobet
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
- Department of Computer Systems and Computation (DSIC)Universitat Politècnica de València 46022 Valencia Spain
| | - Juan-Carlos Perez-Cortes
- Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain
- Department of Computing Engineering (DISCA)Universitat Politècnica de València 46022 Valencia Spain
| |
Collapse
|
8
|
Pérez-Benito FJ, Signol F, Perez-Cortes JC, Fuster-Baggetto A, Pollan M, Pérez-Gómez B, Salas-Trejo D, Casals M, Martínez I, LLobet R. A deep learning system to obtain the optimal parameters for a threshold-based breast and dense tissue segmentation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 195:105668. [PMID: 32755754 DOI: 10.1016/j.cmpb.2020.105668] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 07/13/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is the most frequent cancer in women. The Spanish healthcare network established population-based screening programs in all Autonomous Communities, where mammograms of asymptomatic women are taken with early diagnosis purposes. Breast density assessed from digital mammograms is a biomarker known to be related to a higher risk to develop breast cancer.It is thus crucial to provide a reliable method to measure breast density from mammograms. Furthermore the complete automation of this segmentation process is becoming fundamental as the amount of mammograms increases every day. Important challenges are related with the differences in images from different devices and the lack of an objective gold standard.This paper presents a fully automated framework based on deep learning to estimate the breast density. The framework covers breast detection, pectoral muscle exclusion, and fibroglandular tissue segmentation. METHODS A multi-center study, composed of 1785 women whose "for presentation" mammograms were segmented by two experienced radiologists. A total of 4992 of the 6680 mammograms were used as training corpus and the remaining (1688) formed the test corpus. This paper presents a histogram normalization step that smoothed the difference between acquisition, a regression architecture that learned segmentation parameters as intrinsic image features and a loss function based on the DICE score. RESULTS The results obtained indicate that the level of concordance (DICE score) reached by the two radiologists (0.77) was also achieved by the automated framework when it was compared to the closest breast segmentation from the radiologists. For the acquired with the highest quality device, the DICE score per acquisition device reached 0.84, while the concordance between radiologists was 0.76. CONCLUSIONS An automatic breast density estimator based on deep learning exhibits similar performance when compared with two experienced radiologists. It suggests that this system could be used to support radiologists to ease its work.
Collapse
Affiliation(s)
- Francisco Javier Pérez-Benito
- Instituto Tecnológico de la Informática, Universitat Politècnica de València, Camino de Vera, s/n, València 46022, Spain.
| | - François Signol
- Instituto Tecnológico de la Informática, Universitat Politècnica de València, Camino de Vera, s/n, València 46022, Spain.
| | - Juan-Carlos Perez-Cortes
- Instituto Tecnológico de la Informática, Universitat Politècnica de València, Camino de Vera, s/n, València 46022, Spain.
| | - Alejandro Fuster-Baggetto
- Instituto Tecnológico de la Informática, Universitat Politècnica de València, Camino de Vera, s/n, València 46022, Spain.
| | - Marina Pollan
- National Center for Epidemiology, Carlos III Institute of Health, Monforte de lemos 5, Madrid 28029, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER en Epidemiología y Salud Pública - CIBERESP), Carlos III Institute of Health, Monforte de Lemos 5, Madrid 28029, Spain.
| | - Beatriz Pérez-Gómez
- National Center for Epidemiology, Carlos III Institute of Health, Monforte de lemos 5, Madrid 28029, Spain; Consortium for Biomedical Research in Epidemiology and Public Health (CIBER en Epidemiología y Salud Pública - CIBERESP), Carlos III Institute of Health, Monforte de Lemos 5, Madrid 28029, Spain.
| | - Dolores Salas-Trejo
- Valencian Breast Cancer Screening Program, General Directorate of Public Health, València, Spain; Centro Superior de Investigación en Salud Pública CSISP, FISABIO, València, Spain.
| | - Maria Casals
- Valencian Breast Cancer Screening Program, General Directorate of Public Health, València, Spain; Centro Superior de Investigación en Salud Pública CSISP, FISABIO, València, Spain.
| | - Inmaculada Martínez
- Valencian Breast Cancer Screening Program, General Directorate of Public Health, València, Spain; Centro Superior de Investigación en Salud Pública CSISP, FISABIO, València, Spain.
| | - Rafael LLobet
- Instituto Tecnológico de la Informática, Universitat Politècnica de València, Camino de Vera, s/n, València 46022, Spain.
| |
Collapse
|
9
|
Todd OM, Burton JK, Dodds RM, Hollinghurst J, Lyons RA, Quinn TJ, Schneider A, Walesby KE, Wilkinson C, Conroy S, Gale CP, Hall M, Walters K, Clegg AP. New Horizons in the use of routine data for ageing research. Age Ageing 2020; 49:716-722. [PMID: 32043136 PMCID: PMC7444666 DOI: 10.1093/ageing/afaa018] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 12/02/2019] [Accepted: 01/16/2020] [Indexed: 12/14/2022] Open
Abstract
The past three decades have seen a steady increase in the availability of routinely collected health and social care data and the processing power to analyse it. These developments represent a major opportunity for ageing research, especially with the integration of different datasets across traditional boundaries of health and social care, for prognostic research and novel evaluations of interventions with representative populations of older people. However, there are considerable challenges in using routine data at the level of coding, data analysis and in the application of findings to everyday care. New Horizons in applying routine data to investigate novel questions in ageing research require a collaborative approach between clinicians, data scientists, biostatisticians, epidemiologists and trial methodologists. This requires building capacity for the next generation of research leaders in this important area. There is a need to develop consensus code lists and standardised, validated algorithms for common conditions and outcomes that are relevant for older people to maximise the potential of routine data research in this group. Lastly, we must help drive the application of routine data to improve the care of older people, through the development of novel methods for evaluation of interventions using routine data infrastructure. We believe that harnessing routine data can help address knowledge gaps for older people living with multiple conditions and frailty, and design interventions and pathways of care to address the complex health issues we face in caring for older people.
Collapse
Affiliation(s)
- Oliver M Todd
- Academic Unit of Elderly Care and Rehabilitation, Bradford Teaching Hospitals NHS Trust, University of Leeds, Bradford, UK
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
| | - Jennifer K Burton
- Academic Section of Geriatric Medicine, Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow G4 OSF, UK
| | - Richard M Dodds
- AGE Research Group, Translational and Clinical Research Institute, Newcastle University, Newcastle, UK
| | - Joe Hollinghurst
- Health Data Research UK (HDR-UK), Swansea University, Swansea, UK
| | - Ronan A Lyons
- Health Data Research UK (HDR-UK), Swansea University, Swansea, UK
| | - Terence J Quinn
- Academic Section of Geriatric Medicine, Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow G4 OSF, UK
| | - Anna Schneider
- School of Health & Social Care, Scottish Centre for Administrative Data Research, Edinburgh Napier University, Edinburgh, UK
| | - Katherine E Walesby
- Alzheimer Scotland Dementia Research Centre, University of Edinburgh, Edinburgh EH8 9JZ, UK
| | - Chris Wilkinson
- Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, UK
| | - Simon Conroy
- Department of Health Sciences, University of Leicester, Leicester, UK
| | - Chris P Gale
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
| | - Marlous Hall
- Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
- Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
| | - Kate Walters
- Centre for Ageing Population Studies, Department of Primary Care & Population Health, Institute of Epidemiology & Health Care, University College, London, UK
| | - Andrew P Clegg
- Academic Unit of Elderly Care and Rehabilitation, Bradford Teaching Hospitals NHS Trust, University of Leeds, Bradford, UK
| |
Collapse
|
10
|
Perez-Pozuelo I, Zhai B, Palotti J, Mall R, Aupetit M, Garcia-Gomez JM, Taheri S, Guan Y, Fernandez-Luque L. The future of sleep health: a data-driven revolution in sleep science and medicine. NPJ Digit Med 2020; 3:42. [PMID: 32219183 PMCID: PMC7089984 DOI: 10.1038/s41746-020-0244-4] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 02/18/2020] [Indexed: 01/04/2023] Open
Abstract
In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applications ranging from large-scale epidemiological research linking sleep patterns to disease, to wellness applications, including the sleep coaching of individuals with chronic conditions. However, in order to realise the full potential of these technologies for individuals, medicine and research, several significant challenges must be overcome. There are important outstanding questions regarding performance evaluation, as well as data storage, curation, processing, integration, modelling and interpretation. Here, we leverage expertise across neuroscience, clinical medicine, bioengineering, electrical engineering, epidemiology, computer science, mHealth and human-computer interaction to discuss the digitisation of sleep from a inter-disciplinary perspective. We introduce the state-of-the-art in sleep-monitoring technologies, and discuss the opportunities and challenges from data acquisition to the eventual application of insights in clinical and consumer settings. Further, we explore the strengths and limitations of current and emerging sensing methods with a particular focus on novel data-driven technologies, such as Artificial Intelligence.
Collapse
Affiliation(s)
- Ignacio Perez-Pozuelo
- Department of Medicine, University of Cambridge, Cambridge, UK
- The Alan Turing Institute, London, UK
| | - Bing Zhai
- Open Lab, University of Newcastle, Newcastle, UK
| | - Joao Palotti
- Qatar Computing Research Institute, HBKU, Doha, Qatar
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA USA
| | | | | | - Juan M. Garcia-Gomez
- BDSLab, Instituto Universitario de Tecnologias de la Informacion y Comunicaciones-ITACA, Universitat Politecnica de Valencia, Valencia, Spain
| | - Shahrad Taheri
- Department of Medicine and Clinical Research Core, Weill Cornell Medicine - Qatar, Qatar Foundation, Doha, Qatar
| | - Yu Guan
- Open Lab, University of Newcastle, Newcastle, UK
| | | |
Collapse
|
11
|
Rockenschaub P, Nguyen V, Aldridge RW, Acosta D, García-Gómez JM, Sáez C. Data-driven discovery of changes in clinical code usage over time: a case-study on changes in cardiovascular disease recording in two English electronic health records databases (2001-2015). BMJ Open 2020; 10:e034396. [PMID: 32060159 PMCID: PMC7045100 DOI: 10.1136/bmjopen-2019-034396] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVES To demonstrate how data-driven variability methods can be used to identify changes in disease recording in two English electronic health records databases between 2001 and 2015. DESIGN Repeated cross-sectional analysis that applied data-driven temporal variability methods to assess month-by-month changes in routinely collected medical data. A measure of difference between months was calculated based on joint distributions of age, gender, socioeconomic status and recorded cardiovascular diseases. Distances between months were used to identify temporal trends in data recording. SETTING 400 English primary care practices from the Clinical Practice Research Datalink (CPRD GOLD) and 451 hospital providers from the Hospital Episode Statistics (HES). MAIN OUTCOMES The proportion of patients (CPRD GOLD) and hospital admissions (HES) with a recorded cardiovascular disease (CPRD GOLD: coronary heart disease, heart failure, peripheral arterial disease, stroke; HES: International Classification of Disease codes I20-I69/G45). RESULTS Both databases showed gradual changes in cardiovascular disease recording between 2001 and 2008. The recorded prevalence of included cardiovascular diseases in CPRD GOLD increased by 47%-62%, which partially reversed after 2008. For hospital records in HES, there was a relative decrease in angina pectoris (-34.4%) and unspecified stroke (-42.3%) over the same time period, with a concomitant increase in chronic coronary heart disease (+14.3%). Multiple abrupt changes in the use of myocardial infarction codes in hospital were found in March/April 2010, 2012 and 2014, possibly linked to updates of clinical coding guidelines. CONCLUSIONS Identified temporal variability could be related to potentially non-medical causes such as updated coding guidelines. These artificial changes may introduce temporal correlation among diagnoses inferred from routine data, violating the assumptions of frequently used statistical methods. Temporal variability measures provide an objective and robust technique to identify, and subsequently account for, those changes in electronic health records studies without any prior knowledge of the data collection process.
Collapse
Affiliation(s)
- Patrick Rockenschaub
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
| | - Vincent Nguyen
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
| | - Robert W Aldridge
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
| | - Dionisio Acosta
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, London, UK
| | - Juan Miguel García-Gómez
- Instituto de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Valencia, Spain
| | - Carlos Sáez
- Instituto de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
12
|
Li Y, Sperrin M, Martin GP, Ashcroft DM, van Staa TP. Examining the impact of data quality and completeness of electronic health records on predictions of patients' risks of cardiovascular disease. Int J Med Inform 2019; 133:104033. [PMID: 31785526 DOI: 10.1016/j.ijmedinf.2019.104033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 10/16/2019] [Accepted: 11/08/2019] [Indexed: 10/25/2022]
Abstract
OBJECTIVE To assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). DESIGN Longitudinal cohort study. SETTINGS 392 general practices (including 3.6 million patients) linked to hospital admission data. METHODS Variation in data quality was assessed using Sáez's stability metrics quantifying outlyingness of each practice. Statistical frailty models evaluated whether accuracy of QRISK3 predictions on individual predictions and effects of overall risk factors (linear predictor) varied between practices. RESULTS There was substantial heterogeneity between practices in CVD incidence unaccounted for by QRISK3. In the lowest quintile of statistical frailty, a QRISK3 predicted risk of 10 % for female was in a range between 7.1 % and 9.0 % when incorporating practice variability into the statistical frailty models; for the highest quintile, this was 10.9%-16.4%. Data quality (using Saez metrics) and completeness were comparable across different levels of statistical frailty. For example, recording of missing information on ethnicity was 55.7 %, 62.7 %, 57.8 %, 64.8 % and 62.1 % for practices from lowest to highest quintiles of statistical frailty respectively. The effects of risk factors did not vary between practices with little statistical variation of beta coefficients. CONCLUSIONS The considerable unmeasured heterogeneity in CVD incidence between practices was not explained by variations in data quality or effects of risk factors. QRISK3 risk prediction should be supplemented with clinical judgement and evidence of additional risk factors.
Collapse
Affiliation(s)
- Yan Li
- Health e-Research Centre, Farr Institute, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Sciences Centre (MAHSC), Oxford Road, Manchester, M13 9PL, UK
| | - Matthew Sperrin
- Health e-Research Centre, Farr Institute, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Sciences Centre (MAHSC), Oxford Road, Manchester, M13 9PL, UK
| | - Glen P Martin
- Health e-Research Centre, Farr Institute, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Sciences Centre (MAHSC), Oxford Road, Manchester, M13 9PL, UK
| | - Darren M Ashcroft
- Centre for Pharmacoepidemiology and Drug Safety, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK; NIHR Greater Manchester Patient Safety Translational Research Centre, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Tjeerd Pieter van Staa
- Health e-Research Centre, Farr Institute, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester Academic Health Sciences Centre (MAHSC), Oxford Road, Manchester, M13 9PL, UK; Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, the Netherlands; Alan Turing Institute, Headquartered at the British Library, London, UK.
| |
Collapse
|
13
|
Pérez-Benito FJ, Sáez C, Conejero JA, Tortajada S, Valdivieso B, García-Gómez JM. Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years. PLoS One 2019; 14:e0220369. [PMID: 31390350 PMCID: PMC6685618 DOI: 10.1371/journal.pone.0220369] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 07/15/2019] [Indexed: 12/28/2022] Open
Abstract
OBJECTIVE To evaluate the effects of Process-Reengineering interventions on the Electronic Health Records (EHR) of a hospital over 7 years. MATERIALS AND METHODS Temporal Variability Assessment (TVA) based on probabilistic data quality assessment was applied to the historic monthly-batched admission data of Hospital La Fe Valencia, Spain from 2010 to 2016. Routine healthcare data with a complete EHR was expanded by processed variables such as the Charlson Comorbidity Index. RESULTS Four Process-Reengineering interventions were detected by quantifiable effects on the EHR: (1) the hospital relocation in 2011 involved progressive reduction of admissions during the next four months, (2) the hospital services re-configuration incremented the number of inter-services transfers, (3) the care-services re-distribution led to transfers between facilities (4) the assignment to the hospital of a new area with 80,000 patients in 2015 inspired the discharge to home for follow up and the update of the pre-surgery planned admissions protocol that produced a significant decrease of the patient length of stay. DISCUSSION TVA provides an indicator of the effect of process re-engineering interventions on healthcare practice. Evaluating the effect of facilities' relocation and increment of citizens (findings 1, 3-4), the impact of strategies (findings 2-3), and gradual changes in protocols (finding 4) may help on the hospital management by optimizing interventions based on their effect on EHRs or on data reuse. CONCLUSIONS The effects on hospitals EHR due to process re-engineering interventions can be evaluated using the TVA methodology. Being aware of conditioned variations in EHR is of the utmost importance for the reliable reuse of routine hospitalization data.
Collapse
Affiliation(s)
- Francisco Javier Pérez-Benito
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de Información y Comunicaciones Avanzadas (ITACA), Univeritat Politécnica de València, València, Spain
- Instituto Universitario de Matemática Pura y Aplicada, Universitat Politécnica de València, València, Spain
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de Información y Comunicaciones Avanzadas (ITACA), Univeritat Politécnica de València, València, Spain
| | - J. Alberto Conejero
- Instituto Universitario de Matemática Pura y Aplicada, Universitat Politécnica de València, València, Spain
- * E-mail:
| | - Salvador Tortajada
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de Información y Comunicaciones Avanzadas (ITACA), Univeritat Politécnica de València, València, Spain
- Unidad conjunta de investigación en reingeniería de procesos socio-sanitarios, Instituto de Investigación Sanitaria La Fe, Hospital Universitario La Fe, València, Spain
- Red de Investigación en Servicios de Salud en Enfermedades Crónicas (REDISSEC), València, Spain
| | - Bernardo Valdivieso
- Unidad conjunta de investigación en reingeniería de procesos socio-sanitarios, Instituto de Investigación Sanitaria La Fe, Hospital Universitario La Fe, València, Spain
| | - Juan M. García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de Información y Comunicaciones Avanzadas (ITACA), Univeritat Politécnica de València, València, Spain
- Unidad conjunta de investigación en reingeniería de procesos socio-sanitarios, Instituto de Investigación Sanitaria La Fe, Hospital Universitario La Fe, València, Spain
- Red de Investigación en Servicios de Salud en Enfermedades Crónicas (REDISSEC), València, Spain
| |
Collapse
|
14
|
Do population-level risk prediction models that use routinely collected health data reliably predict individual risks? Sci Rep 2019; 9:11222. [PMID: 31375726 PMCID: PMC6677736 DOI: 10.1038/s41598-019-47712-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 06/28/2019] [Indexed: 11/20/2022] Open
Abstract
The objective of this study was to assess the reliability of individual risk predictions based on routinely collected data considering the heterogeneity between clinical sites in data and populations. Cardiovascular disease (CVD) risk prediction with QRISK3 was used as exemplar. The study included 3.6 million patients in 392 sites from the Clinical Practice Research Datalink. Cox models with QRISK3 predictors and a frailty (random effect) term for each site were used to incorporate unmeasured site variability. There was considerable variation in data recording between general practices (missingness of body mass index ranged from 18.7% to 60.1%). Incidence rates varied considerably between practices (from 0.4 to 1.3 CVD events per 100 patient-years). Individual CVD risk predictions with the random effect model were inconsistent with the QRISK3 predictions. For patients with QRISK3 predicted risk of 10%, the 95% range of predicted risks were between 7.2% and 13.7% with the random effects model. Random variability only explained a small part of this. The random effects model was equivalent to QRISK3 for discrimination and calibration. Risk prediction models based on routinely collected health data perform well for populations but with great uncertainty for individuals. Clinicians and patients need to understand this uncertainty.
Collapse
|
15
|
Juan-Albarracín J, Fuster-Garcia E, García-Ferrando GA, García-Gómez JM. ONCOhabitats: A system for glioblastoma heterogeneity assessment through MRI. Int J Med Inform 2019; 128:53-61. [DOI: 10.1016/j.ijmedinf.2019.05.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 04/30/2019] [Accepted: 05/05/2019] [Indexed: 01/19/2023]
|
16
|
Sáez C, García-Gómez JM. Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. Int J Med Inform 2018; 119:109-124. [DOI: 10.1016/j.ijmedinf.2018.09.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 09/05/2018] [Accepted: 09/13/2018] [Indexed: 01/26/2023]
|
17
|
Juan-Albarracín J, Fuster-Garcia E, Pérez-Girbés A, Aparici-Robles F, Alberich-Bayarri Á, Revert-Ventura A, Martí-Bonmatí L, García-Gómez JM. Glioblastoma: Vascular Habitats Detected at Preoperative Dynamic Susceptibility-weighted Contrast-enhanced Perfusion MR Imaging Predict Survival. Radiology 2018; 287:944-954. [DOI: 10.1148/radiol.2017170845] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Javier Juan-Albarracín
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Elies Fuster-Garcia
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Alexandre Pérez-Girbés
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Fernando Aparici-Robles
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Ángel Alberich-Bayarri
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Antonio Revert-Ventura
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Luis Martí-Bonmatí
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| | - Juan M. García-Gómez
- From the Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain
| |
Collapse
|
18
|
García-de-León-Chocano R, Muñoz-Soler V, Sáez C, García-de-León-González R, García-Gómez JM. Construction of quality-assured infant feeding process of care data repositories: Construction of the perinatal repository (Part 2). Comput Biol Med 2016; 71:214-22. [DOI: 10.1016/j.compbiomed.2016.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 12/03/2015] [Accepted: 01/06/2016] [Indexed: 10/22/2022]
|
19
|
Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification. PLoS One 2015; 10:e0125143. [PMID: 25978453 PMCID: PMC4433123 DOI: 10.1371/journal.pone.0125143] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 03/09/2015] [Indexed: 12/20/2022] Open
Abstract
Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.
Collapse
|
20
|
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Min Knowl Discov 2014. [DOI: 10.1007/s10618-014-0378-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|