1
|
Thayer DS, Mumtaz S, Elmessary MA, Scanlon I, Zinnurov A, Coldea AI, Scanlon J, Chapman M, Curcin V, John A, DelPozo-Banos M, Davies H, Karwath A, Gkoutos GV, Fitzpatrick NK, Quint JK, Varma S, Milner C, Oliveira C, Parkinson H, Denaxas S, Hemingway H, Jefferson E. Creating a next-generation phenotype library: the health data research UK Phenotype Library. JAMIA Open 2024; 7:ooae049. [PMID: 38895652 PMCID: PMC11182945 DOI: 10.1093/jamiaopen/ooae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 02/12/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Objective To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms. Materials and Methods We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly. Results We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups. Discussion This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes. Conclusions We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.
Collapse
Affiliation(s)
- Daniel S Thayer
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Shahzad Mumtaz
- Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom
- School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3UE, United Kingdom
| | - Muhammad A Elmessary
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Ieuan Scanlon
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Artur Zinnurov
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Alex-Ioan Coldea
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Jack Scanlon
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Martin Chapman
- Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
| | - Vasa Curcin
- Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
| | - Ann John
- Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Marcos DelPozo-Banos
- Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Hannah Davies
- SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Natalie K Fitzpatrick
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
| | - Jennifer K Quint
- School of Public Health and National Heart and Lung Institute, Imperial College London, London, W12 0BZ, United Kingdom
| | - Susheel Varma
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Chris Milner
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Carla Oliveira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
- University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom
- British Heart Foundation Data Science Center, Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
- University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom
| | - Emily Jefferson
- Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom
- Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
| |
Collapse
|
2
|
Pan CX, He ZF, Lin SZ, Yue JQ, Chen ZM, Guan WJ. Clinical Characteristics and Outcomes of the Phenotypes of COPD-Bronchiectasis Association. Arch Bronconeumol 2024; 60:356-363. [PMID: 38714385 DOI: 10.1016/j.arbres.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/16/2024] [Accepted: 04/03/2024] [Indexed: 05/09/2024]
Abstract
INTRODUCTION Although COPD may frequently co-exist with bronchiectasis [COPD-bronchiectasis associated (CBA)], little is known regarding the clinical heterogeneity. We aimed to identify the phenotypes and compare the clinical characteristics and prognosis of CBA. METHODS We conducted a retrospective cohort study involving 2928 bronchiectasis patients, 5158 COPD patients, and 1219 patients with CBA hospitalized between July 2017 and December 2020. We phenotyped CBA with a two-step clustering approach and validated in an independent retrospective cohort with decision-tree algorithms. RESULTS Compared with patients with COPD or bronchiectasis alone, patients with CBA had significantly longer disease duration, greater lung function impairment, and increased use of intravenous antibiotics during hospitalization. We identified five clusters of CBA. Cluster 1 (N=120, CBA-MS) had predominantly moderate-severe bronchiectasis, Cluster 2 (N=108, CBA-FH) was characterized by frequent hospitalization within the previous year, Cluster 3 (N=163, CBA-BI) had bacterial infection, Cluster 4 (N=143, CBA-NB) had infrequent hospitalization but no bacterial infection, and Cluster 5 (N=113, CBA-NHB) had no hospitalization or bacterial infection in the past year. The decision-tree model predicted the cluster assignment in the validation cohort with 91.8% accuracy. CBA-MS, CBA-BI, and CBA-FH exhibited higher risks of hospital re-admission and intensive care unit admission compared with CBA-NHB during follow-up (all P<0.05). Of the five clusters, CBA-FH conferred the worst clinical prognosis. CONCLUSION Bronchiectasis severity, recent hospitalizations and sputum culture findings are three defining variables accounting for most heterogeneity of CBA, the characterization of which will help refine personalized clinical management.
Collapse
Affiliation(s)
- Cui-Xia Pan
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Zhen-Feng He
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Sheng-Zhu Lin
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Jun-Qing Yue
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Zhao-Ming Chen
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Wei-Jie Guan
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China; Guangzhou National Laboratory, Guangzhou, Guangdong, China.
| |
Collapse
|
3
|
Calabria S, Ronconi G, Dondi L, Dondi L, Dell'Anno I, Nordon C, Rhodes K, Rogliani P, Dentali F, Martini N, Maggioni AP. Cardiovascular events after exacerbations of chronic obstructive pulmonary disease: Results from the EXAcerbations of COPD and their OutcomeS in CardioVascular diseases study in Italy. Eur J Intern Med 2024:S0953-6205(24)00181-X. [PMID: 38729787 DOI: 10.1016/j.ejim.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/16/2024] [Accepted: 04/27/2024] [Indexed: 05/12/2024]
Abstract
INTRODUCTION Exacerbations of chronic obstructive pulmonary disease (COPD) can increase the risk of severe cardiovascular events. OBJECTIVE Assess the crude incidence rates (IR) of cardiovascular events and the impact of exacerbations on the risk of cardiovascular events within different time periods following an exacerbation. METHODS COPD patients aged ≥45 years between 01/01/2015 and 12/31/2018 were identified from the Fondazione Ricerca e Salute administrative database. IRs of severe non-fatal and fatal cardiovascular events were obtained for post-exacerbation time periods (1-7, 8-14, 15-30, 31-180, 181-365 days). Time-dependent Cox proportional hazard models compared cardiovascular risks between periods with and without exacerbations. RESULTS Of 216,864 COPD patients, >55 % were male, mean age was 74 years, frequent comorbidities were cardiovascular, metabolic and psychiatric. During an average 34-month follow-up, 69,620 (32 %) patients had ≥1 exacerbation and 46,214 (21 %) experienced ≥1 cardiovascular event. During follow-up, 55,470 patients died; 4,661 were in-hospital cardiovascular-related deaths. Among 10,269 patients experiencing cardiovascular events within 365 days post-exacerbation, the IR was 15.8 per 100 person-years (95 %CI 15.5-16.1). Estimated hazard ratios (HR) for the cardiovascular event risk associated with periods post-exacerbation were highest within 7 days (HR: 34.3, 95 %CI: 33.1-35.6), especially for heart failure (HR 50.6; 95 %CI 48.6-52.7) and remained elevated throughout 365 days (HR 1.1, 95 %CI 1.02-1.13). CONCLUSIONS COPD patients in Italy are at high risk of severe cardiovascular events following exacerbations, suggesting the need to prevent exacerbations and possible subsequent cardiovascular events through early interventions and treatment optimization.
Collapse
Affiliation(s)
- Silvia Calabria
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy
| | - Giulia Ronconi
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy
| | - Letizia Dondi
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy
| | - Leonardo Dondi
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy
| | - Irene Dell'Anno
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy.
| | | | - Kirsty Rhodes
- BioPharmaceuticals Medical, AstraZeneca, Cambridge, UK
| | - Paola Rogliani
- Unit of Respiratory Medicine, Department of Experimental Medicine, University of Rome "Tor Vergata", Rome, Italy
| | - Francesco Dentali
- Department of Internal Medicine, ASST dei Sette Laghi, Varese, Italy
| | - Nello Martini
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy
| | - Aldo Pietro Maggioni
- Fondazione Ricerca e Salute (ReS) - Research and Health Foundation, Rome, Italy; ANMCO Research Center Heart Care Foundation, Firenze, Italy
| |
Collapse
|
4
|
Tate NM, Yamkate P, Xenoulis PG, Steiner JM, Behling‐Kelly EL, Rendahl AK, Wu Y, Furrow E. Clustering analysis of lipoprotein profiles to identify subtypes of hypertriglyceridemia in Miniature Schnauzers. J Vet Intern Med 2024; 38:971-979. [PMID: 38348783 PMCID: PMC10937497 DOI: 10.1111/jvim.17010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 01/26/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Hypertriglyceridemia (HTG) is prevalent in Miniature Schnauzers, predisposing them to life-threatening diseases. Varied responses to management strategies suggest the possibility of multiple subtypes. HYPOTHESIS/OBJECTIVE To identify and characterize HTG subtypes in Miniature Schnauzers through cluster analysis of lipoprotein profiles. We hypothesize that multiple phenotypes of primary HTG exist in this breed. ANIMALS Twenty Miniature Schnauzers with normal serum triglyceride concentration (NTG), 25 with primary HTG, and 5 with secondary HTG. METHODS Cross-sectional study using archived samples. Lipoprotein profiles, generated using continuous lipoprotein density profiling, were clustered with hierarchical cluster analysis. Clinical data (age, sex, body condition score, and dietary fat content) was compared between clusters. RESULTS Six clusters were identified. Dogs with primary HTG were dispersed among 4 clusters. One cluster showed the highest intensities for triglyceride-rich lipoprotein (TRL) and low-density lipoprotein (LDL) fractions and also included 4 dogs with secondary HTG. Two clusters had moderately high TRL fraction intensities and low-to-intermediate LDL intensities. The fourth cluster had high LDL but variable TRL fraction intensities with equal numbers of NTG and mild HTG dogs. The final 2 clusters comprised only NTG dogs with low TRL intensities and low-to-intermediate LDL intensities. The clusters did not appear to be driven by differences in the clinical data. CONCLUSIONS AND CLINICAL IMPORTANCE The results of this study support a spectrum of lipoprotein phenotypes within Miniature Schnauzers that cannot be predicted by triglyceride concentration alone. Lipoprotein profiling might be useful to determine if subtypes have different origins, clinical consequences, and response to treatment.
Collapse
Affiliation(s)
- Nicole M. Tate
- Department of Veterinary Clinical SciencesUniversity of Minnesota, College of Veterinary MedicineSt. PaulMinnesotaUSA
| | - Punyamanee Yamkate
- Gastrointestinal Laboratory, Department of Small Animal Clinical Sciences, School of Veterinary Medicine and Biomedical SciencesTexas A & M UniversityCollege StationTexasUSA
| | - Panagiotis G. Xenoulis
- Gastrointestinal Laboratory, Department of Small Animal Clinical Sciences, School of Veterinary Medicine and Biomedical SciencesTexas A & M UniversityCollege StationTexasUSA
- Clinic of Medicine, Faculty of Veterinary MedicineUniversity of ThessalyKarditsaGreece
| | - Jörg M. Steiner
- Gastrointestinal Laboratory, Department of Small Animal Clinical Sciences, School of Veterinary Medicine and Biomedical SciencesTexas A & M UniversityCollege StationTexasUSA
| | - Erica L. Behling‐Kelly
- Department of Population Medicine and Diagnostic Sciences, College of Veterinary MedicineCornell UniversityIthacaNew YorkUSA
| | - Aaron K. Rendahl
- Department of Veterinary Clinical SciencesUniversity of Minnesota, College of Veterinary MedicineSt. PaulMinnesotaUSA
| | - Yu‐An Wu
- Gastrointestinal Laboratory, Department of Small Animal Clinical Sciences, School of Veterinary Medicine and Biomedical SciencesTexas A & M UniversityCollege StationTexasUSA
| | - Eva Furrow
- Department of Veterinary Clinical SciencesUniversity of Minnesota, College of Veterinary MedicineSt. PaulMinnesotaUSA
| |
Collapse
|
5
|
Rhodes JS, Aumon A, Morin S, Girard M, Larochelle C, Brunet-Ratnasingham E, Pagliuzza A, Marchitto L, Zhang W, Cutler A, Grand'Maison F, Zhou A, Finzi A, Chomont N, Kaufmann DE, Zandee S, Prat A, Wolf G, Moon KR. Gaining Biological Insights through Supervised Data Visualization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568384. [PMID: 38293135 PMCID: PMC10827133 DOI: 10.1101/2023.11.22.568384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHATE, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE's prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.
Collapse
|
6
|
Koblizek V, Milenkovic B, Svoboda M, Kocianova J, Holub S, Zindr V, Ilic M, Jankovic J, Cupurdija V, Jarkovsky J, Popov B, Valipour A. RETRO-POPE: A Retrospective, Multicenter, Real-World Study of All-Cause Mortality in COPD. Int J Chron Obstruct Pulmon Dis 2023; 18:2661-2672. [PMID: 38022829 PMCID: PMC10661906 DOI: 10.2147/copd.s426919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 11/09/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose The Phenotypes of COPD in Central and Eastern Europe (POPE) study assessed the prevalence and clinical characteristics of four clinical COPD phenotypes, but not mortality. This retrospective analysis of the POPE study (RETRO-POPE) investigated the relationship between all-cause mortality and patient characteristics using two grouping methods: clinical phenotyping (as in POPE) and Burgel clustering, to better identify high-risk patients. Patients and Methods The two largest POPE study patient cohorts (Czech Republic and Serbia) were categorized into one of four clinical phenotypes (acute exacerbators [with/without chronic bronchitis], non-exacerbators, asthma-COPD overlap), and one of five Burgel clusters based on comorbidities, lung function, age, body mass index (BMI) and dyspnea (very severe comorbid, very severe respiratory, moderate-to-severe respiratory, moderate-to-severe comorbid/obese, and mild respiratory). Patients were followed-up for approximately 7 years for survival status. Results Overall, 801 of 1,003 screened patients had sufficient data for analysis. Of these, 440 patients (54.9%) were alive and 361 (45.1%) had died at the end of follow-up. Analysis of survival by clinical phenotype showed no significant differences between the phenotypes (P=0.211). However, Burgel clustering demonstrated significant differences in survival between clusters (P<0.001), with patients in the "very severe comorbid" and "very severe respiratory" clusters most likely to die. Overall survival was not significantly different between Serbia and the Czech Republic after adjustment for age, BMI, comorbidities and forced expiratory volume in 1 second (hazard ratio [HR] 0.80, 95% confidence interval [CI] 0.65-0.99; P=0.036 [unadjusted]; HR 0.88, 95% CI 0.7-1.1; P=0.257 [adjusted]). The most common causes of death were respiratory-related (36.8%), followed by cardiovascular (25.2%) then neoplasm (15.2%). Conclusion Patient clusters based on comorbidities, lung function, age, BMI and dyspnea were more likely to show differences in COPD mortality risk than phenotypes defined by exacerbation history and presence/absence of chronic bronchitis and/or asthmatic features.
Collapse
Affiliation(s)
- Vladimir Koblizek
- Department of Pneumology, University Hospital, Hradec Kralove, Czech Republic
- Faculty of Medicine Hradec Kralove, Charles University, Hradec Kralove, Czech Republic
| | - Branislava Milenkovic
- Clinic for Pulmonary Diseases, Clinical Center of Serbia, Belgrade, Serbia
- Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Michal Svoboda
- Institute of Biostatistics and Analyses Ltd., Brno, Czech Republic
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jana Kocianova
- Outpatient Department of Pneumology Alveolus, APRO MED, Ostrava, Czech Republic
| | - Stanislav Holub
- Outpatient Chest Clinic, Plicni Stredisko Teplice Ltd., Teplice, Czech Republic
| | - Vladimir Zindr
- Outpatient Chest Clinic, PNEUMO KV Ltd., Karlovy Vary, Czech Republic
| | - Miroslav Ilic
- Faculty of Medicine, University of Novi Sad, Novi Sad, Serbia
- Clinic for Tuberculosis and Interstitial Lung Diseases, PolyClinic Department, Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica, Serbia
| | - Jelena Jankovic
- Clinic for Pulmonary Diseases, Clinical Center of Serbia, Belgrade, Serbia
- Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Vojislav Cupurdija
- Department of Internal Medicine, Faculty of Medical Sciences, University of Kragujevac, Kragujevac, Serbia
- Clinic for Pulmonology, University Clinical Center Kragujevac, Kragujevac, Serbia
| | - Jiri Jarkovsky
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Boris Popov
- Medicine Department, Boehringer Ingelheim Serbia d.o.o. Beograd, Belgrade, Serbia
| | - Arschang Valipour
- Karl Landsteiner Institute for Lung Research and Pulmonary Oncology, Klinik Floridsdorf, Vienna Health Care Group, Vienna, Austria
| |
Collapse
|
7
|
Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS One 2023; 18:e0274276. [PMID: 37682909 PMCID: PMC10491005 DOI: 10.1371/journal.pone.0274276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open
Abstract
With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.
Collapse
Affiliation(s)
- Hebatullah Abdulazeem
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Sera Whitelaw
- Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Gunther Schauberger
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Stefanie J. Klug
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| |
Collapse
|
8
|
Beijers RJHCG, Steiner MC, Schols AMWJ. The role of diet and nutrition in the management of COPD. Eur Respir Rev 2023; 32:32/168/230003. [PMID: 37286221 DOI: 10.1183/16000617.0003-2023] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/27/2023] [Indexed: 06/09/2023] Open
Abstract
In 2014, the European Respiratory Society published a statement on nutritional assessment and therapy in COPD. Since then, increasing research has been performed on the role of diet and nutrition in the prevention and management of COPD. Here, we provide an overview of recent scientific advances and clinical implications. Evidence for a potential role of diet and nutrition as a risk factor in the development of COPD has been accumulating and is reflected in the dietary patterns of patients with COPD. Consuming a healthy diet should, therefore, be promoted in patients with COPD. Distinct COPD phenotypes have been identified incorporating nutritional status, ranging from cachexia and frailty to obesity. The importance of body composition assessment and the need for tailored nutritional screening instruments is further highlighted. Dietary interventions and targeted single or multi-nutrient supplementation can be beneficial when optimal timing is considered. The therapeutic window of opportunity for nutritional interventions during and recovering from an acute exacerbation and hospitalisation is underexplored.
Collapse
Affiliation(s)
- Rosanne J H C G Beijers
- Department of Respiratory Medicine, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Michael C Steiner
- Leicester NIHR Biomedical Research Centre - Respiratory, Department of Respiratory Sciences, University of Leicester, Leicester, UK
| | - Annemie M W J Schols
- Department of Respiratory Medicine, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University Medical Centre+, Maastricht, The Netherlands
| |
Collapse
|
9
|
Pikoula M, Kallis C, Madjiheurem S, Quint JK, Bafadhel M, Denaxas S. Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity. PLoS One 2023; 18:e0287264. [PMID: 37319288 PMCID: PMC10270623 DOI: 10.1371/journal.pone.0287264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 06/01/2023] [Indexed: 06/17/2023] Open
Abstract
BACKGROUND The ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated. AIMS Our aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease. METHODS Using data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient. RESULTS Each of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient. CONCLUSIONS Data transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline.
Collapse
Affiliation(s)
- Maria Pikoula
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Constantinos Kallis
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Sephora Madjiheurem
- Department of Electronic and Electrical Engineering, University College London, London, United Kingdom
| | - Jennifer K. Quint
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Mona Bafadhel
- School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
10
|
Zhang B, Wang J, Chen J, Ling Z, Ren Y, Xiong D, Guo L. Machine learning in chronic obstructive pulmonary disease. Chin Med J (Engl) 2023; 136:536-538. [PMID: 35946787 PMCID: PMC10106241 DOI: 10.1097/cm9.0000000000002247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Indexed: 11/26/2022] Open
Affiliation(s)
- Bochao Zhang
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Jiping Wang
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Jing Chen
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Zongquan Ling
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Yuhao Ren
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Daxi Xiong
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| | - Liquan Guo
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
| |
Collapse
|
11
|
Dashtban A, Mizani MA, Pasea L, Denaxas S, Corbett R, Mamza JB, Gao H, Morris T, Hemingway H, Banerjee A. Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine 2023; 89:104489. [PMID: 36857859 PMCID: PMC9989643 DOI: 10.1016/j.ebiom.2023.104489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING AstraZeneca UK Ltd, Health Data Research UK.
Collapse
Affiliation(s)
- Ashkan Dashtban
- Institute of Health Informatics, University College London, London, UK
| | - Mehrdad A Mizani
- Institute of Health Informatics, University College London, London, UK; British Heart Foundation Data Science Centre, Health Data Research UK, London, UK
| | - Laura Pasea
- Institute of Health Informatics, University College London, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
| | | | - Jil B Mamza
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - He Gao
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Tamsin Morris
- Medical and Scientific Affairs, BioPharmaceuticals Medical, AstraZeneca, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK; Health Data Research UK, University College London, London, UK
| | - Amitava Banerjee
- Institute of Health Informatics, University College London, London, UK; Barts Health NHS Trust, London, UK; University College London Hospitals NHS Trust, London, UK.
| |
Collapse
|
12
|
Agglomerative and divisive hierarchical Bayesian clustering. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Liu H, Dai H, Chen J, Xu J, Tao Y, Lin H. Interactive similar patient retrieval for visual summary of patient outcomes. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00898-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Li XF, Wan CQ, Mao YM. Analysis of pathogenesis and drug treatment of chronic obstructive pulmonary disease complicated with cardiovascular disease. Front Med (Lausanne) 2022; 9:979959. [PMID: 36405582 PMCID: PMC9672343 DOI: 10.3389/fmed.2022.979959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/05/2022] [Indexed: 09/19/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is a disease characterized by persistent airflow limitation, and is associated with abnormal inflammatory responses in the lungs to cigarette smoke and toxic and harmful gases. Due to the existence of common risk factors, COPD is prone to multiple complications, among which cardiovascular disease (CVD) is the most common. It is currently established that cardiovascular comorbidities increase the risk of exacerbations and mortality from COPD. COPD is also an independent risk factor for CVD, and its specific mechanism is still unclear, which may be related to chronic systemic inflammation, oxidative stress, and vascular dysfunction. There is evidence that chronic inflammation of the airways can lead to destruction of the lung parenchyma and decreased lung function. Inflammatory cells in the airways also generate reactive oxygen species in the lungs, and reactive oxygen species further promote lung inflammation through signal transduction and other pathways. Inflammatory mediators circulate from the lungs to the whole body, causing intravascular dysfunction, promoting the formation and rupture of atherosclerotic plaques, and ultimately leading to the occurrence and development of CVD. This article reviews the pathophysiological mechanisms of COPD complicated by CVD and the effects of common cardiovascular drugs on COPD.
Collapse
Affiliation(s)
- Xiao-Fang Li
- Department of Respiratory Medicine, The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| | - Cheng-Quan Wan
- Department of Neonatology, Luoyang Maternal and Child Health Hospital,, Luoyang, Henan, China
| | - Yi-Min Mao
- Department of Respiratory Medicine, The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, Henan, China
| |
Collapse
|
15
|
Zohdi H, Natale L, Scholkmann F, Wolf U. Intersubject Variability in Cerebrovascular Hemodynamics and Systemic Physiology during a Verbal Fluency Task under Colored Light Exposure: Clustering of Subjects by Unsupervised Machine Learning. Brain Sci 2022; 12:1449. [PMID: 36358375 PMCID: PMC9688708 DOI: 10.3390/brainsci12111449] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 10/18/2023] Open
Abstract
There is large intersubject variability in cerebrovascular hemodynamic and systemic physiological responses induced by a verbal fluency task (VFT) under colored light exposure (CLE). We hypothesized that machine learning would enable us to classify the response patterns and provide new insights into the common response patterns between subjects. In total, 32 healthy subjects (15 men and 17 women, age: 25.5 ± 4.3 years) were exposed to two different light colors (red vs. blue) in a randomized cross-over study design for 9 min while performing a VFT. We used the systemic physiology augmented functional near-infrared spectroscopy (SPA-fNIRS) approach to measure cerebrovascular hemodynamics and oxygenation at the prefrontal cortex (PFC) and visual cortex (VC) concurrently with systemic physiological parameters. We found that subjects were suitably classified by unsupervised machine learning into different groups according to the changes in the following parameters: end-tidal carbon dioxide, arterial oxygen saturation, skin conductance, oxygenated hemoglobin in the VC, and deoxygenated hemoglobin in the PFC. With hard clustering methods, three and five different groups of subjects were found for the blue and red light exposure, respectively. Our results highlight the fact that humans show specific reactivity types to the CLE-VFT experimental paradigm.
Collapse
Affiliation(s)
- Hamoon Zohdi
- Institute of Complementary and Integrative Medicine, University of Bern, 3012 Bern, Switzerland
| | - Luciano Natale
- Institute of Complementary and Integrative Medicine, University of Bern, 3012 Bern, Switzerland
| | - Felix Scholkmann
- Institute of Complementary and Integrative Medicine, University of Bern, 3012 Bern, Switzerland
- Biomedical Optics Research Laboratory, Neonatology Research, Department of Neonatology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Ursula Wolf
- Institute of Complementary and Integrative Medicine, University of Bern, 3012 Bern, Switzerland
| |
Collapse
|
16
|
Hurst JR, Han MK, Singh B, Sharma S, Kaur G, de Nigris E, Holmgren U, Siddiqui MK. Prognostic risk factors for moderate-to-severe exacerbations in patients with chronic obstructive pulmonary disease: a systematic literature review. Respir Res 2022; 23:213. [PMID: 35999538 PMCID: PMC9396841 DOI: 10.1186/s12931-022-02123-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 07/20/2022] [Indexed: 11/30/2022] Open
Abstract
Background Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide. COPD exacerbations are associated with a worsening of lung function, increased disease burden, and mortality, and, therefore, preventing their occurrence is an important goal of COPD management. This review was conducted to identify the evidence base regarding risk factors and predictors of moderate-to-severe exacerbations in patients with COPD. Methods A literature review was performed in Embase, MEDLINE, MEDLINE In-Process, and the Cochrane Central Register of Controlled Trials (CENTRAL). Searches were conducted from January 2015 to July 2019. Eligible publications were peer-reviewed journal articles, published in English, that reported risk factors or predictors for the occurrence of moderate-to-severe exacerbations in adults age ≥ 40 years with a diagnosis of COPD. Results The literature review identified 5112 references, of which 113 publications (reporting results for 76 studies) met the eligibility criteria and were included in the review. Among the 76 studies included, 61 were observational and 15 were randomized controlled clinical trials. Exacerbation history was the strongest predictor of future exacerbations, with 34 studies reporting a significant association between history of exacerbations and risk of future moderate or severe exacerbations. Other significant risk factors identified in multiple studies included disease severity or bronchodilator reversibility (39 studies), comorbidities (34 studies), higher symptom burden (17 studies), and higher blood eosinophil count (16 studies). Conclusions This systematic literature review identified several demographic and clinical characteristics that predict the future risk of COPD exacerbations. Prior exacerbation history was confirmed as the most important predictor of future exacerbations. These prognostic factors may help clinicians identify patients at high risk of exacerbations, which are a major driver of the global burden of COPD, including morbidity and mortality. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-022-02123-5.
Collapse
Affiliation(s)
- John R Hurst
- UCL Respiratory, University College London, London, WC1E 6BT, UK.
| | - MeiLan K Han
- Division of Pulmonary and Critical Care, University of Michigan, Ann Arbor, MI, USA
| | | | | | | | | | | | | |
Collapse
|
17
|
Balbirsingh V, Mohammed AS, Turner AM, Newnham M. Cardiovascular disease in chronic obstructive pulmonary disease: a narrative review. Thorax 2022; 77:thoraxjnl-2021-218333. [PMID: 35772939 DOI: 10.1136/thoraxjnl-2021-218333] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 06/06/2022] [Indexed: 11/04/2022]
Abstract
Patients with chronic obstructive pulmonary disease (COPD) are at increased risk of cardiovascular disease (CVD) and concomitant disease leads to reduced quality of life, increased hospitalisations and worse survival. Acute pulmonary exacerbations are an important contributor to COPD burden and are associated with increased cardiovascular (CV) events. Both COPD and CVD represent a significant global disease impact and understanding the relationship between the two could potentially reduce this burden. The association between CVD and COPD could be a consequence of (1) shared risk factors (environmental and/or genetic) (2) shared pathophysiological pathways (3) coassociation from a high prevalence of both diseases (4) adverse effects (including pulmonary exacerbations) of COPD contributing to CVD and (5) CVD medications potentially worsening COPD and vice versa. CV risk in COPD has traditionally been associated with increasing disease severity, but there are other relevant COPD subtype associations including radiological subtypes, those with frequent pulmonary exacerbations and novel disease clusters. While the prevalence of CVD is high in COPD populations, it may be underdiagnosed, and improved risk prediction, diagnosis and treatment optimisation could lead to improved outcomes. This state-of-the-art review will explore the incidence/prevalence, COPD subtype associations, shared pathophysiology and genetics, risk prediction, and treatment of CVD in COPD.
Collapse
Affiliation(s)
- Vishanna Balbirsingh
- College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Andrea S Mohammed
- College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Alice M Turner
- Institute of Applied Health Research, University of Birmingham College of Medical and Dental Sciences, Birmingham, UK
| | - Michael Newnham
- Institute of Applied Health Research, University of Birmingham College of Medical and Dental Sciences, Birmingham, UK
| |
Collapse
|
18
|
Maurits MP, Korsunsky I, Raychaudhuri S, Murphy SN, Smoller JW, Weiss ST, Petukhova LM, Weng C, Wei WQ, Huizinga TWJ, Reinders MJT, Karlson EW, van den Akker EB, Knevel R. A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. J Am Med Inform Assoc 2022; 29:761-769. [PMID: 35139533 PMCID: PMC9122640 DOI: 10.1093/jamia/ocac008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/24/2021] [Accepted: 01/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.
Collapse
Affiliation(s)
- Marc P Maurits
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ilya Korsunsky
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Shawn N Murphy
- Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
| | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lynn M Petukhova
- Lynn M. Petukhova, Department of Dermatology at NewYork-Presbyterian/Columbia University Medical Center (CUMC)
| | - Chunhua Weng
- Chunhua Weng, Biomedical Informatics - Columbia University
| | - Wei-Qi Wei
- Wei-Qi Wei, Biomedical Informatics in the School of Medicine at Vanderbilt University Wei
| | - Thomas W J Huizinga
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Marcel J T Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- The Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
| | - Elizabeth W Karlson
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Erik B van den Akker
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
19
|
MacRae C, Whittaker H, Mukherjee M, Daines L, Morgan A, Iwundu C, Alsallakh M, Vasileiou E, O’Rourke E, Williams AT, Stone PW, Sheikh A, Quint JK. Deriving a Standardised Recommended Respiratory Disease Codelist Repository for Future Research. Pragmat Obs Res 2022; 13:1-8. [PMID: 35210898 PMCID: PMC8859726 DOI: 10.2147/por.s353400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/26/2022] [Indexed: 11/23/2022] Open
Abstract
Background Electronic health record (EHR) databases provide rich, longitudinal data on interactions with healthcare providers and can be used to advance research into respiratory conditions. However, since these data are primarily collected to support health care delivery, clinical coding can be inconsistent, resulting in inherent challenges in using these data for research purposes. Methods We systematically searched existing international literature and UK code repositories to find respiratory disease codelists for asthma from January 2018, and chronic obstructive pulmonary disease and respiratory tract infections from January 2020, based on prior searches. Medline searches using key terms provided in article lists. Full-text articles, supplementary files, and reference lists were examined for codelists, and codelists repositories were searched. A reproducible methodology for codelists creation was developed with recommended lists for each disease created based on multidisciplinary expert opinion and previously published literature. Results Medline searches returned 1126 asthma articles, 70 COPD articles, and 90 respiratory infection articles, with 3%, 22% and 5% including codelists, respectively. Repository searching returned 12 asthma, 23 COPD, and 64 respiratory infection codelists. We have systematically compiled respiratory disease codelists and from these derived recommended lists for use by researchers to find the most up-to-date and relevant respiratory disease codelists that can be tailored to individual research questions. Conclusion Few published papers include codelists, and where published diverse codelists were used, even when answering similar research questions. Whilst some advances have been made, greater consistency and transparency across studies using routine data to study respiratory diseases are needed.
Collapse
Affiliation(s)
- Clare MacRae
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Hannah Whittaker
- National Heart and Lung Institute, Imperial College London, London, UK
| | | | - Luke Daines
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Ann Morgan
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Chukwuma Iwundu
- National Heart and Lung Institute, Imperial College London, London, UK
| | | | | | - Eimear O’Rourke
- National Heart and Lung Institute, Imperial College London, London, UK
| | | | - Philip W Stone
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Aziz Sheikh
- Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Jennifer K Quint
- National Heart and Lung Institute, Imperial College London, London, UK
- Correspondence: Jennifer K Quint, National Heart and Lung Institute, Imperial College London, G48, Emmanuel Kaye Building, Manresa Road, London, SW3 6LR, UK, Tel +44 207 594 8821, Email
| |
Collapse
|
20
|
Exarchos K, Aggelopoulou A, Oikonomou A, Biniskou T, Beli V, Antoniadou E, Kostikas K. Review of Artificial Intelligence techniques in Chronic Obstructive Lung Disease. IEEE J Biomed Health Inform 2021; 26:2331-2338. [PMID: 34914601 DOI: 10.1109/jbhi.2021.3135838] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Artificial Intelligence (AI) has proven to be an invaluable asset in the healthcare domain, where massive amounts of data are produced. Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous chronic condition with multiscale manifestations and complex interactions that represents an ideal target for AI. OBJECTIVE The aim of this review article is to appraise the adoption of AI in COPD research, and more specifically its applications to date along with reported results, potential challenges and future prospects. METHODS We performed a review of the literature from PubMed and DBLP and assembled studies published up to 2020, yielding 156 articles relevant to the scope of this review. RESULTS The resulting articles were assessed and organized into four basic contextual categories, namely: i) COPD diagnosis, ii) COPD prognosis, iii) Patient classification, iv) COPD management, and subsequently presented in an orderly manner based on a set of qualitative and quantitative criteria. CONCLUSIONS We observed considerable acceleration of research activity utilizing AI techniques in COPD research, especially in the last couple of years, nevertheless, the massive production of large and complex data in COPD calls for broader adoption of AI and more advanced techniques.
Collapse
|
21
|
Alexander N, Alexander DC, Barkhof F, Denaxas S. Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning. BMC Med Inform Decis Mak 2021; 21:343. [PMID: 34879829 PMCID: PMC8653614 DOI: 10.1186/s12911-021-01693-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/15/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.
Collapse
Affiliation(s)
- Nonie Alexander
- Institute of Health Informatics, University College London, London, UK. .,Health Data Research UK, London, UK.
| | - Daniel C Alexander
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Frederik Barkhof
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK.,UCL Institute of Neurology, University College London, London, UK.,Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK.,Health Data Research UK, London, UK.,Alan Turing Institute, London, UK
| |
Collapse
|
22
|
Cabrera C, Quélen C, Ouwens M, Hedman K, Rigney U, Quint JK. Evaluating a Cox marginal structural model to assess the comparative effectiveness of inhaled corticosteroids versus no inhaled corticosteroid treatment in chronic obstructive pulmonary disease. Ann Epidemiol 2021; 67:19-28. [PMID: 34798296 DOI: 10.1016/j.annepidem.2021.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 10/20/2021] [Accepted: 11/04/2021] [Indexed: 12/25/2022]
Abstract
PURPOSE To evaluate the potential of a Cox marginal structural model (MSM) to estimate the time-varying causal inference of a known clinical trial association where the effectiveness of inhaled corticosteroid- (ICS-) versus non-ICS-containing treatments has been compared in patients with chronic obstructive pulmonary disease (COPD). METHODS This retrospective study from 2006 to 2016 used linked data from Clinical Practice Research Datalink-GOLD, Hospital Episode Statistics and Office for National Statistics mortality. A Cox MSM, incorporating a new-user design, was deemed capable of replicating a clinical trial-like pathway. Repeated outcomes for exacerbation events and stabilised weights were used to include time-varying and fixed covariate exposures. RESULTS Of 45,958 patients, 55% were male; 52% had moderate COPD. ICS-treated patients had a higher incidence of comorbid asthma than non-ICS-treated patients. Adjusted hazard risk ratios for any exacerbation event: ICS/long-acting β2-agonist (LABA) versus long-acting muscarinic antagonist (LAMA), 1.07 (95% confidence interval 1.04-1.10); ICS/LABA versus LABA/LAMA, 1.05 (1.00-1.10); ICS/LABA/LAMA versus LAMA, 1.04 (1.01-1.06); ICS/LABA/LAMA versus LABA/LAMA 1.02 (0.97-1.07). CONCLUSIONS The Cox MSM was not able to fully demonstrate results consistent with the previously established benefits of ICS-containing treatments seen in clinical trials. Future studies should continue to investigate causal inference methods and their capability to estimate the long-term outcomes of treatment in COPD.
Collapse
Affiliation(s)
- Claudia Cabrera
- Real World Science and Digital, BioPharmaceuticals Medical, AstraZeneca, Gothenburg, Sweden.
| | | | - Mario Ouwens
- Real World Science and Digital, BioPharmaceuticals Medical, AstraZeneca, Gothenburg, Sweden
| | | | | | - Jennifer K Quint
- National Heart & Lung Institute, Imperial College London, London, UK
| |
Collapse
|
23
|
Nikolaou V, Massaro S, Garn W, Fakhimi M, Stergioulas L, Price DB. Fast decliner phenotype of chronic obstructive pulmonary disease (COPD): applying machine learning for predicting lung function loss. BMJ Open Respir Res 2021; 8:8/1/e000980. [PMID: 34716217 PMCID: PMC8559126 DOI: 10.1136/bmjresp-2021-000980] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/19/2021] [Indexed: 12/31/2022] Open
Abstract
Background Chronic obstructive pulmonary disease (COPD) is a heterogeneous group of lung conditions challenging to diagnose and treat. Identification of phenotypes of patients with lung function loss may allow early intervention and improve disease management. We characterised patients with the ‘fast decliner’ phenotype, determined its reproducibility and predicted lung function decline after COPD diagnosis. Methods A prospective 4 years observational study that applies machine learning tools to identify COPD phenotypes among 13 260 patients from the UK Royal College of General Practitioners and Surveillance Centre database. The phenotypes were identified prior to diagnosis (training data set), and their reproducibility was assessed after COPD diagnosis (validation data set). Results Three COPD phenotypes were identified, the most common of which was the ‘fast decliner’—characterised by patients of younger age with the lowest number of COPD exacerbations and better lung function—yet a fast decline in lung function with increasing number of exacerbations. The other two phenotypes were characterised by (a) patients with the highest prevalence of COPD severity and (b) patients of older age, mostly men and the highest prevalence of diabetes, cardiovascular comorbidities and hypertension. These phenotypes were reproduced in the validation data set with 80% accuracy. Gender, COPD severity and exacerbations were the most important risk factors for lung function decline in the most common phenotype. Conclusions In this study, three COPD phenotypes were identified prior to patients being diagnosed with COPD. The reproducibility of those phenotypes in a blind data set following COPD diagnosis suggests their generalisability among different populations.
Collapse
Affiliation(s)
| | - Sebastiano Massaro
- University of Surrey, Surrey Business School, Guildford, UK.,The Organizational Neuroscience Laboratory, London, UK
| | - Wolfgang Garn
- University of Surrey, Surrey Business School, Guildford, UK
| | - Masoud Fakhimi
- University of Surrey, Surrey Business School, Guildford, UK
| | | | - David B Price
- Academic Primary Care, University of Aberdeen, Aberdeen, UK.,Optimum Patient Care, Cambridge, UK.,Observational and Pragmatic Research Institute, Singapore
| |
Collapse
|
24
|
Nikolaou V, Massaro S, Garn W, Fakhimi M, Stergioulas L, Price D. The cardiovascular phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying machine learning to the prediction of cardiovascular comorbidities. Respir Med 2021; 186:106528. [PMID: 34260974 DOI: 10.1016/j.rmed.2021.106528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/29/2021] [Accepted: 07/01/2021] [Indexed: 01/31/2023]
Abstract
BACKGROUND Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous group of lung conditions that are challenging to diagnose and treat. As the presence of comorbidities often exacerbates this scenario, the characterization of patients with COPD and cardiovascular comorbidities may allow early intervention and improve disease management and care. METHODS We analysed a 4-year observational cohort of 6883 UK patients who were ultimately diagnosed with COPD and at least one cardiovascular comorbidity. The cohort was extracted from the UK Royal College of General Practitioners and Surveillance Centre database. The COPD phenotypes were identified prior to diagnosis and their reproducibility was assessed following COPD diagnosis. We then developed four classifiers for predicting cardiovascular comorbidities. RESULTS Three subtypes of the COPD cardiovascular phenotype were identified prior to diagnosis. Phenotype A was characterised by a higher prevalence of severe COPD, emphysema, hypertension. Phenotype B was characterised by a larger male majority, a lower prevalence of hypertension, the highest prevalence of the other cardiovascular comorbidities, and diabetes. Finally, phenotype C was characterised by universal hypertension, a higher prevalence of mild COPD and the low prevalence of COPD exacerbations. These phenotypes were reproduced after diagnosis with 92% accuracy. The random forest model was highly accurate for predicting hypertension while ruling out less prevalent comorbidities. CONCLUSIONS This study identified three subtypes of the COPD cardiovascular phenotype that may generalize to other populations. Among the four models tested, the random forest classifier was the most accurate at predicting cardiovascular comorbidities in COPD patients with the cardiovascular phenotype.
Collapse
Affiliation(s)
- Vasilis Nikolaou
- University of Surrey, Surrey Business School, Guildford, GU2 7HX, United Kingdom.
| | - Sebastiano Massaro
- University of Surrey, Surrey Business School, Guildford, GU2 7HX, United Kingdom; The Organizational Neuroscience Laboratory, London, WC1N 3AX, United Kingdom
| | - Wolfgang Garn
- University of Surrey, Surrey Business School, Guildford, GU2 7HX, United Kingdom
| | - Masoud Fakhimi
- University of Surrey, Surrey Business School, Guildford, GU2 7HX, United Kingdom
| | - Lampros Stergioulas
- The Hague University of Applied Sciences, Johanna Westerdijkplein, 75, 2521, EN Den Haag, Netherlands
| | - David Price
- Optimum Patient Care, Cambridge, UK; Observational and Pragmatic Research Institute, Singapore; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, United Kingdom
| |
Collapse
|
25
|
Brat K, Svoboda M, Zatloukal J, Plutinsky M, Volakova E, Popelkova P, Novotna B, Dvorak T, Koblizek V. The Relation Between Clinical Phenotypes, GOLD Groups/Stages and Mortality in COPD Patients - A Prospective Multicenter Study. Int J Chron Obstruct Pulmon Dis 2021; 16:1171-1182. [PMID: 33953554 PMCID: PMC8089082 DOI: 10.2147/copd.s297087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 03/22/2021] [Indexed: 11/23/2022] Open
Abstract
Introduction The concept of phenotyping emerged, reflecting specific clinical, pulmonary and extrapulmonary features of each particular chronic obstructive pulmonary disease (COPD) case. Our aim was to analyze prognostic utility of: “Czech“ COPD phenotypes and their most frequent combinations, ”Spanish” phenotypes and Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages + groups in relation to long-term mortality risk. Methods Data were extracted from the Czech Multicenter Research Database (CMRD) of COPD. Kaplan-Meier (KM) estimates (at 60 months from inclusion) were used for mortality assessment. Survival rates were calculated for the six elementary “Czech” phenotypes and their most frequent and relevant combinations, “Spanish” phenotypes, GOLD grades and groups. Statistically significant differences were tested by Log Rank test. An analysis of factors underlying mortality risk (the role of confounders) has been assessed with the use of classification and regression tree (CART) analysis. Basic factors showing significant differences between deceased and living patients were entered into the CART model. This showed six different risk groups, the differences in risk were tested by a Log Rank test. Results The cohort (n=720) was 73.1% men, with a mean age of 66.6 years and mean FEV1 44.4% pred. KM estimates showed bronchiectases/COPD overlap (HR 1.425, p=0.045), frequent exacerbator (HR 1.58, p<0.001), cachexia (HR 2.262, p<0.001) and emphysematous (HR 1.786, p=0.015) phenotypes associated with higher mortality risk. Co-presence of multiple phenotypes in a single patient had additive effect on risk; combination of emphysema, cachexia and frequent exacerbations translated into poorest prognosis (HR 3.075; p<0.001). Of the “Spanish” phenotypes, AE CB and AE non-CB were associated with greater risk of mortality (HR 1.787 and 2.001; both p=0.001). FEV1% pred., cachexia and chronic heart failure in patient history were the major underlying factors determining mortality risk in our cohort. Conclusion Certain phenotypes (“Czech” or “Spanish”) of COPD are associated with higher risk of death. Co-presence of multiple phenotypes (emphysematous plus cachectic plus frequent exacerbator) in a single individual was associated with amplified risk of mortality.
Collapse
Affiliation(s)
- Kristian Brat
- Department of Respiratory Diseases, University Hospital Brno, Brno, Czech Republic.,Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Michal Svoboda
- Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Biostatistics and Analyses, Ltd., Brno, Czech Republic
| | - Jaromir Zatloukal
- Pulmonary Department, University Hospital Olomouc, Olomouc, Czech Republic.,Faculty of Medicine, Palacky University, Olomouc, Czech Republic
| | - Marek Plutinsky
- Department of Respiratory Diseases, University Hospital Brno, Brno, Czech Republic.,Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Eva Volakova
- Pulmonary Department, University Hospital Olomouc, Olomouc, Czech Republic.,Faculty of Medicine, Palacky University, Olomouc, Czech Republic
| | - Patrice Popelkova
- Pulmonary Department, University Hospital Ostrava, Ostrava, Czech Republic.,Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic
| | - Barbora Novotna
- Pulmonary Department, Bulovka Hospital, Prague, Czech Republic
| | - Tomas Dvorak
- Pulmonary Department, Mlada Boleslav Hospital, Mlada Boleslav, Czech Republic
| | - Vladimir Koblizek
- Pulmonary Department, University Hospital Hradec Kralove, Hradec Kralove, Czech Republic.,Faculty of Medicine in Hradec Kralove, Charles University, Prague, Czech Republic
| |
Collapse
|
26
|
Coombes CE, Liu X, Abrams ZB, Coombes KR, Brock G. Simulation-derived best practices for clustering clinical data. J Biomed Inform 2021; 118:103788. [PMID: 33862229 DOI: 10.1016/j.jbi.2021.103788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 03/23/2021] [Accepted: 04/11/2021] [Indexed: 11/18/2022]
Abstract
INTRODUCTION Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data. METHODS We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit. RESULTS HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets. DISCUSSION Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.
Collapse
Affiliation(s)
- Caitlin E Coombes
- The Ohio State University College of Medicine, 370 W 9th Ave, Columbus, OH 43210, USA.
| | - Xin Liu
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr, Columbus, OH 43210, USA.
| | - Zachary B Abrams
- Institute for Informatics, Washington University in St. Louis, 444 Forest Park Ave., St. Louis, MO 63108, USA.
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr, Columbus, OH 43210, USA.
| | - Guy Brock
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Dr, Columbus, OH 43210, USA.
| |
Collapse
|
27
|
Banerjee A, Chen S, Fatemifar G, Zeina M, Lumbers RT, Mielke J, Gill S, Kotecha D, Freitag DF, Denaxas S, Hemingway H. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. BMC Med 2021; 19:85. [PMID: 33820530 PMCID: PMC8022365 DOI: 10.1186/s12916-021-01940-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/12/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning (ML) is increasingly used in research for subtype definition and risk prediction, particularly in cardiovascular diseases. No existing ML models are routinely used for cardiovascular disease management, and their phase of clinical utility is unknown, partly due to a lack of clear criteria. We evaluated ML for subtype definition and risk prediction in heart failure (HF), acute coronary syndromes (ACS) and atrial fibrillation (AF). METHODS For ML studies of subtype definition and risk prediction, we conducted a systematic review in HF, ACS and AF, using PubMed, MEDLINE and Web of Science from January 2000 until December 2019. By adapting published criteria for diagnostic and prognostic studies, we developed a seven-domain, ML-specific checklist. RESULTS Of 5918 studies identified, 97 were included. Across studies for subtype definition (n = 40) and risk prediction (n = 57), there was variation in data source, population size (median 606 and median 6769), clinical setting (outpatient, inpatient, different departments), number of covariates (median 19 and median 48) and ML methods. All studies were single disease, most were North American (n = 61/97) and only 14 studies combined definition and risk prediction. Subtype definition and risk prediction studies respectively had limitations in development (e.g. 15.0% and 78.9% of studies related to patient benefit; 15.0% and 15.8% had low patient selection bias), validation (12.5% and 5.3% externally validated) and impact (32.5% and 91.2% improved outcome prediction; no effectiveness or cost-effectiveness evaluations). CONCLUSIONS Studies of ML in HF, ACS and AF are limited by number and type of included covariates, ML methods, population size, country, clinical setting and focus on single diseases, not overlap or multimorbidity. Clinical utility and implementation rely on improvements in development, validation and impact, facilitated by simple checklists. We provide clear steps prior to safe implementation of machine learning in clinical practice for cardiovascular diseases and other disease areas.
Collapse
Affiliation(s)
- Amitava Banerjee
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK.
- Health Data Research UK, University College London, London, UK.
- University College London Hospitals NHS Trust, 235 Euston Road, London, UK.
- Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK.
| | - Suliang Chen
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
| | - Ghazaleh Fatemifar
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
| | | | - R Thomas Lumbers
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals NHS Trust, 235 Euston Road, London, UK
| | - Johanna Mielke
- Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
| | - Simrat Gill
- University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Dipak Kotecha
- University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Department of Cardiology, University Medical Centre Utrecht, Utrecht, the Netherlands
| | - Daniel F Freitag
- Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals Biomedical Research Centre (UCLH BRC), London, UK
| |
Collapse
|
28
|
Coombes CE, Abrams ZB, Nakayiza S, Brock G, Coombes KR. Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning. F1000Res 2021. [DOI: 10.12688/f1000research.25877.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The Umpire 2.0 R-package offers a streamlined, user-friendly workflow to simulate complex, heterogeneous, mixed-type data with known subgroup identities, dichotomous outcomes, and time-to-event data, while providing ample opportunities for fine-tuning and flexibility. Here, we describe how we have expanded the core Umpire 1.0 R-package, developed to simulate gene expression data, to generate clinically realistic, mixed-type data for use in evaluating unsupervised and supervised machine learning (ML) methods. As the availability of large-scale clinical data for ML has increased, clinical data has posed unique challenges, including widely variable size, individual biological heterogeneity, data collection and measurement noise, and mixed data types. Developing and validating ML methods for clinical data requires data sets with known ground truth, generated from simulation. Umpire 2.0 addresses challenges to simulating realistic clinical data by providing the user a series of modules to generate survival parameters and subgroups, apply meaningful additive noise, and discretize to single or mixed data types. Umpire 2.0 provides broad functionality across sample sizes, feature spaces, and data types, allowing the user to simulate correlated, heterogeneous, binary, continuous, categorical, or mixed type data from the scale of a small clinical trial to data on thousands of patients drawn from electronic health records. The user may generate elaborate simulations by varying parameters in order to compare algorithms or interrogate operating characteristics of an algorithm in both supervised and unsupervised ML.
Collapse
|
29
|
Bohn L, Zheng Y, McFall GP, Dixon RA. Portals to frailty? Data-driven analyses detect early frailty profiles. Alzheimers Res Ther 2021; 13:1. [PMID: 33397495 PMCID: PMC7780374 DOI: 10.1186/s13195-020-00736-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/29/2020] [Indexed: 03/21/2023]
Abstract
BACKGROUND Frailty is an aging condition that reflects multisystem decline and an increased risk for adverse outcomes, including differential cognitive decline and impairment. Two prominent approaches for measuring frailty are the frailty phenotype and the frailty index. We explored a complementary data-driven approach for frailty assessment that could detect early frailty profiles (or subtypes) in relatively healthy older adults. Specifically, we tested whether (1) modalities of early frailty profiles could be empirically determined, (2) the extracted profiles were differentially related to longitudinal cognitive decline, and (3) the profile and prediction patterns were robust for males and females. METHODS Participants (n = 649; M age = 70.61, range 53-95) were community-dwelling older adults from the Victoria Longitudinal Study who contributed data for baseline multi-morbidity assessment and longitudinal cognitive trajectory analyses. An exploratory factor analysis on 50 multi-morbidity items produced 7 separable health domains. The proportion of deficits in each domain was calculated and used as continuous indicators in a data-driven latent profile analysis (LPA). We subsequently examined how frailty profiles related to the level and rate of change in a latent neurocognitive speed variable. RESULTS LPA results distinguished three profiles: not-clinically-frail (NCF; characterized by limited impairment across indicators; 84%), mobility-type frailty (MTF; characterized by impaired mobility function; 9%), and respiratory-type frailty (RTF; characterized by impaired respiratory function; 7%). These profiles showed differential neurocognitive slowing, such that MTF was associated with the steepest decline, followed by RTF, and then NCF. The baseline frailty index scores were the highest for MTF and RTF and increased over time. All observations were robust across sex. CONCLUSIONS A data-driven approach to early frailty assessment detected differentiable profiles that may be characterized as morbidity-intensive portals into broader and chronic frailty. Early inventions targeting mobility or respiratory deficits may have positive downstream effects on frailty progression and cognitive decline.
Collapse
Affiliation(s)
- Linzy Bohn
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada.
| | - Yao Zheng
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
| | - G Peggy McFall
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada
| | - Roger A Dixon
- Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, AB, T6G 2E9, Canada
- Neuroscience and Mental Health Institute, University of Alberta, 2-132 Li Ka Shing Center for Health Research Innovation, Edmonton, AB, T6G 2E1, Canada
| |
Collapse
|
30
|
Feng Y, Wang Y, Zeng C, Mao H. Artificial Intelligence and Machine Learning in Chronic Airway Diseases: Focus on Asthma and Chronic Obstructive Pulmonary Disease. Int J Med Sci 2021; 18:2871-2889. [PMID: 34220314 PMCID: PMC8241767 DOI: 10.7150/ijms.58191] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/20/2021] [Indexed: 02/05/2023] Open
Abstract
Chronic airway diseases are characterized by airway inflammation, obstruction, and remodeling and show high prevalence, especially in developing countries. Among them, asthma and chronic obstructive pulmonary disease (COPD) show the highest morbidity and socioeconomic burden worldwide. Although there are extensive guidelines for the prevention, early diagnosis, and rational treatment of these lifelong diseases, their value in precision medicine is very limited. Artificial intelligence (AI) and machine learning (ML) techniques have emerged as effective methods for mining and integrating large-scale, heterogeneous medical data for clinical practice, and several AI and ML methods have recently been applied to asthma and COPD. However, very few methods have significantly contributed to clinical practice. Here, we review four aspects of AI and ML implementation in asthma and COPD to summarize existing knowledge and indicate future steps required for the safe and effective application of AI and ML tools by clinicians.
Collapse
Affiliation(s)
- Yinhe Feng
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.,Department of Respiratory and Critical Care Medicine, People's Hospital of Deyang City, Affiliated Hospital of Chengdu College of Medicine, Deyang, Sichuan Province, China
| | - Yubin Wang
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
| | - Chunfang Zeng
- Department of Respiratory and Critical Care Medicine, People's Hospital of Deyang City, Affiliated Hospital of Chengdu College of Medicine, Deyang, Sichuan Province, China
| | - Hui Mao
- Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
| |
Collapse
|
31
|
Lai AG, Pasea L, Banerjee A, Hall G, Denaxas S, Chang WH, Katsoulis M, Williams B, Pillay D, Noursadeghi M, Linch D, Hughes D, Forster MD, Turnbull C, Fitzpatrick NK, Boyd K, Foster GR, Enver T, Nafilyan V, Humberstone B, Neal RD, Cooper M, Jones M, Pritchard-Jones K, Sullivan R, Davie C, Lawler M, Hemingway H. Estimated impact of the COVID-19 pandemic on cancer services and excess 1-year mortality in people with cancer and multimorbidity: near real-time data on cancer care, cancer deaths and a population-based cohort study. BMJ Open 2020; 10:e043828. [PMID: 33203640 PMCID: PMC7674020 DOI: 10.1136/bmjopen-2020-043828] [Citation(s) in RCA: 185] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/20/2020] [Accepted: 10/23/2020] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES To estimate the impact of the COVID-19 pandemic on cancer care services and overall (direct and indirect) excess deaths in people with cancer. METHODS We employed near real-time weekly data on cancer care to determine the adverse effect of the pandemic on cancer services. We also used these data, together with national death registrations until June 2020 to model deaths, in excess of background (pre-COVID-19) mortality, in people with cancer. Background mortality risks for 24 cancers with and without COVID-19-relevant comorbidities were obtained from population-based primary care cohort (Clinical Practice Research Datalink) on 3 862 012 adults in England. RESULTS Declines in urgent referrals (median=-70.4%) and chemotherapy attendances (median=-41.5%) to a nadir (lowest point) in the pandemic were observed. By 31 May, these declines have only partially recovered; urgent referrals (median=-44.5%) and chemotherapy attendances (median=-31.2%). There were short-term excess death registrations for cancer (without COVID-19), with peak relative risk (RR) of 1.17 at week ending on 3 April. The peak RR for all-cause deaths was 2.1 from week ending on 17 April. Based on these findings and recent literature, we modelled 40% and 80% of cancer patients being affected by the pandemic in the long-term. At 40% affected, we estimated 1-year total (direct and indirect) excess deaths in people with cancer as between 7165 and 17 910, using RRs of 1.2 and 1.5, respectively, where 78% of excess deaths occured in patients with ≥1 comorbidity. CONCLUSIONS Dramatic reductions were detected in the demand for, and supply of, cancer services which have not fully recovered with lockdown easing. These may contribute, over a 1-year time horizon, to substantial excess mortality among people with cancer and multimorbidity. It is urgent to understand how the recovery of general practitioner, oncology and other hospital services might best mitigate these long-term excess mortality risks.
Collapse
Affiliation(s)
- Alvina G Lai
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
| | - Laura Pasea
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
| | - Amitava Banerjee
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
- Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK
| | - Geoff Hall
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
- Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals NIHR Biomedical Research Centre, London, UK
- The Alan Turing Institute, London, UK
| | - Wai Hoong Chang
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
| | - Michail Katsoulis
- Institute of Health Informatics, University College London, London, UK
| | - Bryan Williams
- University College London Hospitals NIHR Biomedical Research Centre, London, UK
- Institute of Cardiovascular Science, University College London, London, UK
- University College London Hospitals NHS Trust, London, UK
| | - Deenan Pillay
- Division of Infection and Immunity, University College London, London, UK
| | - Mahdad Noursadeghi
- Division of Infection and Immunity, University College London, London, UK
| | - David Linch
- University College London Hospitals NIHR Biomedical Research Centre, London, UK
- Department of Hematology, University College London Cancer Institute, London, UK
| | - Derralynn Hughes
- University College London Cancer Institute, London, UK
- Royal Free NHS Foundation Trust, London, UK
| | - Martin D Forster
- University College London Hospitals NHS Trust, London, UK
- University College London Cancer Institute, London, UK
| | - Clare Turnbull
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - Natalie K Fitzpatrick
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
| | - Kathryn Boyd
- Northern Ireland Cancer Network, Northern Ireland, UK
| | - Graham R Foster
- Barts Liver Centre, Blizard Institute, Queen Mary University of London, London, UK
| | - Tariq Enver
- University College London Cancer Institute, London, UK
| | | | | | - Richard D Neal
- Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
| | - Matt Cooper
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Monica Jones
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Kathy Pritchard-Jones
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- UCLPartners Academic Health Science Partnership, London, UK
- Centre for Cancer Outcomes, University College London Hospitals NHS Foundation Trust, London, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Richard Sullivan
- Conflict and Health Research Group, Institute of Cancer Policy, King's College London, London, UK
| | - Charlie Davie
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- Royal Free NHS Foundation Trust, London, UK
- UCLPartners Academic Health Science Partnership, London, UK
| | - Mark Lawler
- DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals NIHR Biomedical Research Centre, London, UK
| |
Collapse
|
32
|
Zucchi JW, Franco EAT, Schreck T, Castro e Silva MH, Migliorini SRDS, Garcia T, Mota GAF, de Morais BEB, Machado LHS, Batista ANR, de Paiva SAR, de Godoy I, Tanni SE. Different Clusters in Patients with Chronic Obstructive Pulmonary Disease (COPD): A Two-Center Study in Brazil. Int J Chron Obstruct Pulmon Dis 2020; 15:2847-2856. [PMID: 33192058 PMCID: PMC7654519 DOI: 10.2147/copd.s268332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 09/06/2020] [Indexed: 11/23/2022] Open
Abstract
Background Chronic obstructive pulmonary disease (COPD) has a functional definition. However, differences in clinical characteristics and systemic manifestations make COPD a heterogeneous disease and some manifestations have been associated with different risks of acute exacerbations, hospitalizations, and death. Objective Therefore, the objective of the study was to evaluate possible clinical clusters in COPD at two study centers in Brazil and identify the associated exacerbation and mortality rate during 1 year of follow-up. Methods We included patients with COPD and all underwent an evaluation composed of the Charlson Index, body mass index (BMI), current pharmacological treatment, smoking history (packs-year), history of exacerbations/hospitalizations in the last year, spirometry, six-minute walking test (6MWT), quality of life questionnaires, dyspnea, and hospital anxiety and depression scale. Blood samples were also collected for measurements of C-reactive protein (CRP), blood gases, laboratory analysis, and blood count. For the construction of the clusters, 13 continuous variables of clinical importance were considered: hematocrit, CRP, triglycerides, low density lipoprotein, absolute number of peripheral eosinophils, age, pulse oximetry, BMI, forced expiratory volume in the first second, dyspnea, 6MWD, total score of the Saint George Respiratory Questionnaire and packs-year of smoking. We used the Ward and K-means methods and determined the best silhouette value to identify similarities of individuals within the cluster (cohesion) in relation to the other clusters (separation). The number of clusters was determined by the heterogeneity values of the cluster, which in this case was determined as four clusters. Results We evaluated 301 COPD patients and identified four different groups of COPD patients. The first cluster (203 patients) was characterized by fewer symptoms and lower functional severity of the disease, the second cluster by higher values of peripheral eosinophils, the third cluster by more systemic inflammation and the fourth cluster by greater obstructive severity and worse gas exchange. Cluster 2 had an average of 959±3 peripheral eosinophils, cluster 3 had a higher prevalence of nutritional depletion (46.1%), and cluster 4 had a higher BODE index. Regarding the associated comorbidities, we found that only obstructive sleep apnea syndrome and pulmonary thromboembolism were more prevalent in cluster 4. Almost 50% of all patients presented an exacerbation during 1 year of follow-up. However, it was higher in cluster 4, with 65% of all patients having at least one exacerbation. The mortality rate was statistically higher in cluster 4, with 26.9%, vs 9.6% in cluster 1. Conclusion We could identify four clinical different clusters in these COPD populations, that were related to different clinical manifestations, comorbidities, exacerbation, and mortality rate. We also identified a specific cluster with higher values of peripheral eosinophils.
Collapse
Affiliation(s)
- José William Zucchi
- Pulmonology Division of Botucatu Medical School, São Paulo State University (UNESP), Botucatu, Brazil
| | | | - Thomas Schreck
- Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Faculty of Business Studies, Regensburg, German
| | | | | | - Thaís Garcia
- Pulmonology Division of Botucatu Medical School, São Paulo State University (UNESP), Botucatu, Brazil
| | | | | | | | | | | | - Irma de Godoy
- Pulmonology Division of Botucatu Medical School, São Paulo State University (UNESP), Botucatu, Brazil
| | - Suzana Erico Tanni
- Pulmonology Division of Botucatu Medical School, São Paulo State University (UNESP), Botucatu, Brazil
| |
Collapse
|
33
|
Zhuang H, Cui J, Liu T, Wang H. A physical model inspired density peak clustering. PLoS One 2020; 15:e0239406. [PMID: 32970727 PMCID: PMC7514087 DOI: 10.1371/journal.pone.0239406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 09/05/2020] [Indexed: 12/02/2022] Open
Abstract
Clustering is an important technology of data mining, which plays a vital role in bioscience, social network and network analysis. As a clustering algorithm based on density and distance, density peak clustering is extensively used to solve practical problems. The algorithm assumes that the clustering center has a larger local density and is farther away from the higher density points. However, the density peak clustering algorithm is highly sensitive to density and distance and cannot accurately identify clusters in a dataset having significant differences in cluster structure. In addition, the density peak clustering algorithm's allocation strategy can easily cause attached allocation errors in data point allocation. To solve these problems, this study proposes a potential-field-diffusion-based density peak clustering. As compared to existing clustering algorithms, the advantages of the potential-field-diffusion-based density peak clustering algorithm is three-fold: 1) The potential field concept is introduced in the proposed algorithm, and a density measure based on the potential field's diffusion is proposed. The cluster center can be accurately selected using this measure. 2) The potential-field-diffusion-based density peak clustering algorithm defines the judgment conditions of similar points and adopts different allocation strategies for dissimilar points to avoid attached errors in data point allocation. 3) This study conducted many experiments on synthetic and real-world datasets. Results demonstrate that the proposed potential-field-diffusion-based density peak clustering algorithm achieves excellent clustering effect and is suitable for complex datasets of different sizes, dimensions, and shapes. Besides, the proposed potential-field-diffusion-based density peak clustering algorithm shows particularly excellent performance on variable density and nonconvex datasets.
Collapse
Affiliation(s)
- Hui Zhuang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiancong Cui
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Taoran Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Shandong Normal University, Jinan, China
| |
Collapse
|
34
|
Nikolaou V, Massaro S, Fakhimi M, Stergioulas L, Price D. COPD phenotypes and machine learning cluster analysis: A systematic review and future research agenda. Respir Med 2020; 171:106093. [PMID: 32745966 DOI: 10.1016/j.rmed.2020.106093] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 12/21/2022]
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a highly heterogeneous condition projected to become the third leading cause of death worldwide by 2030. To better characterize this condition, clinicians have classified patients sharing certain symptomatic characteristics, such as symptom intensity and history of exacerbations, into distinct phenotypes. In recent years, the growing use of machine learning algorithms, and cluster analysis in particular, has promised to advance this classification through the integration of additional patient characteristics, including comorbidities, biomarkers, and genomic information. This combination would allow researchers to more reliably identify new COPD phenotypes, as well as better characterize existing ones, with the aim of improving diagnosis and developing novel treatments. Here, we systematically review the last decade of research progress, which uses cluster analysis to identify COPD phenotypes. Collectively, we provide a systematized account of the extant evidence, describe the strengths and weaknesses of the main methods used, identify gaps in the literature, and suggest recommendations for future research.
Collapse
Affiliation(s)
- Vasilis Nikolaou
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK.
| | - Sebastiano Massaro
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK; The Organizational Neuroscience Laboratory, London, WC1N 3AX, UK
| | - Masoud Fakhimi
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK
| | | | - David Price
- Observational and Pragmatic Research Institute, Singapore, Singapore; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
35
|
Coombes CE, Abrams ZB, Li S, Abruzzo LV, Coombes KR. Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia. J Am Med Inform Assoc 2020; 27:1019-1027. [PMID: 32483590 PMCID: PMC7647286 DOI: 10.1093/jamia/ocaa060] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Revised: 04/08/2020] [Accepted: 04/24/2020] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes. METHODS To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves. RESULTS In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age. CONCLUSIONS This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.
Collapse
MESH Headings
- Adult
- Aged
- Aged, 80 and over
- Female
- Humans
- Immunoglobulin Heavy Chains/genetics
- Kaplan-Meier Estimate
- Leukemia, Lymphocytic, Chronic, B-Cell/immunology
- Leukemia, Lymphocytic, Chronic, B-Cell/metabolism
- Leukemia, Lymphocytic, Chronic, B-Cell/mortality
- Male
- Middle Aged
- Mutation
- Prognosis
- Proportional Hazards Models
- Unsupervised Machine Learning
- ZAP-70 Protein-Tyrosine Kinase/metabolism
Collapse
Affiliation(s)
- Caitlin E Coombes
- The Ohio State University College of Medicine, Columbus, Ohio, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Suli Li
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Lynne V Abruzzo
- Department of Pathology, The Ohio State University, Columbus, Ohio, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
36
|
Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020; 5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.3] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/11/2020] [Indexed: 12/13/2022] Open
Abstract
Background: The COVID-19 pandemic has attracted the attention of researchers and clinicians whom have provided evidence about risk factors and clinical outcomes. Research on the COVID-19 pandemic benefiting from open-access data and machine learning algorithms is still scarce yet can produce relevant and pragmatic information. With country-level pre-COVID-19-pandemic variables, we aimed to cluster countries in groups with shared profiles of the COVID-19 pandemic. Methods: Unsupervised machine learning algorithms (k-means) were used to define data-driven clusters of countries; the algorithm was informed by disease prevalence estimates, metrics of air pollution, socio-economic status and health system coverage. Using the one-way ANOVA test, we compared the clusters in terms of number of confirmed COVID-19 cases, number of deaths, case fatality rate and order in which the country reported the first case. Results: The model to define the clusters was developed with 155 countries. The model with three principal component analysis parameters and five or six clusters showed the best ability to group countries in relevant sets. There was strong evidence that the model with five or six clusters could stratify countries according to the number of confirmed COVID-19 cases (p<0.001). However, the model could not stratify countries in terms of number of deaths or case fatality rate. Conclusions: A simple data-driven approach using available global information before the COVID-19 pandemic, seemed able to classify countries in terms of the number of confirmed COVID-19 cases. The model was not able to stratify countries based on COVID-19 mortality data.
Collapse
Affiliation(s)
- Rodrigo M. Carrillo-Larco
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
- Universidad Católica de Trullijo, Instituto de Investigación, Chimbote, Peru
| | - Manuel Castillo-Cara
- Center of Information and Communication Technologies, Universidad Nacional de Ingeniería, Lima, Peru
| |
Collapse
|
37
|
Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020; 5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2020] [Indexed: 11/04/2023] Open
Abstract
Background: The COVID-19 pandemic has attracted the attention of researchers and clinicians whom have provided evidence about risk factors and clinical outcomes. Research on the COVID-19 pandemic benefiting from open-access data and machine learning algorithms is still scarce yet can produce relevant and pragmatic information. With country-level pre-COVID-19-pandemic variables, we aimed to cluster countries in groups with shared profiles of the COVID-19 pandemic. Methods: Unsupervised machine learning algorithms (k-means) were used to define data-driven clusters of countries; the algorithm was informed by disease prevalence estimates, metrics of air pollution, socio-economic status and health system coverage. Using the one-way ANOVA test, we compared the clusters in terms of number of confirmed COVID-19 cases, number of deaths, case fatality rate and order in which the country reported the first case. Results: The model to define the clusters was developed with 155 countries. The model with three principal component analysis parameters and five or six clusters showed the best ability to group countries in relevant sets. There was strong evidence that the model with five or six clusters could stratify countries according to the number of confirmed COVID-19 cases (p<0.001). However, the model could not stratify countries in terms of number of deaths or case fatality rate. Conclusions: A simple data-driven approach using available global information before the COVID-19 pandemic, seemed able to classify countries in terms of the number of confirmed COVID-19 cases. The model was not able to stratify countries based on COVID-19 mortality data.
Collapse
Affiliation(s)
- Rodrigo M. Carrillo-Larco
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
- Universidad Católica de Trullijo, Instituto de Investigación, Chimbote, Peru
| | - Manuel Castillo-Cara
- Center of Information and Communication Technologies, Universidad Nacional de Ingeniería, Lima, Peru
| |
Collapse
|
38
|
Banerjee A, Pasea L, Harris S, Gonzalez-Izquierdo A, Torralbo A, Shallcross L, Noursadeghi M, Pillay D, Sebire N, Holmes C, Pagel C, Wong WK, Langenberg C, Williams B, Denaxas S, Hemingway H. Estimating excess 1-year mortality associated with the COVID-19 pandemic according to underlying conditions and age: a population-based cohort study. Lancet 2020; 395:1715-1725. [PMID: 32405103 PMCID: PMC7217641 DOI: 10.1016/s0140-6736(20)30854-0] [Citation(s) in RCA: 312] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/02/2020] [Accepted: 04/06/2020] [Indexed: 01/19/2023]
Abstract
BACKGROUND The medical, societal, and economic impact of the coronavirus disease 2019 (COVID-19) pandemic has unknown effects on overall population mortality. Previous models of population mortality are based on death over days among infected people, nearly all of whom thus far have underlying conditions. Models have not incorporated information on high-risk conditions or their longer-term baseline (pre-COVID-19) mortality. We estimated the excess number of deaths over 1 year under different COVID-19 incidence scenarios based on varying levels of transmission suppression and differing mortality impacts based on different relative risks for the disease. METHODS In this population-based cohort study, we used linked primary and secondary care electronic health records from England (Health Data Research UK-CALIBER). We report prevalence of underlying conditions defined by Public Health England guidelines (from March 16, 2020) in individuals aged 30 years or older registered with a practice between 1997 and 2017, using validated, openly available phenotypes for each condition. We estimated 1-year mortality in each condition, developing simple models (and a tool for calculation) of excess COVID-19-related deaths, assuming relative impact (as relative risks [RRs]) of the COVID-19 pandemic (compared with background mortality) of 1·5, 2·0, and 3·0 at differing infection rate scenarios, including full suppression (0·001%), partial suppression (1%), mitigation (10%), and do nothing (80%). We also developed an online, public, prototype risk calculator for excess death estimation. FINDINGS We included 3 862 012 individuals (1 957 935 [50·7%] women and 1 904 077 [49·3%] men). We estimated that more than 20% of the study population are in the high-risk category, of whom 13·7% were older than 70 years and 6·3% were aged 70 years or younger with at least one underlying condition. 1-year mortality in the high-risk population was estimated to be 4·46% (95% CI 4·41-4·51). Age and underlying conditions combined to influence background risk, varying markedly across conditions. In a full suppression scenario in the UK population, we estimated that there would be two excess deaths (vs baseline deaths) with an RR of 1·5, four with an RR of 2·0, and seven with an RR of 3·0. In a mitigation scenario, we estimated 18 374 excess deaths with an RR of 1·5, 36 749 with an RR of 2·0, and 73 498 with an RR of 3·0. In a do nothing scenario, we estimated 146 996 excess deaths with an RR of 1·5, 293 991 with an RR of 2·0, and 587 982 with an RR of 3·0. INTERPRETATION We provide policy makers, researchers, and the public a simple model and an online tool for understanding excess mortality over 1 year from the COVID-19 pandemic, based on age, sex, and underlying condition-specific estimates. These results signal the need for sustained stringent suppression measures as well as sustained efforts to target those at highest risk because of underlying conditions with a range of preventive interventions. Countries should assess the overall (direct and indirect) effects of the pandemic on excess mortality. FUNDING National Institute for Health Research University College London Hospitals Biomedical Research Centre, Health Data Research UK.
Collapse
Affiliation(s)
- Amitava Banerjee
- Institute of Health Informatics, University College London, London, UK; University College London Hospitals NHS Trust, London, UK; Barts Health NHS Trust, The Royal London Hospital, London, UK.
| | - Laura Pasea
- Institute of Health Informatics, University College London, London, UK
| | - Steve Harris
- University College London Hospitals NHS Trust, London, UK
| | | | - Ana Torralbo
- Institute of Health Informatics, University College London, London, UK
| | - Laura Shallcross
- Institute of Health Informatics, University College London, London, UK
| | - Mahdad Noursadeghi
- Division of Infection and Immunity, University College London, London, UK
| | - Deenan Pillay
- Division of Infection and Immunity, University College London, London, UK
| | | | - Chris Holmes
- University of Oxford, Oxford, UK; Alan Turing Institute, London, UK
| | - Christina Pagel
- Clinical Operational Research Unit, University College London, London, UK
| | - Wai Keong Wong
- University College London Hospitals NHS Trust, London, UK
| | | | - Bryan Williams
- Institute of Cardiovascular Science, University College London, London, UK; University College London Hospitals NHS Trust, London, UK; University College London Hospitals National Institute for Health Research Biomedical Research Centre, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK; Alan Turing Institute, London, UK; Health Data Research UK, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK; Health Data Research UK, London, UK
| |
Collapse
|
39
|
Horne E, Tibble H, Sheikh A, Tsanas A. Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping. JMIR Med Inform 2020; 8:e16452. [PMID: 32463370 PMCID: PMC7290450 DOI: 10.2196/16452] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/10/2019] [Accepted: 02/10/2020] [Indexed: 12/27/2022] Open
Abstract
Background In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. Objective This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. Methods We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. Results Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. Conclusions This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.
Collapse
Affiliation(s)
- Elsie Horne
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Holly Tibble
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Aziz Sheikh
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Athanasios Tsanas
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
40
|
Loughran KJ, Atkinson G, Beauchamp MK, Dixon J, Martin D, Rahim S, Harrison SL. Balance impairment in individuals with COPD: a systematic review with meta-analysis. Thorax 2020; 75:539-546. [PMID: 32409612 DOI: 10.1136/thoraxjnl-2019-213608] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 01/29/2020] [Accepted: 03/06/2020] [Indexed: 11/04/2022]
Abstract
BACKGROUND People with chronic obstructive pulmonary disease (COPD) are four times more likely to fall than healthy peers, leading to increased morbidity and mortality. Poor balance is a major risk factor for falls. This review aims to quantify the extent of balance impairment in COPD, and establish contributing clinical factors, which at present are sparse. METHODS Five electronic databases were searched, in July 2017 and updated searches were performed in March 2019, for studies comparing balance in COPD with healthy controls. Meta-analyses were conducted on sample mean differences (MD) and reported correlations between balance and clinical factors. Meta-regression was used to quantify the association between mean difference in percentage predicted forced expiratory volume in 1 s (FEV1) and mean balance impairment. Narrative summaries were provided where data were insufficient for meta-analysis. RESULTS Twenty-three studies were included (n=2751). Meta-analysis indicated COPD patients performed worse than healthy controls on timed up and go (MD=2.77 s, 95% CI 1.46 s to 4.089 s, p=<0.005), single leg stance (MD=-11.75 s, 95% CI -15.12 s to -8.38 s, p=<0.005) and berg balance scale (MD=-6.66, 95% CI -8.95 to -4.37, p=<0.005). The pooled correlation coefficient between balance and reduced quadriceps strength was weak-moderate (r=0.37, 95% CI 0.23 to 0.45, p=<0.005). The relationship between differences in percentage predicted FEV1 and balance were negligible (r2 =<0.04). CONCLUSIONS Compared with healthy controls, people with COPD have a clinically meaningful balance reduction, which may be related to reduced muscle strength, physical activity and exercise capacity. Our findings support a need to expand the focus of pulmonary rehabilitation to include balance assessment and training, and further exploration of balance impairment in COPD. PROSPERO registration number CRD4201769041.
Collapse
Affiliation(s)
| | - Greg Atkinson
- School of Health and Life Sciences, Teesside University, Middlesbrough, UK
| | - Marla K Beauchamp
- School of Rehabilitation Science, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada
| | - John Dixon
- School of Health and Life Sciences, Teesside University, Middlesbrough, UK
| | - Denis Martin
- School of Health and Life Sciences, Teesside University, Middlesbrough, UK
| | - Shaera Rahim
- School of Rehabilitation Science, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada
| | | |
Collapse
|
41
|
Wu JJ, Xu HR, Zhang YX, Li YX, Yu HY, Jiang LD, Wang CX, Han M. The characteristics of the frequent exacerbator with chronic bronchitis phenotype and non-exacerbator phenotype in patients with chronic obstructive pulmonary disease: a meta-analysis and system review. BMC Pulm Med 2020; 20:103. [PMID: 32326924 PMCID: PMC7181594 DOI: 10.1186/s12890-020-1126-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 03/27/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chronic obstructive pulmonary disease (COPD) patients with different phenotypes show different clinical characteristics. Therefore, we conducted a meta-analysis to explore the clinical characteristics between the non-exacerbator (NE) phenotype and the frequent exacerbator with chronic bronchitis (FE-CB) phenotype among patients with COPD. METHODS CNKI, Wan fang, Chongqing VIP, China Biology Medicine disc, PubMed, Cochrane Library, and EMBASE databases were searched from the times of their inception to April 30, 2019. All studies that reported the clinical characteristics of the COPD phenotypes and which met the inclusion criteria were included. The quality assessment was analyzed by Cross-Sectional/Prevalence Study Quality recommendations. The meta-analysis was carried out using RevMan5.3. RESULTS Ten cross-sectional observation studies (n = 8848) were included. Compared with the NE phenotype, patients with the FE-CB phenotype showed significantly lower forced expiratory volume in 1 s percent predicted (FEV1%pred) (mean difference (MD) -8.50, 95% CI -11.36--5.65, P < 0.001, I2 = 91%), forced vital capacity percent predicted (FVC%pred) [MD - 6.69, 95% confidence interval (CI) -7.73--5.65, P < 0.001, I2 = 5%], and forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC) (MD -3.76, 95% CI -4.58--2.95,P < 0.001, I2 = 0%); in contrast, Charlson comorbidity index (MD 0.47, 95% CI 0.37-0.58, P < 0.001, I2 = 0], COPD assessment test (CAT) score (MD 5.61, 95% CI 4.62-6.60, P < 0.001, I2 = 80%), the quantity of cigarettes smoked (pack-years) (MD 3.09, 95% CI 1.60-4.58, P < 0.001, I2 = 41%), exacerbations in previous year (2.65, 95% CI 2.32-2.97, P < 0.001, I2 = 91%), modified Medical British Research Council (mMRC) score (MD 0.72, 95% CI 0.63-0.82, P < 0.001, I2 = 57%), and body mass index (BMI), obstruction, dyspnea, exacerbations (BODEx) (MD 1.78, 95% CI 1.28-2.28, P < 0.001, I2 = 91%), I2 = 34%) were significantly higher in patients with FE-CB phenotype. No significant between-group difference was observed with respect to BMI (MD-0.14, 95% CI -0.70-0.42, P = 0.62, I2 = 75%). CONCLUSION COPD patients with the FE-CB phenotype had worse pulmonary function and higher CAT score, mMRC scores, frequency of acute exacerbations, and the quantity of cigarettes smoked (pack-years) than those with the NE phenotype.
Collapse
Affiliation(s)
- Jian-Jun Wu
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China
| | - Hong-Ri Xu
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China
| | - Ying-Xue Zhang
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China
| | - Yi-Xuan Li
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China
| | - Hui-Yong Yu
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China
| | - Liang-Duo Jiang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, No.5 Haiyuncang, Dongcheng District, Beijing, 100700, People's Republic of China.
| | - Cheng-Xiang Wang
- The Third Affiliated Hospital of Beijing University of Chinese Medicine, No. 51, Xiaoguan Street outside Anding Men, Chaoyang, Beijing, 100029, People's Republic of China.
| | - Mei Han
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, 11 East Road North 3rd Ring Road, Beijing, 100029, People's Republic of China.
| |
Collapse
|
42
|
Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020; 5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2020] [Indexed: 11/04/2023] Open
Abstract
Background: The COVID-19 pandemic has attracted the attention of researchers and clinicians whom have provided evidence about risk factors and clinical outcomes. Research on the COVID-19 pandemic benefiting from open-access data and machine learning algorithms is still scarce yet can produce relevant and pragmatic information. With country-level pre-COVID-19-pandemic variables, we aimed to cluster countries in groups with shared profiles of the COVID-19 pandemic. Methods: Unsupervised machine learning algorithms (k-means) were used to define data-driven clusters of countries; the algorithm was informed by disease prevalence estimates, metrics of air pollution, socio-economic status and health system coverage. Using the one-way ANOVA test, we compared the clusters in terms of number of confirmed COVID-19 cases, number of deaths, case fatality rate and order in which the country reported the first case. Results: The model to define the clusters was developed with 155 countries. The model with three principal component analysis parameters and five or six clusters showed the best ability to group countries in relevant sets. There was strong evidence that the model with five or six clusters could stratify countries according to the number of confirmed COVID-19 cases (p<0.001). However, the model could not stratify countries in terms of number of deaths or case fatality rate. Conclusions: A simple data-driven approach using available global information before the COVID-19 pandemic, seemed able to classify countries in terms of the number of confirmed COVID-19 cases. The model was not able to stratify countries based on COVID-19 mortality data.
Collapse
Affiliation(s)
- Rodrigo M. Carrillo-Larco
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
- Universidad Católica de Trullijo, Instituto de Investigación, Chimbote, Peru
| | - Manuel Castillo-Cara
- Center of Information and Communication Technologies, Universidad Nacional de Ingeniería, Lima, Peru
| |
Collapse
|
43
|
Aldibbiat AM, Al-Sharefi A. Do Benefits Outweigh Risks for Corticosteroid Therapy in Acute Exacerbation of Chronic Obstructive Pulmonary Disease in People with Diabetes Mellitus? Int J Chron Obstruct Pulmon Dis 2020; 15:567-574. [PMID: 32214806 PMCID: PMC7084124 DOI: 10.2147/copd.s236305] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/21/2020] [Indexed: 12/22/2022] Open
Abstract
Chronic obstructive pulmonary disease (COPD) and diabetes mellitus (DM) are chronic health conditions with significant impacts on quality and extent of life. People with COPD and DM appear to have worse outcomes in each of the comorbid conditions. Treatment with corticosteroids in acute exacerbation of COPD (AECOPD) has been shown to reduce treatment failure and exacerbation relapse, and to shorten length of hospital stay, but not to affect the inexorable gradual worsening of lung function. Treatment with corticosteroids can lead to a wide spectrum of side effects and complications, including worsening hyperglycemia and deterioration of diabetes control in those with pre-existing DM. The relationship between COPD and DM is rather complex and accumulating evidence indicates a distinct phenotype of the comorbid state. Several randomized controlled trials on corticosteroid treatment in AECOPD excluded people with DM or did not report on outcomes in this subgroup. As such, the perceived benefits of corticosteroids in AECOPD in people with DM have not been validated. In people with COPD and DM, the detrimental side effects of corticosteroids are guaranteed, while the benefits are not confirmed and only presumed based on extrapolation from the general COPD population. Therefore, the potential for harm when prescribing corticosteroids for AECOPD in people with DM cannot be excluded.
Collapse
Affiliation(s)
- Ali M Aldibbiat
- Dasman Diabetes Institute, Kuwait City, Kuwait
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, UK
| | - Ahmed Al-Sharefi
- Metabolic and Diabetes Unit, Sunderland Royal Hospital, South Tyneside and Sunderland NHS Foundation Trust, Sunderland, UK
| |
Collapse
|
44
|
Ahn GY, Lee J, Won S, Ha E, Kim H, Nam B, Kim JS, Kang J, Kim JH, Song GG, Kim K, Bae SC. Identifying damage clusters in patients with systemic lupus erythematosus. Int J Rheum Dis 2019; 23:84-91. [PMID: 31762221 DOI: 10.1111/1756-185x.13745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 09/19/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022]
Abstract
AIM Systemic lupus erythematosus (SLE) causes irreversible damage to organ systems. Recently, evidence has been obtained for subphenotypes of SLE. This study aimed to identify damage clusters and compare the associated clinical manifestations, SLE disease activity, mortality, and genetic risk scores (GRS). METHODS The study was conducted on the Hanyang BAE lupus cohort. Patients with disease duration <5 years were excluded to minimize confounding effects of disease duration. They were grouped into 3 clusters based on the Systemic Lupus International Collaborating Clinics Damage Index using k-means cluster analysis. RESULTS Among the 1130 analyzed patients, musculoskeletal damage was most prevalent (20.2%), followed by ocular (11.4%), renal (10.5%), and neuropsychiatric damage (10.2%). Three significantly different damage clusters were identified. Patients in cluster 1 (n = 824) showed the least damage. Cluster 2 (n = 195) was characterized by frequent renal (55.4%) and ocular (58.0%) damage, and cluster 3 (n = 111) was dominated by neuropsychiatric (100%) and musculoskeletal damage (35.1%). Cluster 2 had the highest adjusted mean AMS (adjusted mean SLE Disease Activity Index score; mean ± SD: 5.4 ± 2.9), while cluster 3 had the highest mortality (14.4%). Weighted GRS did not differ significantly between the clusters. CONCLUSION Patients in prevalent renal and ocular damage cluster had the highest AMS scores, while the cluster with frequent neuropsychiatric damage had the highest mortality.
Collapse
Affiliation(s)
- Ga Young Ahn
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea.,Division of Rheumatology, Department of Internal Medicine, Korea University Guro Hospital, Seoul, Korea
| | - Jiyoung Lee
- Clinical Research Center for Rheumatoid Arthritis (CRCRA), Seoul, Korea
| | - Soyoung Won
- Clinical Research Center for Rheumatoid Arthritis (CRCRA), Seoul, Korea
| | - Eunji Ha
- Department of Biology, Kyung Hee University, Seoul, Korea
| | - Hyoungyoung Kim
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea
| | - Bora Nam
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea
| | - Ji Soong Kim
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea
| | - Juyeon Kang
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea
| | - Jae-Hoon Kim
- Division of Rheumatology, Department of Internal Medicine, Korea University Guro Hospital, Seoul, Korea
| | - Gwan Gyu Song
- Division of Rheumatology, Department of Internal Medicine, Korea University Guro Hospital, Seoul, Korea
| | - Kwangwoo Kim
- Department of Biology, Kyung Hee University, Seoul, Korea
| | - Sang-Cheol Bae
- Department of Rheumatology, Hanyang University Hospital for Rheumatic Diseases, Seoul, Korea.,Clinical Research Center for Rheumatoid Arthritis (CRCRA), Seoul, Korea
| |
Collapse
|
45
|
Sánchez-Rico M, Alvarado JM. A Machine Learning Approach for Studying the Comorbidities of Complex Diagnoses. Behav Sci (Basel) 2019; 9:E122. [PMID: 31766665 PMCID: PMC6960661 DOI: 10.3390/bs9120122] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 11/16/2019] [Accepted: 11/20/2019] [Indexed: 02/08/2023] Open
Abstract
The study of diagnostic associations entails a large number of methodological problems regarding the application of machine learning algorithms, collinearity and wide variability being some of the most prominent ones. To overcome these, we propose and tested the usage of uniform manifold approximation and projection (UMAP), a very recent, popular dimensionality reduction technique. We showed its effectiveness by using it on a large Spanish clinical database of patients diagnosed with depression, to whom we applied UMAP before grouping them using a hierarchical agglomerative cluster analysis. By extensively studying its behavior and results, validating them with purely unsupervised metrics, we show that they are consistent with well-known relationships, which validates the applicability of UMAP to advance the study of comorbidities.
Collapse
Affiliation(s)
- Marina Sánchez-Rico
- Department of Psychobiology & Behavioral Sciences Methods, Faculty of Psychology, Universidad Complutense de Madrid, Campus de Somosaguas S/N, 28223 Pozuelo de Alarcon, Spain;
| | | |
Collapse
|
46
|
Antonelli Incalzi R, Canonica GW, Scichilone N, Rizzoli S, Simoni L, Blasi F. The COPD multi-dimensional phenotype: A new classification from the STORICO Italian observational study. PLoS One 2019; 14:e0221889. [PMID: 31518364 PMCID: PMC6743765 DOI: 10.1371/journal.pone.0221889] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 08/17/2019] [Indexed: 12/03/2022] Open
Abstract
Background This paper is aimed to (i) develop an innovative classification of COPD, multi-dimensional phenotype, based on a multidimensional assessment; (ii) describe the identified multi-dimensional phenotypes. Methods An exploratory factor analysis to identify the main classificatory variables and, then, a cluster analysis based on these variables were run to classify the COPD-diagnosed 514 patients enrolled in the STORICO (trial registration number: NCT03105999) study into multi-dimensional phenotypes. Results The circadian rhythm of symptoms and health-related quality of life, but neither comorbidity nor respiratory function, qualified as primary classificatory variables. Five multidimensional phenotypes were identified: the MILD COPD characterized by no night-time symptoms and the best health status in terms of quality of life, quality of sleep, level of depression and anxiety, the MILD EMPHYSEMATOUS with prevalent dyspnea in the early-morning and day-time, the SEVERE BRONCHITIC with nocturnal and diurnal cough and phlegm, the SEVERE EMPHYSEMATOUS with nocturnal and diurnal dyspnea and the SEVERE MIXED COPD distinguished by higher frequency of symptoms during 24h and worst quality of life, of sleep and highest levels of depression and anxiety. Conclusions Our results showed that properly collected respiratory symptoms play a primary classificatory role of COPD patients. The longitudinal observation will disclose the discriminative and prognostic potential of the proposed multidimensional phenotype. Trial registration Trial registration number: NCT03105999, date of registration: 10th April 2017.
Collapse
Affiliation(s)
| | - Giorgio Walter Canonica
- Personalized Medicine Asthma & Allergy Clinic Humanitas University Humanitas research Hospital, Rozzano, Milan, Italy
| | - Nicola Scichilone
- DIBIMIS, University of Palermo, Piazza delle Cliniche, Palermo, Italy
| | | | | | - Francesco Blasi
- Department of Pathophysiology and Transplantation, University of Milan, Internal Medicine Department, Respiratory Unit and Cystic Fibrosis Adult Center Fondazione IRCCS Cà Granda Maggiore Hospital, Milan, Italy
| | | |
Collapse
|
47
|
Impact of Disease-Specific Fears on Pulmonary Rehabilitation Trajectories in Patients with COPD. J Clin Med 2019; 8:jcm8091460. [PMID: 31540306 PMCID: PMC6780973 DOI: 10.3390/jcm8091460] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/23/2023] Open
Abstract
Disease-specific fears predict health status in chronic obstructive pulmonary disease (COPD), but their role in pulmonary rehabilitation (PR) remains poorly understood and especially longer-term evaluations are lacking. We therefore investigated changes in disease-specific fears over the course of PR and six months after PR, and investigated associations with PR outcomes (COPD assessment test (CAT) and St. Georges respiratory questionnaire (SGRQ)) in a subset of patients with COPD (n = 146) undergoing a 3-week inpatient PR program as part of the STAR study (Clinicaltrials.gov, ID: NCT02966561). Disease-specific fears as measured with the COPD anxiety questionnaire improved after PR. For fear of dyspnea, fear of physical activity and fear of disease progression, improvements remained significant at six-month follow-up. Patients with higher disease-specific fears at baseline showed elevated symptom burden (CAT and SGRQ Symptom scores), which persisted after PR and at follow-up. Elevated disease-specific fears also resulted in reduced improvements in Quality of Life (SGRQ activity and impact scales) after PR and at follow-up. Finally, improvement in disease-specific fears was associated with improvement in symptom burden and quality of life. Adjustment for potential confounding variables (sex, smoking status, age, lung function, and depressive symptoms) resulted in comparable effects. These findings show the role of disease-specific fears in patients with COPD during PR and highlight the need to target disease-specific fears to further improve the effects of PR.
Collapse
|