Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pikoula M, Quint JK, Nissen F, Hemingway H, Smeeth L, Denaxas S. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC Med Inform Decis Mak 2019;19:86. [PMID: 30999919 PMCID: PMC6472089 DOI: 10.1186/s12911-019-0805-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 03/27/2019] [Indexed: 12/28/2022] Open

For:	Pikoula M, Quint JK, Nissen F, Hemingway H, Smeeth L, Denaxas S. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC Med Inform Decis Mak 2019;19:86. [PMID: 30999919 PMCID: PMC6472089 DOI: 10.1186/s12911-019-0805-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 03/27/2019] [Indexed: 12/28/2022] Open

Number

Cited by Other Article(s)

Thayer DS, Mumtaz S, Elmessary MA, Scanlon I, Zinnurov A, Coldea AI, Scanlon J, Chapman M, Curcin V, John A, DelPozo-Banos M, Davies H, Karwath A, Gkoutos GV, Fitzpatrick NK, Quint JK, Varma S, Milner C, Oliveira C, Parkinson H, Denaxas S, Hemingway H, Jefferson E. Creating a next-generation phenotype library: the health data research UK Phenotype Library. JAMIA Open 2024;7:ooae049. [PMID: 38895652 PMCID: PMC11182945 DOI: 10.1093/jamiaopen/ooae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 02/12/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open

Abstract

Objective

To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms.

Materials and Methods

We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly.

Results

We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups.

Discussion

This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes.

Conclusions

We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.

Collapse

Affiliation(s)

Daniel S Thayer SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Shahzad Mumtaz Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, AB24 3UE, United Kingdom
Muhammad A Elmessary SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Ieuan Scanlon SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Artur Zinnurov SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Alex-Ioan Coldea SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Jack Scanlon SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Martin Chapman Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
Vasa Curcin Department of Population Health Sciences, King’s College London, London, SE1 1UL, United Kingdom
Ann John Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
Marcos DelPozo-Banos Adolescent Mental Health Data Platform and DATAMIND, Swansea University, Swansea, SA2 8PP, United Kingdom
Hannah Davies SAIL Databank, Medical School, Swansea University, Swansea, SA2 8PP, United Kingdom
Andreas Karwath Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
Georgios V Gkoutos Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
Natalie K Fitzpatrick Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom
Jennifer K Quint School of Public Health and National Heart and Lung Institute, Imperial College London, London, W12 0BZ, United Kingdom
Susheel Varma Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
Chris Milner Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
Carla Oliveira European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
Helen Parkinson European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
Spiros Denaxas Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom British Heart Foundation Data Science Center, Health Data Research United Kingdom, London, NW1 2BE, United Kingdom
Harry Hemingway Institute of Health Informatics, University College London, London, NW1 2DA, United Kingdom University College London Hospitals National Institute of Health Research Biomedical Research Centre, London, NW1 2BU, United Kingdom
Emily Jefferson Health Informatics Centre, School of Medicine, University of Dundee, Dundee, DD1 9SY, United Kingdom Health Data Research United Kingdom, London, NW1 2BE, United Kingdom

Collapse

Pan CX, He ZF, Lin SZ, Yue JQ, Chen ZM, Guan WJ. Clinical Characteristics and Outcomes of the Phenotypes of COPD-Bronchiectasis Association. Arch Bronconeumol 2024;60:356-363. [PMID: 38714385 DOI: 10.1016/j.arbres.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/16/2024] [Accepted: 04/03/2024] [Indexed: 05/09/2024]

Abstract

INTRODUCTION

Although COPD may frequently co-exist with bronchiectasis [COPD-bronchiectasis associated (CBA)], little is known regarding the clinical heterogeneity. We aimed to identify the phenotypes and compare the clinical characteristics and prognosis of CBA.

METHODS

We conducted a retrospective cohort study involving 2928 bronchiectasis patients, 5158 COPD patients, and 1219 patients with CBA hospitalized between July 2017 and December 2020. We phenotyped CBA with a two-step clustering approach and validated in an independent retrospective cohort with decision-tree algorithms.

RESULTS

Compared with patients with COPD or bronchiectasis alone, patients with CBA had significantly longer disease duration, greater lung function impairment, and increased use of intravenous antibiotics during hospitalization. We identified five clusters of CBA. Cluster 1 (N=120, CBA-MS) had predominantly moderate-severe bronchiectasis, Cluster 2 (N=108, CBA-FH) was characterized by frequent hospitalization within the previous year, Cluster 3 (N=163, CBA-BI) had bacterial infection, Cluster 4 (N=143, CBA-NB) had infrequent hospitalization but no bacterial infection, and Cluster 5 (N=113, CBA-NHB) had no hospitalization or bacterial infection in the past year. The decision-tree model predicted the cluster assignment in the validation cohort with 91.8% accuracy. CBA-MS, CBA-BI, and CBA-FH exhibited higher risks of hospital re-admission and intensive care unit admission compared with CBA-NHB during follow-up (all P<0.05). Of the five clusters, CBA-FH conferred the worst clinical prognosis.

CONCLUSION

Bronchiectasis severity, recent hospitalizations and sputum culture findings are three defining variables accounting for most heterogeneity of CBA, the characterization of which will help refine personalized clinical management.

Collapse

Affiliation(s)

Cui-Xia Pan State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Zhen-Feng He State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Sheng-Zhu Lin State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Jun-Qing Yue State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Zhao-Ming Chen State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Wei-Jie Guan State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, National Center for Respiratory Medicine, Department of Respiratory and Critical Care Medicine, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China; Guangzhou National Laboratory, Guangzhou, Guangdong, China.

Collapse

Calabria S, Ronconi G, Dondi L, Dondi L, Dell'Anno I, Nordon C, Rhodes K, Rogliani P, Dentali F, Martini N, Maggioni AP. Cardiovascular events after exacerbations of chronic obstructive pulmonary disease: Results from the EXAcerbations of COPD and their OutcomeS in CardioVascular diseases study in Italy. Eur J Intern Med 2024:S0953-6205(24)00181-X. [PMID: 38729787 DOI: 10.1016/j.ejim.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/16/2024] [Accepted: 04/27/2024] [Indexed: 05/12/2024]

Tate NM, Yamkate P, Xenoulis PG, Steiner JM, Behling‐Kelly EL, Rendahl AK, Wu Y, Furrow E. Clustering analysis of lipoprotein profiles to identify subtypes of hypertriglyceridemia in Miniature Schnauzers. J Vet Intern Med 2024;38:971-979. [PMID: 38348783 PMCID: PMC10937497 DOI: 10.1111/jvim.17010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 01/26/2024] [Indexed: 02/18/2024] Open

Abstract

BACKGROUND

Hypertriglyceridemia (HTG) is prevalent in Miniature Schnauzers, predisposing them to life-threatening diseases. Varied responses to management strategies suggest the possibility of multiple subtypes.

HYPOTHESIS/OBJECTIVE

To identify and characterize HTG subtypes in Miniature Schnauzers through cluster analysis of lipoprotein profiles. We hypothesize that multiple phenotypes of primary HTG exist in this breed.

ANIMALS

Twenty Miniature Schnauzers with normal serum triglyceride concentration (NTG), 25 with primary HTG, and 5 with secondary HTG.

METHODS

Cross-sectional study using archived samples. Lipoprotein profiles, generated using continuous lipoprotein density profiling, were clustered with hierarchical cluster analysis. Clinical data (age, sex, body condition score, and dietary fat content) was compared between clusters.

RESULTS

Six clusters were identified. Dogs with primary HTG were dispersed among 4 clusters. One cluster showed the highest intensities for triglyceride-rich lipoprotein (TRL) and low-density lipoprotein (LDL) fractions and also included 4 dogs with secondary HTG. Two clusters had moderately high TRL fraction intensities and low-to-intermediate LDL intensities. The fourth cluster had high LDL but variable TRL fraction intensities with equal numbers of NTG and mild HTG dogs. The final 2 clusters comprised only NTG dogs with low TRL intensities and low-to-intermediate LDL intensities. The clusters did not appear to be driven by differences in the clinical data.

CONCLUSIONS AND CLINICAL IMPORTANCE

The results of this study support a spectrum of lipoprotein phenotypes within Miniature Schnauzers that cannot be predicted by triglyceride concentration alone. Lipoprotein profiling might be useful to determine if subtypes have different origins, clinical consequences, and response to treatment.

Collapse

Rhodes JS, Aumon A, Morin S, Girard M, Larochelle C, Brunet-Ratnasingham E, Pagliuzza A, Marchitto L, Zhang W, Cutler A, Grand'Maison F, Zhou A, Finzi A, Chomont N, Kaufmann DE, Zandee S, Prat A, Wolf G, Moon KR. Gaining Biological Insights through Supervised Data Visualization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568384. [PMID: 38293135 PMCID: PMC10827133 DOI: 10.1101/2023.11.22.568384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]

Koblizek V, Milenkovic B, Svoboda M, Kocianova J, Holub S, Zindr V, Ilic M, Jankovic J, Cupurdija V, Jarkovsky J, Popov B, Valipour A. RETRO-POPE: A Retrospective, Multicenter, Real-World Study of All-Cause Mortality in COPD. Int J Chron Obstruct Pulmon Dis 2023;18:2661-2672. [PMID: 38022829 PMCID: PMC10661906 DOI: 10.2147/copd.s426919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 11/09/2023] [Indexed: 12/01/2023] Open

Abstract

Purpose

The Phenotypes of COPD in Central and Eastern Europe (POPE) study assessed the prevalence and clinical characteristics of four clinical COPD phenotypes, but not mortality. This retrospective analysis of the POPE study (RETRO-POPE) investigated the relationship between all-cause mortality and patient characteristics using two grouping methods: clinical phenotyping (as in POPE) and Burgel clustering, to better identify high-risk patients.

Patients and Methods

The two largest POPE study patient cohorts (Czech Republic and Serbia) were categorized into one of four clinical phenotypes (acute exacerbators [with/without chronic bronchitis], non-exacerbators, asthma-COPD overlap), and one of five Burgel clusters based on comorbidities, lung function, age, body mass index (BMI) and dyspnea (very severe comorbid, very severe respiratory, moderate-to-severe respiratory, moderate-to-severe comorbid/obese, and mild respiratory). Patients were followed-up for approximately 7 years for survival status.

Results

Overall, 801 of 1,003 screened patients had sufficient data for analysis. Of these, 440 patients (54.9%) were alive and 361 (45.1%) had died at the end of follow-up. Analysis of survival by clinical phenotype showed no significant differences between the phenotypes (P=0.211). However, Burgel clustering demonstrated significant differences in survival between clusters (P<0.001), with patients in the "very severe comorbid" and "very severe respiratory" clusters most likely to die. Overall survival was not significantly different between Serbia and the Czech Republic after adjustment for age, BMI, comorbidities and forced expiratory volume in 1 second (hazard ratio [HR] 0.80, 95% confidence interval [CI] 0.65-0.99; P=0.036 [unadjusted]; HR 0.88, 95% CI 0.7-1.1; P=0.257 [adjusted]). The most common causes of death were respiratory-related (36.8%), followed by cardiovascular (25.2%) then neoplasm (15.2%).

Conclusion

Patient clusters based on comorbidities, lung function, age, BMI and dyspnea were more likely to show differences in COPD mortality risk than phenotypes defined by exacerbation history and presence/absence of chronic bronchitis and/or asthmatic features.

Collapse

Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS One 2023;18:e0274276. [PMID: 37682909 PMCID: PMC10491005 DOI: 10.1371/journal.pone.0274276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open

Abstract

With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.

Collapse

Beijers RJHCG, Steiner MC, Schols AMWJ. The role of diet and nutrition in the management of COPD. Eur Respir Rev 2023;32:32/168/230003. [PMID: 37286221 DOI: 10.1183/16000617.0003-2023] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/27/2023] [Indexed: 06/09/2023] Open

Pikoula M, Kallis C, Madjiheurem S, Quint JK, Bafadhel M, Denaxas S. Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity. PLoS One 2023;18:e0287264. [PMID: 37319288 PMCID: PMC10270623 DOI: 10.1371/journal.pone.0287264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 06/01/2023] [Indexed: 06/17/2023] Open

Abstract

BACKGROUND

The ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated.

AIMS

Our aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease.

METHODS

Using data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient.

RESULTS

Each of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient.

CONCLUSIONS

Data transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline.

Collapse

Zhang B, Wang J, Chen J, Ling Z, Ren Y, Xiong D, Guo L. Machine learning in chronic obstructive pulmonary disease. Chin Med J (Engl) 2023;136:536-538. [PMID: 35946787 PMCID: PMC10106241 DOI: 10.1097/cm9.0000000000002247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Indexed: 11/26/2022] Open

Affiliation(s)

Bochao Zhang School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Jiping Wang School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Jing Chen School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Zongquan Ling School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Yuhao Ren School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Daxi Xiong School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China
Liquan Guo School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230026, China Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou, Jiangsu 215163, China

Collapse

Dashtban A, Mizani MA, Pasea L, Denaxas S, Corbett R, Mamza JB, Gao H, Morris T, Hemingway H, Banerjee A. Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine 2023;89:104489. [PMID: 36857859 PMCID: PMC9989643 DOI: 10.1016/j.ebiom.2023.104489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/31/2023] [Accepted: 02/06/2023] [Indexed: 03/01/2023] Open

Abstract

BACKGROUND

Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions.

METHODS

We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter).

FINDINGS

After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%).

MEDICATIONS

Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD.

INTERPRETATION

In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction.

FUNDING

AstraZeneca UK Ltd, Health Data Research UK.

Collapse

Agglomerative and divisive hierarchical Bayesian clustering. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Liu H, Dai H, Chen J, Xu J, Tao Y, Lin H. Interactive similar patient retrieval for visual summary of patient outcomes. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00898-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Li XF, Wan CQ, Mao YM. Analysis of pathogenesis and drug treatment of chronic obstructive pulmonary disease complicated with cardiovascular disease. Front Med (Lausanne) 2022;9:979959. [PMID: 36405582 PMCID: PMC9672343 DOI: 10.3389/fmed.2022.979959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/05/2022] [Indexed: 09/19/2023] Open

Zohdi H, Natale L, Scholkmann F, Wolf U. Intersubject Variability in Cerebrovascular Hemodynamics and Systemic Physiology during a Verbal Fluency Task under Colored Light Exposure: Clustering of Subjects by Unsupervised Machine Learning. Brain Sci 2022;12:1449. [PMID: 36358375 PMCID: PMC9688708 DOI: 10.3390/brainsci12111449] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 10/18/2023] Open

Hurst JR, Han MK, Singh B, Sharma S, Kaur G, de Nigris E, Holmgren U, Siddiqui MK. Prognostic risk factors for moderate-to-severe exacerbations in patients with chronic obstructive pulmonary disease: a systematic literature review. Respir Res 2022;23:213. [PMID: 35999538 PMCID: PMC9396841 DOI: 10.1186/s12931-022-02123-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 07/20/2022] [Indexed: 11/30/2022] Open

Abstract

Background

Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide. COPD exacerbations are associated with a worsening of lung function, increased disease burden, and mortality, and, therefore, preventing their occurrence is an important goal of COPD management. This review was conducted to identify the evidence base regarding risk factors and predictors of moderate-to-severe exacerbations in patients with COPD.

Methods

A literature review was performed in Embase, MEDLINE, MEDLINE In-Process, and the Cochrane Central Register of Controlled Trials (CENTRAL). Searches were conducted from January 2015 to July 2019. Eligible publications were peer-reviewed journal articles, published in English, that reported risk factors or predictors for the occurrence of moderate-to-severe exacerbations in adults age ≥ 40 years with a diagnosis of COPD.

Results

The literature review identified 5112 references, of which 113 publications (reporting results for 76 studies) met the eligibility criteria and were included in the review. Among the 76 studies included, 61 were observational and 15 were randomized controlled clinical trials. Exacerbation history was the strongest predictor of future exacerbations, with 34 studies reporting a significant association between history of exacerbations and risk of future moderate or severe exacerbations. Other significant risk factors identified in multiple studies included disease severity or bronchodilator reversibility (39 studies), comorbidities (34 studies), higher symptom burden (17 studies), and higher blood eosinophil count (16 studies).

Conclusions

This systematic literature review identified several demographic and clinical characteristics that predict the future risk of COPD exacerbations. Prior exacerbation history was confirmed as the most important predictor of future exacerbations. These prognostic factors may help clinicians identify patients at high risk of exacerbations, which are a major driver of the global burden of COPD, including morbidity and mortality.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12931-022-02123-5.

Collapse

Balbirsingh V, Mohammed AS, Turner AM, Newnham M. Cardiovascular disease in chronic obstructive pulmonary disease: a narrative review. Thorax 2022;77:thoraxjnl-2021-218333. [PMID: 35772939 DOI: 10.1136/thoraxjnl-2021-218333] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 06/06/2022] [Indexed: 11/04/2022]

Maurits MP, Korsunsky I, Raychaudhuri S, Murphy SN, Smoller JW, Weiss ST, Petukhova LM, Weng C, Wei WQ, Huizinga TWJ, Reinders MJT, Karlson EW, van den Akker EB, Knevel R. A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. J Am Med Inform Assoc 2022;29:761-769. [PMID: 35139533 PMCID: PMC9122640 DOI: 10.1093/jamia/ocac008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/24/2021] [Accepted: 01/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects.

MATERIAL AND METHODS

We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features.

RESULTS

We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles.

DISCUSSION

Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data.

CONCLUSION

We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.

Collapse

Affiliation(s)

Marc P Maurits Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
Ilya Korsunsky Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Soumya Raychaudhuri Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Shawn N Murphy Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
Jordan W Smoller Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Scott T Weiss Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Lynn M Petukhova Lynn M. Petukhova, Department of Dermatology at NewYork-Presbyterian/Columbia University Medical Center (CUMC)
Chunhua Weng Chunhua Weng, Biomedical Informatics - Columbia University
Wei-Qi Wei Wei-Qi Wei, Biomedical Informatics in the School of Medicine at Vanderbilt University Wei
Thomas W J Huizinga Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
Marcel J T Reinders Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands The Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
Elizabeth W Karlson Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
Erik B van den Akker Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
Rachel Knevel Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA

Collapse

MacRae C, Whittaker H, Mukherjee M, Daines L, Morgan A, Iwundu C, Alsallakh M, Vasileiou E, O’Rourke E, Williams AT, Stone PW, Sheikh A, Quint JK. Deriving a Standardised Recommended Respiratory Disease Codelist Repository for Future Research. Pragmat Obs Res 2022;13:1-8. [PMID: 35210898 PMCID: PMC8859726 DOI: 10.2147/por.s353400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/26/2022] [Indexed: 11/23/2022] Open

Abstract

Background

Electronic health record (EHR) databases provide rich, longitudinal data on interactions with healthcare providers and can be used to advance research into respiratory conditions. However, since these data are primarily collected to support health care delivery, clinical coding can be inconsistent, resulting in inherent challenges in using these data for research purposes.

Methods

We systematically searched existing international literature and UK code repositories to find respiratory disease codelists for asthma from January 2018, and chronic obstructive pulmonary disease and respiratory tract infections from January 2020, based on prior searches. Medline searches using key terms provided in article lists. Full-text articles, supplementary files, and reference lists were examined for codelists, and codelists repositories were searched. A reproducible methodology for codelists creation was developed with recommended lists for each disease created based on multidisciplinary expert opinion and previously published literature.

Results

Medline searches returned 1126 asthma articles, 70 COPD articles, and 90 respiratory infection articles, with 3%, 22% and 5% including codelists, respectively. Repository searching returned 12 asthma, 23 COPD, and 64 respiratory infection codelists. We have systematically compiled respiratory disease codelists and from these derived recommended lists for use by researchers to find the most up-to-date and relevant respiratory disease codelists that can be tailored to individual research questions.

Conclusion

Few published papers include codelists, and where published diverse codelists were used, even when answering similar research questions. Whilst some advances have been made, greater consistency and transparency across studies using routine data to study respiratory diseases are needed.

Collapse

Exarchos K, Aggelopoulou A, Oikonomou A, Biniskou T, Beli V, Antoniadou E, Kostikas K. Review of Artificial Intelligence techniques in Chronic Obstructive Lung Disease. IEEE J Biomed Health Inform 2021;26:2331-2338. [PMID: 34914601 DOI: 10.1109/jbhi.2021.3135838] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Alexander N, Alexander DC, Barkhof F, Denaxas S. Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning. BMC Med Inform Decis Mak 2021;21:343. [PMID: 34879829 PMCID: PMC8653614 DOI: 10.1186/s12911-021-01693-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/15/2021] [Indexed: 02/02/2023] Open

Abstract

BACKGROUND

Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable.

METHODS

We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets.

RESULTS

We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters.

CONCLUSION

Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.

Collapse

Cabrera C, Quélen C, Ouwens M, Hedman K, Rigney U, Quint JK. Evaluating a Cox marginal structural model to assess the comparative effectiveness of inhaled corticosteroids versus no inhaled corticosteroid treatment in chronic obstructive pulmonary disease. Ann Epidemiol 2021;67:19-28. [PMID: 34798296 DOI: 10.1016/j.annepidem.2021.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 10/20/2021] [Accepted: 11/04/2021] [Indexed: 12/25/2022]

Nikolaou V, Massaro S, Garn W, Fakhimi M, Stergioulas L, Price DB. Fast decliner phenotype of chronic obstructive pulmonary disease (COPD): applying machine learning for predicting lung function loss. BMJ Open Respir Res 2021;8:8/1/e000980. [PMID: 34716217 PMCID: PMC8559126 DOI: 10.1136/bmjresp-2021-000980] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/19/2021] [Indexed: 12/31/2022] Open

Nikolaou V, Massaro S, Garn W, Fakhimi M, Stergioulas L, Price D. The cardiovascular phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying machine learning to the prediction of cardiovascular comorbidities. Respir Med 2021;186:106528. [PMID: 34260974 DOI: 10.1016/j.rmed.2021.106528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/29/2021] [Accepted: 07/01/2021] [Indexed: 01/31/2023]

Brat K, Svoboda M, Zatloukal J, Plutinsky M, Volakova E, Popelkova P, Novotna B, Dvorak T, Koblizek V. The Relation Between Clinical Phenotypes, GOLD Groups/Stages and Mortality in COPD Patients - A Prospective Multicenter Study. Int J Chron Obstruct Pulmon Dis 2021;16:1171-1182. [PMID: 33953554 PMCID: PMC8089082 DOI: 10.2147/copd.s297087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 03/22/2021] [Indexed: 11/23/2022] Open

Abstract

Introduction

The concept of phenotyping emerged, reflecting specific clinical, pulmonary and extrapulmonary features of each particular chronic obstructive pulmonary disease (COPD) case. Our aim was to analyze prognostic utility of: “Czech“ COPD phenotypes and their most frequent combinations, ”Spanish” phenotypes and Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages + groups in relation to long-term mortality risk.

Methods

Data were extracted from the Czech Multicenter Research Database (CMRD) of COPD. Kaplan-Meier (KM) estimates (at 60 months from inclusion) were used for mortality assessment. Survival rates were calculated for the six elementary “Czech” phenotypes and their most frequent and relevant combinations, “Spanish” phenotypes, GOLD grades and groups. Statistically significant differences were tested by Log Rank test. An analysis of factors underlying mortality risk (the role of confounders) has been assessed with the use of classification and regression tree (CART) analysis. Basic factors showing significant differences between deceased and living patients were entered into the CART model. This showed six different risk groups, the differences in risk were tested by a Log Rank test.

Results

The cohort (n=720) was 73.1% men, with a mean age of 66.6 years and mean FEV₁ 44.4% pred. KM estimates showed bronchiectases/COPD overlap (HR 1.425, p=0.045), frequent exacerbator (HR 1.58, p<0.001), cachexia (HR 2.262, p<0.001) and emphysematous (HR 1.786, p=0.015) phenotypes associated with higher mortality risk. Co-presence of multiple phenotypes in a single patient had additive effect on risk; combination of emphysema, cachexia and frequent exacerbations translated into poorest prognosis (HR 3.075; p<0.001). Of the “Spanish” phenotypes, AE CB and AE non-CB were associated with greater risk of mortality (HR 1.787 and 2.001; both p=0.001). FEV₁% pred., cachexia and chronic heart failure in patient history were the major underlying factors determining mortality risk in our cohort.

Conclusion

Certain phenotypes (“Czech” or “Spanish”) of COPD are associated with higher risk of death. Co-presence of multiple phenotypes (emphysematous plus cachectic plus frequent exacerbator) in a single individual was associated with amplified risk of mortality.

Collapse

Coombes CE, Liu X, Abrams ZB, Coombes KR, Brock G. Simulation-derived best practices for clustering clinical data. J Biomed Inform 2021;118:103788. [PMID: 33862229 DOI: 10.1016/j.jbi.2021.103788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 03/23/2021] [Accepted: 04/11/2021] [Indexed: 11/18/2022]

Abstract

INTRODUCTION

Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.

METHODS

We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit.

RESULTS

HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets.

DISCUSSION

Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.

Collapse

Banerjee A, Chen S, Fatemifar G, Zeina M, Lumbers RT, Mielke J, Gill S, Kotecha D, Freitag DF, Denaxas S, Hemingway H. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. BMC Med 2021;19:85. [PMID: 33820530 PMCID: PMC8022365 DOI: 10.1186/s12916-021-01940-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/12/2021] [Indexed: 02/08/2023] Open

Abstract

BACKGROUND

Machine learning (ML) is increasingly used in research for subtype definition and risk prediction, particularly in cardiovascular diseases. No existing ML models are routinely used for cardiovascular disease management, and their phase of clinical utility is unknown, partly due to a lack of clear criteria. We evaluated ML for subtype definition and risk prediction in heart failure (HF), acute coronary syndromes (ACS) and atrial fibrillation (AF).

METHODS

For ML studies of subtype definition and risk prediction, we conducted a systematic review in HF, ACS and AF, using PubMed, MEDLINE and Web of Science from January 2000 until December 2019. By adapting published criteria for diagnostic and prognostic studies, we developed a seven-domain, ML-specific checklist.

RESULTS

Of 5918 studies identified, 97 were included. Across studies for subtype definition (n = 40) and risk prediction (n = 57), there was variation in data source, population size (median 606 and median 6769), clinical setting (outpatient, inpatient, different departments), number of covariates (median 19 and median 48) and ML methods. All studies were single disease, most were North American (n = 61/97) and only 14 studies combined definition and risk prediction. Subtype definition and risk prediction studies respectively had limitations in development (e.g. 15.0% and 78.9% of studies related to patient benefit; 15.0% and 15.8% had low patient selection bias), validation (12.5% and 5.3% externally validated) and impact (32.5% and 91.2% improved outcome prediction; no effectiveness or cost-effectiveness evaluations).

CONCLUSIONS

Studies of ML in HF, ACS and AF are limited by number and type of included covariates, ML methods, population size, country, clinical setting and focus on single diseases, not overlap or multimorbidity. Clinical utility and implementation rely on improvements in development, validation and impact, facilitated by simple checklists. We provide clear steps prior to safe implementation of machine learning in clinical practice for cardiovascular diseases and other disease areas.

Collapse

Affiliation(s)

Amitava Banerjee Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK. Health Data Research UK, University College London, London, UK. University College London Hospitals NHS Trust, 235 Euston Road, London, UK. Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK.
Suliang Chen Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK Health Data Research UK, University College London, London, UK
Ghazaleh Fatemifar Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK Health Data Research UK, University College London, London, UK
Mohamad Zeina Medical School, King's College London, London, UK
R Thomas Lumbers Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK Health Data Research UK, University College London, London, UK University College London Hospitals NHS Trust, 235 Euston Road, London, UK
Johanna Mielke Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
Simrat Gill University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Dipak Kotecha University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK Department of Cardiology, University Medical Centre Utrecht, Utrecht, the Netherlands
Daniel F Freitag Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
Spiros Denaxas Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK Health Data Research UK, University College London, London, UK The Alan Turing Institute, London, UK
Harry Hemingway Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK Health Data Research UK, University College London, London, UK University College London Hospitals Biomedical Research Centre (UCLH BRC), London, UK

Collapse

Coombes CE, Abrams ZB, Nakayiza S, Brock G, Coombes KR. Umpire 2.0: Simulating realistic, mixed-type, clinical data for machine learning. F1000Res 2021. [DOI: 10.12688/f1000research.25877.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Bohn L, Zheng Y, McFall GP, Dixon RA. Portals to frailty? Data-driven analyses detect early frailty profiles. Alzheimers Res Ther 2021;13:1. [PMID: 33397495 PMCID: PMC7780374 DOI: 10.1186/s13195-020-00736-w] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/29/2020] [Indexed: 03/21/2023]

Abstract

BACKGROUND

Frailty is an aging condition that reflects multisystem decline and an increased risk for adverse outcomes, including differential cognitive decline and impairment. Two prominent approaches for measuring frailty are the frailty phenotype and the frailty index. We explored a complementary data-driven approach for frailty assessment that could detect early frailty profiles (or subtypes) in relatively healthy older adults. Specifically, we tested whether (1) modalities of early frailty profiles could be empirically determined, (2) the extracted profiles were differentially related to longitudinal cognitive decline, and (3) the profile and prediction patterns were robust for males and females.

METHODS

Participants (n = 649; M age = 70.61, range 53-95) were community-dwelling older adults from the Victoria Longitudinal Study who contributed data for baseline multi-morbidity assessment and longitudinal cognitive trajectory analyses. An exploratory factor analysis on 50 multi-morbidity items produced 7 separable health domains. The proportion of deficits in each domain was calculated and used as continuous indicators in a data-driven latent profile analysis (LPA). We subsequently examined how frailty profiles related to the level and rate of change in a latent neurocognitive speed variable.

RESULTS

LPA results distinguished three profiles: not-clinically-frail (NCF; characterized by limited impairment across indicators; 84%), mobility-type frailty (MTF; characterized by impaired mobility function; 9%), and respiratory-type frailty (RTF; characterized by impaired respiratory function; 7%). These profiles showed differential neurocognitive slowing, such that MTF was associated with the steepest decline, followed by RTF, and then NCF. The baseline frailty index scores were the highest for MTF and RTF and increased over time. All observations were robust across sex.

CONCLUSIONS

A data-driven approach to early frailty assessment detected differentiable profiles that may be characterized as morbidity-intensive portals into broader and chronic frailty. Early inventions targeting mobility or respiratory deficits may have positive downstream effects on frailty progression and cognitive decline.

Collapse

Feng Y, Wang Y, Zeng C, Mao H. Artificial Intelligence and Machine Learning in Chronic Airway Diseases: Focus on Asthma and Chronic Obstructive Pulmonary Disease. Int J Med Sci 2021;18:2871-2889. [PMID: 34220314 PMCID: PMC8241767 DOI: 10.7150/ijms.58191] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/20/2021] [Indexed: 02/05/2023] Open

Lai AG, Pasea L, Banerjee A, Hall G, Denaxas S, Chang WH, Katsoulis M, Williams B, Pillay D, Noursadeghi M, Linch D, Hughes D, Forster MD, Turnbull C, Fitzpatrick NK, Boyd K, Foster GR, Enver T, Nafilyan V, Humberstone B, Neal RD, Cooper M, Jones M, Pritchard-Jones K, Sullivan R, Davie C, Lawler M, Hemingway H. Estimated impact of the COVID-19 pandemic on cancer services and excess 1-year mortality in people with cancer and multimorbidity: near real-time data on cancer care, cancer deaths and a population-based cohort study. BMJ Open 2020;10:e043828. [PMID: 33203640 PMCID: PMC7674020 DOI: 10.1136/bmjopen-2020-043828] [Citation(s) in RCA: 185] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/20/2020] [Accepted: 10/23/2020] [Indexed: 12/30/2022] Open

Abstract

OBJECTIVES

To estimate the impact of the COVID-19 pandemic on cancer care services and overall (direct and indirect) excess deaths in people with cancer.

METHODS

We employed near real-time weekly data on cancer care to determine the adverse effect of the pandemic on cancer services. We also used these data, together with national death registrations until June 2020 to model deaths, in excess of background (pre-COVID-19) mortality, in people with cancer. Background mortality risks for 24 cancers with and without COVID-19-relevant comorbidities were obtained from population-based primary care cohort (Clinical Practice Research Datalink) on 3 862 012 adults in England.

RESULTS

Declines in urgent referrals (median=-70.4%) and chemotherapy attendances (median=-41.5%) to a nadir (lowest point) in the pandemic were observed. By 31 May, these declines have only partially recovered; urgent referrals (median=-44.5%) and chemotherapy attendances (median=-31.2%). There were short-term excess death registrations for cancer (without COVID-19), with peak relative risk (RR) of 1.17 at week ending on 3 April. The peak RR for all-cause deaths was 2.1 from week ending on 17 April. Based on these findings and recent literature, we modelled 40% and 80% of cancer patients being affected by the pandemic in the long-term. At 40% affected, we estimated 1-year total (direct and indirect) excess deaths in people with cancer as between 7165 and 17 910, using RRs of 1.2 and 1.5, respectively, where 78% of excess deaths occured in patients with ≥1 comorbidity.

CONCLUSIONS

Dramatic reductions were detected in the demand for, and supply of, cancer services which have not fully recovered with lockdown easing. These may contribute, over a 1-year time horizon, to substantial excess mortality among people with cancer and multimorbidity. It is urgent to understand how the recovery of general practitioner, oncology and other hospital services might best mitigate these long-term excess mortality risks.

Collapse

Affiliation(s)

Alvina G Lai Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK
Laura Pasea Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK
Amitava Banerjee Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK
Geoff Hall DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK Leeds Institute of Medical Research, University of Leeds, Leeds, UK Leeds Teaching Hospitals NHS Trust, Leeds, UK
Spiros Denaxas Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK University College London Hospitals NIHR Biomedical Research Centre, London, UK The Alan Turing Institute, London, UK
Wai Hoong Chang Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK
Michail Katsoulis Institute of Health Informatics, University College London, London, UK
Bryan Williams University College London Hospitals NIHR Biomedical Research Centre, London, UK Institute of Cardiovascular Science, University College London, London, UK University College London Hospitals NHS Trust, London, UK
Deenan Pillay Division of Infection and Immunity, University College London, London, UK
Mahdad Noursadeghi Division of Infection and Immunity, University College London, London, UK
David Linch University College London Hospitals NIHR Biomedical Research Centre, London, UK Department of Hematology, University College London Cancer Institute, London, UK
Derralynn Hughes University College London Cancer Institute, London, UK Royal Free NHS Foundation Trust, London, UK
Martin D Forster University College London Hospitals NHS Trust, London, UK University College London Cancer Institute, London, UK
Clare Turnbull Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
Natalie K Fitzpatrick Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK
Kathryn Boyd Northern Ireland Cancer Network, Northern Ireland, UK
Graham R Foster Barts Liver Centre, Blizard Institute, Queen Mary University of London, London, UK
Tariq Enver University College London Cancer Institute, London, UK
Vahe Nafilyan Office for National Statistics, London, UK
Ben Humberstone Office for National Statistics, London, UK
Richard D Neal Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
Matt Cooper DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK Leeds Institute of Medical Research, University of Leeds, Leeds, UK
Monica Jones DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK Leeds Institute of Medical Research, University of Leeds, Leeds, UK
Kathy Pritchard-Jones DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK UCLPartners Academic Health Science Partnership, London, UK Centre for Cancer Outcomes, University College London Hospitals NHS Foundation Trust, London, UK UCL Great Ormond Street Institute of Child Health, University College London, London, UK
Richard Sullivan Conflict and Health Research Group, Institute of Cancer Policy, King's College London, London, UK
Charlie Davie DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK Royal Free NHS Foundation Trust, London, UK UCLPartners Academic Health Science Partnership, London, UK
Mark Lawler DATA-CAN, Health Data Research UK hub for cancer hosted by UCLPartners, London, UK Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK
Harry Hemingway Institute of Health Informatics, University College London, London, UK Health Data Research UK, University College London, London, UK University College London Hospitals NIHR Biomedical Research Centre, London, UK

Collapse

Zucchi JW, Franco EAT, Schreck T, Castro e Silva MH, Migliorini SRDS, Garcia T, Mota GAF, de Morais BEB, Machado LHS, Batista ANR, de Paiva SAR, de Godoy I, Tanni SE. Different Clusters in Patients with Chronic Obstructive Pulmonary Disease (COPD): A Two-Center Study in Brazil. Int J Chron Obstruct Pulmon Dis 2020;15:2847-2856. [PMID: 33192058 PMCID: PMC7654519 DOI: 10.2147/copd.s268332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 09/06/2020] [Indexed: 11/23/2022] Open

Abstract

Background

Chronic obstructive pulmonary disease (COPD) has a functional definition. However, differences in clinical characteristics and systemic manifestations make COPD a heterogeneous disease and some manifestations have been associated with different risks of acute exacerbations, hospitalizations, and death.

Objective

Therefore, the objective of the study was to evaluate possible clinical clusters in COPD at two study centers in Brazil and identify the associated exacerbation and mortality rate during 1 year of follow-up.

Methods

We included patients with COPD and all underwent an evaluation composed of the Charlson Index, body mass index (BMI), current pharmacological treatment, smoking history (packs-year), history of exacerbations/hospitalizations in the last year, spirometry, six-minute walking test (6MWT), quality of life questionnaires, dyspnea, and hospital anxiety and depression scale. Blood samples were also collected for measurements of C-reactive protein (CRP), blood gases, laboratory analysis, and blood count. For the construction of the clusters, 13 continuous variables of clinical importance were considered: hematocrit, CRP, triglycerides, low density lipoprotein, absolute number of peripheral eosinophils, age, pulse oximetry, BMI, forced expiratory volume in the first second, dyspnea, 6MWD, total score of the Saint George Respiratory Questionnaire and packs-year of smoking. We used the Ward and K-means methods and determined the best silhouette value to identify similarities of individuals within the cluster (cohesion) in relation to the other clusters (separation). The number of clusters was determined by the heterogeneity values of the cluster, which in this case was determined as four clusters.

Results

We evaluated 301 COPD patients and identified four different groups of COPD patients. The first cluster (203 patients) was characterized by fewer symptoms and lower functional severity of the disease, the second cluster by higher values of peripheral eosinophils, the third cluster by more systemic inflammation and the fourth cluster by greater obstructive severity and worse gas exchange. Cluster 2 had an average of 959±3 peripheral eosinophils, cluster 3 had a higher prevalence of nutritional depletion (46.1%), and cluster 4 had a higher BODE index. Regarding the associated comorbidities, we found that only obstructive sleep apnea syndrome and pulmonary thromboembolism were more prevalent in cluster 4. Almost 50% of all patients presented an exacerbation during 1 year of follow-up. However, it was higher in cluster 4, with 65% of all patients having at least one exacerbation. The mortality rate was statistically higher in cluster 4, with 26.9%, vs 9.6% in cluster 1.

Conclusion

We could identify four clinical different clusters in these COPD populations, that were related to different clinical manifestations, comorbidities, exacerbation, and mortality rate. We also identified a specific cluster with higher values of peripheral eosinophils.

Collapse

Zhuang H, Cui J, Liu T, Wang H. A physical model inspired density peak clustering. PLoS One 2020;15:e0239406. [PMID: 32970727 PMCID: PMC7514087 DOI: 10.1371/journal.pone.0239406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 09/05/2020] [Indexed: 12/02/2022] Open

Abstract

Clustering is an important technology of data mining, which plays a vital role in bioscience, social network and network analysis. As a clustering algorithm based on density and distance, density peak clustering is extensively used to solve practical problems. The algorithm assumes that the clustering center has a larger local density and is farther away from the higher density points. However, the density peak clustering algorithm is highly sensitive to density and distance and cannot accurately identify clusters in a dataset having significant differences in cluster structure. In addition, the density peak clustering algorithm's allocation strategy can easily cause attached allocation errors in data point allocation. To solve these problems, this study proposes a potential-field-diffusion-based density peak clustering. As compared to existing clustering algorithms, the advantages of the potential-field-diffusion-based density peak clustering algorithm is three-fold: 1) The potential field concept is introduced in the proposed algorithm, and a density measure based on the potential field's diffusion is proposed. The cluster center can be accurately selected using this measure. 2) The potential-field-diffusion-based density peak clustering algorithm defines the judgment conditions of similar points and adopts different allocation strategies for dissimilar points to avoid attached errors in data point allocation. 3) This study conducted many experiments on synthetic and real-world datasets. Results demonstrate that the proposed potential-field-diffusion-based density peak clustering algorithm achieves excellent clustering effect and is suitable for complex datasets of different sizes, dimensions, and shapes. Besides, the proposed potential-field-diffusion-based density peak clustering algorithm shows particularly excellent performance on variable density and nonconvex datasets.

Collapse

Nikolaou V, Massaro S, Fakhimi M, Stergioulas L, Price D. COPD phenotypes and machine learning cluster analysis: A systematic review and future research agenda. Respir Med 2020;171:106093. [PMID: 32745966 DOI: 10.1016/j.rmed.2020.106093] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 12/21/2022]

Coombes CE, Abrams ZB, Li S, Abruzzo LV, Coombes KR. Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia. J Am Med Inform Assoc 2020;27:1019-1027. [PMID: 32483590 PMCID: PMC7647286 DOI: 10.1093/jamia/ocaa060] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Revised: 04/08/2020] [Accepted: 04/24/2020] [Indexed: 12/22/2022] Open

Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020;5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.3] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/11/2020] [Indexed: 12/13/2022] Open

Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020;5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/01/2020] [Indexed: 11/04/2023] Open

Banerjee A, Pasea L, Harris S, Gonzalez-Izquierdo A, Torralbo A, Shallcross L, Noursadeghi M, Pillay D, Sebire N, Holmes C, Pagel C, Wong WK, Langenberg C, Williams B, Denaxas S, Hemingway H. Estimating excess 1-year mortality associated with the COVID-19 pandemic according to underlying conditions and age: a population-based cohort study. Lancet 2020;395:1715-1725. [PMID: 32405103 PMCID: PMC7217641 DOI: 10.1016/s0140-6736(20)30854-0] [Citation(s) in RCA: 312] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/02/2020] [Accepted: 04/06/2020] [Indexed: 01/19/2023]

Abstract

BACKGROUND

The medical, societal, and economic impact of the coronavirus disease 2019 (COVID-19) pandemic has unknown effects on overall population mortality. Previous models of population mortality are based on death over days among infected people, nearly all of whom thus far have underlying conditions. Models have not incorporated information on high-risk conditions or their longer-term baseline (pre-COVID-19) mortality. We estimated the excess number of deaths over 1 year under different COVID-19 incidence scenarios based on varying levels of transmission suppression and differing mortality impacts based on different relative risks for the disease.

METHODS

In this population-based cohort study, we used linked primary and secondary care electronic health records from England (Health Data Research UK-CALIBER). We report prevalence of underlying conditions defined by Public Health England guidelines (from March 16, 2020) in individuals aged 30 years or older registered with a practice between 1997 and 2017, using validated, openly available phenotypes for each condition. We estimated 1-year mortality in each condition, developing simple models (and a tool for calculation) of excess COVID-19-related deaths, assuming relative impact (as relative risks [RRs]) of the COVID-19 pandemic (compared with background mortality) of 1·5, 2·0, and 3·0 at differing infection rate scenarios, including full suppression (0·001%), partial suppression (1%), mitigation (10%), and do nothing (80%). We also developed an online, public, prototype risk calculator for excess death estimation.

FINDINGS

We included 3 862 012 individuals (1 957 935 [50·7%] women and 1 904 077 [49·3%] men). We estimated that more than 20% of the study population are in the high-risk category, of whom 13·7% were older than 70 years and 6·3% were aged 70 years or younger with at least one underlying condition. 1-year mortality in the high-risk population was estimated to be 4·46% (95% CI 4·41-4·51). Age and underlying conditions combined to influence background risk, varying markedly across conditions. In a full suppression scenario in the UK population, we estimated that there would be two excess deaths (vs baseline deaths) with an RR of 1·5, four with an RR of 2·0, and seven with an RR of 3·0. In a mitigation scenario, we estimated 18 374 excess deaths with an RR of 1·5, 36 749 with an RR of 2·0, and 73 498 with an RR of 3·0. In a do nothing scenario, we estimated 146 996 excess deaths with an RR of 1·5, 293 991 with an RR of 2·0, and 587 982 with an RR of 3·0.

INTERPRETATION

We provide policy makers, researchers, and the public a simple model and an online tool for understanding excess mortality over 1 year from the COVID-19 pandemic, based on age, sex, and underlying condition-specific estimates. These results signal the need for sustained stringent suppression measures as well as sustained efforts to target those at highest risk because of underlying conditions with a range of preventive interventions. Countries should assess the overall (direct and indirect) effects of the pandemic on excess mortality.

FUNDING

National Institute for Health Research University College London Hospitals Biomedical Research Centre, Health Data Research UK.

Collapse

Horne E, Tibble H, Sheikh A, Tsanas A. Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping. JMIR Med Inform 2020;8:e16452. [PMID: 32463370 PMCID: PMC7290450 DOI: 10.2196/16452] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/10/2019] [Accepted: 02/10/2020] [Indexed: 12/27/2022] Open

Abstract

Background

In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging.

Objective

This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies.

Methods

We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process.

Results

Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification.

Conclusions

This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.

Collapse

Loughran KJ, Atkinson G, Beauchamp MK, Dixon J, Martin D, Rahim S, Harrison SL. Balance impairment in individuals with COPD: a systematic review with meta-analysis. Thorax 2020;75:539-546. [PMID: 32409612 DOI: 10.1136/thoraxjnl-2019-213608] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 01/29/2020] [Accepted: 03/06/2020] [Indexed: 11/04/2022]

Abstract

BACKGROUND

People with chronic obstructive pulmonary disease (COPD) are four times more likely to fall than healthy peers, leading to increased morbidity and mortality. Poor balance is a major risk factor for falls. This review aims to quantify the extent of balance impairment in COPD, and establish contributing clinical factors, which at present are sparse.

METHODS

Five electronic databases were searched, in July 2017 and updated searches were performed in March 2019, for studies comparing balance in COPD with healthy controls. Meta-analyses were conducted on sample mean differences (MD) and reported correlations between balance and clinical factors. Meta-regression was used to quantify the association between mean difference in percentage predicted forced expiratory volume in 1 s (FEV₁) and mean balance impairment. Narrative summaries were provided where data were insufficient for meta-analysis.

RESULTS

Twenty-three studies were included (n=2751). Meta-analysis indicated COPD patients performed worse than healthy controls on timed up and go (MD=2.77 s, 95% CI 1.46 s to 4.089 s, p=<0.005), single leg stance (MD=-11.75 s, 95% CI -15.12 s to -8.38 s, p=<0.005) and berg balance scale (MD=-6.66, 95% CI -8.95 to -4.37, p=<0.005). The pooled correlation coefficient between balance and reduced quadriceps strength was weak-moderate (r=0.37, 95% CI 0.23 to 0.45, p=<0.005). The relationship between differences in percentage predicted FEV₁ and balance were negligible (r² =<0.04).

CONCLUSIONS

Compared with healthy controls, people with COPD have a clinically meaningful balance reduction, which may be related to reduced muscle strength, physical activity and exercise capacity. Our findings support a need to expand the focus of pulmonary rehabilitation to include balance assessment and training, and further exploration of balance impairment in COPD. PROSPERO registration number CRD4201769041.

Collapse

Wu JJ, Xu HR, Zhang YX, Li YX, Yu HY, Jiang LD, Wang CX, Han M. The characteristics of the frequent exacerbator with chronic bronchitis phenotype and non-exacerbator phenotype in patients with chronic obstructive pulmonary disease: a meta-analysis and system review. BMC Pulm Med 2020;20:103. [PMID: 32326924 PMCID: PMC7181594 DOI: 10.1186/s12890-020-1126-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 03/27/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Chronic obstructive pulmonary disease (COPD) patients with different phenotypes show different clinical characteristics. Therefore, we conducted a meta-analysis to explore the clinical characteristics between the non-exacerbator (NE) phenotype and the frequent exacerbator with chronic bronchitis (FE-CB) phenotype among patients with COPD.

METHODS

CNKI, Wan fang, Chongqing VIP, China Biology Medicine disc, PubMed, Cochrane Library, and EMBASE databases were searched from the times of their inception to April 30, 2019. All studies that reported the clinical characteristics of the COPD phenotypes and which met the inclusion criteria were included. The quality assessment was analyzed by Cross-Sectional/Prevalence Study Quality recommendations. The meta-analysis was carried out using RevMan5.3.

RESULTS

Ten cross-sectional observation studies (n = 8848) were included. Compared with the NE phenotype, patients with the FE-CB phenotype showed significantly lower forced expiratory volume in 1 s percent predicted (FEV₁%pred) (mean difference (MD) -8.50, 95% CI -11.36--5.65, P < 0.001, I² = 91%), forced vital capacity percent predicted (FVC%pred) [MD - 6.69, 95% confidence interval (CI) -7.73--5.65, P < 0.001, I² = 5%], and forced expiratory volume in 1 s/forced vital capacity (FEV₁/FVC) (MD -3.76, 95% CI -4.58--2.95,P < 0.001, I² = 0%); in contrast, Charlson comorbidity index (MD 0.47, 95% CI 0.37-0.58, P < 0.001, I² = 0], COPD assessment test (CAT) score (MD 5.61, 95% CI 4.62-6.60, P < 0.001, I² = 80%), the quantity of cigarettes smoked (pack-years) (MD 3.09, 95% CI 1.60-4.58, P < 0.001, I² = 41%), exacerbations in previous year (2.65, 95% CI 2.32-2.97, P < 0.001, I² = 91%), modified Medical British Research Council (mMRC) score (MD 0.72, 95% CI 0.63-0.82, P < 0.001, I² = 57%), and body mass index (BMI), obstruction, dyspnea, exacerbations (BODEx) (MD 1.78, 95% CI 1.28-2.28, P < 0.001, I² = 91%), I² = 34%) were significantly higher in patients with FE-CB phenotype. No significant between-group difference was observed with respect to BMI (MD-0.14, 95% CI -0.70-0.42, P = 0.62, I² = 75%).

CONCLUSION

COPD patients with the FE-CB phenotype had worse pulmonary function and higher CAT score, mMRC scores, frequency of acute exacerbations, and the quantity of cigarettes smoked (pack-years) than those with the NE phenotype.

Collapse

Carrillo-Larco RM, Castillo-Cara M. Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach. Wellcome Open Res 2020;5:56. [PMID: 32587900 PMCID: PMC7308996 DOI: 10.12688/wellcomeopenres.15819.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2020] [Indexed: 11/04/2023] Open

Aldibbiat AM, Al-Sharefi A. Do Benefits Outweigh Risks for Corticosteroid Therapy in Acute Exacerbation of Chronic Obstructive Pulmonary Disease in People with Diabetes Mellitus? Int J Chron Obstruct Pulmon Dis 2020;15:567-574. [PMID: 32214806 PMCID: PMC7084124 DOI: 10.2147/copd.s236305] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/21/2020] [Indexed: 12/22/2022] Open

Ahn GY, Lee J, Won S, Ha E, Kim H, Nam B, Kim JS, Kang J, Kim JH, Song GG, Kim K, Bae SC. Identifying damage clusters in patients with systemic lupus erythematosus. Int J Rheum Dis 2019;23:84-91. [PMID: 31762221 DOI: 10.1111/1756-185x.13745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 09/19/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022]

Sánchez-Rico M, Alvarado JM. A Machine Learning Approach for Studying the Comorbidities of Complex Diagnoses. Behav Sci (Basel) 2019;9:E122. [PMID: 31766665 PMCID: PMC6960661 DOI: 10.3390/bs9120122] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 11/16/2019] [Accepted: 11/20/2019] [Indexed: 02/08/2023] Open

Antonelli Incalzi R, Canonica GW, Scichilone N, Rizzoli S, Simoni L, Blasi F. The COPD multi-dimensional phenotype: A new classification from the STORICO Italian observational study. PLoS One 2019;14:e0221889. [PMID: 31518364 PMCID: PMC6743765 DOI: 10.1371/journal.pone.0221889] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 08/17/2019] [Indexed: 12/03/2022] Open

Impact of Disease-Specific Fears on Pulmonary Rehabilitation Trajectories in Patients with COPD. J Clin Med 2019;8:jcm8091460. [PMID: 31540306 PMCID: PMC6780973 DOI: 10.3390/jcm8091460] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/23/2023] Open