1
|
Bashir MBA, Basna R, Zhang GQ, Backman H, Lindberg A, Ekerljung L, Axelsson M, Hedman L, Vanfleteren L, Lundbäck B, Rönmark E, Nwaru BI. Computational phenotyping of obstructive airway diseases: protocol for a systematic review. Syst Rev 2022; 11:216. [PMID: 36229872 PMCID: PMC9559879 DOI: 10.1186/s13643-022-02078-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 09/18/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Over the last decade, computational sciences have contributed immensely to characterization of phenotypes of airway diseases, but it is difficult to compare derived phenotypes across studies, perhaps as a result of the different decisions that fed into these phenotyping exercises. We aim to perform a systematic review of studies using computational approaches to phenotype obstructive airway diseases in children and adults. METHODS AND ANALYSIS We will search PubMed, Embase, Scopus, Web of Science, and Google Scholar for papers published between 2010 and 2020. Conferences proceedings, reference list of included papers, and experts will form additional sources of literature. We will include observational epidemiological studies that used a computational approach to derive phenotypes of chronic airway diseases, whether in a general population or in a clinical setting. Two reviewers will independently screen the retrieved studies for eligibility, extract relevant data, and perform quality appraisal of included studies. A third reviewer will arbitrate any disagreements in these processes. Quality appraisal of the studies will be undertaken using the Effective Public Health Practice Project quality assessment tool. We will use summary tables to describe the included studies. We will narratively synthesize the generated evidence, providing critical assessment of the populations, variables, and computational approaches used in deriving the phenotypes across studies CONCLUSION: As progress continues to be made in the area of computational phenotyping of chronic obstructive airway diseases, this systematic review, the first on this topic, will provide the state of the art on the field and highlight important perspectives for future works. ETHICS AND DISSEMINATION No ethical approval is needed for this work is based only on the published literature and does not involve collection of any primary or human data. REGISTRATION AND REPORTING SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42020164898.
Collapse
Affiliation(s)
- Muwada Bashir Awad Bashir
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden.
| | - Rani Basna
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden
| | - Guo-Qiang Zhang
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden
| | - Helena Backman
- Section of Sustainable Health/the OLIN Unit, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - Anne Lindberg
- Section of Medicine/the OLIN Unit, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - Linda Ekerljung
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden
| | - Malin Axelsson
- Department of Care Science, Faculty of Health and Society, Malmö University, Malmö, Sweden
| | - Linnea Hedman
- Department of Health Sciences, Luleå University of Technology, Luleå, Sweden
| | - Lowie Vanfleteren
- COPD Center, Sahlgrenska University Hospital, University of Gothenburg, Gothenburg, Sweden
| | - Bo Lundbäck
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden
| | - Eva Rönmark
- Section of Sustainable Health/the OLIN Unit, Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | - Bright I Nwaru
- Krefting Research Centre, Institute of Medicine, University of Gothenburg, SE-405 30, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
2
|
Odajiu I, Covantsev S, Sivapalan P, Mathioudakis AG, Jensen JUS, Davidescu EI, Chatzimavridou-Grigoriadou V, Corlateanu A. Peripheral neuropathy: A neglected cause of disability in COPD - A narrative review. Respir Med 2022; 201:106952. [PMID: 36029697 DOI: 10.1016/j.rmed.2022.106952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/17/2022] [Accepted: 08/05/2022] [Indexed: 11/30/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is a chronic inflammatory syndrome with systemic involvement leading to various cardiovascular, metabolic, and neurological comorbidities. It is well known that conditions associated with oxygen deprivation and metabolic disturbance are associated with polyneuropathy, but current data regarding the relationship between COPD and peripheral nervous system pathology is limited. This review summarizes the available data on the association between COPD and polyneuropathy, including possible pathophysiological mechanisms such as the role of hypoxia, proinflammatory state, and smoking in nerve damage; the role of cardiovascular and metabolic comorbidities, as well as the diagnostic methods and screening tools for identifying polyneuropathy. Furthermore, it outlines the available options for managing and preventing polyneuropathy in COPD patients. Overall, current data suggest that optimal screening strategies to diagnose polyneuropathy early should be implemented in COPD patients due to their relatively common association and the additional burden of polyneuropathy on quality of life.
Collapse
Affiliation(s)
- Irina Odajiu
- Department of Neurology, Colentina Clinical Hospital, Bucharest, Romania
| | | | - Pradeesh Sivapalan
- Department of Medicine, Section of Respiratory Medicine, Herlev and Gentofte Hospital, University of Copenhagen, Hellerup, Denmark
| | - Alexander G Mathioudakis
- Division of Infection, Immunity and Respiratory Medicine, School of Biological Sciences, The University of Manchester, Manchester Academic Health Science Centre, UK; The North-West Lung Centre, Wythenshawe Hospital, Manchester University NHS Foundation Trust, Manchester, UK.
| | - Jens-Ulrik Stæhr Jensen
- Department of Medicine, Section of Respiratory Medicine, Herlev and Gentofte Hospital, University of Copenhagen, Hellerup, Denmark; Department of Clinical Medicine, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Eugenia Irene Davidescu
- Department of Neurology, Colentina Clinical Hospital, Bucharest, Romania; Department of Clinical Neurosciences, "Carol Davila" University of Medicine and Pharmacy, Bucharest, Romania
| | | | - Alexandru Corlateanu
- Department of Respiratory Medicine, State University of Medicine and Pharmacy "Nicolae Testemitanu", Chisinau, Moldavia.
| |
Collapse
|
3
|
Mohamed I. Prediction of Chronic Obstructive Pulmonary Disease Stages Using Machine Learning Algorithms. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY 2022. [DOI: 10.4018/ijdsst.286693] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Identifying chronic obstructive pulmonary disease (COPD) severity stages is of great importance to control the related mortality rates and reduce the associated costs. This study aims to build prediction models for COPD stages and, to compare the relative performance of five machine learning algorithms to determine the optimal prediction algorithm. This research is based on data collected from a private hospital in Egypt for the two calendar years 2018 and 2019. Five machine learning algorithms were used for the comparison. The F1 score, specificity, sensitivity, accuracy, positive predictive value and negative predictive value were the performance measures used for algorithms comparison. Analysis included 211 patients’ records. Our results show that the best performing algorithm in most of the disease stages is the PNN with the optimal prediction accuracy and hence it can be considered as a powerful prediction tool used by decision makers in predicting severity stages of COPD.
Collapse
|
4
|
Lund B, Ma J. A review of cluster analysis techniques and their uses in library and information science research: k-means and k-medoids clustering. PERFORMANCE MEASUREMENT AND METRICS 2021. [DOI: 10.1108/pmm-05-2021-0026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis literature review explores the definitions and characteristics of cluster analysis, a machine-learning technique that is frequently implemented to identify groupings in big datasets and its applicability to library and information science (LIS) research. This overview is intended for researchers who are interested in expanding their data analysis repertory to include cluster analysis, rather than for existing experts in this area.Design/methodology/approachA review of LIS articles included in the Library and Information Source (EBSCO) database that employ cluster analysis is performed. An overview of cluster analysis in general (how it works from a statistical standpoint, and how it can be performed by researchers), the most popular cluster analysis techniques and the uses of cluster analysis in LIS is presented.FindingsThe number of LIS studies that employ a cluster analytic approach has grown from about 5 per year in the early 2000s to an average of 35 studies per year in the mid- and late-2010s. The journal Scientometrics has the most articles published within LIS that use cluster analysis (102 studies). Scientometrics is the most common subject area to employ a cluster analytic approach (152 studies). The findings of this review indicate that cluster analysis could make LIS research more accessible by providing an innovative and insightful process of knowledge discovery.Originality/valueThis review is the first to present cluster analysis as an accessible data analysis approach, specifically from an LIS perspective.
Collapse
|
5
|
Sivakumaran S, Alsallakh MA, Lyons RA, Quint JK, Davies GA. Identifying COPD in routinely collected electronic health records: a systematic scoping review. ERJ Open Res 2021; 7:00167-2021. [PMID: 34527726 PMCID: PMC8435805 DOI: 10.1183/23120541.00167-2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 06/24/2021] [Indexed: 11/23/2022] Open
Abstract
Although routinely collected electronic health records (EHRs) are widely used to examine outcomes related to COPD, consensus regarding the identification of cases from electronic healthcare databases is lacking. We systematically examine and summarise approaches from the recent literature. MEDLINE via EBSCOhost was searched for COPD-related studies using EHRs published from January 1, 2018 to November 30, 2019. Data were extracted relating to the case definition of COPD and determination of COPD severity and phenotypes. From 185 eligible studies, we found widespread variation in the definitions used to identify people with COPD in terms of code sets used (with 20 different code sets in use based on the ICD-10 classification alone) and requirement of additional criteria (relating to age (n=139), medication (n=31), multiplicity of events (n=21), spirometry (n=19) and smoking status (n=9)). Only seven studies used a case definition which had been validated against a reference standard in the same dataset. Various proxies of disease severity were used since spirometry results and patient-reported outcomes were not often available. To enable the research community to draw reliable insights from EHRs and aid comparability between studies, clear reporting and greater consistency of the definitions used to identify COPD and related outcome measures is key.
Collapse
Affiliation(s)
| | | | - Ronan A. Lyons
- Population Data Science, Swansea University Medical School, Swansea, UK
| | - Jennifer K. Quint
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Gwyneth A. Davies
- Population Data Science, Swansea University Medical School, Swansea, UK
| |
Collapse
|
6
|
Webster AJ, Gaitskell K, Turnbull I, Cairns BJ, Clarke R. Characterisation, identification, clustering, and classification of disease. Sci Rep 2021; 11:5405. [PMID: 33686097 PMCID: PMC7940639 DOI: 10.1038/s41598-021-84860-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 02/17/2021] [Indexed: 12/25/2022] Open
Abstract
The importance of quantifying the distribution and determinants of multimorbidity has prompted novel data-driven classifications of disease. Applications have included improved statistical power and refined prognoses for a range of respiratory, infectious, autoimmune, and neurological diseases, with studies using molecular information, age of disease incidence, and sequences of disease onset ("disease trajectories") to classify disease clusters. Here we consider whether easily measured risk factors such as height and BMI can effectively characterise diseases in UK Biobank data, combining established statistical methods in new but rigorous ways to provide clinically relevant comparisons and clusters of disease. Over 400 common diseases were selected for analysis using clinical and epidemiological criteria, and conventional proportional hazards models were used to estimate associations with 12 established risk factors. Several diseases had strongly sex-dependent associations of disease risk with BMI. Importantly, a large proportion of diseases affecting both sexes could be identified by their risk factors, and equivalent diseases tended to cluster adjacently. These included 10 diseases presently classified as "Symptoms, signs, and abnormal clinical and laboratory findings, not elsewhere classified". Many clusters are associated with a shared, known pathogenesis, others suggest likely but presently unconfirmed causes. The specificity of associations and shared pathogenesis of many clustered diseases provide a new perspective on the interactions between biological pathways, risk factors, and patterns of disease such as multimorbidity.
Collapse
Affiliation(s)
- A J Webster
- Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | - K Gaitskell
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - I Turnbull
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - B J Cairns
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - R Clarke
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| |
Collapse
|
7
|
Puteikis K, Mameniškienė R, Jurevičienė E. Neurological and Psychiatric Comorbidities in Chronic Obstructive Pulmonary Disease. Int J Chron Obstruct Pulmon Dis 2021; 16:553-562. [PMID: 33688180 PMCID: PMC7937394 DOI: 10.2147/copd.s290363] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 01/25/2021] [Indexed: 12/22/2022] Open
Abstract
Background and Purpose Chronic obstructive pulmonary disease (COPD) is often accompanied by different neurological and psychiatric comorbidities. The purpose of this study was to examine which of them are the most frequent and to explore whether their manifestation can be explained by underlying latent variables. Methods Data about patients with COPD and their neurological and psychiatric comorbidities were extracted from an electronic database of the National Health Insurance Fund of Lithuania for the period between January 1, 2012, and June 30, 2014. Exploratory factor analysis (EFA) was used to investigate comorbidity patterns. Results A study sample of 4834 patients with COPD was obtained from the database, 3338 (69.1%) of who were male. The most frequent neurological and psychiatric comorbidities were nerve, nerve root and plexus disorders (n=1439, 29.8%), sleep disorders (n=666, 13.8%), transient ischemic attack (n=545, 11.3%), depression (n=364, 7.5%) and ischemic stroke (n=349, 7.2%). The prevalence of ischemic stroke, transient ischemic attack, Parkinson’s disease, dementia and sleep disorders increased with age. One latent variable outlined during EFA grouped neurological disorders, namely ischemic stroke, transient ischemic attack, epilepsy, dementia and Parkinson’s disease. The second encompassed depression, anxiety, somatoform and sleep disorders. While similar patterns emerged in data from male patients, no clear comorbidity profiles among women with COPD were obtained. Conclusion Our study provides novel insights into the neurological and psychiatric comorbidities in COPD by outlining an association among cerebrovascular, neurodegenerative disorders and epilepsy, and psychiatric and sleep disorders. Future studies could substantiate the discrete pathological mechanism that underlie these comorbidity groups.
Collapse
Affiliation(s)
| | | | - Elena Jurevičienė
- Vilnius University, Center for Pulmonology and Allergology, Vilnius, Lithuania
| |
Collapse
|
8
|
Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021; 9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open
Abstract
Background Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research. Objective This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions. Methods A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines. Results A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance. Conclusions Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Elliot Asher Martin
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Adam Giles D'Souza
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
| | - Cathy Eastwood
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Abdel Aziz Shaheen
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Christopher Naugler
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Joon Lee
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
9
|
Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data. Sci Rep 2020; 10:21340. [PMID: 33288774 PMCID: PMC7721729 DOI: 10.1038/s41598-020-77286-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 11/05/2020] [Indexed: 12/13/2022] Open
Abstract
As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving disease progression, and therefore have limited value for clinical decision making and development of novel therapies. Here we present a novel and data-driven approach to understand and characterize the real-world manifestation of HF by clustering disease and symptom-related clinical concepts (complaints) captured from unstructured electronic health record clinical notes. We used natural language processing to construct vectorized representations of patient complaints followed by clustering to group HF patients by similarity of complaint vectors. We then identified complaints that were significantly enriched within each cluster using statistical testing. Breaking the HF population into groups of similar patients revealed a clinically interpretable hierarchy of subgroups characterized by similar HF manifestation. Importantly, our methodology revealed well-known etiologies, risk factors, and comorbid conditions of HF (including ischemic heart disease, aortic valve disease, atrial fibrillation, congenital heart disease, various cardiomyopathies, obesity, hypertension, diabetes, and chronic kidney disease) and yielded additional insights into the details of each HF subgroup's clinical manifestation of HF. Our approach is entirely hypothesis free and can therefore be readily applied for discovery of novel insights in alternative diseases or patient populations.
Collapse
|
10
|
Nikolaou V, Massaro S, Fakhimi M, Stergioulas L, Price D. COPD phenotypes and machine learning cluster analysis: A systematic review and future research agenda. Respir Med 2020; 171:106093. [PMID: 32745966 DOI: 10.1016/j.rmed.2020.106093] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 12/21/2022]
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a highly heterogeneous condition projected to become the third leading cause of death worldwide by 2030. To better characterize this condition, clinicians have classified patients sharing certain symptomatic characteristics, such as symptom intensity and history of exacerbations, into distinct phenotypes. In recent years, the growing use of machine learning algorithms, and cluster analysis in particular, has promised to advance this classification through the integration of additional patient characteristics, including comorbidities, biomarkers, and genomic information. This combination would allow researchers to more reliably identify new COPD phenotypes, as well as better characterize existing ones, with the aim of improving diagnosis and developing novel treatments. Here, we systematically review the last decade of research progress, which uses cluster analysis to identify COPD phenotypes. Collectively, we provide a systematized account of the extant evidence, describe the strengths and weaknesses of the main methods used, identify gaps in the literature, and suggest recommendations for future research.
Collapse
Affiliation(s)
- Vasilis Nikolaou
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK.
| | - Sebastiano Massaro
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK; The Organizational Neuroscience Laboratory, London, WC1N 3AX, UK
| | - Masoud Fakhimi
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK
| | | | - David Price
- Observational and Pragmatic Research Institute, Singapore, Singapore; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
11
|
Gárate-Escamilla AK, Garza-Padilla E, Carvajal Rivera A, Salas-Castro C, Andrès E, Hajjam El Hassani A. Cluster Analysis: A New Approach for Identification of Underlying Risk Factors and Demographic Features of First Trimester Pregnancy Women. J Clin Med 2020; 9:E2247. [PMID: 32679845 PMCID: PMC7408845 DOI: 10.3390/jcm9072247] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/10/2020] [Accepted: 07/13/2020] [Indexed: 12/31/2022] Open
Abstract
Thyroid pathology is reported internationally in 5-10% of all pregnancies. The overall aim of this research was to determine the prevalence of hypothyroidism and risk factors during the first trimester screening in a Mexican patients sample. We included the records of 306 patients who attended a prenatal control consultation between January 2016 and December 2017 at the Women's Institute in Monterrey, Mexico. The studied sample had homogeneous demographic characteristics in terms of age, weight, height, BMI (body mass index) and number of pregnancies. The presence of at least one of the risk factors for thyroid disease was observed in 39.2% of the sample. Two and three clusters were identified, in which patients varied considerably among risk factors, symptoms and pregnancy complications. Compared to Cluster 0, one or more symptoms or signs of hypothyroidism occurred, while Cluster 1 was characterized by healthier patients. When three clusters were used, Cluster 2 had a higher TSH (thyroid stimulating hormone) value and pregnancy complications. There were no significant differences in perinatal variables. In addition, high TSH levels in first trimester pregnancy are characterized by pregnancy complications and decreased newborn weight. Our findings underline the high degree of disease heterogeneity with existing pregnant hypothyroid patients and the need to improve the phenotyping of the syndrome in the Mexican population.
Collapse
Affiliation(s)
| | - Edelmiro Garza-Padilla
- Monterrey Institute of Technology and Higher Education, Monterrey 64700, Mexico; (E.G.-P.); (A.C.R.); (C.S.-C.)
| | - Agustín Carvajal Rivera
- Monterrey Institute of Technology and Higher Education, Monterrey 64700, Mexico; (E.G.-P.); (A.C.R.); (C.S.-C.)
| | - Celina Salas-Castro
- Monterrey Institute of Technology and Higher Education, Monterrey 64700, Mexico; (E.G.-P.); (A.C.R.); (C.S.-C.)
| | - Emmanuel Andrès
- Service de Médecine Interne, Diabète et Maladies Métaboliques de la Clinique Médicale B, CHRU de Strasbourg, 67091 Strasbourg, France;
| | | |
Collapse
|
12
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 323] [Impact Index Per Article: 80.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Wang C, Chen X, Du L, Zhan Q, Yang T, Fang Z. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 188:105267. [PMID: 31841787 DOI: 10.1016/j.cmpb.2019.105267] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 11/19/2019] [Accepted: 12/08/2019] [Indexed: 05/05/2023]
Abstract
OBJECTIVES Identifying acute exacerbations in chronic obstructive pulmonary disease (AECOPDs) is of utmost importance for reducing the associated mortality and financial burden. In this research, the authors aimed to develop identification models for AECOPDs and to compare the relative performance of different modeling paradigms to find the best model for this task. METHODS Data were extracted from electronic medical records (EMRs) of patients with chronic obstructive pulmonary disease who admitted to the China-Japan Friendship Hospital between February 2011 and March 2017. Five machine learning algorithms (random forest, support vector machine, logistic regression, K-nearest neighbor and naïve Bayes) were used to develop the AECOPDs identification models. Feature selection was performed to find an optimal feature subset. 10-folds cross-validation was used to find the best hyperparameters for each model. The following metrics: area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value were used to evaluate the performance of these models. RESULTS A total of 303 EMRs (AECOPDs patients:135; None AECOPDs patients: 168) were included in the study. The SVM model obtained the best performance (sensitivity: 0.80, specificity: 0.83, positive predictive value:0.81, negative predictive value:0.85 and area under the receiver operating characteristic curve: 0.90) after performing feature selection. CONCLUSIONS Our research confirms that the proposed model based on the support vector machine is a powerful tool to identify AECOPDs patients, and it is promising to provide decision support for clinicians when they are struggling to give a confirmed clinical diagnosis.
Collapse
Affiliation(s)
- Chenshuo Wang
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Xianxiang Chen
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China
| | - Lidong Du
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China
| | | | - Ting Yang
- China-Japan Friendship Hospital, Beijing, China.
| | - Zhen Fang
- Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China; University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
14
|
Abstract
Medical care services can be organized into a network. Understanding the structure of this network cannot only help analyze common clinical protocols but can also help reveal previously unknown patterns of care. The objective of this research is to introduce the concept and methods for constructing and analyzing the network of medical care services. We start by demonstrating how to build the network itself and then develop algorithms, based on principal component analysis and social network analysis, to detect communities of services. Finally, we propose novel graphical techniques for representing and assessing patterns of care. We demonstrate the application of our algorithms using data from an Emergency Department in New York State. One of the implications of our research is that clinical experts could use our algorithms to detect deviations from either existing protocols of care or administrative norms.
Collapse
|
15
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
16
|
Pikoula M, Quint JK, Nissen F, Hemingway H, Smeeth L, Denaxas S. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC Med Inform Decis Mak 2019; 19:86. [PMID: 30999919 PMCID: PMC6472089 DOI: 10.1186/s12911-019-0805-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 03/27/2019] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND COPD is a highly heterogeneous disease composed of different phenotypes with different aetiological and prognostic profiles and current classification systems do not fully capture this heterogeneity. In this study we sought to discover, describe and validate COPD subtypes using cluster analysis on data derived from electronic health records. METHODS We applied two unsupervised learning algorithms (k-means and hierarchical clustering) in 30,961 current and former smokers diagnosed with COPD, using linked national structured electronic health records in England available through the CALIBER resource. We used 15 clinical features, including risk factors and comorbidities and performed dimensionality reduction using multiple correspondence analysis. We compared the association between cluster membership and COPD exacerbations and respiratory and cardiovascular death with 10,736 deaths recorded over 146,466 person-years of follow-up. We also implemented and tested a process to assign unseen patients into clusters using a decision tree classifier. RESULTS We identified and characterized five COPD patient clusters with distinct patient characteristics with respect to demographics, comorbidities, risk of death and exacerbations. The four subgroups were associated with 1) anxiety/depression; 2) severe airflow obstruction and frailty; 3) cardiovascular disease and diabetes and 4) obesity/atopy. A fifth cluster was associated with low prevalence of most comorbid conditions. CONCLUSIONS COPD patients can be sub-classified into groups with differing risk factors, comorbidities, and prognosis, based on data included in their primary care records. The identified clusters confirm findings of previous clustering studies and draw attention to anxiety and depression as important drivers of the disease in young, female patients.
Collapse
Affiliation(s)
- Maria Pikoula
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK.
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK.
| | - Jennifer Kathleen Quint
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK
- Respiratory Epidemiology, Occupational Medicine and Public Health, National Heart and Lung Institute, Imperial College London, London, UK
- EHR Research Group, School of Hygiene and Tropical Medicine, London, UK
| | - Francis Nissen
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK
- EHR Research Group, School of Hygiene and Tropical Medicine, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK
| | - Liam Smeeth
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK
- EHR Research Group, School of Hygiene and Tropical Medicine, London, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK London, University College London, 222 Euston Road, London, NW1 2DA, UK
| |
Collapse
|
17
|
Singh U, Wangia-Anderson V, Bernstein JA. Chronic Rhinitis Is a High-Risk Comorbidity for 30-Day Hospital Readmission of Patients with Asthma and Chronic Obstructive Pulmonary Disease. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY-IN PRACTICE 2018; 7:279-285.e6. [PMID: 30053594 DOI: 10.1016/j.jaip.2018.06.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 06/21/2018] [Accepted: 06/28/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Early hospital readmissions for asthma and chronic obstructive pulmonary disease (COPD), measured as hospital readmission within 30 days from the last discharge, is a major economic burden to our health care system. The association of this measure with comorbid chronic rhinitis (CR) has not been investigated before despite significant clinical association between CR and asthma or COPD. OBJECTIVE To investigate the association of CR with the risk of asthma or COPD-related early hospital readmission rates. METHODS This retrospective cohort study was performed using the asthma- and COPD-related hospital encounter and patient comorbidity data between June 15, 2012, and July 19, 2017, from a large hospital care system in Cincinnati, Ohio. Patients (any sex, race or socioeconomic status, and of all ages) with a primary discharge diagnosis of asthma (n = 4754 patients, 10,111 encounters) and COPD (n = 2176 patients, 4748 encounters) based on International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) codes were included. Relevant comorbidities, including comorbid allergic rhinitis (AR) or nonallergic rhinitis (NAR), in such patients were identified using ICD-10-CM codes. The association between 30-day asthma or COPD-related hospital readmission (1670 such encounters for asthma and 736 for COPD) and comorbid CR in the affected patients were determined using Cox proportional hazards models. Multivariate-adjusted hazard ratios (HRs), adjusted for relevant patient comorbidities, compared 30-day asthma- and COPD-related readmissions of patients with CR with those patients without a CR diagnosis. RESULTS Analysis was performed on 4754 patients with asthma and 2176 patients with COPD. The median follow-up period (+interquartile range) for asthma was 980 (+760) days and for COPD was 553 (+827) days. The HRs for 30-day asthma- or COPD-related readmission rates were significantly higher in patients with AR (HR = 4.4 [3.9, 5.0] and 2.4 [1.7, 3.2], respectively) or NAR (HR = 3.7 [2.9, 4.9] and 2.6 [1.8, 3.7], respectively) compared with patients without rhinitis. For asthma, both AR and NAR had higher HRs compared with all other comorbidities analyzed. For COPD, both AR and NAR had HRs to the magnitude as obesity and hypertension. CONCLUSIONS Comorbid CR is significantly associated with 30-day asthma- and COPD-related readmissions. These findings are useful for guiding health care professionals to focus on outpatient management of both the upper and lower respiratory tracts to reduce early readmission of patients with asthma and COPD.
Collapse
Affiliation(s)
- Umesh Singh
- Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio
| | - Victoria Wangia-Anderson
- Clin & Health Info Sci, University of Cincinnati College of Allied Health Sciences, Cincinnati, Ohio
| | - Jonathan A Bernstein
- Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio.
| |
Collapse
|