1
|
Pikoula M, Kallis C, Madjiheurem S, Quint JK, Bafadhel M, Denaxas S. Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity. PLoS One 2023; 18:e0287264. [PMID: 37319288 PMCID: PMC10270623 DOI: 10.1371/journal.pone.0287264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 06/01/2023] [Indexed: 06/17/2023] Open
Abstract
BACKGROUND The ever-growing size, breadth, and availability of patient data allows for a wide variety of clinical features to serve as inputs for phenotype discovery using cluster analysis. Data of mixed types in particular are not straightforward to combine into a single feature vector, and techniques used to address this can be biased towards certain data types in ways that are not immediately obvious or intended. In this context, the process of constructing clinically meaningful patient representations from complex datasets has not been systematically evaluated. AIMS Our aim was to a) outline and b) implement an analytical framework to evaluate distinct methods of constructing patient representations from routine electronic health record data for the purpose of measuring patient similarity. We applied the analysis on a patient cohort diagnosed with chronic obstructive pulmonary disease. METHODS Using data from the CALIBER data resource, we extracted clinically relevant features for a cohort of patients diagnosed with chronic obstructive pulmonary disease. We used four different data processing pipelines to construct lower dimensional patient representations from which we calculated patient similarity scores. We described the resulting representations, ranked the influence of each individual feature on patient similarity and evaluated the effect of different pipelines on clustering outcomes. Experts evaluated the resulting representations by rating the clinical relevance of similar patient suggestions with regard to a reference patient. RESULTS Each of the four pipelines resulted in similarity scores primarily driven by a unique set of features. It was demonstrated that data transformations according to each pipeline prior to clustering can result in a variation of clustering results of over 40%. The most appropriate pipeline was selected on the basis of feature ranking and clinical expertise. There was moderate agreement between clinicians as measured by Cohen's kappa coefficient. CONCLUSIONS Data transformation has downstream and unforeseen consequences in cluster analysis. Rather than viewing this process as a black box, we have shown ways to quantitatively and qualitatively evaluate and select the appropriate preprocessing pipeline.
Collapse
Affiliation(s)
- Maria Pikoula
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Constantinos Kallis
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Sephora Madjiheurem
- Department of Electronic and Electrical Engineering, University College London, London, United Kingdom
| | - Jennifer K. Quint
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Mona Bafadhel
- School of Immunology and Microbial Sciences, King’s College London, London, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
2
|
Unsupervised Learning Identifies Computed Tomographic Measurements as Primary Drivers of Progression, Exacerbation, and Mortality in Chronic Obstructive Pulmonary Disease. Ann Am Thorac Soc 2022; 19:1993-2002. [PMID: 35830591 DOI: 10.1513/annalsats.202110-1127oc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Rationale: Chronic obstructive pulmonary disease (COPD) is a heterogeneous syndrome with phenotypic manifestations that tend to be distributed along a continuum. Unsupervised machine learning based on broad selection of imaging and clinical phenotypes may be used to identify primary variables that define disease axes and stratify patients with COPD. Objectives: To identify primary variables driving COPD heterogeneity using principal component analysis and to define disease axes and assess the prognostic value of these axes across three outcomes: progression, exacerbation, and mortality. Methods: We included 7,331 patients between 39 and 85 years old, of whom 40.3% were Black and 45.8% were female smokers with a mean of 44.6 pack-years, from the COPDGene (Genetic Epidemiology of COPD) phase I cohort (2008-2011) in our analysis. Out of a total of 916 phenotypes, 147 continuous clinical, spirometric, and computed tomography (CT) features were selected. For each principal component (PC), we computed a PC score based on feature weights. We used PC score distributions to define disease axes along which we divided the patients into quartiles. To assess the prognostic value of these axes, we applied logistic regression analyses to estimate 5-year (n = 4,159) and 10-year (n = 1,487) odds of progression. Cox regression and Kaplan-Meier analyses were performed to estimate 5-year and 10-year risk of exacerbation (n = 6,532) and all-cause mortality (n = 7,331). Results: The first PC, accounting for 43.7% of variance, was defined by CT measures of air trapping and emphysema. The second PC, accounting for 13.7% of variance, was defined by spirometric and CT measures of vital capacity and lung volume. The third PC, accounting for 7.9% of the variance, was defined by CT measures of lung mass, airway thickening, and body habitus. Stratification of patients across each disease axis revealed up to 3.2-fold (95% confidence interval [CI] 2.4, 4.3) greater odds of 5-year progression, 5.4-fold (95% CI 4.6, 6.3) greater risk of 5-year exacerbation, and 5.0-fold (95% CI 4.2, 6.0) greater risk of 10-year mortality between the highest and lowest quartiles. Conclusions: Unsupervised learning analysis of the COPDGene cohort reveals that CT measurements may bolster patient stratification along the continuum of COPD phenotypes. Each of the disease axes also individually demonstrate prognostic potential, predictive of future forced expiratory volume in 1 second decline, exacerbation, and mortality.
Collapse
|
3
|
Usmani OS, Dhand R, Lavorini F, Price D. Why We Should Target Small Airways Disease in Our Management of Chronic Obstructive Pulmonary Disease. Mayo Clin Proc 2021; 96:2448-2463. [PMID: 34183115 DOI: 10.1016/j.mayocp.2021.03.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/12/2021] [Accepted: 03/16/2021] [Indexed: 12/23/2022]
Abstract
For more than 50 years, small airways disease has been considered a key feature of chronic obstructive pulmonary disease (COPD) and a major cause of airway obstruction. Both preventable and treatable, small airways disease has important clinical consequences if left unchecked. Small airways disease is associated with poor spirometry results, increased lung hyperinflation, and poor health status, making the small airways an important treatment target in COPD. The early detection of small airways disease remains the key barrier; if detected early, treatments designed to target small airways may help reduce symptoms and allow patients to maintain their activities. Studies are needed to evaluate the possible role of new drugs and novel drug formulations, inhalers, and inhalation devices for treating small airways disease. These developments will help to improve our management of small airways disease in patients with COPD.
Collapse
Affiliation(s)
- Omar S Usmani
- National Heart and Lung Institute, Imperial College London, and Royal Brompton Hospital, Airways Disease Section, London, UK.
| | - Rajiv Dhand
- Department of Medicine, University of Tennessee Graduate School of Medicine, Knoxville
| | - Federico Lavorini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - David Price
- Observational and Pragmatic Research Institute, Singapore; Optimum Patient Care, Cambridge, UK; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
4
|
Guan Z, Chen XG, Hay J, van Gerven J, Burggraaf J, de Kam M. Stability analysis of clustering of Norris' visual analogue scale: Applying the consensus clustering approach. Medicine (Baltimore) 2021; 100:e25363. [PMID: 33907093 PMCID: PMC8084085 DOI: 10.1097/md.0000000000025363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 01/25/2021] [Accepted: 03/11/2021] [Indexed: 11/19/2022] Open
Abstract
ABSTRACT Visual analogue scales are widely used to measure subjective responses. Norris' 16 visual analogue scales (N_VAS) measure subjective feelings of alertness and mood. Up to now, different scientists have clustered items of N_VAS into different ways and Bond and Lader's way has been the most frequently used in clinical research. However, there are concerns about the stability of this clustering over different subject samples and different drug classes. The aim of this study was to test whether Bond and Lader's clustering was stable in terms of subject samples and drug effects. Alternative clustering of N_VAS was tested.Data from studies with 3 types of drugs: cannabinoid receptor agonist (delta-9-tetrahydrocannabinol [THC]), muscarinic antagonist (scopolamine), and benzodiazepines (midazolam and lorazepam), collected between 2005 and 2012, were used for this analysis. Exploratory factor analysis (EFA) was used to test the clustering algorithm of Bond and Lader. Consensus clustering was performed to test the stability of clustering results over samples and over different drug types. Stability analysis was performed using a three-cluster assumption, and then on other alternative assumptions.Heat maps of the consensus matrix (CM) and density plots showed instability of the three-cluster hypothesis and suggested instability over the 3 drug classes. Two- and four-cluster hypothesis were also tested. Heat maps of the CM and density plots suggested that the two-cluster assumption was superior.In summary, the two-cluster assumption leads to a provably stable outcome over samples and the 3 drug types based on the data used.
Collapse
Affiliation(s)
- Zheng Guan
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | | | | | - Joop van Gerven
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | - Jacobus Burggraaf
- Centre for Human Drug Research
- Leiden University Medical Center, The Netherlands
| | | |
Collapse
|
5
|
Zahraei HN, Guissard F, Paulus V, Henket M, Donneau AF, Louis R. Comprehensive Cluster Analysis for COPD Including Systemic and Airway Inflammatory Markers. COPD 2020; 17:672-683. [PMID: 33092418 DOI: 10.1080/15412555.2020.1833853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Chronic obstructive pulmonary disease (COPD) is a complex, multidimensional and heterogeneous disease. The main purpose of the present study was to identify clinical phenotypes through cluster analysis in adults suffering from COPD. A retrospective study was conducted on 178 COPD patients in stable state recruited from ambulatory care at University hospital of Liege. All patients were above 40 years, had a smoking history of more than 20 pack years, post bronchodilator FEV1/FVC <70% and denied any history of asthma before 40 years. In this study, the patients were described by a total of 84 mixed sets of variables with some missing values. Hierarchical clustering on principal components (HCPC) was applied on multiple imputation. In the final step, patients were classified into homogeneous distinct groups by consensus clustering. Three different clusters, which shared similar smoking history were found. Cluster 1 included men with moderate airway obstruction (n = 67) while cluster 2 comprised men who were exacerbation-prone, with severe airflow limitation and intense granulocytic airway and neutrophilic systemic inflammation (n = 56). Cluster 3 essentially included women with moderate airway obstruction (n = 55). All clusters had a low rate of bacterial colonization (5%), a low median FeNO value (<20 ppb) and a very low sensitization rate toward common aeroallergens (0-5%). CAT score did not differ between clusters. Including markers of systemic airway inflammation and atopy and applying a comprehensive cluster analysis we provide here evidence for 3 clusters markedly shaped by sex, airway obstruction and neutrophilic inflammation but not by symptoms and T2 biomarkers.
Collapse
Affiliation(s)
- Halehsadat Nekoee Zahraei
- Biostatistics Unit, Department of Public Health, University of Liège, Liège, Belgium.,Department of Pneumology, GIGA, University of Liège, Liège, Belgium
| | | | - Virginie Paulus
- Department of Pneumology, GIGA, University of Liège, Liège, Belgium
| | - Monique Henket
- Department of Pneumology, GIGA, University of Liège, Liège, Belgium
| | | | - Renaud Louis
- Department of Pneumology, GIGA, University of Liège, Liège, Belgium
| |
Collapse
|
6
|
Nikolaou V, Massaro S, Fakhimi M, Stergioulas L, Price D. COPD phenotypes and machine learning cluster analysis: A systematic review and future research agenda. Respir Med 2020; 171:106093. [PMID: 32745966 DOI: 10.1016/j.rmed.2020.106093] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 12/21/2022]
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a highly heterogeneous condition projected to become the third leading cause of death worldwide by 2030. To better characterize this condition, clinicians have classified patients sharing certain symptomatic characteristics, such as symptom intensity and history of exacerbations, into distinct phenotypes. In recent years, the growing use of machine learning algorithms, and cluster analysis in particular, has promised to advance this classification through the integration of additional patient characteristics, including comorbidities, biomarkers, and genomic information. This combination would allow researchers to more reliably identify new COPD phenotypes, as well as better characterize existing ones, with the aim of improving diagnosis and developing novel treatments. Here, we systematically review the last decade of research progress, which uses cluster analysis to identify COPD phenotypes. Collectively, we provide a systematized account of the extant evidence, describe the strengths and weaknesses of the main methods used, identify gaps in the literature, and suggest recommendations for future research.
Collapse
Affiliation(s)
- Vasilis Nikolaou
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK.
| | - Sebastiano Massaro
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK; The Organizational Neuroscience Laboratory, London, WC1N 3AX, UK
| | - Masoud Fakhimi
- Surrey Business School, University of Surrey, Guildford, GU2 7HX, UK
| | | | - David Price
- Observational and Pragmatic Research Institute, Singapore, Singapore; Centre of Academic Primary Care, Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
7
|
Li L, Song Q, Yang X. Categorization of β-cell capacity in patients with obesity via OGTT using K-means clustering. Endocr Connect 2020; 9:135-143. [PMID: 31910150 PMCID: PMC6993255 DOI: 10.1530/ec-19-0476] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 01/07/2020] [Indexed: 01/21/2023]
Abstract
Insufficient insulin release plays a crucial role in the development of unhealthy status in patients with obesity; the present study aimed to classify these patients by the indices for insulin resistance and insulin release. After the indices from OGTT were assessed to achieve high differentiability and low redundancy in classifying patients, HOMA-IR and IGI30min were chosen to classify the patients using K-means clustering method. A total of 249 non-diabetic patients with obesity were classified into four groups. In Group 1, 19 patients were characteristic of high insulin resistance and high insulin release, as well as well-controlled glucose levels, the highest BMI, the youngest age, and the highest early phase release of insulin. In Group 2, 38 patients were unhealthiest in terms of high insulin resistance, reduced insulin release and IGT status. Group 3 consisted of 63 patients that were healthiest with low insulin resistance and high insulin release. In Group 4, 46 IGT patients and 14 IFG patients were identified among 129 patients that showed low insulin resistance, low insulin release, moderate obesity and older age. These concurrent impotent insulin release, older age, and moderate obesity indicated decreasing obesity with increasing age and reduced insulin release. The classification of patients with obesity using K-means clustering method by HOMA-IR and IGI30min provides more information about the development of obesity and unhealthy status. The patients with distinct insulin resistance and insulin release should be followed up, especially for those with reduced or even absent insulin response to glucose stimulation.
Collapse
Affiliation(s)
- Li Li
- Department of Endocrinology and Metabolism, Ningbo First Hospital, Ningbo, Zhejiang, China
| | - Qifa Song
- Department of Microbiology, Ningbo Municipal Centre for Disease Control and Prevention, Ningbo, Zhejiang, China
- Correspondence should be addressed to Q Song:
| | - Xi Yang
- Department of Endocrinology and Metabolism, Ningbo First Hospital, Ningbo, Zhejiang, China
| |
Collapse
|
8
|
A probabilistic framework for predicting disease dynamics: A case study of psychotic depression. J Biomed Inform 2019; 95:103232. [PMID: 31201965 DOI: 10.1016/j.jbi.2019.103232] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 04/30/2019] [Accepted: 06/11/2019] [Indexed: 11/23/2022]
Abstract
Unsupervised learning is often used to obtain insight into the underlying structure of medical data, but it is not always clear how to use such structure in an effective way. In this paper, we propose a probabilistic framework for predicting disease dynamics guided by latent states. The framework is based on hidden Markov models and aims to facilitate the selection of hypotheses that might yield insight into the dynamics. We demonstrate this by using clinical trial data for psychotic depression treatment as a case study. The discovered latent structure and proposed outcome are then validated using standard depression criteria, and are shown to provide new insight into the heterogeneity of psychotic depression in terms of predictive symptoms for different interventions.
Collapse
|
9
|
Bueno MLP, Hommersom A, Lucas PJF, Janzing J. A Data-Driven Exploration of Hypotheses on Disease Dynamics. Artif Intell Med 2019. [DOI: 10.1007/978-3-030-21642-9_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Haghighi B, Choi S, Choi J, Hoffman EA, Comellas AP, Newell JD, Graham Barr R, Bleecker E, Cooper CB, Couper D, Han ML, Hansel NN, Kanner RE, Kazerooni EA, Kleerup EAC, Martinez FJ, O'Neal W, Rennard SI, Woodruff PG, Lin CL. Imaging-based clusters in current smokers of the COPD cohort associate with clinical characteristics: the SubPopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS). Respir Res 2018; 19:178. [PMID: 30227877 PMCID: PMC6145340 DOI: 10.1186/s12931-018-0888-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 09/10/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Classification of COPD is usually based on the severity of airflow, which may not sensitively differentiate subpopulations. Using a multiscale imaging-based cluster analysis (MICA), we aim to identify subpopulations for current smokers with COPD. METHODS Among the SPIROMICS subjects, we analyzed computed tomography images at total lung capacity (TLC) and residual volume (RV) of 284 current smokers. Functional variables were derived from registration of TLC and RV images, e.g. functional small airways disease (fSAD%). Structural variables were assessed at TLC images, e.g. emphysema and airway wall thickness and diameter. We employed an unsupervised method for clustering. RESULTS Four clusters were identified. Cluster 1 had relatively normal airway structures; Cluster 2 had an increase of fSAD% and wall thickness; Cluster 3 exhibited a further increase of fSAD% but a decrease of wall thickness and airway diameter; Cluster 4 had a significant increase of fSAD% and emphysema. Clinically, Cluster 1 showed normal FEV1/FVC and low exacerbations. Cluster 4 showed relatively low FEV1/FVC and high exacerbations. While Cluster 2 and Cluster 3 showed similar exacerbations, Cluster 2 had the highest BMI among all clusters. CONCLUSIONS Association of imaging-based clusters with existing clinical metrics suggests the sensitivity of MICA in differentiating subpopulations.
Collapse
Affiliation(s)
- Babak Haghighi
- Department of Mechanical and Industrial Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA
- IIHR-Hydroscience & Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA
| | - Sanghun Choi
- Department of Mechanical Engineering, Kyungpook National University, Daegu, Republic of Korea
| | - Jiwoong Choi
- Department of Mechanical and Industrial Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA
- IIHR-Hydroscience & Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA
| | - Eric A Hoffman
- Department of Radiology, University of Iowa, Iowa City, Iowa, USA
| | | | - John D Newell
- Department of Radiology, University of Iowa, Iowa City, Iowa, USA
| | - R Graham Barr
- Department of Epidemiology, Mailman School of Public Health, Columbia University Medical School, New York, NY, USA
| | - Eugene Bleecker
- Division of Genetics, Genomics and Precision Medicine, Department of Medicine, University of Arizona, Tucson, AZ, USA
| | | | - David Couper
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Mei Lan Han
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Ella A Kazerooni
- Department of Radiology, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Wanda O'Neal
- School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Stephen I Rennard
- Department of Internal Medicine, University of Nebraska College of Medicine, NE, USA and Clinical Discovery Unit, AstraZeneca, Cambridge, UK
| | - Prescott G Woodruff
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Ching-Long Lin
- Department of Mechanical and Industrial Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA.
- IIHR-Hydroscience & Engineering, University of Iowa, 2406 Seamans Center for the Engineering Art and Science, Iowa City, Iowa, 52242, USA.
| |
Collapse
|
11
|
Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn 2018; 18:219-226. [PMID: 29431517 DOI: 10.1080/14737159.2018.1439380] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
INTRODUCTION For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.
Collapse
Affiliation(s)
- Anna Okula Basile
- a Department of Biochemistry and Molecular Biology , The Pennsylvania State University , State College , PA , USA
| | - Marylyn DeRiggi Ritchie
- a Department of Biochemistry and Molecular Biology , The Pennsylvania State University , State College , PA , USA.,b Department of Genetics , University of Pennsylvania, Perelman School of Medicine , Philadelphia , PA , USA
| |
Collapse
|
12
|
Castaldi PJ, Benet M, Petersen H, Rafaels N, Finigan J, Paoletti M, Marike Boezen H, Vonk JM, Bowler R, Pistolesi M, Puhan MA, Anto J, Wauters E, Lambrechts D, Janssens W, Bigazzi F, Camiciottoli G, Cho MH, Hersh CP, Barnes K, Rennard S, Boorgula MP, Dy J, Hansel NN, Crapo JD, Tesfaigzi Y, Agusti A, Silverman EK, Garcia-Aymerich J. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts. Thorax 2017. [PMID: 28637835 DOI: 10.1136/thoraxjnl-2016-209846] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
BACKGROUND COPD is a heterogeneous disease, but there is little consensus on specific definitions for COPD subtypes. Unsupervised clustering offers the promise of 'unbiased' data-driven assessment of COPD heterogeneity. Multiple groups have identified COPD subtypes using cluster analysis, but there has been no systematic assessment of the reproducibility of these subtypes. OBJECTIVE We performed clustering analyses across 10 cohorts in North America and Europe in order to assess the reproducibility of (1) correlation patterns of key COPD-related clinical characteristics and (2) clustering results. METHODS We studied 17 146 individuals with COPD using identical methods and common COPD-related characteristics across cohorts (FEV1, FEV1/FVC, FVC, body mass index, Modified Medical Research Council score, asthma and cardiovascular comorbid disease). Correlation patterns between these clinical characteristics were assessed by principal components analysis (PCA). Cluster analysis was performed using k-medoids and hierarchical clustering, and concordance of clustering solutions was quantified with normalised mutual information (NMI), a metric that ranges from 0 to 1 with higher values indicating greater concordance. RESULTS The reproducibility of COPD clustering subtypes across studies was modest (median NMI range 0.17-0.43). For methods that excluded individuals that did not clearly belong to any cluster, agreement was better but still suboptimal (median NMI range 0.32-0.60). Continuous representations of COPD clinical characteristics derived from PCA were much more consistent across studies. CONCLUSIONS Identical clustering analyses across multiple COPD cohorts showed modest reproducibility. COPD heterogeneity is better characterised by continuous disease traits coexisting in varying degrees within the same individual, rather than by mutually exclusive COPD subtypes.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Division of General Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Marta Benet
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Hans Petersen
- COPD Program, Lovelace Respiratory Research Institute, Albuquerque, New Mexico, USA
| | - Nicholas Rafaels
- Center for Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, Colorado, USA
| | - James Finigan
- Department of Medicine, National Jewish Health, Denver, Colorado, USA
| | - Matteo Paoletti
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - H Marike Boezen
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Judith M Vonk
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver, Colorado, USA
| | - Massimo Pistolesi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Milo A Puhan
- Epidemiology, Biostatistics & Prevention Institute, University of Zurich, Zurich, Switzerland
| | - Josep Anto
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
| | - Els Wauters
- Vesalius Research Center (VRC), VIB, Leuven, Belgium.,Laboratory for Translational Genetics, Department of Oncology, KU Leuven, Leuven, Belgium.,Respiratory Division, University Hospital Gasthuisberg, KU Leuven, Leuven, Belgium
| | - Diether Lambrechts
- Vesalius Research Center (VRC), VIB, Leuven, Belgium.,Laboratory for Translational Genetics, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Wim Janssens
- Respiratory Division, University Hospital Gasthuisberg, KU Leuven, Leuven, Belgium
| | - Francesca Bigazzi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Gianna Camiciottoli
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Kathleen Barnes
- Center for Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, Colorado, USA
| | - Stephen Rennard
- Division of Pulmonary and Critical Care Medicine, University of Nebraska Medical Center, Omaha, Nebraska, USA.,Clinical Discovery Unit, AstraZeneca, Cambridge, UK
| | - Meher Preethi Boorgula
- Center for Biomedical Informatics and Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, Colorado, USA
| | - Jennifer Dy
- Department of Computer Science, Northeastern University, Boston, Massachusetts, USA
| | - Nadia N Hansel
- Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.,Department of Environmental Health Sciences, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA
| | - James D Crapo
- Department of Medicine, National Jewish Health, Denver, Colorado, USA
| | - Yohannes Tesfaigzi
- COPD Program, Lovelace Respiratory Research Institute, Albuquerque, New Mexico, USA
| | - Alvar Agusti
- Respiratory Institute, Hospital Clinic, University of Barcelona, IDIBAPS and CIBERES, Barcelona, Spain
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.,Division of Pulmonary and Critical Care Medicine, University of Nebraska Medical Center, Omaha, Nebraska, USA
| | - Judith Garcia-Aymerich
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
13
|
Lim S, Tucker CS, Kumara S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J Biomed Inform 2016; 66:82-94. [PMID: 28034788 DOI: 10.1016/j.jbi.2016.12.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 12/03/2016] [Accepted: 12/14/2016] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In this study, a latent infectious disease is defined as a communicable disease that has not yet been formalized by national public health institutes and explicitly communicated to the general public. Most existing approaches to modeling infectious-disease-related knowledge discovery through social media networks are top-down approaches that are based on already known information, such as the names of diseases and their symptoms. In existing top-down approaches, necessary but unknown information, such as disease names and symptoms, is mostly unidentified in social media data until national public health institutes have formalized that disease. Most of the formalizing processes for latent infectious diseases are time consuming. Therefore, this study presents a bottom-up approach for latent infectious disease discovery in a given location without prior information, such as disease names and related symptoms. METHODS Social media messages with user and temporal information are extracted during the data preprocessing stage. An unsupervised sentiment analysis model is then presented. Users' expressions about symptoms, body parts, and pain locations are also identified from social media data. Then, symptom weighting vectors for each individual and time period are created, based on their sentiment and social media expressions. Finally, latent-infectious-disease-related information is retrieved from individuals' symptom weighting vectors. DATASETS AND RESULTS Twitter data from August 2012 to May 2013 are used to validate this study. Real electronic medical records for 104 individuals, who were diagnosed with influenza in the same period, are used to serve as ground truth validation. The results are promising, with the highest precision, recall, and F1 score values of 0.773, 0.680, and 0.724, respectively. CONCLUSION This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.
Collapse
Affiliation(s)
- Sunghoon Lim
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Conrad S Tucker
- School of Engineering Design, Technology, and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Soundar Kumara
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
14
|
Camiciottoli G, Bigazzi F, Magni C, Bonti V, Diciotti S, Bartolucci M, Mascalchi M, Pistolesi M. Prevalence of comorbidities according to predominant phenotype and severity of chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis 2016; 11:2229-2236. [PMID: 27695310 PMCID: PMC5028079 DOI: 10.2147/copd.s111724] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background In addition to lung involvement, several other diseases and syndromes coexist in patients with chronic obstructive pulmonary disease (COPD). Our purpose was to investigate the prevalence of idiopathic arterial hypertension (IAH), ischemic heart disease, heart failure, peripheral vascular disease (PVD), diabetes, osteoporosis, and anxious depressive syndrome in a clinical setting of COPD outpatients whose phenotypes (predominant airway disease and predominant emphysema) and severity (mild and severe diseases) were determined by clinical and functional parameters. Methods A total of 412 outpatients with COPD were assigned either a predominant airway disease or a predominant emphysema phenotype of mild or severe degree according to predictive models based on pulmonary functions (forced expiratory volume in 1 second/vital capacity; total lung capacity %; functional residual capacity %; and diffusing capacity of lung for carbon monoxide %) and sputum characteristics. Comorbidities were assessed by objective medical records. Results Eighty-four percent of patients suffered from at least one comorbidity and 75% from at least one cardiovascular comorbidity, with IAH and PVD being the most prevalent ones (62% and 28%, respectively). IAH prevailed significantly in predominant airway disease, osteoporosis prevailed significantly in predominant emphysema, and ischemic heart disease and PVD prevailed in mild COPD. All cardiovascular comorbidities prevailed significantly in predominant airway phenotype of COPD and mild COPD severity. Conclusion Specific comorbidities prevail in different phenotypes of COPD; this fact may be relevant to identify patients at risk for specific, phenotype-related comorbidities. The highest prevalence of comorbidities in patients with mild disease indicates that these patients should be investigated for coexisting diseases or syndromes even in the less severe, pauci-symptomatic stages of COPD. The simple method employed to phenotype and score COPD allows these results to be translated easily into daily clinical practice.
Collapse
Affiliation(s)
- Gianna Camiciottoli
- Section of Respiratory Medicine, Department of Clinical and Experimental Medicine; Department of Clinical and Experimental Biomedical Sciences, University of Florence, Florence
| | - Francesca Bigazzi
- Section of Respiratory Medicine, Department of Clinical and Experimental Medicine
| | - Chiara Magni
- Section of Respiratory Medicine, Department of Clinical and Experimental Medicine
| | - Viola Bonti
- Section of Respiratory Medicine, Department of Clinical and Experimental Medicine
| | - Stefano Diciotti
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi," University of Bologna, Cesena
| | | | - Mario Mascalchi
- Radiodiagnostic Section, Department of Clinical and Experimental Biomedical Sciences, University of Florence, Florence, Italy
| | - Massimo Pistolesi
- Section of Respiratory Medicine, Department of Clinical and Experimental Medicine
| |
Collapse
|
15
|
Liang Z, Liu L, Zhao H, Xia Y, Zhang W, Ye Y, Jiang M, Cai S. A Systemic Inflammatory Endotype of Asthma With More Severe Disease Identified by Unbiased Clustering of the Serum Cytokine Profile. Medicine (Baltimore) 2016; 95:e3774. [PMID: 27336865 PMCID: PMC4998303 DOI: 10.1097/md.0000000000003774] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Asthma is considered as a clinical and molecularly heterogeneous disorder. Systemic inflammation is suggested to play an important role in a group of asthma patients. We hypothesized that there is a subgroup of patients with asthma characterized by systemic inflammation. In this study, we aimed to discriminate asthma subtypes based on circulating biomarkers and to determine whether a systemic inflammatory endotype of asthma could be identified. In the present cross-sectional study, 50 patients with untreated asthma were prospectively recruited from a single academic outpatient clinic, and characterized with respect to clinical, functional, and inflammatory parameters. The expression profiles of 20 serum cytokines were assessed by anti-human cytokine antibody array. Then, hierarchical clustering analysis was performed based on principal component analysis (PCA)-transformed data to classify the clinical groups. PCA showed that 6 independent components accounted for 80.113% of the variance, and PCA-based hierarchical clustering identified 3 endotypes. One of the endotypes was evidenced by elevated systemic inflammation markers such as leptin, vascular endothelial growth factor (VEGF), and reduced levels of soluble receptor for advanced glycation end products (sRAGE), an anti-inflammatory molecule. More female patients were included, with higher circulating neutrophil counts and more severe symptoms. In conclusion, we identified an endotype of asthma characterized by systemic inflammation and severe symptoms. Increased levels of VEGF, leptin and decreased level of sRAGE may contribute to the systemic inflammation of this asthma endotype.
Collapse
Affiliation(s)
- Zhenyu Liang
- From the Department of Respiratory and Critical Care Medicine, Chronic Airways Diseases Laboratory, Nanfang Hospital, Southern Medical University (ZL, LL, HZ, YX, WZ, YY, SC); and State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Disease, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China (ZL, MJ)
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Fragoso E, André S, Boleo-Tomé JP, Areias V, Munhá J, Cardoso J. Understanding COPD: A vision on phenotypes, comorbidities and treatment approach. REVISTA PORTUGUESA DE PNEUMOLOGIA 2016; 22:101-11. [PMID: 26827246 DOI: 10.1016/j.rppnen.2015.12.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 11/27/2015] [Accepted: 12/02/2015] [Indexed: 01/31/2023] Open
Abstract
Chronic Obstructive Pulmonary Disease (COPD) phenotypes have become increasingly recognized as important for grouping patients with similar presentation and/or behavior, within the heterogeneity of the disease. The primary aim of identifying phenotypes is to provide patients with the best health care possible, tailoring the therapeutic approach to each patient. However, the identification of specific phenotypes has been hindered by several factors such as which specific attributes are relevant, which discriminant features should be used for assigning patients to specific phenotypes, and how relevant are they to the therapeutic approach, prognostic and clinical outcome. Moreover, the definition of phenotype is still not consensual. Comorbidities, risk factors, modifiable risk factors and disease severity, although not phenotypes, have impact across all COPD phenotypes. Although there are some identified phenotypes that are fairly consensual, many others have been proposed, but currently lack validation. The on-going debate about which instruments and tests should be used in the identification and definition of phenotypes has contributed to this uncertainty. In this paper, the authors review present knowledge regarding COPD phenotyping, discuss the role of phenotypes and comorbidities on the severity of COPD, propose new phenotypes and suggest a phenotype-based pharmacological therapeutic approach. The authors conclude that a patient-tailored treatment approach, which takes into account each patient's specific attributes and specificities, should be pursued.
Collapse
Affiliation(s)
- E Fragoso
- Pulmonology Department, Hospital de Santa Maria, Centro Hospitalar Lisboa Norte, EPE (CHLN), Lisbon, Portugal.
| | - S André
- Pulmonology Department, Hospital Egas Moniz, Centro Hospitalar de Lisboa Ocidental, EPE(CHLO), Lisbon, Portugal.
| | - J P Boleo-Tomé
- Pulmonology Department, Hospital Prof. Doutor Fernando da Fonseca, EPE, Amadora, Portugal.
| | - V Areias
- Pulmonology Department, Hospital de Faro, Centro Hospitalar do Algarve, EPE, Faro, Portugal; Department of Biomedical Sciences and Medicine, Algarve University, Portugal.
| | - J Munhá
- Pulmonology Department, Centro Hospitalar do Barlavento Algarvio, EPE, Portimão, Portugal.
| | - J Cardoso
- Pulmonology Department, Hospital de Santa Marta, Centro Hospitalar de Lisboa Central, EPE (CHLC), Lisbon, Portugal; Nova Medical School, Nova University, Lisbon, Portugal.
| | | |
Collapse
|
17
|
Liu J, Brodley CE, Healy BC, Chitnis T. Removing confounding factors via constraint-based clustering: An application to finding homogeneous groups of multiple sclerosis patients. Artif Intell Med 2015; 65:79-88. [PMID: 26253753 DOI: 10.1016/j.artmed.2015.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2015] [Revised: 06/06/2015] [Accepted: 06/26/2015] [Indexed: 11/24/2022]
Abstract
OBJECTIVES Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical datasets, age is often a confounding factor in tests designed to judge the severity of a patient's disease through measures of mobility, eyesight and hearing. In such cases, removing age from each instance will not remove its effect from the data as other features will be correlated with age. Motivated by the need to find homogeneous groups of multiple sclerosis (MS) patients, we apply our approach to remove physician subjectivity from patient data. METHODS We present a method based on constraint-based clustering to remove the impact of such confounding factors. Given knowledge about which feature (or set of features) is a confounding factor, call it F. Our method first partitions the data into b bins: if F is categorical, instances from the same category construct one bin; if F is numeric, then we split bins such that each bin contains instances of similar F value. Thus each instance is assigned to a single bin for factor F. We then remove feature F from each instance for the remaining steps. Next, we cluster the data separately in each bin. Using these clustering results, we generate pair-wise constraints and then run a constraint-based clustering algorithm to produce a final grouping. RESULTS In a series of experiments with synthetic datasets, we compare our proposed methods to detrending when one has numeric confounding factors. We apply our method to the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Womens Hospital dataset, and find a novel grouping of patients that can help uncover the factors that impact disease progression in MS. CONCLUSIONS Our method groups data removing the effect of confounding factors without making any assumptions about the form of the influence of these factors on the other features. We identified clusters of MS patients that have clinically recognizable differences. Because patients more likely to progress are found using this approach, our results have the potential to aid physicians in tailoring treatment decisions for MS patients.
Collapse
Affiliation(s)
- Jingjing Liu
- Department of Computer Science, Tufts University, 161 College Avenue, Medford, MA 02155, USA.
| | - Carla E Brodley
- College of Computer and Information Science, Northeastern University, 440 Huntington Avenue, 202 West Village H, Boston, MA 02115, USA.
| | - Brian C Healy
- Biostatistics Center, Massachusetts General Hospital, Boston, MA 02114, USA.
| | - Tanuja Chitnis
- Partners Multiple Sclerosis Center, Brigham and Women's Hospital, Brookline, MA 02115, USA.
| |
Collapse
|
18
|
Pinto LM, Alghamdi M, Benedetti A, Zaihra T, Landry T, Bourbeau J. Derivation and validation of clinical phenotypes for COPD: a systematic review. Respir Res 2015; 16:50. [PMID: 25928208 PMCID: PMC4460884 DOI: 10.1186/s12931-015-0208-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 03/19/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The traditional classification of COPD, which relies solely on spirometry, fails to account for the complexity and heterogeneity of the disease. Phenotyping is a method that attempts to derive a single or combination of disease attributes that are associated with clinically meaningful outcomes. Deriving phenotypes entails the use of cluster analyses, and helps individualize patient management by identifying groups of individuals with similar characteristics. We aimed to systematically review the literature for studies that had derived such phenotypes using unsupervised methods. METHODS Two independent reviewers systematically searched multiple databases for studies that performed validated statistical analyses, free of definitive pre-determined hypotheses, to derive phenotypes among patients with COPD. Data were extracted independently. RESULTS 9156 citations were retrieved, of which, 8 studies were included. The number of subjects ranged from 213 to 1543. Most studies appeared to be biased: patients were more likely males, with severe disease, and recruited in tertiary care settings. Statistical methods used to derive phenotypes varied by study. The number of phenotypes identified ranged from 2 to 5. Two phenotypes, with poor longitudinal health outcomes, were common across multiple studies: young patients with severe respiratory disease, few cardiovascular co-morbidities, poor nutritional status and poor health status, and a phenotype of older patients with moderate respiratory disease, obesity, cardiovascular and metabolic co-morbidities. CONCLUSIONS The recognition that two phenotypes of COPD were often reported may have clinical implications for altering the course of the disease. This review also provided important information on limitations of phenotype studies in COPD and the need for improvement in future studies.
Collapse
Affiliation(s)
- Lancelot M Pinto
- Respiratory Division, McGill University Health Centre, Montreal, Quebec, Canada. .,Respiratory Epidemiology and Clinical Research Unit, Montreal Chest Institute, McGill University Health Centre, Montreal, Quebec, Canada.
| | - Majed Alghamdi
- Respiratory Division, McGill University Health Centre, Montreal, Quebec, Canada. .,Respiratory Epidemiology and Clinical Research Unit, Montreal Chest Institute, McGill University Health Centre, Montreal, Quebec, Canada.
| | - Andrea Benedetti
- Respiratory Epidemiology and Clinical Research Unit, Montreal Chest Institute, McGill University Health Centre, Montreal, Quebec, Canada. .,Department of Epidemiology, Biostatistics & Occupational Health, Montreal, Quebec, Canada.
| | - Tasneem Zaihra
- School of PH & OT, Faculty of Medicine, McGill University, Quebec, Canada. .,Division of Clinical Epidemiology, McGill University Health Centre, Montreal, Quebec, Canada.
| | - Tara Landry
- Medical Library, Montreal General Hospital, McGill University Health Centre, Montreal, Quebec, Canada.
| | - Jean Bourbeau
- Respiratory Division, McGill University Health Centre, Montreal, Quebec, Canada. .,Respiratory Epidemiology and Clinical Research Unit, Montreal Chest Institute, McGill University Health Centre, Montreal, Quebec, Canada. .,Montreal Chest Institute, McGill University Health Centre, 3650 St.Urbain, Room K1.32, H2X 2P4, Montréal (Québec), Canada.
| |
Collapse
|
19
|
Castaldi PJ, Dy J, Ross J, Chang Y, Washko GR, Curran-Everett D, Williams A, Lynch DA, Make BJ, Crapo JD, Bowler RP, Regan EA, Hokanson JE, Kinney GL, Han MK, Soler X, Ramsdell JW, Barr RG, Foreman M, van Beek E, Casaburi R, Criner GJ, Lutz SM, Rennard SI, Santorico S, Sciurba FC, DeMeo DL, Hersh CP, Silverman EK, Cho MH. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 2014; 69:415-22. [PMID: 24563194 DOI: 10.1136/thoraxjnl-2013-203601] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
BACKGROUND There is notable heterogeneity in the clinical presentation of patients with COPD. To characterise this heterogeneity, we sought to identify subgroups of smokers by applying cluster analysis to data from the COPDGene study. METHODS We applied a clustering method, k-means, to data from 10 192 smokers in the COPDGene study. After splitting the sample into a training and validation set, we evaluated three sets of input features across a range of k (user-specified number of clusters). Stable solutions were tested for association with four COPD-related measures and five genetic variants previously associated with COPD at genome-wide significance. The results were confirmed in the validation set. FINDINGS We identified four clusters that can be characterised as (1) relatively resistant smokers (ie, no/mild obstruction and minimal emphysema despite heavy smoking), (2) mild upper zone emphysema-predominant, (3) airway disease-predominant and (4) severe emphysema. All clusters are strongly associated with COPD-related clinical characteristics, including exacerbations and dyspnoea (p<0.001). We found strong genetic associations between the mild upper zone emphysema group and rs1980057 near HHIP, and between the severe emphysema group and rs8034191 in the chromosome 15q region (p<0.001). All significant associations were replicated at p<0.05 in the validation sample (12/12 associations with clinical measures and 2/2 genetic associations). INTERPRETATION Cluster analysis identifies four subgroups of smokers that show robust associations with clinical characteristics of COPD and known COPD-associated genetic variants.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, , Boston, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Identification of clinical phenotypes using cluster analyses in COPD patients with multiple comorbidities. BIOMED RESEARCH INTERNATIONAL 2014; 2014:420134. [PMID: 24683548 PMCID: PMC3934315 DOI: 10.1155/2014/420134] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/10/2013] [Accepted: 01/02/2014] [Indexed: 11/17/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is characterized by persistent airflow limitation, the severity of which is assessed using forced expiratory volume in 1 sec (FEV1, % predicted). Cohort studies have confirmed that COPD patients with similar levels of airflow limitation showed marked heterogeneity in clinical manifestations and outcomes. Chronic coexisting diseases, also called comorbidities, are highly prevalent in COPD patients and likely contribute to this heterogeneity. In recent years, investigators have used innovative statistical methods (e.g., cluster analyses) to examine the hypothesis that subgroups of COPD patients sharing clinically relevant characteristics (phenotypes) can be identified. The objectives of the present paper are to review recent studies that have used cluster analyses for defining phenotypes in observational cohorts of COPD patients. Strengths and weaknesses of these statistical approaches are briefly described. Description of the phenotypes that were reasonably reproducible across studies and received prospective validation in at least one study is provided, with a special focus on differences in age and comorbidities (including cardiovascular diseases). Finally, gaps in current knowledge are described, leading to proposals for future studies.
Collapse
|
21
|
Bon J, Liao S, Tseng G, Sciurba FC. Considerations and pitfalls in phenotyping and reclassification of chronic obstructive pulmonary disease. Transl Res 2013; 162:252-7. [PMID: 23920431 DOI: 10.1016/j.trsl.2013.07.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 06/24/2013] [Accepted: 07/11/2013] [Indexed: 10/26/2022]
Abstract
As the clinical and research focus of chronic obstructive pulmonary disease (COPD) evolves from regarding obstructive lung disease as a single disease entity to recognizing the complexity of disease expression, the importance of COPD phenotyping rises to the forefront. The reclassification of COPD holds both prognostic and therapeutic implications but does not come without issues that may complicate classification efforts. In this review, we discuss the significance of refining the definition of the term phenotype, consider the impact of variations in cohort severity and attribute mix, account for the contrast of longitudinal vs cross-sectional cohort analysis, recognize the differing criteria used to define disease traits along with the nuances of combining cohorts, and identify the interaction of covariates as we advance in the field of COPD phenotyping.
Collapse
Affiliation(s)
- Jessica Bon
- Division of Pulmonary Allergy and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA
| | | | | | | |
Collapse
|
22
|
Sánchez Morillo D, Astorga Moreno S, Fernández Granero MÁ, León Jiménez A. Computerized analysis of respiratory sounds during COPD exacerbations. Comput Biol Med 2013; 43:914-21. [PMID: 23746734 DOI: 10.1016/j.compbiomed.2013.03.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2012] [Revised: 03/26/2013] [Accepted: 03/27/2013] [Indexed: 10/27/2022]
Abstract
Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is a major event in the natural course of the disease, and is associated with significant mortality and socioeconomic impact. Abnormal respiratory sounds are commonly present in patients with AECOPD. Computerized analysis of these sounds can assist in diagnosis and in evaluation during follow-up. Exploratory data analysis methods were applied to respiratory sounds in these patients when they were hospitalized because of exacerbation. Two different patterns of presentation and evolution of respiratory sounds in AECOPD were found and described from the method of computerized respiratory sound analysis and unsupervised clustering that was devised. Based on the findings of the study, remote monitoring of respiratory sounds may be useful for the detection and/or follow-up of COPD exacerbation.
Collapse
Affiliation(s)
- Daniel Sánchez Morillo
- Biomedical Engineering and Telemedicine Researching Group, University of Cádiz, Cádiz 11003, Spain.
| | | | | | | |
Collapse
|
23
|
Maslove DM, Podchiyska T, Lowe HJ. Discretization of continuous features in clinical datasets. J Am Med Inform Assoc 2012; 20:544-53. [PMID: 23059731 DOI: 10.1136/amiajnl-2012-000929] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The increasing availability of clinical data from electronic medical records (EMRs) has created opportunities for secondary uses of health information. When used in machine learning classification, many data features must first be transformed by discretization. OBJECTIVE To evaluate six discretization strategies, both supervised and unsupervised, using EMR data. MATERIALS AND METHODS We classified laboratory data (arterial blood gas (ABG) measurements) and physiologic data (cardiac output (CO) measurements) derived from adult patients in the intensive care unit using decision trees and naïve Bayes classifiers. Continuous features were partitioned using two supervised, and four unsupervised discretization strategies. The resulting classification accuracy was compared with that obtained with the original, continuous data. RESULTS Supervised methods were more accurate and consistent than unsupervised, but tended to produce larger decision trees. Among the unsupervised methods, equal frequency and k-means performed well overall, while equal width was significantly less accurate. DISCUSSION This is, we believe, the first dedicated evaluation of discretization strategies using EMR data. It is unlikely that any one discretization method applies universally to EMR data. Performance was influenced by the choice of class labels and, in the case of unsupervised methods, the number of intervals. In selecting the number of intervals there is generally a trade-off between greater accuracy and greater consistency. CONCLUSIONS In general, supervised methods yield higher accuracy, but are constrained to a single specific application. Unsupervised methods do not require class labels and can produce discretized data that can be used for multiple purposes.
Collapse
Affiliation(s)
- David M Maslove
- Center for Clinical Informatics, Stanford University School of Medicine, Stanford, CA94305, USA.
| | | | | |
Collapse
|
24
|
Chen H, Wang X. Significance of bioinformatics in research of chronic obstructive pulmonary disease. J Clin Bioinforma 2011; 1:35. [PMID: 22185624 PMCID: PMC3285039 DOI: 10.1186/2043-9113-1-35] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2010] [Accepted: 12/20/2011] [Indexed: 01/06/2023] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is an inflammatory disease characterized by the progressive deterioration of pulmonary function and increasing airway obstruction, with high morality all over the world. The advent of high-throughput omics techniques provided an opportunity to gain insights into disease pathogenesis and process which contribute to the heterogeneity, and find target-specific and disease-specific therapies. As an interdispline, bioinformatics supplied vital information on integrative understanding of COPD. This review focused on application of bioinformatics in COPD study, including biomarkers searching and systems biology. We also presented the requirements and challenges in implementing bioinformatics to COPD research and interpreted these results as clinical physicians.
Collapse
Affiliation(s)
- Hong Chen
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University, Shanghai, China.
| | | |
Collapse
|
25
|
Kaminsky DA, Irvin CG, Sterk PJ. Complex systems in pulmonary medicine: a systems biology approach to lung disease. J Appl Physiol (1985) 2010; 110:1716-22. [PMID: 21183622 DOI: 10.1152/japplphysiol.01310.2010] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The lung is a highly complex organ that can only be understood by integrating the many aspects of its structure and function into a comprehensive view. Such a view is provided by a systems biology approach, whereby the many layers of complexity, from the molecular genetic, to the cellular, to the tissue, to the whole organ, and finally to the whole body, are synthesized into a working model of understanding. The systems biology approach therefore relies on the expertise of many disciplines, including genomics, proteomics, metabolomics, physiomics, and, ultimately, clinical medicine. The overall structure and functioning of the lung cannot be predicted from studying any one of these systems in isolation, and so this approach highlights the importance of emergence as the fundamental feature of systems biology. In this paper, we will provide an overview of a systems biology approach to lung disease by briefly reviewing the advances made at many of these levels, with special emphasis on recent work done in the realm of pulmonary physiology and the analysis of clinical phenotypes.
Collapse
Affiliation(s)
- David A Kaminsky
- Pulmonary and Critical Care Medicine, Given D-213, 89 Beaumont Ave., Burlington, VT 05405, USA.
| | | | | |
Collapse
|
26
|
Han MK, Agusti A, Calverley PM, Celli BR, Criner G, Curtis JL, Fabbri LM, Goldin JG, Jones PW, Macnee W, Make BJ, Rabe KF, Rennard SI, Sciurba FC, Silverman EK, Vestbo J, Washko GR, Wouters EFM, Martinez FJ. Chronic obstructive pulmonary disease phenotypes: the future of COPD. Am J Respir Crit Care Med 2010; 182:598-604. [PMID: 20522794 DOI: 10.1164/rccm.200912-1843cc] [Citation(s) in RCA: 689] [Impact Index Per Article: 49.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Significant heterogeneity of clinical presentation and disease progression exists within chronic obstructive pulmonary disease (COPD). Although FEV(1) inadequately describes this heterogeneity, a clear alternative has not emerged. The goal of phenotyping is to identify patient groups with unique prognostic or therapeutic characteristics, but significant variation and confusion surrounds use of the term "phenotype" in COPD. Phenotype classically refers to any observable characteristic of an organism, and up until now, multiple disease characteristics have been termed COPD phenotypes. We, however, propose the following variation on this definition: "a single or combination of disease attributes that describe differences between individuals with COPD as they relate to clinically meaningful outcomes (symptoms, exacerbations, response to therapy, rate of disease progression, or death)." This more focused definition allows for classification of patients into distinct prognostic and therapeutic subgroups for both clinical and research purposes. Ideally, individuals sharing a unique phenotype would also ultimately be determined to have a similar underlying biologic or physiologic mechanism(s) to guide the development of therapy where possible. It follows that any proposed phenotype, whether defined by symptoms, radiography, physiology, or cellular or molecular fingerprint will require an iterative validation process in which "candidate" phenotypes are identified before their relevance to clinical outcome is determined. Although this schema represents an ideal construct, we acknowledge any phenotype may be etiologically heterogeneous and that any one individual may manifest multiple phenotypes. We have much yet to learn, but establishing a common language for future research will facilitate our understanding and management of the complexity implicit to this disease.
Collapse
Affiliation(s)
- MeiLan K Han
- University of Michigan-Pulmonary and Critical Care, 1500 E. Medical Center Drive, 3916 Taubman, Ann Arbor, MI 48109, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA, Martinez FJ, Laird N, Reilly JJ, Silverman EK. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respir Res 2010; 11:30. [PMID: 20233420 PMCID: PMC2850331 DOI: 10.1186/1465-9921-11-30] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Accepted: 03/16/2010] [Indexed: 11/10/2022] Open
Abstract
Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT) Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1) post-bronchodilator FEV1 percent predicted, 2) percent bronchodilator responsiveness, and quantitative CT measurements of 3) apical emphysema and 4) airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1) emphysema predominant, 2) bronchodilator responsive, with higher FEV1; 3) discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4) airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant) was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.
Collapse
Affiliation(s)
- Michael H Cho
- Channing Laboratory, Brigham & Women's Hospital, Boston, MA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|