1
|
Lu Z, Chandra NK. A sparse factor model for clustering high-dimensional longitudinal data. Stat Med 2024; 43:3633-3648. [PMID: 38885953 DOI: 10.1002/sim.10151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 04/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024]
Abstract
Recent advances in engineering technologies have enabled the collection of a large number of longitudinal features. This wealth of information presents unique opportunities for researchers to investigate the complex nature of diseases and uncover underlying disease mechanisms. However, analyzing such kind of data can be difficult due to its high dimensionality, heterogeneity and computational challenges. In this article, we propose a Bayesian nonparametric mixture model for clustering high-dimensional mixed-type (eg, continuous, discrete and categorical) longitudinal features. We employ a sparse factor model on the joint distribution of random effects and the key idea is to induce clustering at the latent factor level instead of the original data to escape the curse of dimensionality. The number of clusters is estimated through a Dirichlet process prior. An efficient Gibbs sampler is developed to estimate the posterior distribution of the model parameters. Analysis of real and simulated data is presented and discussed. Our study demonstrates that the proposed model serves as a useful analytical tool for clustering high-dimensional longitudinal data.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
- Department of Mathematics and Statistics, Queen's University, Kingston, Ontario, Canada
| | - Noirrit Kiran Chandra
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
2
|
Lu Z, Ahmadiankalati M, Tan Z. Joint clustering multiple longitudinal features: A comparison of methods and software packages with practical guidance. Stat Med 2023; 42:5513-5540. [PMID: 37789706 DOI: 10.1002/sim.9917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 06/07/2023] [Accepted: 09/13/2023] [Indexed: 10/05/2023]
Abstract
Clustering longitudinal features is a common goal in medical studies to identify distinct disease developmental trajectories. Compared to clustering a single longitudinal feature, integrating multiple longitudinal features allows additional information to be incorporated into the clustering process, which may reveal co-existing longitudinal patterns and generate deeper biological insight. Despite its increasing importance and popularity, there is limited practical guidance for implementing cluster analysis approaches for multiple longitudinal features and evaluating their comparative performance in medical datasets. In this paper, we provide an overview of several commonly used approaches to clustering multiple longitudinal features, with an emphasis on application and implementation through R software. These methods can be broadly categorized into two categories, namely model-based (including frequentist and Bayesian) approaches and algorithm-based approaches. To evaluate their performance, we compare these approaches using real-life and simulated datasets. These results provide practical guidance to applied researchers who are interested in applying these approaches for clustering multiple longitudinal features. Recommendations for applied researchers and suggestions for future research in this area are also discussed.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
- Department of Mathematics and Statistics, Queen's University, Kingston, Ontario, Canada
| | | | - Zhiwen Tan
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
3
|
Poulakis K, Pereira JB, Muehlboeck JS, Wahlund LO, Smedby Ö, Volpe G, Masters CL, Ames D, Niimi Y, Iwatsubo T, Ferreira D, Westman E. Multi-cohort and longitudinal Bayesian clustering study of stage and subtype in Alzheimer's disease. Nat Commun 2022; 13:4566. [PMID: 35931678 PMCID: PMC9355993 DOI: 10.1038/s41467-022-32202-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/18/2022] [Indexed: 11/08/2022] Open
Abstract
Understanding Alzheimer's disease (AD) heterogeneity is important for understanding the underlying pathophysiological mechanisms of AD. However, AD atrophy subtypes may reflect different disease stages or biologically distinct subtypes. Here we use longitudinal magnetic resonance imaging data (891 participants with AD dementia, 305 healthy control participants) from four international cohorts, and longitudinal clustering to estimate differential atrophy trajectories from the age of clinical disease onset. Our findings (in amyloid-β positive AD patients) show five distinct longitudinal patterns of atrophy with different demographical and cognitive characteristics. Some previously reported atrophy subtypes may reflect disease stages rather than distinct subtypes. The heterogeneity in atrophy rates and cognitive decline within the five longitudinal atrophy patterns, potentially expresses a complex combination of protective/risk factors and concomitant non-AD pathologies. By alternating between the cross-sectional and longitudinal understanding of AD subtypes these analyses may allow better understanding of disease heterogeneity.
Collapse
Affiliation(s)
- Konstantinos Poulakis
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden.
| | - Joana B Pereira
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Clinical Memory Research Unit, Department of Clinical Sciences, Lund University, Malmo, Sweden
| | - J-Sebastian Muehlboeck
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Lars-Olof Wahlund
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Örjan Smedby
- Department of Biomedical Engineering and Health Systems (MTH), KTH Royal Institute of Technology, Stockholm, Sweden
| | - Giovanni Volpe
- Department of Physics, University of Gothenburg, Gothenburg, Sweden
| | - Colin L Masters
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Victoria, Australia
| | - David Ames
- Academic Unit for Psychiatry of Old Age, St George's Hospital, University of Melbourne, Melbourne, Victoria, Australia
- National Ageing Research Institute, Parkville, Victoria, Australia
| | - Yoshiki Niimi
- Unit for Early and Exploratory Clinical Development, The University of Tokyo Hospital, Tokyo, Japan
| | - Takeshi Iwatsubo
- Unit for Early and Exploratory Clinical Development, The University of Tokyo Hospital, Tokyo, Japan
| | - Daniel Ferreira
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - Eric Westman
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| |
Collapse
|
4
|
Vávra J, Komárek A. Classification based on multivariate mixed type longitudinal data with an application to the EU-SILC database. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00504-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5
|
Yang L, Wu TT. Model‐based clustering of high‐dimensional longitudinal data via regularization. Biometrics 2022. [DOI: 10.1111/biom.13672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 03/28/2022] [Indexed: 12/01/2022]
Affiliation(s)
- Luoying Yang
- Department of Biostatistics and Computational Biology University of Rochester Medical Center Rochester NY U.S.A
| | - Tong Tong Wu
- Department of Biostatistics and Computational Biology University of Rochester Medical Center Rochester NY U.S.A
| |
Collapse
|
6
|
Dharmaratne ADVTT, Dini S, O’Flaherty K, Price DJ, Beeson J, McGready R, Nosten F, Fowkes FJI, Simpson JA, Zaloumis SG. Quantification of the dynamics of antibody response to malaria to inform sero-surveillance in pregnant women. Malar J 2022; 21:75. [PMID: 35248084 PMCID: PMC8897879 DOI: 10.1186/s12936-022-04111-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 02/28/2022] [Indexed: 12/02/2022] Open
Abstract
Background Malaria remains a major public health threat and tools sensitive to detect infections in low malaria transmission areas are needed to progress elimination efforts. Pregnant women are particularly vulnerable to malaria infections. Throughout pregnancy they access routine antenatal care, presenting a unique sentinel population to apply novel sero-surveillance tools to measure malaria transmission. The aim of this study was to quantify the dynamic antibody responses to multiple antigens during pregnancy so as to identify a single or multiple antibody response of exposure to malaria in pregnancy. Methods This study involved a secondary analysis of antibody responses to six parasite antigens [five commonly studied merozoite antigens and the variant surface antigen 2-chondroitin sulphate A (VAR2CSA), a pregnancy-specific erythrocytic antigen] measured by enzyme-linked immunosorbent assay (ELISA) over the gestation period until delivery (median of 7 measurements/woman) in 250 pregnant women who attended antenatal clinics located at the Thai-Myanmar border. A multivariate mixture linear mixed model was used to cluster the pregnant women into groups that have similar longitudinal antibody responses to all six antigens over the gestational period using a Bayesian approach. The variable-specific entropy was calculated to identify the antibody responses that have the highest influence on the classification of the women into clusters, and subsequent agreement with grouping of women based on exposure to malaria during pregnancy. Results Of the 250 pregnant women, 135 had a Plasmodium infection detected by light microscopy during pregnancy (39% Plasmodium falciparum only, 33% Plasmodium vivax only and 28% mixed/other species), defined as cases. The antibody responses to all six antigens accurately identified the women who did not have a malaria infection detected during pregnancy (93%, 107/115 controls). Antibody responses to P. falciparum merozoite surface protein 3 (PfMSP3) and P. vivax apical membrane antigen 1 (PvAMA1) were the least dynamic. Antibody responses to the antigens P. falciparum apical membrane antigen 1 (PfAMA1) and PfVAR2CSA were able to identify the majority of the cases more accurately (63%, 85/135). Conclusion These findings suggest that the combination of antibodies, PfAMA1 and PfVAR2CSA, may be useful for sero-surveillance of malaria infections in pregnant women, particularly in low malaria transmission settings. Further investigation of other antibody markers is warranted considering these antibodies combined only detected 63% of the malaria infections during pregnancy. Supplementary Information The online version contains supplementary material available at 10.1186/s12936-022-04111-y.
Collapse
|
7
|
Identifying Clusters of Adolescents Based on Their Daily-Life Social Withdrawal Experience. J Youth Adolesc 2022; 51:915-926. [PMID: 35066708 DOI: 10.1007/s10964-021-01558-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 12/10/2021] [Indexed: 11/27/2022]
Abstract
Social withdrawal is often presented as overall negative, with a focus on loneliness and peer exclusion. However, social withdrawal is also a part of normative adolescent development, which indicates that groups of adolescents potentially experience social withdrawal differently from one another. This study investigated whether different groups of adolescents experienced social withdrawal in daily life as positive versus negative, using experience sampling data from a large-scale study on mental health in general population adolescents aged 11 to 20 (n = 1913, MAge = 13.8, SDAge = 1.9, 63% female) from the Flemish region in Belgium. Two social withdrawal clusters were identified using model-based cluster analysis: one cluster characterized by high levels of positive affect and one cluster characterized by high levels of negative affect, loneliness and exclusion. Logistic regression showed that boys had 66% decreased odds of belonging to the negative cluster. These results show that daily-life social withdrawal experiences are heterogeneous in adolescence, which strengthens the view that, both in research and clinical practice, social withdrawal should not be seen as necessarily maladaptive.
Collapse
|
8
|
Lu Z, Lou W. Bayesian approaches to variable selection in mixture models with application to disease clustering. J Appl Stat 2021; 50:387-407. [PMID: 36698543 PMCID: PMC9869999 DOI: 10.1080/02664763.2021.1994529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
In biomedical research, cluster analysis is often performed to identify patient subgroups based on patients' characteristics or traits. In the model-based clustering for identifying patient subgroups, mixture models have played a fundamental role in modeling. While there is an increasing interest in using mixture modeling for identifying patient subgroups, little work has been done in selecting the predictors that are associated with the class assignment. In this study, we develop and compare two approaches to perform variable selection in the context of a mixture model to identify important predictors that are associated with the class assignment. These two approaches are the one-step approach and the stepwise approach. The former refers to an approach in which clustering and variable selection are performed simultaneously in one overall model, whereas the latter refers to an approach in which clustering and variable selection are performed in two sequential steps. We considered both shrinkage prior and spike-and-slab prior to select the importance of variables. Markov chain Monte Carlo algorithms are developed to estimate the posterior distribution of the model parameters. Practical applications and simulation studies are carried out to evaluate the clustering and variable selection performance of the proposed models.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada,Zihang Lu
| | - Wendy Lou
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
9
|
Lu Z, Lou W. Bayesian consensus clustering for multivariate longitudinal data. Stat Med 2021; 41:108-127. [PMID: 34672001 DOI: 10.1002/sim.9225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 09/26/2021] [Accepted: 09/27/2021] [Indexed: 11/06/2022]
Abstract
In clinical and epidemiological studies, there is a growing interest in studying the heterogeneity among patients based on longitudinal characteristics to identify subtypes of the study population. Compared to clustering a single longitudinal marker, simultaneously clustering multiple longitudinal markers allow additional information to be incorporated into the clustering process, which reveals co-existing longitudinal patterns and generates deeper biological insight. In the current study, we propose a Bayesian consensus clustering (BCC) model for multivariate longitudinal data. Instead of arriving at a single overall clustering, the proposed model allows each marker to follow marker-specific local clustering and these local clusterings are aggregated to find a global (consensus) clustering. To estimate the posterior distribution of model parameters, a Gibbs sampling algorithm is proposed. We apply our proposed model to the primary biliary cirrhosis study to identify patient subtypes that may be associated with their prognosis. We also perform simulation studies to compare the clustering performance between the proposed model and existing models under several scenarios. The results demonstrate that the proposed BCC model serves as a useful tool for clustering multivariate longitudinal data.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
| | - Wendy Lou
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Ensari I, Caceres BA, Jackman KB, Suero-Tejeda N, Shechter A, Odlum ML, Bakken S. Digital phenotyping of sleep patterns among heterogenous samples of Latinx adults using unsupervised learning. Sleep Med 2021; 85:211-220. [PMID: 34364092 DOI: 10.1016/j.sleep.2021.07.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/17/2021] [Accepted: 07/12/2021] [Indexed: 11/25/2022]
Abstract
OBJECTIVE This study aimed to identify sleep disturbance subtypes ("phenotypes") among Latinx adults based on objective sleep data using a flexible unsupervised machine learning technique. METHODS This study was an analysis of sleep data from three cross-sectional studies of the Precision in Symptom Self-Management Center at Columbia University. All studies focused on sleep health in Latinx adults at increased risk for sleep disturbance. Data on total sleep time (TST), time in bed (TIB), wake after sleep onset (WASO), sleep efficiency (SE), number of awakenings (NOA) and the mean length of nightly awakenings were collected using wrist-mounted accelerometers. Cluster analysis of the sleep data was conducted using an unsupervised machine learning approach that relies on mixtures of multivariate generalized linear mixed models. RESULTS The analytic sample included 494 days of data from 118 adults (Ages 19-77). A 3-cluster model provided the best fit based on deviance indices (ie, DΔ∼ -75 and -17 from 1- and 2- to 3-cluster models, respectively) and likelihood ratio (Pdiff ∼ 0.93). Phenotype 1 (n = 64) was associated with greater likelihood of overall adequate SE and less variability in SE and WASO. Phenotype 2 (n = 11) was characterized by higher NOAs, and greater WASO and TIB than the other phenotypes. Phenotype 3 (n = 43) was characterized by greater variability in SE, bed times and awakening times. CONCLUSION Robust digital data-driven modeling approaches can be useful for detecting sleep phenotypes from heterogenous patient populations, and have implications for designing precision sleep health strategies for management and early detection of sleep problems.
Collapse
Affiliation(s)
- Ipek Ensari
- Columbia University Data Science Institute, New York, NY, 10025, USA.
| | - Billy A Caceres
- Columbia University Data Science Institute, New York, NY, 10025, USA; Columbia University School of Nursing, New York, NY, 10032, USA
| | - Kasey B Jackman
- Columbia University School of Nursing, New York, NY, 10032, USA; New York-Presbyterian Hospital, New York, 10032, USA
| | | | - Ari Shechter
- Columbia University Irving Medical Center, New York, NY, 10032, USA
| | | | - Suzanne Bakken
- Columbia University Data Science Institute, New York, NY, 10025, USA; Columbia University School of Nursing, New York, NY, 10032, USA
| |
Collapse
|
11
|
Feely A, Lim LS, Jiang D, Lix LM. A population-based study to develop juvenile arthritis case definitions for administrative health data using model-based dynamic classification. BMC Med Res Methodol 2021; 21:105. [PMID: 33993875 PMCID: PMC8127203 DOI: 10.1186/s12874-021-01296-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 04/27/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Previous research has shown that chronic disease case definitions constructed using population-based administrative health data may have low accuracy for ascertaining cases of episodic diseases such as rheumatoid arthritis, which are characterized by periods of good health followed by periods of illness. No studies have considered a dynamic approach that uses statistical (i.e., probability) models for repeated measures data to classify individuals into disease, non-disease, and indeterminate categories as an alternative to deterministic (i.e., non-probability) methods that use summary data for case ascertainment. The research objectives were to validate a model-based dynamic classification approach for ascertaining cases of juvenile arthritis (JA) from administrative data, and compare its performance with a deterministic approach for case ascertainment. METHODS The study cohort was comprised of JA cases and non-JA controls 16 years or younger identified from a pediatric clinical registry in the Canadian province of Manitoba and born between 1980 and 2002. Registry data were linked to hospital records and physician billing claims up to 2018. Longitudinal discriminant analysis (LoDA) models and dynamic classification were applied to annual healthcare utilization measures. The deterministic case definition was based on JA diagnoses in healthcare use data anytime between birth and age 16 years; it required one hospitalization ever or two physician visits. Case definitions based on model-based dynamic classification and deterministic approaches were assessed on sensitivity, specificity, and positive and negative predictive values (PPV, NPV). Mean time to classification was also measured for the former. RESULTS The cohort included 797 individuals; 386 (48.4 %) were JA cases. A model-based dynamic classification approach using an annual measure of any JA-related healthcare contact had sensitivity = 0.70 and PPV = 0.82. Mean classification time was 9.21 years. The deterministic case definition had sensitivity = 0.91 and PPV = 0.92. CONCLUSIONS A model-based dynamic classification approach had lower accuracy for ascertaining JA cases than a deterministic approach. However, the dynamic approach required a shorter duration of time to produce a case definition with acceptable PPV. The choice of methods to construct case definitions and their performance may depend on the characteristics of the chronic disease under investigation.
Collapse
Affiliation(s)
- Allison Feely
- Department of Epidemiology and Cancer Registry, CancerCare Manitoba, Winnipeg, Canada
| | - Lily Sh Lim
- Department of Paediatrics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
| | - Depeng Jiang
- Department of Community Health Sciences, Rady Faculty of Health Sciences, University of Manitoba, S113-750 Bannatyne Avenue, R3E 0W3, Winnipeg, Canada
| | - Lisa M Lix
- Department of Community Health Sciences, Rady Faculty of Health Sciences, University of Manitoba, S113-750 Bannatyne Avenue, R3E 0W3, Winnipeg, Canada.
| |
Collapse
|
12
|
El Saeiti R, García-Fiñana M, Hughes DM. The effect of random-effects misspecification on classification accuracy. Int J Biostat 2021; 18:279-292. [PMID: 33770823 PMCID: PMC9156334 DOI: 10.1515/ijb-2019-0159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 01/21/2021] [Accepted: 02/17/2021] [Indexed: 11/15/2022]
Abstract
Mixed models are a useful way of analysing longitudinal data. Random effects terms allow modelling of patient specific deviations from the overall trend over time. Correlation between repeated measurements are captured by specifying a joint distribution for all random effects in a model. Typically, this joint distribution is assumed to be a multivariate normal distribution. For Gaussian outcomes misspecification of the random effects distribution usually has little impact. However, when the outcome is discrete (e.g. counts or binary outcomes) generalised linear mixed models (GLMMs) are used to analyse longitudinal trends. Opinion is divided about how robust GLMMs are to misspecification of the random effects. Previous work explored the impact of random effects misspecification on the bias of model parameters in single outcome GLMMs. Accepting that these model parameters may be biased, we investigate whether this affects our ability to classify patients into clinical groups using a longitudinal discriminant analysis. We also consider multiple outcomes, which can significantly increase the dimensions of the random effects distribution when modelled simultaneously. We show that when there is severe departure from normality, more flexible mixture distributions can give better classification accuracy. However, in many cases, wrongly assuming a single multivariate normal distribution has little impact on classification accuracy.
Collapse
Affiliation(s)
- Riham El Saeiti
- Health Data Science, University of Liverpool Faculty of Health and Life Sciences, Liverpool, UK
| | - Marta García-Fiñana
- Health Data Science, University of Liverpool Faculty of Health and Life Sciences, Liverpool, UK
| | - David M Hughes
- Health Data Science, University of Liverpool Faculty of Health and Life Sciences, Liverpool, UK
| |
Collapse
|
13
|
Yeager KA, Waldrop-Valverde D, Paul S, Bruner DW, Klisovic R, Burns E, Mason TA, Patel N, Jennings BM. Adherence trajectories in oral therapy for chronic myeloid leukemia: Overview of a research protocol. Res Nurs Health 2020; 43:443-452. [PMID: 32866350 DOI: 10.1002/nur.22069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 08/16/2020] [Indexed: 12/15/2022]
Abstract
Over a quarter of chemotherapy regimens now include oral agents. Individuals living with cancer are now responsible for administering this lifesaving therapy at home by taking every dose as prescribed. One type of oral chemotherapy, tyrosine kinase inhibitors (TKIs), is the current recommended treatment for chronic myeloid leukemia. This targeted therapy has markedly improved survival but comes with significant side effects and financial costs. In the study described in this protocol, the investigators seek to understand the dynamic nature of TKI adherence experienced by individuals diagnosed with CML. Using a mixed-method approach in this prospective observational study, funded by the National Cancer Institute, we seek to describe subjects' adherence trajectories over 1 year. We aim to characterize adherence trajectories in individuals taking TKIs using model-based cluster analysis. Next, we will determine how side effects and financial toxicity influence adherence trajectories. Then we will examine the influence of TKI adherence trajectories on disease outcomes. Additionally, we will explore the experience of patients taking TKIs by interviewing a subset of participants in different adherence trajectories. The projected sample includes 120 individuals taking TKIs who we will assess monthly for 12 months, measuring adherence with an objective measure (Medication Event Monitoring System). Identifying differential trajectories of adherence for TKIs is important for detecting subgroups at the highest risk of nonadherence and will support designing targeted interventions. Results from this study can potentially translate to other oral agents to improve care across different types of cancer.
Collapse
Affiliation(s)
- Katherine A Yeager
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | | | - Sudeshna Paul
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | - Deborah Watkins Bruner
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia.,Department of Radiation Oncology, Emory School of Medicine, Atlanta, Georgia
| | - Rebecca Klisovic
- Department of Hematology and Medical Oncology, Emory School of Medicine, Atlanta, Georgia
| | - Emily Burns
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | - Tamara A Mason
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | - Nisha Patel
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | | |
Collapse
|
14
|
Poulakis K, Ferreira D, Pereira JB, Smedby Ö, Vemuri P, Westman E. Fully bayesian longitudinal unsupervised learning for the assessment and visualization of AD heterogeneity and progression. Aging (Albany NY) 2020; 12:12622-12647. [PMID: 32644944 PMCID: PMC7377879 DOI: 10.18632/aging.103623] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 06/19/2020] [Indexed: 11/25/2022]
Abstract
Tau pathology and brain atrophy are the closest correlate of cognitive decline in Alzheimer's disease (AD). Understanding heterogeneity and longitudinal progression of atrophy during the disease course will play a key role in understanding AD pathogenesis. We propose a framework for longitudinal clustering that simultaneously: 1) incorporates whole brain data, 2) leverages unequal visits per individual, 3) compares clusters with a control group, 4) allows for study confounding effects, 5) provides cluster visualization, 6) measures clustering uncertainty. We used amyloid-β positive AD and negative healthy subjects, three longitudinal structural magnetic resonance imaging scans (cortical thickness and subcortical volume) over two years. We found three distinct longitudinal AD brain atrophy patterns: one typical diffuse pattern (n=34, 47.2%), and two atypical patterns: minimal atrophy (n=23 31.9%) and hippocampal sparing (n=9, 12.5%). We also identified outliers (n=3, 4.2%) and observations with uncertain classification (n=3, 4.2%). The clusters differed not only in regional distributions of atrophy at baseline, but also longitudinal atrophy progression, age at AD onset, and cognitive decline. A framework for the longitudinal assessment of variability in cohorts with several neuroimaging measures was successfully developed. We believe this framework may aid in disentangling distinct subtypes of AD from disease staging.
Collapse
Affiliation(s)
- Konstantinos Poulakis
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Daniel Ferreira
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Joana B. Pereira
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Örjan Smedby
- Department of Biomedical Engineering and Health Systems (MTH), KTH Royal Institute of Technology, Stockholm, Sweden
| | | | - Eric Westman
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| |
Collapse
|
15
|
Auder B, Gassiat E, Loum MA. Least squares moment identification of binary regression mixture models. METRIKA 2020. [DOI: 10.1007/s00184-020-00787-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Lim Y, Cheung YK, Oh HS. A generalization of functional clustering for discrete multivariate longitudinal data. Stat Methods Med Res 2020; 29:3205-3217. [PMID: 32368950 DOI: 10.1177/0962280220921912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This paper presents a new model-based generalized functional clustering method for discrete longitudinal data, such as multivariate binomial and Poisson distributed data. For this purpose, we propose a multivariate functional principal component analysis (MFPCA)-based clustering procedure for a latent multivariate Gaussian process instead of the original functional data directly. The main contribution of this study is two-fold: modeling of discrete longitudinal data with the latent multivariate Gaussian process and developing of a clustering algorithm based on MFPCA coupled with the latent multivariate Gaussian process. Numerical experiments, including real data analysis and a simulation study, demonstrate the promising empirical properties of the proposed approach.
Collapse
Affiliation(s)
- Yaeji Lim
- Department of Applied Statistics, Chung-Ang University, Seoul, Republic of Korea
| | | | - Hee-Seok Oh
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
17
|
Park S, Lim J, Choi H, Kwak M. Clustering of longitudinal interval-valued data via mixture distribution under covariance separability. J Appl Stat 2019; 47:1739-1756. [DOI: 10.1080/02664763.2019.1692795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Seongoh Park
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Hyejeong Choi
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Minjung Kwak
- Department of Statistics, Yeungnam University, Gyeongsan, Korea
| |
Collapse
|
18
|
Exploring the longitudinal dynamics of herd BVD antibody test results using model-based clustering. Sci Rep 2019; 9:11353. [PMID: 31388019 PMCID: PMC6684638 DOI: 10.1038/s41598-019-47339-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/15/2019] [Indexed: 11/08/2022] Open
Abstract
Determining the Bovine Viral Diarrhoea (BVD) infection status of cattle herds is a challenge for control and eradication schemes. Given the changing dynamics of BVD virus (BVDV) antibody responses in cattle, classifying herds based on longitudinal changes in the results of BVDV antibody tests could offer a novel, complementary approach to categorising herds that is less likely than the present system to result in a herd's status changing from year to year, as it is more likely to capture the true exposure dynamics of the farms. This paper describes the dynamics of BVDV antibody test values (measured as percentage positivity (PP)) obtained from 15,500 bovines between 2007 and 2010 from thirty nine cattle herds located in Scotland and Northern England. It explores approaches of classifying herds based on trend, magnitude and shape of their antibody PP trajectories and investigates the epidemiological similarities between farms within the same cluster. Gaussian mixture models were used for the magnitude and shape clustering. Epidemiologically meaningful clusters were obtained. Farm cluster membership depends on clustering approach used. Moderate concordance was found between the shape and magnitude clusters. These methods hold potential for application to enhance control efforts for BVD and other infectious livestock diseases.
Collapse
|
19
|
Chen W, Subbarao P, McGihon RE, Feldman LY, Zhu J, Lou W, Gershon AS, Abdullah K, Moraes TJ, Dubeau A, Sears MR, Lefebvre DL, Turvey SE, Mandhane PJ, Azad MB, To T. Patterns of health care use related to respiratory conditions in early life: A birth cohort study with linked administrative data. Pediatr Pulmonol 2019; 54:1267-1276. [PMID: 31172683 DOI: 10.1002/ppul.24381] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 05/03/2019] [Accepted: 05/09/2019] [Indexed: 11/11/2022]
Abstract
OBJECTIVES To identify distinctive patterns of respiratory-related health services use (HSU) between birth and 3 years of age, and to examine associated symptom and risk profiles. METHODS This study included 729 mother and child pairs enrolled in the Toronto site of the Canadian Healthy Infant Longitudinal Development study in 2009-2012; they were linked to Ontario health administrative databases (2009-2016). A model-based cluster analysis was performed to identify distinct groups of children who followed a similar pattern of respiratory-related HSU between birth and 3 years of age, regarding hospitalization, emergency department (ED) and physician office visits for respiratory conditions and total health care costs (2016 Canadian dollars). RESULTS The majority (estimated cluster weight = 0.905) showed a pattern of low and stable respiratory care use (low HSU) while the remainder (weight = 0.095) showed a pattern of high use (high HSU). From 0 to 3 years of age, the low- and high-HSU groups differed in mean trajectories of total health care costs ($783 per 6 months decreased to $114, vs $1796 to $177, respectively). Compared to low-HSU, the high-HSU group was associated with a constant risk of hospitalizations, early high ED utilization and physician visits for respiratory problems. The two groups differed significantly in the timing of wheezing (late onset in low-HSU vs early in high-HSU) and future total costs (stable vs increased). CONCLUSIONS One in ten children had high respiratory care use in early life. Such information can help identify high-risk young children in a large population, monitor their long-term health, and inform resource allocation.
Collapse
Affiliation(s)
- Wenjia Chen
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada.,Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
| | - Padmaja Subbarao
- Translational Medicine and Division of Respiratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada.,Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Rachel E McGihon
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
| | - Laura Y Feldman
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
| | - Jingqin Zhu
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada.,Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
| | - Wendy Lou
- Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Andrea S Gershon
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada.,Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada.,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Kawsari Abdullah
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
| | - Theo J Moraes
- Translational Medicine and Division of Respiratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada
| | - Aimée Dubeau
- Translational Medicine and Division of Respiratory Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Malcolm R Sears
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Diana L Lefebvre
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Stuart E Turvey
- Department of Pediatrics, BC Children's Hospital, University of British Columbia, Vancouver, British Columbia, Canada
| | - Piush J Mandhane
- Department of Pediatrics, University of Alberta, Edmonton, Alberta, Canada
| | - Meghan B Azad
- Department of Pediatrics & Child Health, Children's Hospital Research Institute of Manitoba, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Teresa To
- Child Health Evaluative Sciences Program, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada.,Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada.,Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
20
|
Paul S, Corwin EJ. Identifying clusters from multidimensional symptom trajectories in postpartum women. Res Nurs Health 2019; 42:119-127. [PMID: 30710373 DOI: 10.1002/nur.21935] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Accepted: 01/01/2019] [Indexed: 12/15/2022]
Abstract
Depressive symptoms, stress, fatigue, and lack of sleep are often experienced by women in the perinatal period and are potential contributors to adverse maternal and child health outcomes. To explore the evolution of symptoms and identify groups of women of similar severity and patterns, we utilized clustering of multidimensional symptom trajectories. In an observational study data were collected from pregnant women in the 3rd trimester (36 weeks prenatal) and in the postnatal period at weeks 1 and 2 as well as at 1-, 2-, 3-, and 6-months postpartum. Depressive symptoms and maternal stress were measured using the Edinburg Postnatal Depression Scale (EPDS) and the Perceived Stress Scale (PSS), respectively. Self-reported duration of sleep and levels of fatigue also were collected. A model-based clustering approach was used to classify women by their symptom severity. The sample included 151 pregnant women with a 6-month follow-up. Two clusters were identified. Cluster 1 (n = 43) comprised women with fewer depressive symptoms, less perceived stress, lower likelihood of being fatigued, increased sleep duration and a negative trend in EPDS (β = -0.05, CI [-0.09, -0.001]), and PSS (β = -0.09, CI [-0.17, -0.01]). Cluster 2 (n = 108) comprised women with higher EPDS and PSS scores, increased likelihood of fatigue and lower sleep duration with a positive trend in sleep hours (β = -0.02, CI [0.01, 0.03]). Pro-inflammatory markers interleukin-6 and tumor necrosis factor-α were associated with longer sleep duration and fewer depressive symptoms, respectively. Using this methodology in maternal and child health research can potentially predict women's risk of developing severe symptoms and help clinicians provide timely interventions.
Collapse
Affiliation(s)
- Sudeshna Paul
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| | - Elizabeth J Corwin
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, Georgia
| |
Collapse
|
21
|
Pencina MJ, Parikh CR, Kimmel PL, Cook NR, Coresh J, Feldman HI, Foulkes A, Gimotty PA, Hsu CY, Lemley K, Song P, Wilkins K, Gossett DR, Xie Y, Star RA. Statistical methods for building better biomarkers of chronic kidney disease. Stat Med 2019; 38:1903-1917. [PMID: 30663113 DOI: 10.1002/sim.8091] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 10/17/2018] [Accepted: 12/12/2018] [Indexed: 12/23/2022]
Abstract
The last two decades have witnessed an explosion in research focused on the development and assessment of novel biomarkers for improved prognosis of diseases. As a result, best practice standards guiding biomarker research have undergone extensive development. Currently, there is great interest in the promise of biomarkers to enhance research efforts and clinical practice in the setting of chronic kidney disease, acute kidney injury, and glomerular disease. However, some have questioned whether biomarkers currently add value to the clinical practice of nephrology. The current state of the art pertaining to statistical analyses regarding the use of such measures is critical. In December 2014, the National Institute of Diabetes and Digestive and Kidney Diseases convened a meeting, "Toward Building Better Biomarker Statistical Methodology," with the goals of summarizing the current best practice recommendations and articulating new directions for methodological research. This report summarizes its conclusions and describes areas that need attention. Suggestions are made regarding metrics that should be commonly reported. We outline the methodological issues related to traditional metrics and considerations in prognostic modeling, including discrimination and case mix, calibration, validation, and cost-benefit analysis. We highlight the approach to improved risk communication and the value of graphical displays. Finally, we address some "new frontiers" in prognostic biomarker research, including the competing risk framework, the use of longitudinal biomarkers, and analyses in distributed research networks.
Collapse
Affiliation(s)
- Michael J Pencina
- Duke Clinical Research Institute, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina
| | - Chirag R Parikh
- Division of Nephrology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Paul L Kimmel
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Nancy R Cook
- Division of Preventive Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Josef Coresh
- Departments of Epidemiology, Medicine and Biostatistics, Johns Hopkins University, Baltimore, Maryland
| | - Harold I Feldman
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Andrea Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts
| | - Phyllis A Gimotty
- Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.,Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco, San Francisco, California
| | - Kevin Lemley
- Division of Nephrology, Children's Hospital Los Angeles, Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Peter Song
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Kenneth Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.,Department of Preventive Medicine and Biostatistics, F. Edward Hébert School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Daniel R Gossett
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Yining Xie
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert A Star
- Division of Kidney, Urologic and Hematologic Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
22
|
Xu P, Peng H, Huang T. Unsupervised learning of mixture regression models for longitudinal data. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.03.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
23
|
Hughes DM, Komárek A, Czanner G, Garcia-Fiñana M. Dynamic longitudinal discriminant analysis using multiple longitudinal markers of different types. Stat Methods Med Res 2018; 27:2060-2080. [PMID: 27789653 PMCID: PMC5985589 DOI: 10.1177/0962280216674496] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
There is an emerging need in clinical research to accurately predict patients' disease status and disease progression by optimally integrating multivariate clinical information. Clinical data are often collected over time for multiple biomarkers of different types (e.g. continuous, binary and counts). In this paper, we present a flexible and dynamic (time-dependent) discriminant analysis approach in which multiple biomarkers of various types are jointly modelled for classification purposes by the multivariate generalized linear mixed model. We propose a mixture of normal distributions for the random effects to allow additional flexibility when modelling the complex correlation between longitudinal biomarkers and to robustify the model and the classification procedure against misspecification of the random effects distribution. These longitudinal models are subsequently used in a multivariate time-dependent discriminant scheme to predict, at any time point, the probability of belonging to a particular risk group. The methodology is illustrated using clinical data from patients with epilepsy, where the aim is to identify patients who will not achieve remission of seizures within a five-year follow-up period.
Collapse
Affiliation(s)
- David M Hughes
- Department of Biostatistics, University of Liverpool, UK
| | - Arnošt Komárek
- Charles University, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Prague, Czech Republic
| | - Gabriela Czanner
- Department of Biostatistics, University of Liverpool, UK
- Department of Eye and Vision Science, University of Liverpool, UK
| | | |
Collapse
|
24
|
Hughes DM, Komárek A, Bonnett LJ, Czanner G, García‐Fiñana M. Dynamic classification using credible intervals in longitudinal discriminant analysis. Stat Med 2017; 36:3858-3874. [PMID: 28762546 PMCID: PMC5655752 DOI: 10.1002/sim.7397] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 06/14/2017] [Indexed: 11/08/2022]
Abstract
Recently developed methods of longitudinal discriminant analysis allow for classification of subjects into prespecified prognostic groups using longitudinal history of both continuous and discrete biomarkers. The classification uses Bayesian estimates of the group membership probabilities for each prognostic group. These estimates are derived from a multivariate generalised linear mixed model of the biomarker's longitudinal evolution in each of the groups and can be updated each time new data is available for a patient, providing a dynamic (over time) allocation scheme. However, the precision of the estimated group probabilities differs for each patient and also over time. This precision can be assessed by looking at credible intervals for the group membership probabilities. In this paper, we propose a new allocation rule that incorporates credible intervals for use in context of a dynamic longitudinal discriminant analysis and show that this can decrease the number of false positives in a prognostic test, improving the positive predictive value. We also establish that by leaving some patients unclassified for a certain period, the classification accuracy of those patients who are classified can be improved, giving increased confidence to clinicians in their decision making. Finally, we show that determining a stopping rule dynamically can be more accurate than specifying a set time point at which to decide on a patient's status. We illustrate our methodology using data from patients with epilepsy and show how patients who fail to achieve adequate seizure control are more accurately identified using credible intervals compared to existing methods.
Collapse
Affiliation(s)
- David M. Hughes
- Department of BiostatisticsUniversity of LiverpoolLiverpoolU.K.
| | - Arnošt Komárek
- Department of Probability and Mathematical Statistics, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic
| | | | - Gabriela Czanner
- Department of BiostatisticsUniversity of LiverpoolLiverpoolU.K.
- Department of Eye and Vision ScienceUniversity of LiverpoolLiverpoolU.K.
| | | |
Collapse
|
25
|
Trajectories of Glycemic Change in a National Cohort of Adults With Previously Controlled Type 2 Diabetes. Med Care 2017; 55:956-964. [PMID: 28922296 DOI: 10.1097/mlr.0000000000000807] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Individualized diabetes management would benefit from prospectively identifying well-controlled patients at risk of losing glycemic control. OBJECTIVES To identify patterns of hemoglobin A1c (HbA1c) change among patients with stable controlled diabetes. RESEARCH DESIGN Cohort study using OptumLabs Data Warehouse, 2001-2013. We develop and apply a machine learning framework that uses a Bayesian estimation of the mixture of generalized linear mixed effect models to discover glycemic trajectories, and a random forest feature contribution method to identify patient characteristics predictive of their future glycemic trajectories. SUBJECTS The study cohort consisted of 27,005 US adults with type 2 diabetes, age 18 years and older, and stable index HbA1c <7.0%. MEASURES HbA1c values during 24 months of observation. RESULTS We compared models with k=1, 2, 3, 4, 5 trajectories and baseline variables including patient age, sex, race/ethnicity, comorbidities, medications, and HbA1c. The k=3 model had the best fit, reflecting 3 distinct trajectories of glycemic change: (T1) rapidly deteriorating HbA1c among 302 (1.1%) youngest (mean, 55.2 y) patients with lowest mean baseline HbA1c, 6.05%; (T2) gradually deteriorating HbA1c among 902 (3.3%) patients (mean, 56.5 y) with highest mean baseline HbA1c, 6.53%; and (T3) stable glycemic control among 25,800 (95.5%) oldest (mean, 58.5 y) patients with mean baseline HbA1c 6.21%. After 24 months, HbA1c rose to 8.75% in T1 and 8.40% in T2, but remained stable at 6.56% in T3. CONCLUSIONS Patients with controlled type 2 diabetes follow 3 distinct trajectories of glycemic control. This novel application of advanced analytic methods can facilitate individualized and population diabetes care by proactively identifying high risk patients.
Collapse
|
26
|
Sun J, Herazo-Maya JD, Kaminski N, Zhao H, Warren JL. A Dirichlet process mixture model for clustering longitudinal gene expression data. Stat Med 2017; 36:3495-3506. [PMID: 28620908 PMCID: PMC5583037 DOI: 10.1002/sim.7374] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 04/15/2017] [Accepted: 05/23/2017] [Indexed: 12/27/2022]
Abstract
Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jiehuan Sun
- Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A
| | - Jose D Herazo-Maya
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520, CT, U.S.A
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520, CT, U.S.A
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A
| | - Joshua L Warren
- Department of Biostatistics, Yale University, New Haven, 06520, CT, U.S.A
| |
Collapse
|
27
|
Hughes DM, El Saeiti R, García-Fiñana M. A comparison of group prediction approaches in longitudinal discriminant analysis. Biom J 2017; 60:307-322. [PMID: 28833412 PMCID: PMC5873537 DOI: 10.1002/bimj.201700013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 05/18/2017] [Accepted: 05/25/2017] [Indexed: 01/10/2023]
Abstract
Longitudinal discriminant analysis (LoDA) can be used to classify patients into prognostic groups based on their clinical history, which often involves longitudinal measurements of various clinically relevant markers. Patients' longitudinal data is first modelled using multivariate generalised linear mixed models, allowing markers of different types (e.g. continuous, binary, counts) to be modelled simultaneously. We describe three approaches to calculating a patient's posterior group membership probabilities which have been outlined in previous studies, based on the marginal distribution of the longitudinal markers, conditional distribution and distribution of the random effects. Here we compare the three approaches, first using data from the Mayo Primary Biliary Cirrhosis study and then by way of simulation study to explore in which situations each of the three approaches is expected to give the best prediction. We demonstrate situations in which the marginal or random‐effects approach perform well, but find that the conditional approach offers little extra information to the random‐effects and marginal approaches.
Collapse
Affiliation(s)
- David M Hughes
- Department of Biostatistics, University of Liverpool, Liverpool, UK
| | - Riham El Saeiti
- Department of Biostatistics, University of Liverpool, Liverpool, UK.,Department of Statistics, University of Benghazi, Benghazi, Libya
| | | |
Collapse
|
28
|
Maruotti A, Vichi M. Time-varying clustering of multivariate longitudinal observations. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2013.821488] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
29
|
Heinzl F, Tutz G. Clustering in linear-mixed models with a group fused lasso penalty. Biom J 2013; 56:44-68. [DOI: 10.1002/bimj.201200111] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Revised: 12/04/2012] [Accepted: 08/18/2013] [Indexed: 11/06/2022]
Affiliation(s)
- Felix Heinzl
- Department of Statistics; Ludwig-Maximilians-University Munich, Akademiestr. 1; 80799 Munich Germany
| | - Gerhard Tutz
- Department of Statistics; Ludwig-Maximilians-University Munich, Akademiestr. 1; 80799 Munich Germany
| |
Collapse
|
30
|
Heinzl F, Tutz G. Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm. STAT MODEL 2013. [DOI: 10.1177/1471082x12471372] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In linear mixed models, the assumption of normally distributed random effects is often inappropriate and unnecessarily restrictive. The proposed approximate Dirichlet process mixture assumes a hierarchical Gaussian mixture that is based on the truncated version of the stick breaking presentation of the Dirichlet process. In addition to the weakening of distributional assumptions, the specification allows to identify clusters of observations with a similar random effects structure. An Expectation-Maximization algorithm is given that solves the estimation problem and that, in certain respects, may exhibit advantages over Markov chain Monte Carlo approaches when modelling with Dirichlet processes. The method is evaluated in a simulation study and applied to the dynamics of unemployment in Germany as well as lung function growth data.
Collapse
Affiliation(s)
- Felix Heinzl
- Department of Statistics, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Gerhard Tutz
- Department of Statistics, Ludwig-Maximilians-University Munich, Munich, Germany
| |
Collapse
|