1
|
D’Angelo S, Brennan L, Gormley IC. Inferring food intake from multiple biomarkers using a latent variable model. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Silvia D’Angelo
- School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin
| | - Lorraine Brennan
- School of Agriculture and Food Science, University College Dublin
| | - Isobel Claire Gormley
- School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin
| |
Collapse
|
2
|
Holmes JB, Schofield MR, Barker RJ. Pólya‐gamma data augmentation and latent variable models for multivariate binomial data. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Zangrossi A, Montemurro S, Altoè G, Mondini S. Heterogeneity and Factorial Structure in Alzheimer's Disease: A Cognitive Perspective. J Alzheimers Dis 2021; 83:1341-1351. [PMID: 34420975 DOI: 10.3233/jad-210719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) patients show heterogeneous cognitive profiles which suggest the existence of cognitive subgroups. A deeper comprehension of this heterogeneity could contribute to move toward a precision medicine perspective. OBJECTIVE In this study, we aimed 1) to investigate AD cognitive heterogeneity as a product of the combination of within- (factors) and between-patients (sub-phenotypes) components, and 2) to promote its assessment in clinical practice by defining a small set of critical tests for this purpose. METHODS We performed factor mixture analysis (FMA) on neurocognitive assessment results of N = 230 patients with a clinical diagnosis of AD. This technique allowed to investigate the structure of cognitive heterogeneity in this sample and to characterize the core features of cognitive sub-phenotypes. Subsequently, we performed a tests selection based on logistic regression to highlight the best tests to detect AD patients in our sample. Finally, the accuracy of the same tests in the discrimination of sub-phenotypes was evaluated. RESULTS FMA revealed a structure characterized by five latent factors and four groups, which were identifiable by means of a few cognitive tests and were mainly characterized by memory deficits with visuospatial difficulties ("Visuospatial AD"), typical AD cognitive pattern ("Typical AD"), less impaired memory ("Mild AD"), and language/praxis deficits with relatively spared memory ("Nonamnestic AD"). CONCLUSION The structure of cognitive heterogeneity in our sample of AD patients, as studied by FMA, could be summarized by four sub-phenotypes with distinct cognitive characteristics easily identifiable in clinical practice. Clinical implications under the precision medicine framework are discussed.
Collapse
Affiliation(s)
- Andrea Zangrossi
- Department of Neuroscience, University of Padua, Padua, Italy.,Padova Neuroscience Center (PNC), University of Padua, Padua, Italy
| | | | - Gianmarco Altoè
- Department of Developmental and Social Psychology, University of Padua, Padua, Italy
| | - Sara Mondini
- Department of Philosophy, Sociology, Pedagogy and Applied Psychology, University of Padua, Padua, Italy.,Human Inspired Technology Research Centre, University of Padua, Padua, Italy
| |
Collapse
|
4
|
Dimension reduction for longitudinal multivariate data by optimizing class separation of projected latent Markov models. TEST-SPAIN 2021. [DOI: 10.1007/s11749-020-00727-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractWe present a method for dimension reduction of multivariate longitudinal data, where new variables are assumed to follow a latent Markov model. New variables are obtained as linear combinations of the multivariate outcome as usual. Weights of each linear combination maximize a measure of separation of the latent intercepts, subject to orthogonality constraints. We evaluate our proposal in a simulation study and illustrate it using an EU-level data set on income and living conditions, where dimension reduction leads to an optimal scoring system for material deprivation. An implementation of our approach can be downloaded from .
Collapse
|
5
|
Capdeville V, Gonçalves KCM, Pereira JBM. Bayesian factor models for multivariate categorical data obtained from questionnaires. J Appl Stat 2020; 48:3150-3173. [PMID: 35707256 PMCID: PMC9041874 DOI: 10.1080/02664763.2020.1796935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 07/12/2020] [Indexed: 10/23/2022]
Abstract
Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology, where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants' responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings.
Collapse
Affiliation(s)
- Vitor Capdeville
- Departamento de Métodos Estatísticos, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Kelly C. M. Gonçalves
- Departamento de Métodos Estatísticos, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - João B. M. Pereira
- Departamento de Métodos Estatísticos, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
|
7
|
Cagnone S, Viroli C. Multivariate latent variable transition models of longitudinal mixed data: an analysis on alcohol use disorder. J R Stat Soc Ser C Appl Stat 2018. [DOI: 10.1111/rssc.12285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Altani A, Protopapas A, Georgiou GK. Using Serial and Discrete Digit Naming to Unravel Word Reading Processes. Front Psychol 2018; 9:524. [PMID: 29706918 PMCID: PMC5908969 DOI: 10.3389/fpsyg.2018.00524] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 03/27/2018] [Indexed: 11/13/2022] Open
Abstract
During reading acquisition, word recognition is assumed to undergo a developmental shift from slow serial/sublexical processing of letter strings to fast parallel processing of whole word forms. This shift has been proposed to be detected by examining the size of the relationship between serial- and discrete-trial versions of word reading and rapid naming tasks. Specifically, a strong association between serial naming of symbols and single word reading suggests that words are processed serially, whereas a strong association between discrete naming of symbols and single word reading suggests that words are processed in parallel as wholes. In this study, 429 Grade 1, 3, and 5 English-speaking Canadian children were tested on serial and discrete digit naming and word reading. Across grades, single word reading was more strongly associated with discrete naming than with serial naming of digits, indicating that short high-frequency words are processed as whole units early in the development of reading ability in English. In contrast, serial naming was not a unique predictor of single word reading across grades, suggesting that within-word sequential processing was not required for the successful recognition for this set of words. Factor mixture analysis revealed that our participants could be clustered into two classes, namely beginning and more advanced readers. Serial naming uniquely predicted single word reading only among the first class of readers, indicating that novice readers rely on a serial strategy to decode words. Yet, a considerable proportion of Grade 1 students were assigned to the second class, evidently being able to process short high-frequency words as unitized symbols. We consider these findings together with those from previous studies to challenge the hypothesis of a binary distinction between serial/sublexical and parallel/lexical processing in word reading. We argue instead that sequential processing in word reading operates on a continuum, depending on the level of reading proficiency, the degree of orthographic transparency, and word-specific characteristics.
Collapse
Affiliation(s)
- Angeliki Altani
- Department of Educational Psychology, Faculty of Education, University of Alberta, Edmonton, AB, Canada
| | - Athanassios Protopapas
- Department of Special Needs Education, Faculty of Educational Sciences, University of Oslo, Oslo, Norway
| | - George K Georgiou
- Department of Educational Psychology, Faculty of Education, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
9
|
Choi J, Zeng D, Olshan AF, Cai J. Joint modeling of survival time and longitudinal outcomes with flexible random effects. LIFETIME DATA ANALYSIS 2018; 24:126-152. [PMID: 28856493 PMCID: PMC5756108 DOI: 10.1007/s10985-017-9405-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 08/17/2017] [Indexed: 06/07/2023]
Abstract
Joint models with shared Gaussian random effects have been conventionally used in analysis of longitudinal outcome and survival endpoint in biomedical or public health research. However, misspecifying the normality assumption of random effects can lead to serious bias in parameter estimation and future prediction. In this paper, we study joint models of general longitudinal outcomes and survival endpoint but allow the underlying distribution of shared random effect to be completely unknown. For inference, we propose to use a mixture of Gaussian distributions as an approximation to this unknown distribution and adopt an Expectation-Maximization (EM) algorithm for computation. Either AIC and BIC criteria are adopted for selecting the number of mixtures. We demonstrate the proposed method via a number of simulation studies. We illustrate our approach with the data from the Carolina Head and Neck Cancer Study (CHANCE).
Collapse
Affiliation(s)
- Jaeun Choi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Avenue, New York, NY, 10461, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC, 27599, USA
| | - Andrew F Olshan
- Department of Epidemiology, University of North Carolina at Chapel Hill, McGavran-Greenberg Hl, 135 Dauer Drive, CB 7435, Chapel Hill, NC, 27599, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, McGavran-Greenberg Hl, 135 Dauer Drive, CB 7420, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
10
|
McParland D, Phillips CM, Brennan L, Roche HM, Gormley IC. Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data. Stat Med 2017; 36:4548-4569. [PMID: 28664564 DOI: 10.1002/sim.7371] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 04/28/2017] [Accepted: 05/23/2017] [Indexed: 12/31/2022]
Abstract
The LIPGENE-SU.VI.MAX study, like many others, recorded high-dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC-MCMC criterion is developed to select the optimal model. Two clusters or sub-phenotypes ('healthy' and 'at risk') are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE-SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub-phenotypes strongly correspond to the 7-year follow-up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub-phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- D McParland
- School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
| | - C M Phillips
- HRB Centre for Diet and Health Research, Department of Epidemiology and Public Health, University College Cork, Cork, Ireland.,HRB Centre for Diet and Health Research, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland.,Nutrigenomics Research Group, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - L Brennan
- School of Agriculture and Food Science, UCD Institute of Food and Health, University College Dublin, Dublin, Ireland
| | - H M Roche
- Nutrigenomics Research Group, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - I C Gormley
- School of Mathematics and Statistics, University College Dublin, Dublin, Ireland.,INSIGHT: The National Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
11
|
Ranalli M, Rocci R. A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data. PSYCHOMETRIKA 2017; 82:1007-1034. [PMID: 28879568 DOI: 10.1007/s11336-017-9578-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 05/22/2017] [Indexed: 06/07/2023]
Abstract
The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal.
Collapse
Affiliation(s)
- Monia Ranalli
- Department of Statistics, The Pennsylvania State University, State College, PA, USA.
| | - Roberto Rocci
- Department of Economics and Finance, University of Tor Vergata, Rome, Italy
| |
Collapse
|
12
|
Ranalli M, Rocci R. Mixture models for mixed-type data through a composite likelihood approach. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2016.12.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Baghfalaki T, Ganjali M, Verbeke G. A shared parameter model of longitudinal measurements and survival time with heterogeneous random-effects distribution. J Appl Stat 2016. [DOI: 10.1080/02664763.2016.1266309] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Taban Baghfalaki
- Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mojtaba Ganjali
- Department of Statistics, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Geert Verbeke
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
14
|
|
15
|
Wall MM, Park JY, Moustaki I. IRT Modeling in the Presence of Zero-Inflation With Application to Psychiatric Disorder Severity. APPLIED PSYCHOLOGICAL MEASUREMENT 2015; 39:583-597. [PMID: 29881029 PMCID: PMC5978495 DOI: 10.1177/0146621615588184] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Item response theory (IRT) has been increasingly utilized in psychiatry for the purpose of describing the relationship among items in psychiatric disorder symptom batteries hypothesized to be indicators of an underlying latent continuous trait representing the severity of the psychiatric disorder. It is common to find zero-inflated (ZI) data such that a large proportion of the sample has none of the symptoms. It has been argued that standard IRT models of psychiatric disorder symptoms may be problematic due to the unipolar nature of many clinical traits. In the current article, the authors propose to address this by using a mixture model to approximate the unknown latent trait distribution in the IRT model while allowing for the presence of a non-pathological subgroup. The basic idea is that instead of assuming normality for the underlying trait, the latent trait will be allowed to follow a mixture of normals including a degenerate component that is fixed to represent a non-pathological group for whom the psychiatric symptoms simply are not relevant and hence are all expected to be zero. The authors demonstrate how the ZI mixture IRT method can be implemented in Mplus and present a simulation study comparing its performance with a standard IRT model assuming normality under different scenarios representative of psychiatric disorder symptom batteries. The model incorrectly assuming normality is shown to have biased discrimination and severity estimates. An application further illustrates the method using data from an alcohol use disorder criteria battery.
Collapse
Affiliation(s)
- Melanie M. Wall
- Columbia University, New York, NY, USA
- New York State Psychiatric Institute, New York City, USA
| | | | - Irini Moustaki
- London School of Economics and Political Science, UK
- Irini Moustaki, Department of Statistics, London School of Economics, Houghton Street, London WC2A 2AE, UK.
| |
Collapse
|
16
|
|
17
|
Cagnone S, Viroli C. A factor mixture model for analyzing heterogeneity and cognitive structure of dementia. ASTA ADVANCES IN STATISTICAL ANALYSIS 2013. [DOI: 10.1007/s10182-012-0206-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|