1
|
Milligan BG, Rohde AT. Why More Biologists Must Embrace Quantitative Modeling. Integr Comp Biol 2024; 64:975-986. [PMID: 38740442 DOI: 10.1093/icb/icae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
Biology as a field has transformed since the time of its foundation from an organized enterprise cataloging the diversity of the natural world to a quantitatively rigorous science seeking to answer complex questions about the functions of organisms and their interactions with each other and their environments. As the mathematical rigor of biological analyses has improved, quantitative models have been developed to describe multi-mechanistic systems and to test complex hypotheses. However, applications of quantitative models have been uneven across fields, and many biologists lack the foundational training necessary to apply them in their research or to interpret their results to inform biological problem-solving efforts. This gap in scientific training has created a false dichotomy of "biologists" and "modelers" that only exacerbates the barriers to working biologists seeking additional training in quantitative modeling. Here, we make the argument that all biologists are modelers and are capable of using sophisticated quantitative modeling in their work. We highlight four benefits of conducting biological research within the framework of quantitative models, identify the potential producers and consumers of information produced by such models, and make recommendations for strategies to overcome barriers to their widespread implementation. Improved understanding of quantitative modeling could guide the producers of biological information to better apply biological measurements through analyses that evaluate mechanisms, and allow consumers of biological information to better judge the quality and applications of the information they receive. As our explanations of biological phenomena increase in complexity, so too must we embrace modeling as a foundational skill.
Collapse
Affiliation(s)
- Brook G Milligan
- Department of Biology, New Mexico State University, Las Cruces, NM 88001, USA
| | - Ashley T Rohde
- Department of Biology, New Mexico State University, Las Cruces, NM 88001, USA
| |
Collapse
|
2
|
Liu SH, Chen Y, Kuiper JR, Ho E, Buckley JP, Feuerstahler L. Applying Latent Variable Models to Estimate Cumulative Exposure Burden to Chemical Mixtures and Identify Latent Exposure Subgroups: A Critical Review and Future Directions. STATISTICS IN BIOSCIENCES 2024; 16:482-502. [PMID: 39494216 PMCID: PMC11529820 DOI: 10.1007/s12561-023-09410-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/03/2023] [Accepted: 11/11/2023] [Indexed: 11/05/2024]
Abstract
Environmental mixtures, which reflect joint exposure to multiple environmental agents, are a major focus of environmental health and risk assessment research. Advancements in latent variable modeling and psychometrics can be used to address contemporary questions in environmental mixtures research. In particular, latent variable models can quantify an individual's cumulative exposure burden to mixtures and identify hidden subpopulations with distinct exposure patterns. Here, we first provide a review of measurement approaches from the psychometrics field, including structural equation modeling and latent class/profile analysis, and discuss their prior environmental epidemiologic applications. Then, we discuss additional, underutilized opportunities to leverage the strengths of psychometric approaches. This includes using item response theory to create a common scale for comparing exposure burden scores across studies; facilitating data harmonization through the use of anchors. We also discuss studying fairness or appropriateness of measurement models to quantify exposure burden across diverse populations, through the use of mixture item response theory and through evaluation of measurement invariance and differential item functioning. Multi-dimensional models to quantify correlated exposure burden sub-scores, and methods to adjust for imprecision of chemical exposure data, are also discussed. We show that there is great potential to address pressing environmental epidemiology and exposure science questions using latent variable methods.
Collapse
Affiliation(s)
- Shelley H. Liu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yitong Chen
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jordan R. Kuiper
- Department of Environmental and Occupational Health, The George Washington University Milken Institute School of Public Health, Washington, DC, USA
| | - Emily Ho
- Medical Social Sciences, Northwestern University, Chicago, IL, USA
| | - Jessie P. Buckley
- Department of Environmental Health and Engineering, John Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | |
Collapse
|
3
|
Sørensen Ø, Fjell AM, Walhovd KB. Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models. PSYCHOMETRIKA 2023; 88:456-486. [PMID: 36976415 PMCID: PMC10188428 DOI: 10.1007/s11336-023-09910-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Indexed: 05/17/2023]
Abstract
We present generalized additive latent and mixed models (GALAMMs) for analysis of clustered data with responses and latent variables depending smoothly on observed variables. A scalable maximum likelihood estimation algorithm is proposed, utilizing the Laplace approximation, sparse matrix computation, and automatic differentiation. Mixed response types, heteroscedasticity, and crossed random effects are naturally incorporated into the framework. The models developed were motivated by applications in cognitive neuroscience, and two case studies are presented. First, we show how GALAMMs can jointly model the complex lifespan trajectories of episodic memory, working memory, and speed/executive function, measured by the California Verbal Learning Test (CVLT), digit span tests, and Stroop tests, respectively. Next, we study the effect of socioeconomic status on brain structure, using data on education and income together with hippocampal volumes estimated by magnetic resonance imaging. By combining semiparametric estimation with latent variable modeling, GALAMMs allow a more realistic representation of how brain and cognition vary across the lifespan, while simultaneously estimating latent traits from measured items. Simulation experiments suggest that model estimates are accurate even with moderate sample sizes.
Collapse
Affiliation(s)
| | - Anders M Fjell
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway
| | - Kristine B Walhovd
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
4
|
Jiang Y, Zhang N. Does commerce promote theft? A quantitative study from Beijing, China. HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS 2023; 10:203. [PMID: 37192942 PMCID: PMC10161168 DOI: 10.1057/s41599-023-01706-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 04/18/2023] [Indexed: 05/18/2023]
Abstract
Commerce, as both an environmental and a social factor, is essential to the study of the causes of urban crimes. This paper aims to comprehensively propose research hypotheses based on these two types of commercial factors and optimise statistical tools with which to analyse commerce's impact on the level of theft in Beijing. Combining criminal verdicts, census data, points of interest, and information on nighttime lighting, this paper first applies a hierarchical regression model to verify the effectiveness of using commercial environmental and social factors to explain theft statistics and then constructs a structural equation model to analyse the joint influence of multiple commercial factors on those statistics. This paper finds that Beijing's commerce does not significantly promote theft, verifies the effectiveness of two types of commercial variables and the corresponding Western theories in explaining commerce's impact on theft in Beijing, and provides empirical data for the study of the causes of theft in a non-Western context.
Collapse
Affiliation(s)
- Yutian Jiang
- Department of Economics, School of Economics and Management, Beijing Jiaotong University, Beijing, China
| | - Na Zhang
- Department of Economics, School of Economics and Management, Beijing Jiaotong University, Beijing, China
- Beijing Laboratory of National Economic Security Early-warning Engineering, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
5
|
Ranalli MG, Matei A, Neri A. Generalised calibration with latent variables for the treatment of unit nonresponse in sample surveys. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00646-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractSample surveys may suffer from nonignorable unit nonresponse. This happens when the decision of whether or not to participate in the survey is correlated with variables of interest; in such a case, nonresponse produces biased estimates for parameters related to those variables, even after adjustments that account for auxiliary information. This paper presents a method to deal with nonignorable unit nonresponse that uses generalised calibration and latent variable modelling. Generalised calibration enables to model unit nonresponse using a set of auxiliary variables (instrumental or model variables), that can be different from those used in the calibration constraints (calibration variables). We propose to use latent variables to estimate the probability to participate in the survey and to construct a reweighting system incorporating such latent variables. The proposed methodology is illustrated, its properties discussed and tested on two simulation studies. Finally, it is applied to adjust estimates of the finite population mean wealth from the Italian Survey of Household Income and Wealth.
Collapse
|
6
|
Donneyong MM, Fischer MA, Langston MA, Joseph JJ, Juarez PD, Zhang P, Kline DM. Examining the Drivers of Racial/Ethnic Disparities in Non-Adherence to Antihypertensive Medications and Mortality Due to Heart Disease and Stroke: A County-Level Analysis. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182312702. [PMID: 34886429 PMCID: PMC8657217 DOI: 10.3390/ijerph182312702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 11/30/2021] [Accepted: 11/30/2021] [Indexed: 11/16/2022]
Abstract
Background: Prior research has identified disparities in anti-hypertensive medication (AHM) non-adherence between Black/African Americans (BAAs) and non-Hispanic Whites (nHWs) but the role of determinants of health in these gaps is unclear. Non-adherence to AHM may be associated with increased mortality (due to heart disease and stroke) and the extent to which such associations are modified by contextual determinants of health may inform future interventions. Methods: We linked the Centers for Disease Control and Prevention (CDC) Atlas of Heart Disease and Stroke (2014-2016) and the 2016 County Health Ranking (CHR) dataset to investigate the associations between AHM non-adherence, mortality, and determinants of health. A proportion of days covered (PDC) with AHM < 80%, was considered as non-adherence. We computed the prevalence rate ratio (PRR)-the ratio of the prevalence among BAAs to that among nHWs-as an index of BAA-nHW disparity. Hierarchical linear models (HLM) were used to assess the role of four pre-defined determinants of health domains-health behaviors, clinical care, social and economic and physical environment-as contributors to BAA-nHW disparities in AHM non-adherence. A Bayesian paradigm framework was used to quantify the associations between AHM non-adherence and mortality (heart disease and stroke) and to assess whether the determinants of health factors moderated these associations. Results: Overall, BAAs were significantly more likely to be non-adherent: PRR = 1.37, 95% Confidence Interval (CI):1.36, 1.37. The four county-level constructs of determinants of health accounted for 24% of the BAA-nHW variation in AHM non-adherence. The clinical care (β = -0.21, p < 0.001) and social and economic (β = -0.11, p < 0.01) domains were significantly inversely associated with the observed BAA-nHW disparity. AHM non-adherence was associated with both heart disease and stroke mortality among both BAAs and nHWs. We observed that the determinants of health, specifically clinical care and physical environment domains, moderated the effects of AHM non-adherence on heart disease mortality among BAAs but not among nHWs. For the AHM non-adherence-stroke mortality association, the determinants of health did not moderate this association among BAAs; the social and economic domain did moderate this association among nHWs. Conclusions: The socioeconomic, clinical care and physical environmental attributes of the places that patients live are significant contributors to BAA-nHW disparities in AHM non-adherence and mortality due to heart diseases and stroke.
Collapse
Affiliation(s)
- Macarius M. Donneyong
- College of Pharmacy, The Ohio State University, Columbus, OH 43210, USA
- Correspondence: ; Tel.: +614-292-0075
| | - Michael A. Fischer
- General Internal Medicine at Boston Medical Center, Boston University School of Medicine, Boston, MA 02118, USA;
| | - Michael A. Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA;
| | - Joshua J. Joseph
- College of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA;
| | - Paul D. Juarez
- Department of Family and Community Medicine, Meharry Medical College, Nashville, TN 37208, USA;
| | - Ping Zhang
- Division of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA;
| | - David M. Kline
- Department of Biostatistics and Data Science, Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA;
| |
Collapse
|
7
|
Adam NS, Twabi HS, Manda SOM. A simulation study for evaluating the performance of clustering measures in multilevel logistic regression. BMC Med Res Methodol 2021; 21:245. [PMID: 34772354 PMCID: PMC8590272 DOI: 10.1186/s12874-021-01417-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 09/22/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)). METHODS We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi. RESULTS Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively. CONCLUSION At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses.
Collapse
Affiliation(s)
- Nicholas Siame Adam
- Department of Mathematical Sciences, University of Malawi, Chirunga, Zomba, P.O. Box 280, Malawi.,African Institute for Development Policy, Petroda Glasshouse, Area 14, plot number 14/191, Lilongwe 3, 31024, Malawi
| | - Halima S Twabi
- Department of Mathematical Sciences, University of Malawi, Chirunga, Zomba, P.O. Box 280, Malawi.
| | - Samuel O M Manda
- Biostatistics Research Unit, South African Medical Research Council, Pretoria, South Africa.,Department of Statistics, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
8
|
Zhang A, Fang J, Hu W, Calhoun VD, Wang YP. A Latent Gaussian Copula Model for Mixed Data Analysis in Brain Imaging Genetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1350-1360. [PMID: 31689199 PMCID: PMC7756188 DOI: 10.1109/tcbb.2019.2950904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent advances in imaging genetics make it possible to combine different types of data including medical images like functional magnetic resonance imaging (fMRI) and genetic data like single nucleotide polymorphisms (SNPs) for comprehensive diagnosis of mental disorders. Understanding complex interactions among these heterogeneous data may give rise to a new perspective, while at the same time demand statistical models for their integration. Various graphical models have been proposed for the study of interaction or association networks with continuous, binary, and count data as well as the mixture of them. However, limited efforts have been made for the multinomial case, for instance, SNP data. Our goal is therefore to fill the void by developing a graphical model for the integration of fMRI image and SNP data, which can provide deeper understanding of the unknown neurogenetic mechanism. In this article, we propose a latent Gaussian copula model for mixed data containing multinomial components. We assume that the discrete variable is obtained by discretizing a latent (unobserved) continuous variable and then create a semi-rank based estimator of the graph structure. The simulation results demonstrate that the proposed latent correlation has more steady and accurate performance than several existing methods in detecting graph structure. When applying to a real schizophrenia data consisting of SNP array and fMRI image collected by the Mind Clinical Imaging Consortium (MCIC), the proposed method reveals a set of distinct SNP-brain associations, which are verified to be biologically significant. The proposed model is statistically promising in handling mixed types of data including multinomial components, which can find widespread applications. To promote reproducible research, the R code is available at https://github.com/Aiying0512/LGCM.
Collapse
|
9
|
Congdon P. A diabetes risk index for small areas in England. Health Place 2020; 63:102340. [PMID: 32543429 DOI: 10.1016/j.healthplace.2020.102340] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 03/26/2020] [Accepted: 04/06/2020] [Indexed: 01/03/2023]
Abstract
UK and international studies point to significant area variation in diabetes risk, and summary indices of diabetic risk are potentially of value in effective targeting of health interventions and healthcare resources. This paper aims to develop a summary measure of the diabetic risk environment which can act as an index for targeting health care resources. The diabetes risk index is for 6791 English small areas (which provide entire coverage of England) and has advantages in incorporating evidence from both diabetes outcomes and area risk factors, and in including spatial correlation in its construction. The analysis underlying the risk index shows that area socio-economic status, social fragmentation and south Asian ethnic concentration are all positive risk factors for diabetes risk. However, urban-rural and regional differences in risk intersect with these socio-demographic influences.
Collapse
Affiliation(s)
- Peter Congdon
- School of Geography, Queen Mary University of London, Mile End Rd, London, E1 4NS, UK.
| |
Collapse
|
10
|
Yousefi A, Basu I, Paulk AC, Peled N, Eskandar EN, Dougherty DD, Cash SS, Widge AS, Eden UT. Decoding Hidden Cognitive States From Behavior and Physiology Using a Bayesian Approach. Neural Comput 2019; 31:1751-1788. [DOI: 10.1162/neco_a_01196] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Cognitive processes, such as learning and cognitive flexibility, are both difficult to measure and to sample continuously using objective tools because cognitive processes arise from distributed, high-dimensional neural activity. For both research and clinical applications, that dimensionality must be reduced. To reduce dimensionality and measure underlying cognitive processes, we propose a modeling framework in which a cognitive process is defined as a low-dimensional dynamical latent variable—called a cognitive state, which links high-dimensional neural recordings and multidimensional behavioral readouts. This framework allows us to decompose the hard problem of modeling the relationship between neural and behavioral data into separable encoding-decoding approaches. We first use a state-space modeling framework, the behavioral decoder, to articulate the relationship between an objective behavioral readout (e.g., response times) and cognitive state. The second step, the neural encoder, involves using a generalized linear model (GLM) to identify the relationship between the cognitive state and neural signals, such as local field potential (LFP). We then use the neural encoder model and a Bayesian filter to estimate cognitive state using neural data (LFP power) to generate the neural decoder. We provide goodness-of-fit analysis and model selection criteria in support of the encoding-decoding result. We apply this framework to estimate an underlying cognitive state from neural data in human participants ([Formula: see text]) performing a cognitive conflict task. We successfully estimated the cognitive state within the 95% confidence intervals of that estimated using behavior readout for an average of 90% of task trials across participants. In contrast to previous encoder-decoder models, our proposed modeling framework incorporates LFP spectral power to encode and decode a cognitive state. The framework allowed us to capture the temporal evolution of the underlying cognitive processes, which could be key to the development of closed-loop experiments and treatments.
Collapse
Affiliation(s)
- Ali Yousefi
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, U.S.A
| | - Ishita Basu
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, U.S.A
| | - Angelique C. Paulk
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, U.S.A
| | - Noam Peled
- Department of Radiology, MBGH/HST Martinos Center for Biomedical Imaging and Harvard Medical School, Boston, MA 02114, U.S.A
| | - Emad N. Eskandar
- Department of Neurological Surgery, Albert Einstein College of Medicine, Bronx, NY 10461, U.S.A
| | - Darin D. Dougherty
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, U.S.A
| | - Sydney S. Cash
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, U.S.A
| | - Alik S. Widge
- Department of Psychiatry, University of Minnesota, Minneapolis, MN 55454, U.S.A
| | - Uri T. Eden
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, U.S.A
| |
Collapse
|
11
|
Gronsbell J, Minnier J, Yu S, Liao K, Cai T. Automated feature selection of predictors in electronic medical records data. Biometrics 2019; 75:268-277. [PMID: 30353541 DOI: 10.1111/biom.12987] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 10/01/2018] [Indexed: 01/29/2023]
Abstract
The use of Electronic Health Records (EHR) for translational research can be challenging due to difficulty in extracting accurate disease phenotype data. Historically, EHR algorithms for annotating phenotypes have been either rule-based or trained with billing codes and gold standard labels curated via labor intensive medical chart review. These simplistic algorithms tend to have unpredictable portability across institutions and low accuracy for many disease phenotypes due to imprecise billing codes. Recently, more sophisticated machine learning algorithms have been developed to improve the robustness and accuracy of EHR phenotyping algorithms. These algorithms are typically trained via supervised learning, relating gold standard labels to a wide range of candidate features including billing codes, procedure codes, medication prescriptions and relevant clinical concepts extracted from narrative notes via Natural Language Processing (NLP). However, due to the time intensiveness of gold standard labeling, the size of the training set is often insufficient to build a generalizable algorithm with the large number of candidate features extracted from EHR. To reduce the number of candidate predictors and in turn improve model performance, we present an automated feature selection method based entirely on unlabeled observations. The proposed method generates a comprehensive surrogate for the underlying phenotype with an unsupervised clustering of disease status based on several highly predictive features such as diagnosis codes and mentions of the disease in text fields available in the entire set of EHR data. A sparse regression model is then built with the estimated outcomes and remaining covariates to identify those features most informative of the phenotype of interest. Relying on the results of Li and Duan (1989), we demonstrate that variable selection for the underlying phenotype model can be achieved by fitting the surrogate-based model. We explore the performance of our methods in numerical simulations and present the results of a prediction model for Rheumatoid Arthritis (RA) built on a large EHR data mart from the Partners Health System consisting of billing codes and NLP terms. Empirical results suggest that our procedure reduces the number of gold-standard labels necessary for phenotyping thereby harnessing the automated power of EHR data and improving efficiency.
Collapse
Affiliation(s)
- Jessica Gronsbell
- Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Jessica Minnier
- OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, Oregon
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China
| | | | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, Massachusetts
| |
Collapse
|
12
|
Silva SSM, Jayawardana MW, Meyer D. Statistical methods to model and evaluate physical activity programs, using step counts: A systematic review. PLoS One 2018; 13:e0206763. [PMID: 30388164 PMCID: PMC6214537 DOI: 10.1371/journal.pone.0206763] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Physical activity reduces the risk of noncommunicable diseases and is therefore an essential component of a healthy lifestyle. Regular engagement in physical activity can produce immediate and long term health benefits. However, physical activity levels are not as high as might be expected. For example, according to the global World Health Organization (WHO) 2017 statistics, more than 80% of the world’s adolescents are insufficiently physically active. In response to this problem, physical activity programs have become popular, with step counts commonly used to measure program performance. Analysing step count data and the statistical modeling of this data is therefore important for evaluating individual and program performance. This study reviews the statistical methods that are used to model and evaluate physical activity programs, using step counts. Methods Adhering to PRISMA guidelines, this review systematically searched for relevant journal articles which were published between January 2000 and August 2017 in any of three databases (PubMed, PsycINFO and Web of Science). Only the journal articles which used a statistical model in analysing step counts for a healthy sample of participants, enrolled in an intervention involving physical exercise or a physical activity program, were included in this study. In these programs the activities considered were natural elements of everyday life rather than special activity interventions. Results This systematic review was able to identify 78 unique articles describing statistical models for analysing step counts obtained through physical activity programs. General linear models and generalized linear models were the most popular methods used followed by multilevel models, while structural equation modeling was only used for measuring the personal and psychological factors related to step counts. Surprisingly no use was made of time series analysis for analysing step count data. The review also suggested several strategies for the personalisation of physical activity programs. Conclusions Overall, it appears that the physical activity levels of people involved in such programs vary across individuals depending on psychosocial, demographic, weather and climatic factors. Statistical models can provide a better understanding of the impact of these factors, allowing for the provision of more personalised physical activity programs, which are expected to produce better immediate and long-term outcomes for participants. It is hoped that this review will identify the statistical methods which are most suitable for this purpose.
Collapse
Affiliation(s)
- S. S. M. Silva
- Department of Statistics, Data Science and Epidemiology, Swinburne University of Technology, Hawthorn, Victoria, Australia
- * E-mail:
| | - Madawa W. Jayawardana
- Department of Statistics, Data Science and Epidemiology, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Denny Meyer
- Department of Statistics, Data Science and Epidemiology, Swinburne University of Technology, Hawthorn, Victoria, Australia
| |
Collapse
|
13
|
Schauber SK, Hecht M, Nouns ZM. Why assessment in medical education needs a solid foundation in modern test theory. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2018; 23:217-232. [PMID: 28303398 DOI: 10.1007/s10459-017-9771-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 03/09/2017] [Indexed: 06/06/2023]
Abstract
Despite the frequent use of state-of-the-art psychometric models in the field of medical education, there is a growing body of literature that questions their usefulness in the assessment of medical competence. Essentially, a number of authors raised doubt about the appropriateness of psychometric models as a guiding framework to secure and refine current approaches to the assessment of medical competence. In addition, an intriguing phenomenon known as case specificity is specific to the controversy on the use of psychometric models for the assessment of medical competence. Broadly speaking, case specificity is the finding of instability of performances across clinical cases, tasks, or problems. As stability of performances is, generally speaking, a central assumption in psychometric models, case specificity may limit their applicability. This has probably fueled critiques of the field of psychometrics with a substantial amount of potential empirical evidence. This article aimed to explain the fundamental ideas employed in psychometric theory, and how they might be problematic in the context of assessing medical competence. We further aimed to show why and how some critiques do not hold for the field of psychometrics as a whole, but rather only for specific psychometric approaches. Hence, we highlight approaches that, from our perspective, seem to offer promising possibilities when applied in the assessment of medical competence. In conclusion, we advocate for a more differentiated view on psychometric models and their usage.
Collapse
Affiliation(s)
- Stefan K Schauber
- Centre for Educational Measurement at the University of Oslo (CEMO) and Centre for Health Sciences Education, University of Oslo, Oslo, Norway.
| | - Martin Hecht
- Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Zineb M Nouns
- Institute of Medical Education, Faculty of Medicine, University of Bern, Konsumstrasse 13, 3010, Bern, Switzerland
| |
Collapse
|
14
|
Lang JWB, Bliese PD, de Voogt A. Modeling consensus emergence in groups using longitudinal multilevel methods. PERSONNEL PSYCHOLOGY 2018. [DOI: 10.1111/peps.12260] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Luan H, Law J, Lysy M. Diving into the consumer nutrition environment: A Bayesian spatial factor analysis of neighborhood restaurant environment. Spat Spatiotemporal Epidemiol 2018; 24:39-51. [PMID: 29413713 DOI: 10.1016/j.sste.2017.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Revised: 12/03/2017] [Accepted: 12/09/2017] [Indexed: 10/18/2022]
Abstract
Neighborhood restaurant environment (NRE) plays a vital role in shaping residents' eating behaviors. While NRE 'healthfulness' is a multi-facet concept, most studies evaluate it based only on restaurant type, thus largely ignoring variations of in-restaurant features. In the few studies that do account for such features, healthfulness scores are simply averaged over accessible restaurants, thereby concealing any uncertainty that attributed to neighborhoods' size or spatial correlation. To address these limitations, this paper presents a Bayesian Spatial Factor Analysis for assessing NRE healthfulness in the city of Kitchener, Canada. Several in-restaurant characteristics are included. By treating NRE healthfulness as a spatially correlated latent variable, the adopted modeling approach can: (i) identify specific indicators most relevant to NRE healthfulness, (ii) provide healthfulness estimates for neighborhoods without accessible restaurants, and (iii) readily quantify uncertainties in the healthfulness index. Implications of the analysis for intervention program development and community food planning are discussed.
Collapse
Affiliation(s)
- Hui Luan
- School of Geodesy and Geomatics, Wuhan University, 129 Luoyu Road, Wuchang District, Wuhan, Hubei, China; School of Human Kinetics and Recreation, Memorial University of Newfoundland, 230 Elizabeth Avenue, St. John's, NL, Canada.
| | - Jane Law
- School of Planning, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada; School of Public Health and Health Systems, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada.
| | - Martin Lysy
- Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada.
| |
Collapse
|
16
|
Wu JY, Lin JJH, Nian MW, Hsiao YC. A Solution to Modeling Multilevel Confirmatory Factor Analysis with Data Obtained from Complex Survey Sampling to Avoid Conflated Parameter Estimates. Front Psychol 2017; 8:1464. [PMID: 29018369 PMCID: PMC5614970 DOI: 10.3389/fpsyg.2017.01464] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 08/15/2017] [Indexed: 11/13/2022] Open
Abstract
The issue of equality in the between-and within-level structures in Multilevel Confirmatory Factor Analysis (MCFA) models has been influential for obtaining unbiased parameter estimates and statistical inferences. A commonly seen condition is the inequality of factor loadings under equal level-varying structures. With mathematical investigation and Monte Carlo simulation, this study compared the robustness of five statistical models including two model-based (a true and a mis-specified models), one design-based, and two maximum models (two models where the full rank of variance-covariance matrix is estimated in between level and within level, respectively) in analyzing complex survey measurement data with level-varying factor loadings. The empirical data of 120 3rd graders' (from 40 classrooms) perceived Harter competence scale were modeled using MCFA and the parameter estimates were used as true parameters to perform the Monte Carlo simulation study. Results showed maximum models was robust to unequal factor loadings while the design-based and the miss-specified model-based approaches produced conflated results and spurious statistical inferences. We recommend the use of maximum models if researchers have limited information about the pattern of factor loadings and measurement structures. Measurement models are key components of Structural Equation Modeling (SEM); therefore, the findings can be generalized to multilevel SEM and CFA models. Mplus codes are provided for maximum models and other analytical models.
Collapse
Affiliation(s)
- Jiun-Yu Wu
- Institute of Education, National Chiao Tung UniversityHsinchu, Taiwan
| | - John J. H. Lin
- Office of Institutional Research, National Central UniversityTaoyuan, Taiwan
| | - Mei-Wen Nian
- Institute of Education, National Chiao Tung UniversityHsinchu, Taiwan
| | - Yi-Cheng Hsiao
- Institute of Education, National Chiao Tung UniversityHsinchu, Taiwan
| |
Collapse
|
17
|
Su Z, Li D, Li H, Luo X. Boosting attribute recognition with latent topics by matrix factorization. J Assoc Inf Sci Technol 2017. [DOI: 10.1002/asi.23827] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Zhuo Su
- School of Data and Computer Science; Sun Yat-sen University; Guangzhou China
| | - Donghui Li
- National Engineering Research Center of Digital Life; State-Province Joint Laboratory of Digital Home Interactive Applications, School of Data and Computer Science, Sun Yat-sen University; Guangzhou China
| | - Hanhui Li
- National Engineering Research Center of Digital Life; State-Province Joint Laboratory of Digital Home Interactive Applications, School of Data and Computer Science, Sun Yat-sen University; Guangzhou China
| | - Xiaonan Luo
- School of Electronics and Information Technology; Sun Yat-sen University; Guangzhou China
| |
Collapse
|
18
|
Milla J, Martín ES, Van Bellegem S. Higher Education Value Added Using Multiple Outcomes. JOURNAL OF EDUCATIONAL MEASUREMENT 2016. [DOI: 10.1111/jedm.12114] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
19
|
Luan H, Minaker LM, Law J. Do marginalized neighbourhoods have less healthy retail food environments? An analysis using Bayesian spatial latent factor and hurdle models. Int J Health Geogr 2016; 15:29. [PMID: 27550019 PMCID: PMC4994297 DOI: 10.1186/s12942-016-0060-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 08/09/2016] [Indexed: 01/01/2023] Open
Abstract
Background Findings of whether marginalized neighbourhoods have less healthy retail food environments (RFE) are mixed across countries, in part because inconsistent approaches have been used to characterize RFE ‘healthfulness’ and marginalization, and researchers have used non-spatial statistical methods to respond to this ultimately spatial issue. Methods This study uses in-store features to categorize healthy and less healthy food outlets. Bayesian spatial hierarchical models are applied to explore the association between marginalization dimensions and RFE healthfulness (i.e., relative healthy food access that modelled via a probability distribution) at various geographical scales. Marginalization dimensions are derived from a spatial latent factor model. Zero-inflation occurring at the walkable-distance scale is accounted for with a spatial hurdle model. Results Neighbourhoods with higher residential instability, material deprivation, and population density are more likely to have access to healthy food outlets within a walkable distance from a binary ‘have’ or ‘not have’ access perspective. At the walkable distance scale however, materially deprived neighbourhoods are found to have less healthy RFE (lower relative healthy food access). Conclusion Food intervention programs should be developed for striking the balance between healthy and less healthy food access in the study region as well as improving opportunities for residents to buy and consume foods consistent with dietary recommendations.
Collapse
Affiliation(s)
- Hui Luan
- Faculty of Environment, School of Planning, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada.
| | - Leia M Minaker
- Propel Centre for Population Health Impact, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada
| | - Jane Law
- Faculty of Environment, School of Planning, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada.,Faculty of Applied Health Sciences, School of Public Health and Health System, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada
| |
Collapse
|
20
|
He B, Luo S. Joint modeling of multivariate longitudinal measurements and survival data with applications to Parkinson's disease. Stat Methods Med Res 2016; 25:1346-58. [PMID: 23592717 PMCID: PMC3883896 DOI: 10.1177/0962280213480877] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In many clinical trials, studying neurodegenerative diseases including Parkinson's disease (PD), multiple longitudinal outcomes are collected in order to fully explore the multidimensional impairment caused by these diseases. The follow-up of some patients can be stopped by some outcome-dependent terminal event, e.g. death and dropout. In this article, we develop a joint model that consists of a multilevel item response theory (MLIRT) model for the multiple longitudinal outcomes, and a Cox's proportional hazard model with piecewise constant baseline hazards for the event time data. Shared random effects are used to link together two models. The model inference is conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in BUGS language. Our proposed model is evaluated by simulation studies and is applied to the DATATOP study, a motivating clinical trial assessing the effect of tocopherol on PD among patients with early PD.
Collapse
Affiliation(s)
- Bo He
- Division of Biostatistics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sheng Luo
- Division of Biostatistics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
21
|
Item response theory and structural equation modelling for ordinal data: Describing the relationship between KIDSCREEN and Life-H. Stat Methods Med Res 2016; 25:1892-1924. [DOI: 10.1177/0962280213504177] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Both item response theory and structural equation models are useful in the analysis of ordered categorical responses from health assessment questionnaires. We highlight the advantages and disadvantages of the item response theory and structural equation modelling approaches to modelling ordinal data, from within a community health setting. Using data from the SPARCLE project focussing on children with cerebral palsy, this paper investigates the relationship between two ordinal rating scales, the KIDSCREEN, which measures quality-of-life, and Life-H, which measures participation. Practical issues relating to fitting models, such as non-positive definite observed or fitted correlation matrices, and approaches to assessing model fit are discussed. item response theory models allow properties such as the conditional independence of particular domains of a measurement instrument to be assessed. When, as with the SPARCLE data, the latent traits are multidimensional, structural equation models generally provide a much more convenient modelling framework.
Collapse
|
22
|
Kruijver M. Characterizing the genetic structure of a forensic DNA database using a latent variable approach. Forensic Sci Int Genet 2016; 23:130-149. [DOI: 10.1016/j.fsigen.2016.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 02/24/2016] [Accepted: 03/21/2016] [Indexed: 12/11/2022]
|
23
|
Singer M, Krivobokova T, Munk A, de Groot B. Partial least squares for dependent data. Biometrika 2016; 103:351-362. [DOI: 10.1093/biomet/asw010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
24
|
Fan J, Liu H, Ning Y, Zou H. High dimensional semiparametric latent graphical model for mixed data. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12168] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
| | | | | | - Hui Zou
- University of Minnesota Minneapolis USA
| |
Collapse
|
25
|
Rahbar MH, Ning J, Choi S, Piao J, Hong C, Huang H, Del Junco DJ, Fox EE, Rahbar E, Holcomb JB. A joint latent class model for classifying severely hemorrhaging trauma patients. BMC Res Notes 2015; 8:602. [PMID: 26498438 PMCID: PMC4620016 DOI: 10.1186/s13104-015-1563-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 10/05/2015] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND In trauma research, "massive transfusion" (MT), historically defined as receiving ≥10 units of red blood cells (RBCs) within 24 h of admission, has been routinely used as a "gold standard" for quantifying bleeding severity. Due to early in-hospital mortality, however, MT is subject to survivor bias and thus a poorly defined criterion to classify bleeding trauma patients. METHODS Using the data from a retrospective trauma transfusion study, we applied a latent-class (LC) mixture model to identify severely hemorrhaging (SH) patients. Based on the joint distribution of cumulative units of RBCs and binary survival outcome at 24 h of admission, we applied an expectation-maximization (EM) algorithm to obtain model parameters. Estimated posterior probabilities were used for patients' classification and compared with the MT rule. To evaluate predictive performance of the LC-based classification, we examined the role of six clinical variables as predictors using two separate logistic regression models. RESULTS Out of 471 trauma patients, 211 (45 %) were MT, while our latent SH classifier identified only 127 (27 %) of patients as SH. The agreement between the two classification methods was 73 %. A non-ignorable portion of patients (17 out of 68, 25 %) who died within 24 h were not classified as MT but the SH group included 62 patients (91 %) who died during the same period. Our comparison of the predictive models based on MT and SH revealed significant differences between the coefficients of potential predictors of patients who may be in need of activation of the massive transfusion protocol. CONCLUSIONS The traditional MT classification does not adequately reflect transfusion practices and outcomes during the trauma reception and initial resuscitation phase. Although we have demonstrated that joint latent class modeling could be used to correct for potential bias caused by misclassification of severely bleeding patients, improvement in this approach could be made in the presence of time to event data from prospective studies.
Collapse
Affiliation(s)
- Mohammad H Rahbar
- Division of Clinical and Translational Sciences, Department of Internal Medicine, The University of Texas Medical School at Houston, The University of Texas Health Science Center at Houston, Fannin St, Houston, TX, USA. .,Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Sciences Center at Houston, Pressler St, Houston, TX, USA.
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Holcombe Blvd, Houston, TX, USA.
| | - Sangbum Choi
- Division of Clinical and Translational Sciences, Department of Internal Medicine, The University of Texas Medical School at Houston, The University of Texas Health Science Center at Houston, Fannin St, Houston, TX, USA.
| | - Jin Piao
- Division of Biostatistics, School of Public Health, The University of Texas Health Sciences Center at Houston, Pressler St, Houston, TX, USA.
| | - Chuan Hong
- Division of Biostatistics, School of Public Health, The University of Texas Health Sciences Center at Houston, Pressler St, Houston, TX, USA.
| | - Hanwen Huang
- Epidemiology and Biostatistics, College of Public Health, University of Georgia, Buck Road, Athens, GA, 30602, USA.
| | - Deborah J Del Junco
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research, The University of Texas Health Science Center at Houston, Fannin St, Houston, TX, USA.
| | - Erin E Fox
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research, The University of Texas Health Science Center at Houston, Fannin St, Houston, TX, USA.
| | - Elaheh Rahbar
- Department of Biomedical Engineering, Wake Forest University, Winston-Salem, NC, USA.
| | - John B Holcomb
- Division of Acute Care Surgery, Department of Surgery, Center for Translational Injury Research, The University of Texas Health Science Center at Houston, Fannin St, Houston, TX, USA.
| |
Collapse
|
26
|
de Vos S, Wardenaar KJ, Bos EH, Wit EC, de Jonge P. Decomposing the heterogeneity of depression at the person-, symptom-, and time-level: latent variable models versus multimode principal component analysis. BMC Med Res Methodol 2015; 15:88. [PMID: 26471992 PMCID: PMC4608190 DOI: 10.1186/s12874-015-0080-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 10/05/2015] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Heterogeneity of psychopathological concepts such as depression hampers progress in research and clinical practice. Latent Variable Models (LVMs) have been widely used to reduce this problem by identification of more homogeneous factors or subgroups. However, heterogeneity exists at multiple levels (persons, symptoms, time) and LVMs cannot capture all these levels and their interactions simultaneously, which leads to incomplete models. Our objective is to briefly review the most widely used LVMs in depression research, illustrating their use and incompatibility in real data, and to consider an alternative, statistical approach, namely multimode principal component analysis (MPCA). METHODS We applied LVMs to data from 147 patients, who filled out the Quick Inventory of Depressive Symptomatology (QIDS) at 9 time points. Compatibility of the results and suitability of the LVMs to capture the heterogeneity of the data were evaluated. Alternatively, MPCA was used to simultaneously decompose depression on the person-, symptom- and time-level and to investigate the interactions between these levels. RESULTS QIDS-data could be decomposed on the person-level (2 classes), symptom-level (2 factors) and time-level (2 trajectory-classes). However, these results could not be integrated into a single model. Instead, MPCA allowed for decomposition of the data at the person- (3 components), symptom- (2 components) and time-level (2 components) and for the investigation of these components' interactions. CONCLUSIONS Traditional LVMs have limited use when trying to define an integrated model of depression heterogeneity at the person, symptom and time level. More integrative statistical techniques such as MPCA can be used to address these relatively complex data patterns and could be used in future attempts to identify empirically-based subtypes/phenotypes of depression.
Collapse
Affiliation(s)
- Stijn de Vos
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), (internal mail CC-72), P.O. Box 30.001, 9700 RB, Groningen, The Netherlands.
| | - Klaas J Wardenaar
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), (internal mail CC-72), P.O. Box 30.001, 9700 RB, Groningen, The Netherlands.
| | - Elisabeth H Bos
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), (internal mail CC-72), P.O. Box 30.001, 9700 RB, Groningen, The Netherlands.
| | - Ernst C Wit
- University of Groningen, Johann Bernoulli Institute of Mathematics and Computer Science, Groningen, The Netherlands.
| | - Peter de Jonge
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), (internal mail CC-72), P.O. Box 30.001, 9700 RB, Groningen, The Netherlands.
| |
Collapse
|
27
|
Affiliation(s)
- Youngjo Lee
- Department of Statistics; Seoul National University; Seoul Korea
| | - Gwangsu Kim
- Department of Statistics; Korea University; Seoul Korea
| |
Collapse
|
28
|
Choi S, Huang X, Cormier JN. Efficient semiparametric mixture inferences on cure rate models for competing risks. CAN J STAT 2015. [DOI: 10.1002/cjs.11256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Sangbum Choi
- Division of Clinical and Translational Sciences, Department of Internal Medicine; The University of Texas Health Science Center at Houston; Houston, TX U.S.A
| | - Xuelin Huang
- Department of Biostatistics; The University of Texas MD Anderson Cancer Center; Houston, TX U.S.A
| | - Janice N. Cormier
- Department of Surgical Oncology; The University of Texas MD Anderson Cancer Center; Houston, TX U.S.A
| |
Collapse
|
29
|
|
30
|
Kim G. Likelihood-Based Inference of Random Effects and Application in Logistic Regression. KOREAN JOURNAL OF APPLIED STATISTICS 2015. [DOI: 10.5351/kjas.2015.28.2.269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
31
|
Gordon RA. Measuring Constructs in Family Science: How Can Item Response Theory Improve Precision and Validity? JOURNAL OF MARRIAGE AND THE FAMILY 2015; 77:147-176. [PMID: 25663714 PMCID: PMC4313622 DOI: 10.1111/jomf.12157] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 10/03/2014] [Indexed: 06/04/2023]
Abstract
This article provides family scientists with an understanding of contemporary measurement perspectives and the ways in which item response theory (IRT) can be used to develop measures with desired evidence of precision and validity for research uses. The article offers a nontechnical introduction to some key features of IRT, including its orientation toward locating items along an underlying dimension and toward estimating precision of measurement for persons with different levels of that same construct. It also offers a didactic example of how the approach can be used to refine conceptualization and operationalization of constructs in the family sciences, using data from the National Longitudinal Survey of Youth 1979 (n = 2,732). Three basic models are considered: (a) the Rasch and (b) two-parameter logistic models for dichotomous items and (c) the Rating Scale Model for multicategory items. Throughout, the author highlights the potential for researchers to elevate measurement to a level on par with theorizing and testing about relationships among constructs.
Collapse
Affiliation(s)
- Rachel A Gordon
- Department of Sociology and Institute of Government and Public Affairs, University of Illinois at Chicago, 815 West Van Buren St., Suite 525, Chicago, IL 60607
| |
Collapse
|
32
|
Exploring somatization types among patients in Indonesia: latent class analysis using the Adult Symptom Inventory. CURRENT ISSUES IN PERSONALITY PSYCHOLOGY 2014. [DOI: 10.5114/cipp.2014.47810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
<b>Background</b><br />
The aim of this study was to explore somatization types by reducing patient complaints to their most basic and parsimonious characteristics. We hypothesized that there were latent groups representing distinct types of somatization.<br />
<br />
<b>Participants and procedure</b><br />
Data were collected from patients undergoing both inpatient and outpatient treatment at two hospitals in Yogyakarta, Indonesia (N = 212).<br />
<br />
<b>Results</b><br />
Results from latent class analysis revealed four classes of somatization: two classes (Classes 1 and 2) referring to levels of somatization and two classes (Classes 3 and 4) referring to unique types of somatization. The first two classes (Classes 1 and 2; low and high levels of somatization, respectively) corresponded to the number of different symptoms that patients reported out of the list of physical symptoms in the Adult Symptom Inventory. The second two classes (Classes 3 and 4; non-serious and critical complaints, respectively) corresponded to two different sets of symptoms. Patients in Class 3 tended to report temporary mild complaints that are common in daily life, such as dizziness, nausea, and stomach pain. Patients in Class 4 tended to report severe complaints and medical problems that require serious treatment or medication, such as deafness or blindness.<br />
<br />
<b>Conclusions</b><br />
The present study do confirm somatization as a unidimensional experience reflecting a general tendency to report somatic symptoms, but rather support the understanding of somatization as a multidimensional construct.
Collapse
|
33
|
Zhou L, Lin H, Song X, Li YI. Selection of latent variables for multiple mixed-outcome models. Scand Stat Theory Appl 2014; 41:1064-1082. [PMID: 27642219 PMCID: PMC5026194 DOI: 10.1111/sjos.12084] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 02/02/2014] [Indexed: 11/27/2022]
Abstract
Latent variable models have been widely used for modeling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown a priori, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples' values and beliefs and the social and personal characteristics that might influence them.
Collapse
Affiliation(s)
- Ling Zhou
- Center of Statistical Research, School of Statistics, Southwestern University of Finance and Economics
| | - Huazhen Lin
- Center of Statistical Research, School of Statistics, Southwestern University of Finance and Economics
| | - Xinyuan Song
- Department of Statistics, The Chinese University of Hong Kong
| | - Y I Li
- Department of Biostatistics University of Michigan
| |
Collapse
|
34
|
Arima S. Item selection via Bayesian IRT models. Stat Med 2014; 34:487-503. [PMID: 25327293 DOI: 10.1002/sim.6341] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Revised: 09/23/2014] [Accepted: 10/06/2014] [Indexed: 11/11/2022]
Abstract
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan.
Collapse
Affiliation(s)
- Serena Arima
- Dipartimento di Metodi e Modelli per l'Economia, il Territorio e la Finanza, Sapienza Università di Roma, Rome, Italy
| |
Collapse
|
35
|
Affiliation(s)
- Mogens Fenger
- Clinical Biochemistry, Molecular Biology, and Genetics, KBA339 Hvidovre, Denmark
| |
Collapse
|
36
|
Reichenheim ME, Hökerberg YHM, Moraes CL. Assessing construct structural validity of epidemiological measurement tools: a seven-step roadmap. CAD SAUDE PUBLICA 2014; 30:927-39. [DOI: 10.1590/0102-311x00143613] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 12/17/2013] [Indexed: 11/21/2022] Open
Abstract
Guidelines have been proposed for assessing the quality of clinical trials, observational studies and validation studies of diagnostic tests. More recently, the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) initiative extended those in regards to epidemiological measurement tools in general. Among various facets proposed for assessment is the validity of an instrument’s dimensional structure (or structural validity). The purpose of this article is to extend these guidelines. A seven-step roadmap is proposed to examine (1) the hypothesized dimensional structure; (2) strength of component indicators regarding loading patterns and measurement errors; (3) measurement error correlations; (4) factor-based convergent and discriminant validity of scales; (5) item discrimination and intensity vis-à-vis the latent trait spectrum; and (6) the properties of raw scores; and (7) factorial invariance. The paper also holds that the suggested steps still require debate and are open to refinements.
Collapse
Affiliation(s)
| | | | - Claudia Leite Moraes
- Universidade do Estado do Rio de Janeiro, Brasil; Universidade Estácio de Sá, Brasil
| |
Collapse
|
37
|
Luo S. A Bayesian approach to joint analysis of multivariate longitudinal data and parametric accelerated failure time. Stat Med 2014; 33:580-94. [PMID: 24009073 PMCID: PMC3947121 DOI: 10.1002/sim.5956] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Revised: 06/24/2013] [Accepted: 07/30/2013] [Indexed: 11/10/2022]
Abstract
Impairment caused by Parkinson's disease (PD) is multidimensional (e.g., sensoria, functions, and cognition) and progressive. Its multidimensional nature precludes a single outcome to measure disease progression. Clinical trials of PD use multiple categorical and continuous longitudinal outcomes to assess the treatment effects on overall improvement. A terminal event such as death or dropout can stop the follow-up process. Moreover, the time to the terminal event may be dependent on the multivariate longitudinal measurements. In this article, we consider a joint random-effects model for the correlated outcomes. A multilevel item response theory model is used for the multivariate longitudinal outcomes and a parametric accelerated failure time model is used for the failure time because of the violation of proportional hazard assumption. These two models are linked via random effects. The Bayesian inference via MCMC is implemented in 'BUGS' language. Our proposed method is evaluated by a simulation study and is applied to DATATOP study, a motivating clinical trial to determine if deprenyl slows the progression of PD.
Collapse
Affiliation(s)
- Sheng Luo
- Division of Biostatistics, University of Texas School of Public Health, 1200 Pressler St., Houston, TX 77030, U.S.A
| |
Collapse
|
38
|
Agresti A, Kateri M. Some Remarks on Latent Variable Models in Categorical Data Analysis. COMMUN STAT-THEOR M 2014. [DOI: 10.1080/03610926.2013.814783] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
39
|
Usami S. PERFORMANCE OF INFORMATION CRITERIA FOR MODEL SELECTION IN A LATENT GROWTH CURVE MIXTURE MODEL. JOURNAL JAPANESE SOCIETY OF COMPUTATIONAL STATISTICS 2014. [DOI: 10.5183/jjscs.1309001_207] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Satoshi Usami
- Division of Psychology, Faculty of Human Sciences, University of Tsukuba
| |
Collapse
|
40
|
Proust-Lima C, Amieva H, Jacqmin-Gadda H. Analysis of multivariate mixed longitudinal data: a flexible latent process approach. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2013; 66:470-487. [PMID: 23082854 DOI: 10.1111/bmsp.12000] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Multivariate ordinal and quantitative longitudinal data measuring the same latent construct are frequently collected in psychology. We propose an approach to describe change over time of the latent process underlying multiple longitudinal outcomes of different types (binary, ordinal, quantitative). By relying on random-effect models, this approach handles individually varying and outcome-specific measurement times. A linear mixed model describes the latent process trajectory while equations of observation combine outcome-specific threshold models for binary or ordinal outcomes and models based on flexible parameterized non-linear families of transformations for Gaussian and non-Gaussian quantitative outcomes. As models assuming continuous distributions may be also used with discrete outcomes, we propose likelihood and information criteria for discrete data to compare the goodness of fit of models assuming either a continuous or a discrete distribution for discrete data. Two analyses of the repeated measures of the Mini-Mental State Examination, a 20-item psychometric test, illustrate the method. First, we highlight the usefulness of parameterized non-linear transformations by comparing different flexible families of transformation for modelling the test as a sum score. Then, change over time of the latent construct underlying directly the 20 items is described using two-parameter longitudinal item response models that are specific cases of the approach.
Collapse
Affiliation(s)
- Cécile Proust-Lima
- INSERM, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France; Université Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France
| | | | | |
Collapse
|
41
|
Huggins R. Using speeding detections and numbers of fatalities to estimate relative risk of a fatality for motorcyclists and car drivers. ACCIDENT; ANALYSIS AND PREVENTION 2013; 59:296-300. [PMID: 23845409 DOI: 10.1016/j.aap.2013.06.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Revised: 06/11/2013] [Accepted: 06/14/2013] [Indexed: 06/02/2023]
Abstract
Precise estimation of the relative risk of motorcyclists being involved in a fatal accident compared to car drivers is difficult. Simple estimates based on the proportions of licenced drivers or riders that are killed in a fatal accident are biased as they do not take into account the exposure to risk. However, exposure is difficult to quantify. Here we adapt the ideas behind the well known induced exposure methods and use available summary data on speeding detections and fatalities for motorcycle riders and car drivers to estimate the relative risk of a fatality for motorcyclists compared to car drivers under mild assumptions. The method is applied to data on motorcycle riders and car drivers in Victoria, Australia in 2010 and a small simulation study is conducted.
Collapse
Affiliation(s)
- Richard Huggins
- Department of Mathematics and Statistics, The University of Melbourne, Victoria 3010, Australia.
| |
Collapse
|
42
|
Affiliation(s)
- Ben Van Calster
- Department of Development and Regeneration; KU Leuven; B-3000; Leuven; Belgium
| |
Collapse
|
43
|
An X, Bentler PM. Efficient direct sampling MCEM algorithm for latent variable models with binary responses. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.06.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
44
|
USAMI SATOSHI. Generalized graded unfolding model with structural equation for subject parameters. JAPANESE PSYCHOLOGICAL RESEARCH 2011. [DOI: 10.1111/j.1468-5884.2011.00476.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
45
|
|