1
|
McNeish D, Somers JA, Savord A. Dynamic structural equation models with binary and ordinal outcomes in Mplus. Behav Res Methods 2024; 56:1506-1532. [PMID: 37118647 PMCID: PMC10611901 DOI: 10.3758/s13428-023-02107-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/03/2023] [Indexed: 04/30/2023]
Abstract
Intensive longitudinal designs are increasingly popular, as are dynamic structural equation models (DSEM) to accommodate unique features of these designs. Many helpful resources on DSEM exist, though they focus on continuous outcomes while categorical outcomes are omitted, briefly mentioned, or considered as a straightforward extension. This viewpoint regarding categorical outcomes is not unwarranted for technical audiences, but there are non-trivial nuances in model building and interpretation with categorical outcomes that are not necessarily straightforward for empirical researchers. Furthermore, categorical outcomes are common given that binary behavioral indicators or Likert responses are frequently solicited as low-burden variables to discourage participant non-response. This tutorial paper is therefore dedicated to providing an accessible treatment of DSEM in Mplus exclusively for categorical outcomes. We cover the general probit model whereby the raw categorical responses are assumed to come from an underlying normal process. We cover probit DSEM and expound why existing treatments have considered categorical outcomes as a straightforward extension of the continuous case. Data from a motivating ecological momentary assessment study with a binary outcome are used to demonstrate an unconditional model, a model with disaggregated covariates, and a model for data with a time trend. We provide annotated Mplus code for these models and discuss interpretation of the results. We then discuss model specification and interpretation in the case of an ordinal outcome and provide an example to highlight differences between ordinal and binary outcomes. We conclude with a discussion of caveats and extensions.
Collapse
Affiliation(s)
- Daniel McNeish
- Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA.
| | | | - Andrea Savord
- Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA
| |
Collapse
|
2
|
McNeish D, Dumas D, Dong Y, Duellberg D. Promoting inclusive recruiting and selection into military training schools: Admission waivers versus retesting. J Appl Psychol 2024; 109:415-436. [PMID: 37856410 DOI: 10.1037/apl0001147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023] Open
Abstract
There is high-level interest in diversifying workforces, which has led organizations-including the U.S. Armed Forces-to reevaluate recruiting and selection practices. The U.S. Coast Guard (USCG) has encountered particular difficulties in diversifying its workforce, and it relies mainly on the Armed Services Vocational Aptitude Battery (ASVAB) for assigning active-duty recruits to one of 19 specialized training schools. When recruits' scores fall below ASVAB entrance standards, the USCG sometimes offers admission waivers. Alternatively, recruits can retest until their ASVAB scores meet the entrance standard. Retesting has shown mixed results in the personnel selection literature, so our main interest is to determine whether retesting or waivers best support USCG recruits' training school outcomes, especially for recruits identifying as an underrepresented minority (URM). We use data from 16,624 USCG recruits entering between 2013 and 2021 and fit augmented inverse propensity weighted models to assess differences in training outcomes by pathway to admission while accounting for self-selection into pathways. Our analyses found (a) no difference in training outcomes between recruits who qualified from their initial scores and recruits who retested, (b) recruits who received waivers were less likely to complete training school on time and spent more time in remedial training when they failed training school compared to those who retested, and (c) improvement in training outcomes for retesting over waivers was larger for recruits identifying as an URM. Results suggest that retesting may be an effective strategy for workforce diversification and for improving outcomes among recruits identifying as an URM. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Denis Dumas
- Department of Educational Psychology, University of Georgia
| | - Yixiao Dong
- Department of Research Methods and Information Science, University of Denver
| | | |
Collapse
|
3
|
McNeish D. Dynamic fit index cutoffs for categorical factor analysis with Likert-type, ordinal, or binary responses. Am Psychol 2023; 78:1061-1075. [PMID: 38166269 DOI: 10.1037/amp0001213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Scale validation is vital to psychological research because it ensures that scores from measurement scales represent the intended construct. Fit indices are commonly used to provide quantitative evidence that a proposed factor structure is plausible. However, there is a mismatch between guidelines for evaluating fit of the factor models and the data that most researchers have. Namely, fit guidelines are based on the simulations that assume item responses are collected on a continuous scale whereas most researchers collect discrete responses such as with a Likert-type scale. In this article, we show that common guidelines derived from assuming continuous responses (e.g., root-mean-square error of approximation < 0.06, comparative fit index > 0.95) do not generalize to factor models applied to discrete responses. Specifically, discrete responses provide less information than continuous responses, so less information about misfit is passed to fit indices. Traditional guidelines, therefore, end up being too lenient and lose their ability to identify that a model may have a poor fit. We provide one possible solution by extending the recently developed dynamic fit index framework to accommodate discrete responses common in psychology. We conduct a simulation study to provide evidence that the proposed method consistently distinguishes between well-fitting and poorly fitting models. Results showed that our proposed cutoffs maintained at least 90% sensitivity to misspecification across studied conditions, whereas traditional cutoffs were highly inconsistent and frequently exhibited sensitivity below 50%. The proposed method is included in the dynamic R package and as a web-based Shiny application to make it easily accessible to psychologists. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Collapse
|
4
|
McNeish D. Psychometric properties of sum scores and factor scores differ even when their correlation is 0.98: A response to Widaman and Revelle. Behav Res Methods 2023; 55:4269-4290. [PMID: 36394821 DOI: 10.3758/s13428-022-02016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Commentary in Widaman and Revelle (2022) argued that sum scoring is justified as long as unidimensionality holds because sum score reliability is defined. My response begins with a review of the literature supporting the perspective we adopted in the original article. I then conduct simulation studies to assess the psychometric properties of sum scores created using Widaman and Revelle's justification relative to scores created by the weighted factor score approach in the original article. In my simulations, I generate data where sum and factor scores are correlated at 0.96 or 0.98 because high factor-sum score correlations are often used to support the contention that sum and factor scores have interchangeable psychometric properties. I explore (a) correlations between estimated scores and true scores, (b) classification accuracy of sum and factor scores, and (c) reliability of sum and factor scores. Results show that factor scores have (a) higher correlations with true scores (Δ = 0.02-0.04), (b) higher sensitivity (Δ = 4-8 percentage points), and (c) higher reliability (Δ = 0.04-0.07). Factor score performance metrics also have less sampling variability in most conditions. Psychometric properties of sum scores-even when highly correlated with factor scores-remain less desirable than those of factor scores. Additional considerations like models with multiple factors and measurement invariance are also discussed. Essentially, even if accepting Widaman and Revelle's justification for sum scoring, it is uncertain whether researchers generally would want to sum score after fitting a factor analysis unless sum and factor scores correlate at (and not merely close to) 1.00.
Collapse
Affiliation(s)
- Daniel McNeish
- Department of Psychology, Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA.
| |
Collapse
|
5
|
McNeish D. A practical guide to selecting and blending approaches for clustered data: Clustered errors, multilevel models, and fixed-effect models. Psychol Methods 2023:2024-24091-001. [PMID: 37956085 DOI: 10.1037/met0000620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Psychological data are often clustered within organizational units, which violates the independence assumption in standard regression models. Clustered errors, multilevel models, and fixed-effects models all address this issue, but in different ways. Disciplinary preferences for approaching clustered data are strong, which can restrict questions researchers ask because certain approaches are better equipped to handle particular types of questions. Resources comparing approaches to facilitate broader understanding of clustered data approaches exist for economists, political scientists, and biostatisticians. These existing resources use concepts and terminology consistent with statistical training in other disciplines, so this article provides a resource using language and principles familiar to psychologists. The article starts by walking through the origin and importance of the independence assumption to motivate the problem and emergence of different solutions in different fields. Then, information on clustered errors, multilevel models, and fixed-effect models is provided, including (a) how each approach addresses independence violations, (b) research questions ideally suited for each approach, and (c) example analyses highlighting advantages and disadvantages. The article then discusses how these approaches are not mutually exclusive but instead can be blended together to create tailor-made models that flexibly accommodate idiosyncrasies in research questions and are robust to nuances of a particular data set. The broader theme is that there is no one-size-fits-all approach to clustered data. The research question-not disciplinary preferences-should inform the statistical approach. Wider appreciation of the landscape of clustered data approaches can expand the questions researchers ask and improve the theoretical foundation of statistical models. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
|
6
|
Pandika D, Guttmannova K, Skinner ML, Sanchez-Rodriguez M, McNeish D, Morales LS, Oesterle S. Tobacco Use Patterns From Adolescence to Young Adulthood Among Latinx Youth From Rural Communities. J Adolesc Health 2023; 73:761-768. [PMID: 37395693 PMCID: PMC10524685 DOI: 10.1016/j.jadohealth.2023.05.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 04/05/2023] [Accepted: 05/11/2023] [Indexed: 07/04/2023]
Abstract
PURPOSE To examine patterns in adolescent and young adult tobacco use, comparing Latinx foreign-born children and children of foreign-born parents (i.e., children of immigrants(COI)) to Latinx US-born children of US-born parents (i.e., children of nonimmigrants,(CONI)) and to CONI White youth who grew up in small and rural towns. METHODS Data were from youth who lived in control communities that participated in a community-randomized trial of the Communities That Care prevention system. We compared Latinx CONI (n = 154) with Latinx COI (n = 316) and with non-Latinx White CONI (n = 918). We examined tobacco use in adolescence (any adolescent use, early onset, and chronic use) and young adulthood (any past-year tobacco use, any daily smoking, any nicotine dependence symptoms) with mixed-effects logistic regressions. RESULTS In adolescence, Latinx CONI had a higher prevalence of any and chronic tobacco use relative to Latinx COI, and of any and early onset tobacco use relative to non-Latinx White CONI. In young adulthood, Latinx CONI were more likely to report tobacco use in the past year, any symptoms of nicotine dependence, and daily smoking relative to Latinx COI; and more likely to report daily smoking relative to non-Latinx White CONI. Generation differences in young adult tobacco use were explained by chronic tobacco use in adolescence. DISCUSSION The study suggests targeting chronic tobacco use in adolescence to prevent disparities in tobacco outcomes among Latinx young adults from rural communities.
Collapse
Affiliation(s)
- Danielle Pandika
- Social Development Research Group, School of Social Work, University of Washington, Seattle, Washington.
| | - Katarina Guttmannova
- Center for the Study of Health and Risk Behavior, Department of Psychiatry and Behavioral Sciences, School of Medicine, University of Washington, Seattle, Washington
| | - Martie L Skinner
- Social Development Research Group, School of Social Work, University of Washington, Seattle, Washington
| | - Mariel Sanchez-Rodriguez
- Social Development Research Group, School of Social Work, University of Washington, Seattle, Washington
| | - Daniel McNeish
- Department of Psychology, Arizona State University, Tempe, Arizona
| | - Leo S Morales
- Departments of Medicine and Health Services, School of Medicine, University of Washington, Seattle, Washington
| | - Sabrina Oesterle
- Southwest Interdisciplinary Research Center, School of Social Work, Arizona State University, Phoenix, Arizona
| |
Collapse
|
7
|
Abstract
Growth mixture models (GMMs) are a popular method to identify latent classes of growth trajectories. One shortcoming of GMMs is nonconvergence, which often leads researchers to apply covariance equality constraints to simplify estimation, though this may be a dubious assumption. Alternative model specifications have been proposed to reduce nonconvergence without imposing covariance equality constraints. These methods perform well when the correct number of classes is known, but research has not yet examined their use when the number of classes is unknown. Given the importance of selecting the number of classes, more information about class enumeration performance is crucial to assess the potential utility of these methods. We conducted an extensive simulation to explore class enumeration and classification accuracy of model specifications that are more robust to nonconvergence. Results show that the typical approach of applying covariance equality constraints performs quite poorly. Instead, we recommended covariance pattern GMMs because they (a) had the highest convergence rates, (b) were most likely to identify the correct number of classes, and (c) had the highest classification accuracy in many conditions, even with modest sample sizes. An analysis of empirical posttraumatic stress disorder (PTSD) data is provided to show that the typical four-class solution found in many empirical PTSD studies may be an artifact of the covariance equality constraint method that has permeated this literature. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Jeffrey R Harring
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park
| | - Daniel J Bauer
- Department of Psychology and Neuroscience, University of North Carolina, Chapel Hill
| |
Collapse
|
8
|
English D, Smith JC, Scott-Walker L, Lopez FG, Morris M, Reid M, Lashay C, Bridges D, McNeish D. Feasibility, Acceptability, and Preliminary HIV Care and Psychological Health Effects of iTHRIVE 365 for Black Same Gender Loving Men. J Acquir Immune Defic Syndr 2023; 93:55-63. [PMID: 36706362 PMCID: PMC10840385 DOI: 10.1097/qai.0000000000003167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/29/2022] [Indexed: 01/28/2023]
Abstract
OBJECTIVES This uncontrolled pilot study examined the feasibility, acceptability, and preliminary HIV and psychological health effects of iTHRIVE 365, a multicomponent intervention designed by and for Black same gender loving men (SGLM) to promote: health knowledge and motivation, Black SGLM social support, affirming health care, and housing and other economic resources. DESIGN METHODS We conducted a 14-day daily diary study with 32 Black SGLM living with HIV connected to THRIVE SS in Atlanta, GA. Daily surveys assessed intervention engagement, antiretroviral medication (ART) use, depressive symptoms, anxiety symptoms, and emotion regulation difficulties. App paradata (ie, process data detailing app usage) assessed amount of intervention engagement via page access. Participants began receiving access to the intervention on day 7. After the 14-day daily diary period, participants responded to follow-up items on the user-friendliness, usefulness, helpfulness, and whether they would recommend iTHRIVE 365 to others. Chi-square analyses examined associations between intervention engagement and ART use, and dynamic structural equation modelling assessed longitudinal associations from intervention engagement to next-day psychological health. This intervention trial is registered on ClinicalTrials.gov (NCT05376397). RESULTS On average, participants engaged with iTHRIVE 365 over once every other day and accessed intervention pages 4.65 times per day. Among participants who engaged with the intervention, 78% reported it was helpful to extremely helpful, 83% reported it was moderately to extremely useful, and 88% reported it was user-friendly and they would recommend it to others. On intervention engagement days, participants had higher odds of ART use, χ 2 (1) = 4.09, P = 0.04, than intervention nonengagement days. On days after intervention engagement, participants showed non-null decreases in depressive symptoms (τ = -0.14; 95% CI : = [-0.23, -0.05]) and emotion regulation difficulties (τ = -0.16; 95% CI : = [-0.24, -0.02]). CONCLUSIONS Findings suggest iTHRIVE 365 is feasible, acceptable, and positively affects daily ART use, depressive symptoms, and emotion regulation difficulties.
Collapse
Affiliation(s)
- Devin English
- Department of Urban-Global Public Health, Rutgers School of Public Health, Newark, NJ
| | | | | | | | | | - Malcolm Reid
- THRIVE Social Services (THRIVE SS), Inc., Atlanta, GA
| | | | - Dwain Bridges
- THRIVE Social Services (THRIVE SS), Inc., Atlanta, GA
| | - Daniel McNeish
- Department of Psychology, Arizona State University, Tempe, AZ
| |
Collapse
|
9
|
McNeish D, Bauer DJ, Dumas D, Clements DH, Cohen JR, Lin W, Sarama J, Sheridan MA. Modeling individual differences in the timing of change onset and offset. Psychol Methods 2023; 28:401-421. [PMID: 34570554 PMCID: PMC8957627 DOI: 10.1037/met0000407] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Individual differences in the timing of developmental processes are often of interest in longitudinal studies, yet common statistical approaches to modeling change cannot directly estimate the timing of when change occurs. The time-to-criterion framework was recently developed to incorporate the timing of a prespecified criterion value; however, this framework has difficulty accommodating contexts where the criterion value differs across people or when the criterion value is not known a priori, such as when the interest is in individual differences in when change starts or stops. This article combines aspects of reparameterized quadratic models and multiphase models to provide information on the timing of change. We first consider the more common situation of modeling decelerating change to an offset point, defined as the point in time at which change ceases. For increasing trajectories, the offset occurs when the criterion attains its maximum ("inverted J-shaped" trajectories). For decreasing trajectories, offset instead occurs at the minimum. Our model allows for individual differences in both the timing of offset and ultimate level of the outcome. The same model, reparameterized slightly, captures accelerating change from a point of onset ("J-shaped" trajectories). We then extend the framework to accommodate "S-shaped" curves where both the onset and offset of change are within the observation window. We provide demonstrations that span neuroscience, educational psychology, developmental psychology, and cognitive science, illustrating the applicability of the modeling framework to a variety of research questions about individual differences in the timing of change. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | | | | | | | | | - Weili Lin
- University of North Carolina, Chapel Hill, USA
| | | | | |
Collapse
|
10
|
McNeish D, Peña A, Vander Wyst KB, Ayers SL, Olson ML, Shaibi GQ. Facilitating Growth Mixture Model Convergence in Preventive Interventions. Prev Sci 2023; 24:505-516. [PMID: 34235633 PMCID: PMC9004621 DOI: 10.1007/s11121-021-01262-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/09/2021] [Indexed: 01/09/2023]
Abstract
Growth mixture models (GMMs) are applied to intervention studies with repeated measures to explore heterogeneity in the intervention effect. However, traditional GMMs are known to be difficult to estimate, especially at sample sizes common in single-center interventions. Common strategies to coerce GMMs to converge involve post hoc adjustments to the model, particularly constraining covariance parameters to equality across classes. Methodological studies have shown that although convergence is improved with post hoc adjustments, they embed additional tenuous assumptions into the model that can adversely impact key aspects of the model such as number of classes extracted and the estimated growth trajectories in each class. To facilitate convergence without post hoc adjustments, this paper reviews the recent literature on covariance pattern mixture models, which approach GMMs from a marginal modeling tradition rather than the random effect modeling tradition used by traditional GMMs. We discuss how the marginal modeling tradition can avoid complexities in estimation encountered by GMMs that feature random effects, and we use data from a lifestyle intervention for increasing insulin sensitivity (a risk factor for type 2 diabetes) among 90 Latino adolescents with obesity to demonstrate our point. Specifically, GMMs featuring random effects-even with post hoc adjustments-fail to converge due to estimation errors, whereas covariance pattern mixture models following the marginal model tradition encounter no issues with estimation while maintaining the ability to answer all the research questions.
Collapse
Affiliation(s)
| | | | | | | | - Micha L Olson
- Arizona State University, Tempe, AZ, USA
- Phoenix Children's Hospital, Phoenix, AZ, USA
| | - Gabriel Q Shaibi
- Arizona State University, Tempe, AZ, USA
- Phoenix Children's Hospital, Phoenix, AZ, USA
| |
Collapse
|
11
|
Abstract
To evaluate the fit of a confirmatory factor analysis model, researchers often rely on fit indices such as SRMR, RMSEA, and CFI. These indices are frequently compared to benchmark values of .08, .06, and .96, respectively, established by Hu and Bentler (Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55). However, these indices are affected by model characteristics and their sensitivity to misfit can change across models. Decisions about model fit can therefore be improved by tailoring cutoffs to each model. The methodological literature has proposed methods for deriving customized cutoffs, although it can require knowledge of linear algebra and Monte Carlo simulation. Given that many empirical researchers do not have training in these technical areas, empirical studies largely continue to rely on fixed benchmarks even though they are known to generalize poorly and can be poor arbiters of fit. To address this, this paper introduces the R package, dynamic, to make computation of dynamic fit index cutoffs (which are tailored to the user's model) more accessible to empirical researchers. dynamic heavily automatizes this process and only requires a lavaan object to automatically conduct several custom Monte Carlo simulations and output fit index cutoffs designed to be sensitive to misfit with the user's model characteristics.
Collapse
Affiliation(s)
- Melissa G Wolf
- Gevirtz Graduate School of Education, University of California Santa Barbara
| | | |
Collapse
|
12
|
McNeish D. Generalizability of Dynamic Fit Index, Equivalence Testing, and Hu & Bentler Cutoffs for Evaluating Fit in Factor Analysis. Multivariate Behav Res 2023; 58:195-219. [PMID: 36787523 DOI: 10.1080/00273171.2022.2163477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Factor analysis is often used to model scales created to measure latent constructs, and internal structure validity evidence is commonly assessed with indices like RMSEA, and CFI. These indices are essentially effect size measures and definitive benchmarks regarding which values connote reasonable fit have been elusive. Simulations from the 1990s suggesting possible benchmark values are among the most highly cited methodological papers across any discipline. However, simulations have suggested that fixed benchmarks do not generalize well - fit indices are systematically impacted by characteristics like the number of items and the magnitude of the loadings, so fixed benchmarks can confound misfit with model characteristics. Alternative frameworks for creating customized, model-specific benchmarks have recently been proposed to circumvent these issues but they have not been systematically evaluated. Motivated by two empirical applications where different methods yield inconsistent conclusions, two simulation studies are performed to assess the ability of three different approaches to correctly classify models that are correct or misspecified across different conditions. Results show that dynamic fit indices and equivalence testing both improved upon the traditional Hu & Bentler benchmarks and dynamic fit indices appeared to be least confounded with model characteristics in the conditions studied.
Collapse
|
13
|
Abstract
Much of the existing longitudinal mediation literature focuses on panel data where relatively few repeated measures are collected over a relatively broad timespan. However, technological advances in data collection (e.g., smartphones, wearables) have led to a proliferation of short duration, densely collected longitudinal data in behavioral research. These intensive longitudinal data differ in structure and focus relative to traditionally collected panel data. As a result, existing methodological resources do not necessarily extend to nuances present in the recent influx of intensive longitudinal data and designs. In this tutorial, we first cover potential limitations of traditional longitudinal mediation models to accommodate unique characteristics of intensive longitudinal data. Then, we discuss how recently developed dynamic structural equation models (DSEMs) may be well-suited for mediation modeling with intensive longitudinal data and can overcome some of the limitations associated with traditional approaches. We describe four increasingly complex intensive longitudinal mediation models: (a) stationary models where the indirect effect is constant over time and people, (b) person-specific models where the indirect effect varies across people, (c) dynamic models where the indirect effect varies across time, and (d) cross-classified models where the indirect effect varies across both time and people. We apply each model to a running example featuring a mobile health intervention designed to improve health behavior of individuals with binge eating disorder. In each example, we provide annotated Mplus code and interpretation of the output to guide empirical researchers through mediation modeling with this increasingly popular type of longitudinal data. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
14
|
McNeish D, Harring JR, Dumas D. A multilevel structured latent curve model for disaggregating student and school contributions to learning. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
15
|
Aitken AA, Graham S, McNeish D. The effects of choice versus preference on writing and the mediating role of perceived competence. Journal of Educational Psychology 2022. [DOI: 10.1037/edu0000765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
16
|
Dumas D, Dong Y, McNeish D. How Fair Is My Test? European Journal of Psychological Assessment 2022. [DOI: 10.1027/1015-5759/a000724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Abstract. The degree to which test scores can support justified and fair decisions about demographically diverse participants has been an important aspect of educational and psychological testing for millennia. In the last 30 years, this aspect of measurement has come to be known as consequential validity, and it has sparked scholarly debate as to how responsible psychometricians should be for the fairness of the tests they create and how the field might be able to quantify that fairness and communicate it to applied researchers and other stakeholders of testing programs. Here, we formulate a relatively simple-to-calculate ratio coefficient that is meant to capture how well the scores from a given test can predict a criterion free from the undue influence of student demographics. We posit three example calculations of this Consequential Validity Ratio (CVR): one where the CVR is quite strong, another where the CVR is more moderate, and a third where the CVR is weak. We provide preliminary suggestions for interpreting the CVR and discuss its utility in instances where new tests are being developed, tests are being adapted to a new population, or the fairness of an established test has become an empirical question.
Collapse
Affiliation(s)
- Denis Dumas
- Department of Research Methods and Information Science, University of Denver, CO, USA
- Department of Educational Psychology, University of Georgia, Athens, GA, USA
| | - Yixiao Dong
- Department of Research Methods and Information Science, University of Denver, CO, USA
| | - Daniel McNeish
- Department of Psychology, Arizona State University, AZ, USA
| |
Collapse
|
17
|
Abstract
Deciding which random effects to retain is a central decision in mixed effect models. Recent recommendations advise a maximal structure whereby all theoretically relevant random effects are retained. Nonetheless, including many random effects often leads to nonpositive definiteness. A typical remedy is to simplify the random effect structure by removing random effects or associated covariances. However, this practice is known to bias estimates of remaining covariance parameters and compromise fixed effect inferences. Cholesky decompositions frequently are suggested as an alternative and are automatically implemented in some software. Instead of Cholesky decompositions, we describe factor analytic structures as an approach to avoid nonpositive definiteness. This approach is occasionally employed in biosciences like plant breeding, but, ironically, has not been established in behavioral sciences despite the close historical connection with factor analysis in these fields. We discuss how a factor analytic structure facilitates estimation and conduct simulations to compare convergence and performance to simplifying the random effects structure or Cholesky decomposition approaches. Results show a lower rate of nonpositive definiteness with the factor analytic structure than Cholesky decomposition and suggest that factor analytic covariance structure may be useful to combating nonpositive definiteness, especially in models with many random effects.
Collapse
|
18
|
Abstract
A large body of literature suggests that parent-child separation predicts child maladjustment. However, further advancement in methodology is needed to account for heterogeneity in types of separation. Additionally, given a lack of research examining different types of separation as predictors of offspring substance use, further research into this topic is warranted. The present study tested the relation between parent-child separation and young-adult substance use disorder (SUD), capturing heterogeneity in these effects based on group differences and measurement of separation. In a sample of 427 young adults from a larger longitudinal study oversampled for parental alcohol use disorder (AUD), effects of number and type of separations on SUD diagnosis were tested. Further, we explored whether these associations were moderated by gender, ethnicity, or parental AUD. Two underlying types of separation were found: parental health-related separation (i.e., parental death, hospitalization) and nonhealth-related separation (i.e., divorce, arrest). A higher sum of separations and greater nonhealth-related separation predicted higher odds of SUD. Greater health-related separation predicted lower odds of SUD. However, these effects were qualified by interactions with ethnicity and parental AUD. Although the vast majority of studies measure cumulative parent-child separation with sum scores, the present study demonstrates that measuring underlying "types" of cumulative separation also reveals important effects. Moreover, childhood separation is a significant risk factor for SUD. Future research on separation should implement methods to capture separation types and further account for potential effects of selection into separation types. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Austin J Blake
- Department of Clinical Psychology, Arizona State University
| | - Daniel McNeish
- Department of Quantitative Psychology, Arizona State University
| | - Laurie Chassin
- Department of Clinical Psychology, Arizona State University
| |
Collapse
|
19
|
Abstract
Use of Bayesian methods has proliferated in recent years as technological and software developments have made Bayesian methods more approachable for researchers working with empirical data. Connected with the increased usage of Bayesian methods in empirical studies is a corresponding increase in recommendations and best practices for Bayesian methods. However, given the extensive scope of Bayes, theorem, there are various compelling perspectives one could adopt for its application. This paper first describes five different perspectives, including examples of different methodologies that are aligned within these perspectives. We then discuss how the different perspectives can have implications for modeling and reporting practices, such that approaches and recommendations that are perfectly reasonable under one perspective might be unreasonable when viewed from another perspective. The ultimate goal is to show the heterogeneity of defensible practices in Bayesian methods and to foster a greater appreciation for the variety of orientations that exist. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
20
|
Abstract
Model fit assessment is a central component of evaluating confirmatory factor analysis models and the validity of psychological assessments. Fit indices remain popular and researchers often judge fit with fixed cutoffs derived by Hu and Bentler (1999). Despite their overwhelming popularity, methodological studies have cautioned against fixed cutoffs, noting that the meaning of fit indices varies based on a complex interaction of model characteristics like factor reliability, number of items, and number of factors. Criticism of fixed cutoffs stems primarily from the fact that they were derived from one specific confirmatory factor analysis model and lack generalizability. To address this, we propose a simulation-based method called dynamic fit index cutoffs such that derivation of cutoffs is adaptively tailored to the specific model and data characteristics being evaluated. Unlike previously proposed simulation-based techniques, our method removes existing barriers to implementation by providing an open-source, Web based Shiny software application that automates the entire process so that users neither need to manually write any software code nor be knowledgeable about foundations of Monte Carlo simulation. Additionally, we extend fit index cutoff derivations to include sets of cutoffs for multiple levels of misspecification. In doing so, fit indices can more closely resemble their originally intended purpose as effect sizes quantifying misfit rather than improperly functioning as ad hoc hypothesis tests. We also provide an approach specifically designed for the nuances of 1-factor models, which have received surprisingly little attention in the literature despite frequent substantive interests in unidimensionality. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
21
|
Abstract
Context-appropriate infant physiological functioning may support emotion regulation and mother-infant emotion coregulation. Among a sample of 210 low-income Mexican-origin mothers and their 24-week-old infants, dynamic structural equation modeling (DSEM) was used to examine whether within-infant vagal functioning accounted for between-dyad differences in within-dyad second-by-second emotion regulation and coregulation during free play. Vagal functioning was captured by within-infant mean and variability (standard deviation) of respiratory sinus arrhythmia (RSA) during free play. Infant emotion regulation was quantified as emotional equilibria (within-person mean), volatility (within-person deviation from equilibrium), carryover (how quickly equilibrium is restored following a disturbance), and feedback loops (the extent to which prior affect dampens or amplifies subsequent affect) in positive and negative affect during free play; coregulation was quantified as the influence of one partner's affect on the other's subsequent affect. Among infants with lower RSA variability, positive affect fluctuated around a higher equilibrium, and negative affect fluctuated around a lower equilibrium; these infants exhibited feedback loops where their positive affect dampened their subsequent negative affect. As expected, infants with higher mean RSA exhibited more volatility in positive affect, feedback loops between their positive and negative affect, and stronger mother-driven emotion coregulation. The results highlight differences in simultaneously occurring biological and emotion regulation.
Collapse
Affiliation(s)
| | - Linda J Luecken
- Department of Psychology, Arizona State University, Tempe, AZ, USA
| | - Daniel McNeish
- Department of Psychology, Arizona State University, Tempe, AZ, USA
| | | | - Tracy L Spinrad
- School of Social and Family Dynamics, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
22
|
Abstract
Growth mixture models are a popular method to uncover heterogeneity in growth trajectories. Harnessing the power of growth mixture models in applications is difficult given the prevalence of nonconvergence when fitting growth mixture models to empirical data. Growth mixture models are rooted in the random effect tradition, and nonconvergence often leads researchers to modify their intended model with constraints in the random effect covariance structure to facilitate estimation. While practical, doing so has been shown to adversely affect parameter estimates, class assignment, and class enumeration. Instead, we advocate specifying the models with a marginal approach to prevent the widespread practice of sacrificing class-specific covariance structures to appease nonconvergence. A simulation is provided to show the importance of modeling class-specific covariance structures and builds off existing literature showing that applying constraints to the covariance leads to poor performance. These results suggest that retaining class-specific covariance structures should be a top priority and that marginal models like covariance pattern growth mixture models that model the covariance structure without random effects are well-suited for such a purpose, particularly with modest sample sizes and attrition commonly found in applications. An application to PTSD data with such characteristics is provided to demonstrate (a) convergence difficulties with random effect models, (b) how covariance structure constraints improve convergence but to the detriment of performance, and (c) how covariance pattern growth mixture models may provide a path forward that improves convergence without forfeiting class-specific covariance structures.
Collapse
|
23
|
Abstract
Technological advances have increased the prevalence of intensive longitudinal data as well as statistical techniques appropriate for these data, such as dynamic structural equation modeling (DSEM). Intensive longitudinal designs often investigate constructs related to affect or mood and do so with multiple item scales. However, applications of intensive longitudinal methods often rely on simple sums or averages of the administered items rather than considering a proper measurement model. This paper demonstrates how to incorporate measurement models into DSEM to (1) provide more rigorous measurement of constructs used in intensive longitudinal studies and (2) assess whether scales are invariant across time and across people, which is not possible when item responses are summed or averaged. We provide an example from an ecological momentary assessment study on self-regulation in adults with binge eating disorder and walkthrough how to fit the model in Mplus and how to interpret the results.
Collapse
|
24
|
Peña A, McNeish D, Ayers SL, Olson ML, Vander Wyst KB, Williams AN, Shaibi GQ. Response heterogeneity to lifestyle intervention among Latino adolescents. Pediatr Diabetes 2020; 21:1430-1436. [PMID: 32939893 PMCID: PMC8274397 DOI: 10.1111/pedi.13120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 09/06/2020] [Accepted: 09/08/2020] [Indexed: 12/19/2022] Open
Abstract
OBJECTIVE To characterize the heterogeneity in response to lifestyle intervention among Latino adolescents with obesity. METHODS We conducted secondary data analysis of 90 Latino adolescents (age 15.4 ± 0.9 y, female 56.7%) with obesity (BMI% 98.1 ± 1.5%) that were enrolled in a 3 month lifestyle intervention and were followed for a year. Covariance pattern mixture models identified response phenotypes defined by changes in insulin sensitivity as measured using a 2 hour oral glucose tolerance test. Baseline characteristics were compared across response phenotypes using one-way ANOVA and chi-square test. RESULTS Three distinct response phenotypes (PH1, PH2, PH3) were identified. PH1 exhibited the most robust response defined by the greatest increase in insulin sensitivity over time (β ± SE, linear 0.52 ± 0.17, P < .001; quadratic -0.03 ± 0.01, P = .001). PH2 showed non-significant changes, while PH3 demonstrated modest short-term increases in insulin sensitivity which were not sustained over time (linear 0.08 ± 0.03, P = .002; quadratic -0.01 ± 0.002, P = .003). At baseline, PH3 (1.1 ± 0.4) was the most insulin resistant phenotype and exhibited the highest BMI% (98.5 ± 1.1%), 2 hours glucose concentrations (144.0 ± 27.5 mg/dL), and lowest beta-cell function as estimated by the oral disposition index (4.5 ± 2.8). CONCLUSION Response to lifestyle intervention varies among Latino youth with obesity and suggests that precision approaches are warranted to meet the prevention needs of high risk youth.
Collapse
Affiliation(s)
- Armando Peña
- Center for Health Promotion and Disease Prevention, Arizona State University, Phoenix, AZ,College of Health Solutions, Arizona State University, Phoenix, AZ
| | - Daniel McNeish
- Department of Psychology, Arizona State University, Tempe, AZ
| | - Stephanie L. Ayers
- Southwest Interdisciplinary Research Center, Arizona State University, Phoenix, AZ
| | - Micah L. Olson
- Center for Health Promotion and Disease Prevention, Arizona State University, Phoenix, AZ,College of Health Solutions, Arizona State University, Phoenix, AZ
| | - Kiley B. Vander Wyst
- Center for Health Promotion and Disease Prevention, Arizona State University, Phoenix, AZ
| | - Allison N. Williams
- Center for Health Promotion and Disease Prevention, Arizona State University, Phoenix, AZ
| | - Gabriel Q. Shaibi
- Center for Health Promotion and Disease Prevention, Arizona State University, Phoenix, AZ,College of Health Solutions, Arizona State University, Phoenix, AZ,Department of Pediatric Endocrinology and Diabetes, Phoenix Children’s Hospital, Phoenix, AZ
| |
Collapse
|
25
|
Abstract
Psychometric models for longitudinal test scores typically estimate quantities associated with single-administration tests, like ability at each time-point. However, models for longitudinal tests have not considered opportunities to estimate new quantities that are unavailable from single-administration tests. Specifically, we discuss dynamic measurement models - which combine aspects of longitudinal IRT, nonlinear growth models, and dynamic assessment - to directly estimate capacity, defined as the expected future score once the construct has fully developed. After discussing the history and connecting these areas into a single framework, we apply the model to verbal test scores from the Intergenerational Studies, which follow 494 people from 3 to 72 years old. The goal is to predict adult verbal scores (Age ≥ 34) from adolescent scores (Age ≤ 20). We held-out the adult data for prediction and compared predictions from traditional longitudinal IRT ability scores and proposed dynamic measurement capacity scores from models fit to the adolescent data. Results showed that the R2 from capacity scores were 2.5 times larger than the R2 from longitudinal IRT ability scores (43% vs. 16%), providing some evidence that exploring new quantities available from longitudinal testing could be worthwhile when an interest in testing is forecasting future performance.
Collapse
|
26
|
Somers JA, Kerr ML, McNeish D, Smiley PA, Buttitta KV, Rasmussen HF, Borelli JL. Quantitatively representing real-time emotion dynamics: Attachment-based differences in mothers' emotion experiences. J Fam Psychol 2020; 34:480-489. [PMID: 31829672 DOI: 10.1037/fam0000617] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Given inconsistent findings emerging in the literature between motherhood and emotional well-being, it is important to employ cutting-edge methods to evaluate mothers' dynamic emotional experiences. As anticipated by theory, attachment anxiety and avoidance may uniquely predict fluctuations in mothers' positive emotion, which may be yoked in particular to 2 aspects of their experiences: their emotional closeness with their children and their perceptions of their children's positive emotion. In the current study, 144 mothers (41% Hispanic) of young children (mean [M] = 20.9 months) reported on their positive emotion, closeness/distance with their children, and perceptions of their children's positive emotion, up to 5 times per day for 10 days. We fit a dynamic structural equation model (DSEM) in order to evaluate attachment-based differences in mothers' emotional equilibrium (i.e., mean levels of positive emotion), intraindividual volatility in positive emotion, within-person emotional inertia, and cross-lagged emotion processes over time. Attachment anxiety was related to lower average maternal positive emotion ratings and to greater volatility in mothers' positive emotion and emotional closeness/distance. Attachment avoidance was related to higher average ratings of emotional distance, stronger inertia in mothers' positive emotion, and weaker inertia in mothers' emotional distance. Among mothers who were higher on attachment avoidance, emotional distance was related to greater subsequent feelings of positive emotion and perceived child positive emotion. The results are aligned with theory and have specific implications for attachment-informed parenting interventions. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
27
|
Abstract
Standard multilevel models focus on variables that predict the mean while the within-group variability is largely treated as a nuisance. Recent work has shown the advantage of including predictors for both the mean (the location submodel) and the variability (the scale submodel) within a single model. Constrained versions of the model can be fit in standard mixed effect model software, but the most general version with random effects in each of the location and scale submodels has been noted for being difficult to fit and estimate in software. However, the latest release of Mplus includes new capabilities that facilitate fitting the general version of the model as a multilevel structural equation model (SEM). This article introduces the general form of the model that includes location and scale random effects (called the location-scale model) and notes how it can be envisioned as a multilevel SEM. We provide a tutorial with example analyses and Mplus code for the model with two-level cross-sectional data and three-level repeated measures data and discuss how such a model has potential to extend recent developments in organizational science.
Collapse
|
28
|
Abstract
Technological advances have led to an increase in intensive longitudinal data and the statistical literature on modeling such data is rapidly expanding, as are software capabilities. Common methods in this area are related to time-series analysis, a framework that historically has received little exposure in psychology. There is a scarcity of psychology-based resources introducing the basic ideas of time-series analysis, especially for data sets featuring multiple people. We begin with basics of N = 1 time-series analysis and build up to complex dynamic structural equation models available in the newest release of Mplus Version 8. The goal is to provide readers with a basic conceptual understanding of common models, template code, and result interpretation. We provide short descriptions of some advanced issues, but our main priority is to supply readers with a solid knowledge base so that the more advanced literature on the topic is more readily digestible to a larger group of researchers. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Ellen L Hamaker
- Department of Methodology and Statistics, Utrecht University
| |
Collapse
|
29
|
Abstract
Effect partitioning is almost exclusively performed with multilevel models (MLMs) - so much so that some have considered the two to be synonymous. MLMs are able to provide estimates with desirable statistical properties when data come from a hierarchical structure; but the random effects included in MLMs are not always integral to the analysis. As a result, other methods with relaxed assumptions are viable options in many cases. Through empirical examples and simulations, we show how generalized estimating equations (GEEs) can be used to effectively partition effects without random effects. We show that more onerous steps of MLMs such as determining the number of random effects and the structure for their covariance can be bypassed with GEEs while still obtaining identical or near-identical results. Additionally, violations of distributional assumptions adversely affect estimates with MLMs but have no effect on GEEs because no such assumptions are made. This makes GEEs a flexible alternative to MLMs with minimal assumptions that may warrant consideration. Limitations of GEEs for partitioning effects are also discussed.
Collapse
|
30
|
Hussong AM, Ennett ST, McNeish D, Rothenberg WA, Cole V, Gottfredson NC, Faris RW. Teen Social Networks and Depressive Symptoms-Substance Use Associations: Developmental and Demographic Variation. J Stud Alcohol Drugs 2019. [PMID: 30422791 DOI: 10.15288/jsad.2018.79.770] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE The current study examined whether an adolescent's standing within a school-bounded social network moderated the association between depressive symptoms and substance use across adolescence as a function of developmental and demographic factors (gender, parental education, and race/ethnicity). METHOD The sample of 6,776 adolescents participated in up to seven waves of data collection spanning 6th to 12th grade. RESULTS Results of latent growth models showed that lower integration into the social network exacerbates risk for depression-related substance use in youth, particularly around the high school transition, but social status acted as both a risk factor and a protective factor at different points in development for different youth. Findings also varied as a function of youth gender and parental education status. CONCLUSIONS Together these findings suggest that lower integration into the social network exacerbates risk for depression-related substance use in youth, particularly around the high school transition in general as well as just before the high school transition in those with lower parental education or just after the high school transition in males. Thus, the risky impact of social isolation appears more consistent across this period. Social status, however, showed a more varied pattern and further study is needed to understand the sometimes risky and sometimes protective effects of social status on depression-related substance use.
Collapse
Affiliation(s)
- Andrea M Hussong
- Center for Developmental Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Susan T Ennett
- Department of Health Behavior, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Daniel McNeish
- Department of Psychology, Arizona State University, Tempe, Arizona
| | | | - Veronica Cole
- Center for Developmental Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Nisha C Gottfredson
- Center for Developmental Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.,Department of Health Behavior, University of North Carolina Chapel Hill Gillings School of Global Public Health, Chapel Hill, North Carolina
| | - Robert W Faris
- Department of Sociology, University of California at Davis, Davis, California
| |
Collapse
|
31
|
Dumas D, McNeish D, Schreiber-Gregory D, Durning SJ, Torre DM. Dynamic Measurement in Health Professions Education: Rationale, Application, and Possibilities. Acad Med 2019; 94:1323-1328. [PMID: 31460924 DOI: 10.1097/acm.0000000000002729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dynamic measurement modeling (DMM) is a psychometric paradigm that uses longitudinal data to estimate individual students' growth in measured skills over the course of an educational program (i.e., growth scores). DMM represents a more formal way of assessing learning progress across the health professions education continuum. In this article, the authors provide justification for this approach in health professions education and demonstrate its proof-of-concept use with three time points of United States Medical Licensing Examination Step exams to generate growth scores for 454 current and recent medical learners. The authors demonstrate that learners vary substantially on their growth scores, and those growth scores exhibit psychometric reliability. In addition, growth scores significantly and positively correlated with indicators of medical learner readiness (e.g., undergraduate grade point average and Medical College Admission Test scores). These growth scores were also capable of significantly and positively correlating with future ratings of clinical competencies during internship as assessed through a survey sent to their program directors at the end of the first postgraduate year (e.g., patient care, interpersonal skills). These preliminary findings of reliability and validity for DMM growth scores provide initial evidence for further investigation into the suitability of a dynamic measurement paradigm in health professions education.
Collapse
Affiliation(s)
- Denis Dumas
- D. Dumas is assistant professor of research methods and information science, University of Denver, Denver, Colorado. D. McNeish is assistant professor of quantitative psychology, Arizona State University, Phoenix, Arizona. D. Schreiber-Gregory is data analyst, Uniformed Services University of the Health Sciences, Bethesda, Maryland. S.J. Durning is professor of medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland. D.M. Torre is associate professor of medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | | | | | | | | |
Collapse
|
32
|
Abstract
Recent methodological studies have investigated the properties of multilevel models with small samples. Previous work has primarily focused on continuous outcomes and little attention has been paid to count outcomes. The estimation of count outcome models can be difficult because the likelihood has no closed-form solution, meaning that approximation methods are required. Although adaptive Gaussian quadrature (AGQ) is generally seen as the gold standard, its comparative performance has been investigated with larger samples. AGQ approximates the full likelihood, a function that is known to produce biased estimates with small samples with continuous outcomes. Conversely, penalized quasi-likelihood (PQL) is considered to be a less desirable approximation; however, it can approximate the restricted likelihood function, a function that is known to perform well with smaller samples with continuous outcomes. The goal of this paper is to compare the small sample bias of full likelihood methods to the linearization bias of PQL with restricted likelihood. Simulation results indicate that the linearization bias of PQL is preferable to the finite sample bias of AGQ with smaller samples.
Collapse
|
33
|
McNeish D, Hancock GR. The effect of measurement quality on targeted structural model fit indices: A comment on Lance, Beck, Fan, and Carter (2016). Psychol Methods 2019. [PMID: 29517268 DOI: 10.1037/met0000157] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Lance, Beck, Fan, and Carter (2016) recently advanced 6 new fit indices and associated cutoff values for assessing data-model fit in the structural portion of traditional latent variable path models. The authors appropriately argued that, although most researchers' theoretical interest rests with the latent structure, they still rely on indices of global model fit that simultaneously assess both the measurement and structural portions of the model. As such, Lance et al. proposed indices intended to assess the structural portion of the model in isolation of the measurement model. Unfortunately, although these strategies separate the assessment of the structure from the fit of the measurement model, they do not isolate the structure's assessment from the quality of the measurement model. That is, even with a perfectly fitting measurement model, poorer quality (i.e., less reliable) measurements will yield a more favorable verdict regarding structural fit, whereas better quality (i.e., more reliable) measurements will yield a less favorable structural assessment. This phenomenon, referred to by Hancock and Mueller (2011) as the reliability paradox, affects not only traditional global fit indices but also those structural indices proposed by Lance et al. as well. Fortunately, as this comment will clarify, indices proposed by Hancock and Mueller help to mitigate this problem and allow the structural portion of the model to be assessed independently of both the fit of the measurement model as well as the quality of indicator variables contained therein. (PsycINFO Database Record
Collapse
|
34
|
Abstract
Advances in data collection have made intensive longitudinal data easier to collect, unlocking potential for methodological innovations to model such data. Dynamic structural equation modeling (DSEM) is one such methodology but recent studies have suggested that its small N performance is poor. This is problematic because small N data are omnipresent in empirical applications due to logistical and financial concerns associated with gathering many measurements on many people. In this paper, we discuss how previous studies considering small samples have focused on Bayesian methods with diffuse priors. The small sample literature has shown that diffuse priors may cause problems because they become unintentionally informative. Instead, we outline how researchers can create weakly informative admissible-range-restricted priors, even in the absence of previous studies. A simulation study shows that metrics like relative bias and non-null detection rates with these admissible-range-restricted priors improve small N properties of DSEM compared to diffuse priors.
Collapse
|
35
|
Wentzel KR, Tomback R, Williams A, McNeish D. Perceptions of competence, control, and belongingness over the transition to high school: A mixed-method study. Contemporary Educational Psychology 2019. [DOI: 10.1016/j.cedpsych.2018.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
36
|
|
37
|
Abstract
Debate continues about whether the likelihood ratio test ( T ML) or goodness-of-fit indices are most appropriate for assessing data-model fit in structural equation models. Though potential advantages and disadvantages of these methods with large samples are often discussed, shortcomings concomitant with smaller samples are not. This article aims to (a) highlight the broader small sample issues with both approaches to data-model fit assessment, (b) note that what constitutes a small sample is common in empirical studies (approximately 20% to 50% in review studies, depending on the definition of “small”), and (c) more widely introduce F-tests as a desirable alternative than the traditional T ML tests, small-sample corrections, or goodness-of-fit indices with smaller samples. Both goodness-of-fit indices and comparing T ML to a chi-square distribution at smaller samples leads to overrejection of well-fitting models. Simulations and example analyses show that F-tests yield more desirable statistical properties—with or without normality—than standard approaches like chi-square tests or goodness-of-fit indices with smaller samples, roughly defined as N < 200 or N: df < 3.
Collapse
|
38
|
Wentzel KR, Muenks K, McNeish D, Russell S. Emotional support, social goals, and classroom behavior: A multilevel, multisite study. Journal of Educational Psychology 2018. [DOI: 10.1037/edu0000239] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
39
|
McNeish D, Kelley K. Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychol Methods 2018; 24:20-35. [PMID: 29863377 DOI: 10.1037/met0000182] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Clustered data are common in many fields. Some prominent examples of clustering are employees clustered within supervisors, students within classrooms, and clients within therapists. Many methods exist that explicitly consider the dependency introduced by a clustered data structure, but the multitude of available options has resulted in rigid disciplinary preferences. For example, those working in the psychological, organizational behavior, medical, and educational fields generally prefer mixed effects models, whereas those working in economics, behavioral finance, and strategic management generally prefer fixed effects models. However, increasingly interdisciplinary research has caused lines that separate the fields grounded in psychology and those grounded in economics to blur, leading to researchers encountering unfamiliar statistical methods commonly found in other disciplines. Persistent discipline-specific preferences can be particularly problematic because (a) each approach has certain limitations that can restrict the types of research questions that can be appropriately addressed, and (b) analyses based on the statistical modeling decisions common in one discipline can be difficult to understand for researchers trained in alternative disciplines. This can impede cross-disciplinary collaboration and limit the ability of scientists to make appropriate use of research from adjacent fields. This article discusses the differences between mixed effects and fixed effects models for clustered data, reviews each approach, and helps to identify when each approach is optimal. We then discuss the within-between specification, which blends advantageous properties of each framework into a single model. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Ken Kelley
- Department of Information Technology, Analytics, and Operations, University of Notre Dame
| |
Collapse
|
40
|
Abstract
To date, small sample problems with latent growth models (LGMs) have not received the amount of attention in the literature as related mixed-effect models (MEMs). Although many models can be interchangeably framed as a LGM or a MEM, LGMs uniquely provide criteria to assess global data-model fit. However, previous studies have demonstrated poor small sample performance of these global data-model fit criteria and three post hoc small sample corrections have been proposed and shown to perform well with complete data. However, these corrections use sample size in their computation-whose value is unclear when missing data are accommodated with full information maximum likelihood, as is common with LGMs. A simulation is provided to demonstrate the inadequacy of these small sample corrections in the near ubiquitous situation in growth modeling where data are incomplete. Then, a missing data correction for the small sample correction equations is proposed and shown through a simulation study to perform well in various conditions found in practice. An applied developmental psychology example is then provided to demonstrate how disregarding missing data in small sample correction equations can greatly affect assessment of global data-model fit.
Collapse
Affiliation(s)
- Daniel McNeish
- University of Maryland, College Park, MD, USA
- Utrecht University, Utrecht, Netherlands
| | | |
Collapse
|
41
|
Abstract
Studies on small sample properties of multilevel models have become increasingly prominent in the methodological literature in response to the frequency with which small sample data appear in empirical studies. Simulation results generally recommend that empirical researchers employ restricted maximum likelihood estimation (REML) with a Kenward-Roger correction with small samples in frequentist contexts to minimize small sample bias in estimation and to prevent inflation of Type-I error rates. However, simulation studies focus on recommendations for best practice, and there is little to no explanation of why traditional maximum likelihood (ML) breaks down with smaller samples, what differentiates REML from ML, or how the Kenward-Roger correction remedies lingering small sample issues. Due to the complexity of these methods, most extant descriptions are highly mathematical and are intended to prove that the methods improve small sample performance as intended. Thus, empirical researchers have documentation that these methods are advantageous but still lack resources to help understand what the methods actually do and why they are needed. This tutorial explains why ML falters with small samples, how REML circumvents some issues, and how Kenward-Roger works. We do so without equations or derivations to support more widespread understanding and use of these valuable methods.
Collapse
Affiliation(s)
- Daniel McNeish
- a University of North Carolina, Chapel Hill; Arizona State University
| |
Collapse
|
42
|
McNeish D. Fitting Residual Error Structures for Growth Models in SAS PROC MCMC. Educ Psychol Meas 2017; 77:587-612. [PMID: 30034021 PMCID: PMC5991787 DOI: 10.1177/0013164416652441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In behavioral sciences broadly, estimating growth models with Bayesian methods is becoming increasingly common, especially to combat small samples common with longitudinal data. Although Mplus is becoming an increasingly common program for applied research employing Bayesian methods, the limited selection of prior distributions for the elements of covariance structures makes more general software more advantages under certain conditions. However, as a disadvantage of general software's software flexibility, few preprogrammed commands exist for specifying covariance structures. For instance, PROC MIXED has a few dozen such preprogrammed options, but when researchers divert to a Bayesian framework, software offer no such guidance and requires researchers to manually program these different structures, which is no small task. As such the literature has noted that empirical papers tend to simplify their covariance matrices to circumvent this difficulty, which is not desirable because such a simplification will likely lead to biased estimates of variance components and standard errors. To facilitate wider implementation of Bayesian growth models that properly model covariance structures, this article overviews how to generally program a growth model in SAS PROC MCMC and then demonstrates how to program common residual error structures. Full annotated SAS code and an applied example are provided.
Collapse
|
43
|
Abstract
Empirical studies in psychology commonly report Cronbach's alpha as a measure of internal consistency reliability despite the fact that many methodological studies have shown that Cronbach's alpha is riddled with problems stemming from unrealistic assumptions. In many circumstances, violating these assumptions yields estimates of reliability that are too small, making measures look less reliable than they actually are. Although methodological critiques of Cronbach's alpha are being cited with increasing frequency in empirical studies, in this tutorial we discuss how the trend is not necessarily improving methodology used in the literature. That is, many studies continue to use Cronbach's alpha without regard for its assumptions or merely cite methodological articles advising against its use to rationalize unfavorable Cronbach's alpha estimates. This tutorial first provides evidence that recommendations against Cronbach's alpha have not appreciably changed how empirical studies report reliability. Then, we summarize the drawbacks of Cronbach's alpha conceptually without relying on mathematical or simulation-based arguments so that these arguments are accessible to a broad audience. We continue by discussing several alternative measures that make less rigid assumptions which provide justifiably higher estimates of reliability compared to Cronbach's alpha. We conclude with empirical examples to illustrate advantages of alternative measures of reliability including omega total, Revelle's omega total, the greatest lower bound, and Coefficient H. A detailed software appendix is also provided to help researchers implement alternative methods. (PsycINFO Database Record
Collapse
Affiliation(s)
- Daniel McNeish
- Department of Methodology and Statistics, Utrecht University
| |
Collapse
|
44
|
Elzakkers IFFM, Danner UN, Sternheim LC, McNeish D, Hoek HW, van Elburg AA. Mental capacity to consent to treatment and the association with outcome: a longitudinal study in patients with anorexia nervosa. BJPsych Open 2017; 3:147-153. [PMID: 28584660 PMCID: PMC5445260 DOI: 10.1192/bjpo.bp.116.003905] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 02/15/2017] [Accepted: 05/03/2017] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Relevance of diminished mental capacity in anorexia nervosa (AN) to course of disorder is unknown. AIMS To examine prognostic relevance of diminished mental capacity in AN. METHOD A longitudinal study was conducted in 70 adult female patients with severe AN. At baseline, mental capacity was assessed by psychiatrists, and clinical and neuropsychological data (decision-making) were collected. After 1 and 2 years, clinical and neuropsychological assessments were repeated, and remission and admission rates were calculated. RESULTS People with AN with diminished mental capacity had a less favourable outcome with regard to remission and were admitted more frequently. Their appreciation of illness remained hampered. Decision-making did not improve, in contrast to people with full mental capacity. CONCLUSIONS People with AN with diminished mental capacity seem to do less well in treatment and display decision-making deficiencies that do not ameliorate with weight improvement. DECLARATION OF INTEREST None. COPYRIGHT AND USAGE © The Royal College of Psychiatrists 2017. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) license.
Collapse
Affiliation(s)
- Isis F F M Elzakkers
- , MD, MSc, Altrecht Eating Disorders Rintveld, Altrecht Mental Health Institute, Zeist, The Netherlands
| | - Unna N Danner
- , PhD, Altrecht Eating Disorders Rintveld, Altrecht Mental Health Institute, Zeist, The Netherlands; Department of Psychology, Utrecht University, Utrecht, The Netherlands
| | - Lot C Sternheim
- , PhD, Department of Psychology, Utrecht University, Utrecht, The Netherlands
| | - Daniel McNeish
- , PhD, Department of Methodology and Statistics, Utrecht University, Utrecht, The Netherlands
| | - Hans W Hoek
- , MD, PhD, Altrecht Eating Disorders Rintveld, Altrecht Mental Health Institute, Zeist, The Netherlands; Parnassia Psychiatric Institute, The Hague, The Netherlands; Department of Psychiatry, University Medical Center Groningen, Groningen, The Netherlands; Department of Epidemiology, Mailman School of Public Health, Columbia University New York, USA
| | - Annemarie A van Elburg
- , MD, PhD, Altrecht Eating Disorders Rintveld, Altrecht Mental Health Institute, Zeist, The Netherlands; Department of Psychology, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
45
|
Wentzel KR, Muenks K, McNeish D, Russell S. Peer and teacher supports in relation to motivation and effort: A multi-level study. Contemporary Educational Psychology 2017. [DOI: 10.1016/j.cedpsych.2016.11.002] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
46
|
Abstract
Latent variable modeling is a popular and flexible statistical framework. Concomitant with fitting latent variable models is assessment of how well the theoretical model fits the observed data. Although firm cutoffs for these fit indexes are often cited, recent statistical proofs and simulations have shown that these fit indexes are highly susceptible to measurement quality. For instance, a root mean square error of approximation (RMSEA) value of 0.06 (conventionally thought to indicate good fit) can actually indicate poor fit with poor measurement quality (e.g., standardized factors loadings of around 0.40). Conversely, an RMSEA value of 0.20 (conventionally thought to indicate very poor fit) can indicate acceptable fit with very high measurement quality (standardized factor loadings around 0.90). Despite the wide-ranging effect on applications of latent variable models, the high level of technical detail involved with this phenomenon has curtailed the exposure of these important findings to empirical researchers who are employing these methods. This article briefly reviews these methodological studies in minimal technical detail and provides a demonstration to easily quantify the large influence measurement quality has on fit index values and how greatly the cutoffs would change if they were derived under an alternative level of measurement quality. Recommendations for best practice are also discussed.
Collapse
Affiliation(s)
- Daniel McNeish
- a Human Development and Quantitative Methodology Department , University of Maryland , College Park
| | - Ji An
- a Human Development and Quantitative Methodology Department , University of Maryland , College Park
| | - Gregory R Hancock
- a Human Development and Quantitative Methodology Department , University of Maryland , College Park
| |
Collapse
|
47
|
Abstract
Small samples sizes are a pervasive problem when modeling clustered data. In two-level models, this problem has been well studied, and several resources provide guidance for modeling such data. However, a recent review of small-sample clustered data methods has noted that no studies have investigated methods for modeling three-level data with small sample sizes. Furthermore, strategies for two-level models do not necessarily translate to the three-level context. Moreover, three-level models are prone to small samples because the "small sample" designation is primarily based on the sample size of the highest level, and large samples are increasingly difficult to amass as one progresses up a hierarchy. In this study, we focus on the case when the third level is incidental, meaning that the third level is important to consider but there are no explicit research questions at the third level. This study performs a simulation study to examine the performance of seven methods for modeling three-level data with a small sample at the third level. A motivating educational psychology example is also provided to demonstrate how the choice of method can greatly affect results.
Collapse
Affiliation(s)
- Daniel McNeish
- a Department of Methodology and Statistics , Utrecht University
| | - Kathryn R Wentzel
- b Department of Human Development and Quantitative Methodology , University of Maryland
| |
Collapse
|
48
|
|
49
|
Abstract
Recent methodological work has highlighted the promise of nonlinear growth models for addressing substantive questions in the behavioral sciences. In this article, we outline a second-order nonlinear growth model in order to measure a critical notion in development and education: potential. Here, potential is conceptualized as having three components-ability, capacity, and availability-where ability is the amount of skill a student is estimated to have at a given timepoint, capacity is the maximum amount of ability a student is predicted to be able to develop asymptotically, and availability is the difference between capacity and ability at any particular timepoint. We argue that single timepoint measures are typically insufficient for discerning information about potential, and we therefore describe a general framework that incorporates a growth model into the measurement model to capture these three components. Then, we provide an illustrative example using the public-use Early Childhood Longitudinal Study-Kindergarten data set using a Michaelis-Menten growth function (reparameterized from its common application in biochemistry) to demonstrate our proposed model as applied to measuring potential within an educational context. The advantage of this approach compared to currently utilized methods is discussed as are future directions and limitations.
Collapse
Affiliation(s)
- Daniel McNeish
- a Department of Human Development and Quantitative Methodology , University of Maryland
| | - Denis Dumas
- b Department of Human Development and Psychoeducational Studies , Howard University
| |
Collapse
|
50
|
Abstract
Exploratory factor analysis (EFA) is an extremely popular method for determining the underlying factor structure for a set of variables. Due to its exploratory nature, EFA is notorious for being conducted with small sample sizes, and recent reviews of psychological research have reported that between 40% and 60% of applied studies have 200 or fewer observations. Recent methodological studies have addressed small size requirements for EFA models; however, these models have only considered complete data, which are the exception rather than the rule in psychology. Furthermore, the extant literature on missing data techniques with small samples is scant, and nearly all existing studies focus on topics that are not of primary interest to EFA models. Therefore, this article presents a simulation to assess the performance of various missing data techniques for EFA models with both small samples and missing data. Results show that deletion methods do not extract the proper number of factors and estimate the factor loadings with severe bias, even when data are missing completely at random. Predictive mean matching is the best method overall when considering extracting the correct number of factors and estimating factor loadings without bias, although 2-stage estimation was a close second.
Collapse
Affiliation(s)
- Daniel McNeish
- a Department of Methodology and Statistics , Utrecht University , The Netherlands
| |
Collapse
|