1
|
Caronni A, Scarano S. Generalisability of the Barthel Index and the Functional Independence Measure: robustness of disability measures to Differential Item Functioning. Disabil Rehabil 2024:1-12. [PMID: 39221560 DOI: 10.1080/09638288.2024.2391554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/16/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
PURPOSE Differential Item Functioning (DIF), an item malfunctioning, causes Differential Test Functioning (DTF), thus biasing questionnaire measures. The current study evaluates the relationship between DIF and DTF for the Barthel Index and the Functional Independence Measure, likely the most used disability measures. The aim is to understand under which conditions DIF can be ignored as its DTF is negligible. METHODS A simulation study was run. Disability measures were obtained for the Barthel Index and FIM motor domain using Rasch analysis with previously published item calibrations. Several DIF scenarios have been assessed. DTF was tolerable if ≤0.50 logits. RESULTS Simulations showed that the larger the DIF, the larger the DTF and that, keeping the overall DIF constant, the total number of items with DIF does not affect DTF. DIF of the items with the lowest or highest calibrations is the most dangerous. The DIF of central items should be so massive to matter in DTF terms that it is unlikely to happen in practice. The FIM robustness to DIF is better than that of the Barthel Index. CONCLUSIONS The FIM and the Barthel Index show remarkable robustness to DIF. Thanks to this feature, sample invariant, generalisable disability measures are available.
Collapse
Affiliation(s)
- Antonio Caronni
- Department of Neurorehabilitation Sciences, IRCCS Istituto Auxologico Italiano, Milan, Italy
- Department of Biomedical Sciences for Health, University of Milan, Milan, Italy
| | - Stefano Scarano
- Department of Neurorehabilitation Sciences, IRCCS Istituto Auxologico Italiano, Milan, Italy
- Department of Biomedical Sciences for Health, University of Milan, Milan, Italy
| |
Collapse
|
2
|
Wallin G, Chen Y, Moustaki I. DIF Analysis with Unknown Groups and Anchor Items. PSYCHOMETRIKA 2024; 89:267-295. [PMID: 38383880 PMCID: PMC11062998 DOI: 10.1007/s11336-024-09948-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Indexed: 02/23/2024]
Abstract
Ensuring fairness in instruments like survey questionnaires or educational tests is crucial. One way to address this is by a Differential Item Functioning (DIF) analysis, which examines if different subgroups respond differently to a particular item, controlling for their overall latent construct level. DIF analysis is typically conducted to assess measurement invariance at the item level. Traditional DIF analysis methods require knowing the comparison groups (reference and focal groups) and anchor items (a subset of DIF-free items). Such prior knowledge may not always be available, and psychometric methods have been proposed for DIF analysis when one piece of information is unknown. More specifically, when the comparison groups are unknown while anchor items are known, latent DIF analysis methods have been proposed that estimate the unknown groups by latent classes. When anchor items are unknown while comparison groups are known, methods have also been proposed, typically under a sparsity assumption - the number of DIF items is not too large. However, DIF analysis when both pieces of information are unknown has not received much attention. This paper proposes a general statistical framework under this setting. In the proposed framework, we model the unknown groups by latent classes and introduce item-specific DIF parameters to capture the DIF effects. Assuming the number of DIF items is relatively small, an L 1 -regularised estimator is proposed to simultaneously identify the latent classes and the DIF items. A computationally efficient Expectation-Maximisation (EM) algorithm is developed to solve the non-smooth optimisation problem for the regularised estimator. The performance of the proposed method is evaluated by simulation studies and an application to item response data from a real-world educational test.
Collapse
Affiliation(s)
- Gabriel Wallin
- Department of Mathematics and Statistics, Lancaster University, Umeå, Sweden
| | - Yunxiao Chen
- Department of Statistics, London School of Economics and Political Science, Columbia House, Room 5.16 Houghton Street, London, WC2A 2AE, UK.
| | - Irini Moustaki
- Department of Statistics, London School of Economics and Political Science, Columbia House, Room 5.16 Houghton Street, London, WC2A 2AE, UK
| |
Collapse
|
3
|
Hladká A, Martinková P, Magis D. Combining Item Purification and Multiple Comparison Adjustment Methods in Detection of Differential Item Functioning. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:46-61. [PMID: 37218672 DOI: 10.1080/00273171.2023.2205393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Many of the differential item functioning (DIF) detection methods rely on a principle of testing for DIF item by item, while considering the rest of the items or at least some of them being DIF-free. Computational algorithms of these DIF detection methods involve the selection of DIF-free items in an iterative procedure called item purification. Another aspect is the need to correct for multiple comparisons, which can be done with a number of existing multiple comparison adjustment methods. In this article, we demonstrate that implementation of these two controlling procedures together may have an impact on which items are detected as DIF items. We propose an iterative algorithm combining item purification and adjustment for multiple comparisons. Pleasant properties of the newly proposed algorithm are shown with a simulation study. The method is demonstrated on a real data example.
Collapse
Affiliation(s)
- Adéla Hladká
- Institute of Computer Science of the Czech Academy of Sciences
- Faculty of Mathematics and Physics, Charles University
| | - Patrícia Martinková
- Institute of Computer Science of the Czech Academy of Sciences
- Faculty of Education, Charles University
| | | |
Collapse
|
4
|
Wijayanto F, Bucur IG, Mul K, Groot P, van Engelen BGM, Heskes T. Semi-automated Rasch analysis with differential item functioning. Behav Res Methods 2023; 55:3129-3148. [PMID: 36070131 PMCID: PMC10556135 DOI: 10.3758/s13428-022-01947-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2022] [Indexed: 11/08/2022]
Abstract
Rasch analysis is a procedure to develop and validate instruments that aim to measure a person's traits. However, manual Rasch analysis is a complex and time-consuming task, even more so when the possibility of differential item functioning (DIF) is taken into consideration. Furthermore, manual Rasch analysis by construction relies on a modeler's subjective choices. As an alternative approach, we introduce a semi-automated procedure that is based on the optimization of a new criterion, called in-plus-out-of-questionnaire log likelihood with differential item functioning (IPOQ-LL-DIF), which extends our previous criterion. We illustrate our procedure on artificially generated data as well as on several real-world datasets containing potential DIF items. On these real-world datasets, our procedure found instruments with similar clinimetric properties as those suggested by experts through manual analyses.
Collapse
Affiliation(s)
- Feri Wijayanto
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
- Department of Informatics, Universitas Islam Indonesia, Yogyakarta, Indonesia.
| | - Ioan Gabriel Bucur
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Karlien Mul
- Department of Neurology, Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| | - Perry Groot
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Baziel G M van Engelen
- Department of Neurology, Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| | - Tom Heskes
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| |
Collapse
|
5
|
Eyring JB, Crandall A, Magnusson BM. A Modified Menstrual Attitudes Scale: Heteronormative Attitudes, Sexism, and Attitudes Toward Menstruation in Male and Female Adults. ARCHIVES OF SEXUAL BEHAVIOR 2023; 52:1535-1547. [PMID: 36856958 DOI: 10.1007/s10508-023-02565-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
Social shame and stigma surround menstruation, which may compromise women's health and rights in various contexts. Men's attitudes are particularly important because men often hold positions of power that influence women's experience. This study examined factors associated with menstrual attitudes, including heteronormative attitudes, sexism, and family influences. A cross-sectional Qualtrics panel survey (n = 802; aged 18-44; 50.8% female) was performed. We tested a revised menstrual attitudes scale based on items drawn from previously validated measures. Data were analyzed using a structural equation modeling framework. Factor analysis identified and confirmed a 5-factor model for menstrual attitudes. Men endorsed more negative attitudes toward menstruation than women; however, this difference was largely explained by factors other than gender in the structural equation model. After controlling for family and demographic characteristics, attitudes toward openness and secrecy surrounding menstruation were most strongly associated with gender role expectations and hostile sexism. Benevolent sexism was associated with finding menstruation debilitating, denying menstrual symptoms, and endorsing avoidance of activities during menstruation. Heteronormative and sexist attitudes were associated with more negative menstrual attitudes, while increased menstrual knowledge was associated with more positive menstrual attitudes. The difference in menstrual attitudes between males and females was explained largely by heteronormative attitudes and sexism. This suggests that attitudes toward menstruation are closely linked to social ideals about men and women.
Collapse
Affiliation(s)
- J B Eyring
- Department of Public Health, College of Life Sciences, Brigham Young University, 4103 LSB, Provo, UT, 84003, USA.
| | - AliceAnn Crandall
- Department of Public Health, College of Life Sciences, Brigham Young University, 4103 LSB, Provo, UT, 84003, USA
| | - Brianna M Magnusson
- Department of Public Health, College of Life Sciences, Brigham Young University, 4103 LSB, Provo, UT, 84003, USA
| |
Collapse
|
6
|
GÖKTENTÜRK T, SAĞLAM MH, ZUMBO BD. Zumbo'nun madde tepki sürecine yönelik eleştirel bakış açısıyla Türkçeyi Ölçme ve değerlendirmede yeni yaklaşımlar. RUMELIDE DIL VE EDEBIYAT ARAŞTIRMALARI DERGISI 2023:224-245. [DOI: 10.29000/rumelide.1285296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Ana dili olarak Türkçenin ölçülmesi ve değerlendirilmesi maddenin kendisi ve maddeye yönelik katılımcı tepkilerine odaklı bir yaklaşımla sürmektedir. 21. yüzyılda Türkçenin ölçülmesi ise MEB ve ÖSYM öncülüğündeki testler yoluyla gerçekleşmekte ve aynı yaklaşımla paydaşlar tarafından yürütülmektedir. Bununla beraber Dünya’da maddeye tepki sürecinin ekolojik arka planının araştırılması ehemmiyetini giderek artırmaktadır. Madde tepki sürecinde madde ile karşılaşan katılımcı, tepkisinde arka plandaki kompleks bir ekolojik örüntüden etkilenmektedir. Dolayısıyla ilgili örüntünün geçerlilik araştırmalarına dâhil edilmemesi tablonun sınırlı görülebilmesine ve ölçme sonuçlarının eksik yorumlanmasına neden olmaktadır. Madde yanlılığı araştırmalarının mühim bir parçası olan diferansiyel madde fonksiyonunu belirlemeye yönelik çalışmalar bu noktada umut vermektedir. Buradan hareketle Türkçenin ana dili olarak ölçülmesindeki kısıtlı yaklaşımlara da karşılık 3 adımda bu çalışma şekillendirilmiştir. Birinci adımda ana dili olarak Türkçenin ölçülmesi ve değerlendirilmesine yönelik bir çerçeve sunulmuştur. İkinci adımda madde tepki sürecinin ekolojisi tartışılmıştır. Üçüncü adımda kompleks arka planın keşfi için diferansiyel madde fonksiyonu araştırmalarından potansiyel bir çözüm olarak Zumbo ve diğerleri (2015) tarafından geliştirilen ekolojik model tanıtılmıştır. Son aşamada ise kurum bazında MEB ve ÖSYM başta olmak üzere madde yazarları, araştırmacılar ve politika yapıcılar için tavsiye ve teklifler sıralanmıştır. Tartışmanın çıkarımından hareketle ana dili olarak Türkçenin ölçülmesi ve değerlendirilmesinde ekolojik yaklaşımın ve ekolojik modelin geçerlilik araştırmaları için büyük bir potansiyel taşıdığı söylenebilir. Dolayısıyla Türkiye’nin ölçme ihtiyaçlarına uygun olarak ekoloji tabanlı bir perspektifin Türkçenin ana dilinde ölçülmesine adapte edilmesi gerekmektedir. Böylece Türkçe dersi öğretim programının ölçmeye ve değerlendirmeye yönelik yaklaşım ihtiyacının karşılanması mümkün olacaktır.
Collapse
Affiliation(s)
- Talha GÖKTENTÜRK
- Yıldız Teknik Üniversitesi, Eğitim Fakültesi, Türkçe Eğitimi ABD
| | - Mehmet Hilmi SAĞLAM
- The University of British Columbia, Faculty of Education, The Department of Educational and Counselling Psychology, and Special Education
| | - Bruno D. ZUMBO
- The University of British Columbia, Faculty of Education, The Department of Educational and Counselling Psychology, and Special Education
| |
Collapse
|
7
|
Henninger M, Debelak R, Strobl C. A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:181-212. [PMID: 36601252 PMCID: PMC9806517 DOI: 10.1177/00131644221077135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.
Collapse
|
8
|
Wijayanto F, Mul K, Groot P, van Engelen BG, Heskes T. Semi-automated Rasch analysis using in-plus-out-of-questionnaire log likelihood. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2021; 74:313-339. [PMID: 32857418 PMCID: PMC8246875 DOI: 10.1111/bmsp.12218] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 07/09/2020] [Indexed: 06/11/2023]
Abstract
Rasch analysis is a popular statistical tool for developing and validating instruments that aim to measure human performance, attitudes and perceptions. Despite the availability of various software packages, constructing a good instrument based on Rasch analysis is still considered to be a complex, labour-intensive task, requiring human expertise and rather subjective judgements along the way. In this paper we propose a semi-automated method for Rasch analysis based on first principles that reduces the need for human input. To this end, we introduce a novel criterion, called in-plus-out-of-questionnaire log likelihood (IPOQ-LL). On artificial data sets, we confirm that optimization of IPOQ-LL leads to the desired behaviour in the case of multi-dimensional and inhomogeneous surveys. On three publicly available real-world data sets, our method leads to instruments that are, for all practical purposes, indistinguishable from those obtained by Rasch analysis experts through a manual procedure.
Collapse
Affiliation(s)
- Feri Wijayanto
- Department of InformaticsUniversitas Islam IndonesiaYogyakartaIndonesia
- Institute for Computing and Information SciencesRadboud UniversityNijmegenThe Netherlands
| | - Karlien Mul
- Department of NeurologyDonders Institute for BrainCognition, and BehaviourRadboud University Medical CenterNijmegenThe Netherlands
| | - Perry Groot
- Institute for Computing and Information SciencesRadboud UniversityNijmegenThe Netherlands
| | - Baziel G.M. van Engelen
- Department of NeurologyDonders Institute for BrainCognition, and BehaviourRadboud University Medical CenterNijmegenThe Netherlands
| | - Tom Heskes
- Institute for Computing and Information SciencesRadboud UniversityNijmegenThe Netherlands
| |
Collapse
|
9
|
Adams LB, Farrell M, Mall S, Mahlalela N, Berkman L. Dimensionality and differential item endorsement of depressive symptoms among aging Black populations in South Africa: Findings from the HAALSI study. J Affect Disord 2020; 277:850-856. [PMID: 33065826 PMCID: PMC7575820 DOI: 10.1016/j.jad.2020.08.073] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 07/10/2020] [Accepted: 08/24/2020] [Indexed: 11/29/2022]
Abstract
BACKGROUND The Center for Epidemiologic Studies-Depression (CES-D) scale is a widely used measure of depressive symptoms, but its construct validity has not been adequately assessed in sub-Saharan Africa. This study validates the CES-D among an aging Shangaan-speaking and predominantly Black African sample in rural South Africa, with a special emphasis on gender differences. METHODS An 8-item CES-D scale was administered in Shangaan to 5059 respondents, aged 40+ years, residing in Agincourt, South Africa. We used Cronbach's alpha and exploratory and confirmatory factor analysis to examine and confirm dimensionality of the CES-D scale. Differential endorsement of CES-D items by gender were assessed using the Mantel-Haenszel (MH) odds ratio test. RESULTS Reliability of the CES-D scale differed by gender with women reporting higher internal consistency on items than men. A two-factor solution was retained and confirmed representing two latent factors: (1) Negative Affect (six items) and (2) Diminished Positive Affect (two items). MH results showed that men exhibited significantly higher odds of putting an effort in everything that they did (OR: 1.33, 95% CI: 1.15-1.54) and lower odds of feeling depressed (OR: 0.71, 95% CI: 0.56-0.89) and having restless sleep (OR: 0.67, 95% CI:0.58-0.77) than women. LIMITATIONS Analyses were limited to a dichotomous, short form of the CES-D, a self-reported population-based measure. CONCLUSION Aging Black Africans differ in endorsing affective and somatic items on the CES-D scale by gender, which may lead to skewed population-level estimates of depression in key subpopulations. These findings highlight the importance of continued research disentangling cross-cultural and gendered nuances of depression measurements.
Collapse
Affiliation(s)
- Leslie B. Adams
- Harvard Center for Population and Development Studies,
Harvard University, Cambridge, MA, USA,Department of Mental Health, Johns Hopkins Bloomberg School
of Public Health, Baltimore, MD
| | - Meagan Farrell
- Department of Mental Health, Johns Hopkins Bloomberg School
of Public Health, Baltimore, MD
| | - Sumaya Mall
- Division of Epidemiology and Biostatistics, School of
Public Health, Faculty of Health Sciences, University of the Witwatersrand,
Johannesburg, South Africa
| | - Nomsa Mahlalela
- MRC/Wits Rural Public Health and Health Transitions
Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences,
University of the Witwatersrand, Johannesburg, South Africa
| | - Lisa Berkman
- Harvard Center for Population and Development Studies,
Harvard University, Cambridge, MA, USA
| |
Collapse
|
10
|
Cotton J, Baker ST. A data mining and item response mixture modeling method to retrospectively measure Diagnostic and Statistical Manual of Mental Disorders-5 attention deficit hyperactivity disorder in the 1970 British Cohort Study. Int J Methods Psychiatr Res 2019; 28:e1753. [PMID: 30402897 PMCID: PMC6877163 DOI: 10.1002/mpr.1753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 09/21/2018] [Accepted: 10/06/2018] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVE To facilitate future outcome studies, we aimed to develop a robust and replicable method for estimating a categorical and dimensional measure of Diagnostic and Statistical Manual of Mental Disorders-5 (DSM-5) attention deficit hyperactivity disorder (ADHD) in the 1970 British Cohort Study (BCS70). METHOD Following a data mining framework, we mapped DSM-5 ADHD symptoms to age 10 BCS70 data (N = 11,426) and derived a 16-item scale (α = 0.85). Mapping was validated by an expert panel. A categorical subgroup was derived (n = 594, 5.2%), and a zero-inflated item response theory (IRT) mixture model fitted to estimate a dimensional measure. RESULTS Subgroup composition was comparable with other ADHD samples. Relative risk ratios (ADHD/not ADHD) included boys = 1.38, unemployed fathers = 2.07, below average reading = 2.58, and depressed parent = 3.73. Our estimated measures correlated with two derived reference scales: Strengths and Difficulties Questionnaire hyperactivity (r = 0.74) and a Rutter/Conners-based scale (r = 0.81), supporting construct validity. IRT model items (symptoms) had moderate to high discrimination (0.90-2.81) and provided maximum information at average to moderate theta levels of ADHD (0.5-1.75). CONCLUSION We extended previous work to identify ADHD in BCS70, derived scales from existing data, modeled ADHD items with IRT, and adjusted for a zero-inflated distribution. Psychometric properties were promising, and this work will enable future studies of causal mechanisms in ADHD.
Collapse
Affiliation(s)
- Joanne Cotton
- Faculty of EducationUniversity of CambridgeCambridgeUK
| | - Sara T. Baker
- Faculty of EducationUniversity of CambridgeCambridgeUK
| |
Collapse
|
11
|
von Davier M, Cho Y, Pan T. Effects of Discontinue Rules on Psychometric Properties of Test Scores. PSYCHOMETRIKA 2019; 84:147-163. [PMID: 30607661 DOI: 10.1007/s11336-018-09652-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Indexed: 06/09/2023]
Abstract
This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) agree with results presented by DeAyala et al. (J Educ Meas 38:213-234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795-819, 2017. https://doi.org/10.1007/s11336-016-9544-7 ) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.
Collapse
Affiliation(s)
- Matthias von Davier
- National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104-3102, USA.
| | - Youngmi Cho
- American Institutes for Research, 1000 Thomas Jefferson Street, NW, Washington D.C., 20007, USA
| | - Tianshu Pan
- Pearson, 19500 Bulverde Rd, San Antonio, TX, 78259, USA
| |
Collapse
|
12
|
Guo H, Robin F, Dorans N. Detecting Item Drift in Large-Scale Testing. JOURNAL OF EDUCATIONAL MEASUREMENT 2017. [DOI: 10.1111/jedm.12144] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Mau WCJ, Mau YH. Factors Influencing High School Students to Persist in Aspirations of Teaching Careers. JOURNAL OF CAREER DEVELOPMENT 2016. [DOI: 10.1177/0894845305282602] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study longitudinally tracks 10th grade students for 4 years regarding their persistence in aspirations of teaching careers using a nationally representative sample (National Educational Longitudinal Survey of 1988). Factors contributing to persistence in aspirations of teaching careers are examined based on the social-cognitive career theory (SCCT). Results suggest that there are racial differences in persistence in aspirations to teaching careers. Students who persist perform better on academic achievement, score higher on locus of control, and come from a family that had a higher socioeconomic status and a higher parental education level than students who do not persist. Results also suggest a good fit of the social-cognitive model in prediction of persistence in teaching aspirations.
Collapse
|
14
|
Ziegler M, Kemper CJ, Lenzner T. The Issue of Fuzzy Concepts in Test Construction and Possible Remedies. EUROPEAN JOURNAL OF PSYCHOLOGICAL ASSESSMENT 2015. [DOI: 10.1027/1015-5759/a000255] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
| | - Christoph J. Kemper
- Institute for Medical and Pharmaceutical Proficiency Assessment, Mainz, Germany
| | - Timo Lenzner
- GESIS – Leibniz-Institute for the Social Sciences, Mannheim, Germany
| |
Collapse
|
15
|
|
16
|
Zwick R. A REVIEW OF ETS DIFFERENTIAL ITEM FUNCTIONING ASSESSMENT PROCEDURES: FLAGGING RULES, MINIMUM SAMPLE SIZE REQUIREMENTS, AND CRITERION REFINEMENT. ACTA ACUST UNITED AC 2014. [DOI: 10.1002/j.2333-8504.2012.tb02290.x] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
17
|
Mau WC, Domnick M, Ellsworth RA. Characteristics of Female Students Who Aspire to Science and Engineering or Homemaking Occupations. CAREER DEVELOPMENT QUARTERLY 2011. [DOI: 10.1002/j.2161-0045.1995.tb00437.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
18
|
Mau WC. Educational Planning and Academic Achievement of Middle School Students: A Racial and Cultural Comparison. JOURNAL OF COUNSELING AND DEVELOPMENT 2011. [DOI: 10.1002/j.1556-6676.1995.tb01788.x] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
19
|
Job satisfaction and career persistence of beginning teachers. INTERNATIONAL JOURNAL OF EDUCATIONAL MANAGEMENT 2008. [DOI: 10.1108/09513540810844558] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
20
|
French AW, Miller TR. Logistic Regression and Its Use in Detecting Differential Item Functioning in Polytomous Items. JOURNAL OF EDUCATIONAL MEASUREMENT 1996. [DOI: 10.1111/j.1745-3984.1996.tb00495.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
21
|
Welch CJ, Miller TR. Assessing Differential Item Functioning in Direct Writing Assessments: Problems and an Example. JOURNAL OF EDUCATIONAL MEASUREMENT 1995. [DOI: 10.1111/j.1745-3984.1995.tb00461.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
22
|
Skaggs G, Lissitz RW. The Consistency of Detecting Item Bias Across Different Test Administrations: Implications of Another Failure. JOURNAL OF EDUCATIONAL MEASUREMENT 1992. [DOI: 10.1111/j.1745-3984.1992.tb00375.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Freedle R, Kostin I. Item Difficulty of Four Verbal Item Types and an Index of Differential Item Functioning for Black and White Examinees. JOURNAL OF EDUCATIONAL MEASUREMENT 1990. [DOI: 10.1111/j.1745-3984.1990.tb00752.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
24
|
Kelderman H, Macready GB. The Use of Loglinear Models for Assessing Differential Item Functioning Across Manifest and Latent Examinee Groups. JOURNAL OF EDUCATIONAL MEASUREMENT 1990. [DOI: 10.1111/j.1745-3984.1990.tb00751.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
|
26
|
Johnson ST, Wallace MB. Characteristics of SAT Quantitative Items Showing Improvement After Coaching Among Black Students From Low-Income Families: An Exploratory Study. JOURNAL OF EDUCATIONAL MEASUREMENT 1989. [DOI: 10.1111/j.1745-3984.1989.tb00324.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Tatsuoka KK, Linn RL, Tatsuoka MM, Yamamoto K. Differential Item Functioning Resulting From The Use of Different Solution Strategies. JOURNAL OF EDUCATIONAL MEASUREMENT 1988. [DOI: 10.1111/j.1745-3984.1988.tb00310.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
28
|
DORANS NEILJ, KULICK EDWARD. DEMONSTRATING THE UTILITY OF THE STANDARDIZATION APPROACH TO ASSESSING UNEXPECTED DIFFERENTIAL ITEM PERFORMANCE ON THE SCHOLASTIC APTITUDE TEST. JOURNAL OF EDUCATIONAL MEASUREMENT 1986. [DOI: 10.1111/j.1745-3984.1986.tb00255.x] [Citation(s) in RCA: 228] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|