1
|
Man K. Multimodal Data Fusion to Detect Preknowledge Test-Taking Behavior Using Machine Learning. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:753-779. [PMID: 39055093 PMCID: PMC11268392 DOI: 10.1177/00131644231193625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
In various fields, including college admission, medical board certifications, and military recruitment, high-stakes decisions are frequently made based on scores obtained from large-scale assessments. These decisions necessitate precise and reliable scores that enable valid inferences to be drawn about test-takers. However, the ability of such tests to provide reliable, accurate inference on a test-taker's performance could be jeopardized by aberrant test-taking practices, for instance, practicing real items prior to the test. As a result, it is crucial for administrators of such assessments to develop strategies that detect potential aberrant test-takers after data collection. The aim of this study is to explore the implementation of machine learning methods in combination with multimodal data fusion strategies that integrate bio-information technology, such as eye-tracking, and psychometric measures, including response times and item responses, to detect aberrant test-taking behaviors in technology-assisted remote testing settings.
Collapse
Affiliation(s)
- Kaiwen Man
- The University of Alabama, Tuscaloosa, USA
| |
Collapse
|
2
|
Gorney K, Sinharay S, Liu X. Using item scores and response times in person-fit assessment. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2024; 77:151-168. [PMID: 37667833 DOI: 10.1111/bmsp.12320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 07/12/2023] [Accepted: 08/08/2023] [Indexed: 09/06/2023]
Abstract
The use of joint models for item scores and response times is becoming increasingly popular in educational and psychological testing. In this paper, we propose two new person-fit statistics for such models in order to detect aberrant behaviour. The first statistic is computed by combining two existing person-fit statistics: one for the item scores, and one for the item response times. The second statistic is computed directly using the likelihood function of the joint model. Using detailed simulations, we show that the empirical null distributions of the new statistics are very close to the theoretical null distributions, and that the new statistics tend to be more powerful than several existing statistics for item scores and/or response times. A real data example is also provided using data from a licensure examination.
Collapse
Affiliation(s)
- Kylie Gorney
- University of Wisconsin-Madison, Madison, Wisconsin, USA
| | | | - Xiang Liu
- Educational Testing Service, Princeton, New Jersey, USA
| |
Collapse
|
3
|
Man K, Harring JR. Detecting Preknowledge Cheating via Innovative Measures: A Mixture Hierarchical Model for Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:1059-1080. [PMID: 37663535 PMCID: PMC10470163 DOI: 10.1177/00131644221136142] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Preknowledge cheating jeopardizes the validity of inferences based on test results. Many methods have been developed to detect preknowledge cheating by jointly analyzing item responses and response times. Gaze fixations, an essential eye-tracker measure, can be utilized to help detect aberrant testing behavior with improved accuracy beyond using product and process data types in isolation. As such, this study proposes a mixture hierarchical model that integrates item responses, response times, and visual fixation counts collected from an eye-tracker (a) to detect aberrant test takers who have different levels of preknowledge and (b) to account for nuances in behavioral patterns between normally-behaved and aberrant examinees. A Bayesian approach to estimating model parameters is carried out via an MCMC algorithm. Finally, the proposed model is applied to experimental data to illustrate how the model can be used to identify test takers having preknowledge on the test items.
Collapse
|
4
|
Fox JP, Klotzke K, Simsek AS. R-package LNIRT for joint modeling of response accuracy and times. PeerJ Comput Sci 2023; 9:e1232. [PMID: 37346642 PMCID: PMC10280685 DOI: 10.7717/peerj-cs.1232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 01/10/2023] [Indexed: 06/23/2023]
Abstract
In computer-based testing it has become standard to collect response accuracy (RA) and response times (RTs) for each test item. IRT models are used to measure a latent variable (e.g., ability, intelligence) using the RA observations. The information in the RTs can help to improve routine operations in (educational) testing, and provide information about speed of working. In modern applications, the joint models are needed to integrate RT information in a test analysis. The R-package LNIRT supports fitting joint models through a user-friendly setup which only requires specifying RA, RT data, and the total number of Gibbs sampling iterations. More detailed specifications of the analysis are optional. The main results can be reported through the summary functions, but output can also be analysed with Markov chain Monte Carlo (MCMC) output tools (i.e., coda, mcmcse). The main functionality of the LNIRT package is illustrated with two real data applications.
Collapse
Affiliation(s)
- Jean-Paul Fox
- Faculty of Behavioral, Management, and Social Sciences, University of Twente, Enschede, Netherlands
| | - Konrad Klotzke
- Faculty of Behavioral, Management, and Social Sciences, University of Twente, Enschede, Netherlands
| | - Ahmet Salih Simsek
- Department of Measurement and Evaluation in Education, University of Kirsehir Ahi Evran, Kirsehir, Turkey
| |
Collapse
|
5
|
Man K, Harring JR, Zhan P. Bridging Models of Biometric and Psychometric Assessment: A Three-Way Joint Modeling Approach of Item Responses, Response Times, and Gaze Fixation Counts. APPLIED PSYCHOLOGICAL MEASUREMENT 2022; 46:361-381. [PMID: 35812811 PMCID: PMC9265489 DOI: 10.1177/01466216221089344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Recently, joint models of item response data and response times have been proposed to better assess and understand test takers' learning processes. This article demonstrates how biometric information such as gaze fixation counts obtained from an eye-tracking machine can be integrated into the measurement model. The proposed joint modeling framework accommodates the relations among a test taker's latent ability, working speed and test engagement level via a person-side variance-covariance structure, while simultaneously permitting the modeling of item difficulty, time-intensity, and the engagement intensity through an item-side variance-covariance structure. A Bayesian estimation scheme is used to fit the proposed model to data. Posterior predictive model checking based on three discrepancy measures corresponding to various model components are introduced to assess model-data fit. Findings from a Monte Carlo simulation and results from analyzing experimental data demonstrate the utility of the model.
Collapse
Affiliation(s)
- Kaiwen Man
- University of Alabama, Tuscaloosa, AL, USA
- Kaiwen Man, Educational Research Program, Educational Studies in Psychology, Research Methodology, and Counseling, 313 Carmichael Box 870231, University of Alabama, Tuscaloosa, AL 35487, USA.
| | | | - Peida Zhan
- Zhejiang Normal University, Jinhua, China
| |
Collapse
|
6
|
Guo X, Jiao Y, Huang Z, Liu T. Joint Modeling of Response Accuracy and Time in Between-Item Multidimensional Tests Based on Bi-Factor Model. Front Psychol 2022; 13:763959. [PMID: 35478766 PMCID: PMC9035624 DOI: 10.3389/fpsyg.2022.763959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 03/04/2022] [Indexed: 11/16/2022] Open
Abstract
With the popularity of computer-based testing (CBT), it is easier to collect item response times (RTs) in psychological and educational assessments. RTs can provide an important source of information for respondents and tests. To make full use of RTs, the researchers have invested substantial effort in developing statistical models of RTs. Most of the proposed models posit a unidimensional latent speed to account for RTs in tests. In psychological and educational tests, many tests are multidimensional, either deliberately or inadvertently. There may be general effects in between-item multidimensional tests. However, currently there exists no RT model that considers the general effects to analyze between-item multidimensional test RT data. Also, there is no joint hierarchical model that integrates RT and response accuracy (RA) for evaluating the general effects of between-item multidimensional tests. Therefore, a bi-factor joint hierarchical model using between-item multidimensional test is proposed in this study. The simulation indicated that the Hamiltonian Monte Carlo (HMC) algorithm works well in parameter recovery. Meanwhile, the information criteria showed that the bi-factor hierarchical model (BFHM) is the best fit model. This means that it is necessary to take into consideration the general effects (general latent trait) and the multidimensionality of the RT in between-item multidimensional tests.
Collapse
Affiliation(s)
- Xiaojun Guo
- School of Education Science, Gannan Normal University, Ganzhou, China
| | - Yuyue Jiao
- School of Education Science, Gannan Normal University, Ganzhou, China
| | - ZhengZheng Huang
- School of Humanities, Hubei University of Chinese Medicine, Wuhan, China
- *Correspondence: ZhengZheng Huang,
| | - TieChuan Liu
- School of Education Science, Gannan Normal University, Ganzhou, China
| |
Collapse
|
7
|
Becker B, Debeer D, Weirich S, Goldhammer F. On the Speed Sensitivity Parameter in the Lognormal Model for Response Times and Implications for High-Stakes Measurement Practice. APPLIED PSYCHOLOGICAL MEASUREMENT 2021; 45:407-422. [PMID: 34565944 PMCID: PMC8381695 DOI: 10.1177/01466216211008530] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.
Collapse
Affiliation(s)
| | | | | | - Frank Goldhammer
- DIPF – Leibniz Institute for Research
and Information in Education, Frankfurt am Main, Germany
- Centre for International Student
Assessment Germany (ZIB)
| |
Collapse
|
8
|
Sinharay S, Johnson MS. The use of item scores and response times to detect examinees who may have benefited from item preknowledge. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:397-419. [PMID: 31418458 DOI: 10.1111/bmsp.12187] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/04/2019] [Indexed: 05/28/2023]
Abstract
According to Wollack and Schoenig (2018, The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage, 260), benefiting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect examinees who may have benefited from item preknowledge for the case when the set of compromised items is known. The asymptotic distribution of the new statistic under no preknowledge is proved to be a simple mixture of two χ2 distributions. We perform a detailed simulation study to show that the Type I error rate of the new statistic is very close to the nominal level and that the power of the new statistic is satisfactory in comparison to that of the existing statistics for detecting item preknowledge based on both item scores and response times. We also include a real data example to demonstrate the usefulness of the suggested statistic.
Collapse
|
9
|
Ranger J, Kuhn J, Wolgast A. Robust Estimation of Ability and Mental Speed Employing the Hierarchical Model for Responses and Response Times. JOURNAL OF EDUCATIONAL MEASUREMENT 2020. [DOI: 10.1111/jedm.12284] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Sinharay S. Detection of Item Preknowledge Using Response Times. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:376-392. [PMID: 32879537 PMCID: PMC7433384 DOI: 10.1177/0146621620909893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. This article suggests a new statistic that can be used for detecting the examinees who may have benefited from item preknowledge using their response times. The statistic quantifies the difference in speed between the compromised items and the non-compromised items of the examinees. The distribution of the statistic under the null hypothesis of no preknowledge is proved to be the standard normal distribution. A simulation study is used to evaluate the Type I error rate and power of the suggested statistic. A real data example demonstrates the usefulness of the new statistic that is found to provide information that is not provided by statistics based only on item scores.
Collapse
|
11
|
The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed. ACTA PSYCHOLOGICA SINICA 2020. [DOI: 10.3724/sp.j.1041.2020.01132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Man K, Harring JR, Sinharay S. Use of Data Mining Methods to Detect Test Fraud. JOURNAL OF EDUCATIONAL MEASUREMENT 2019. [DOI: 10.1111/jedm.12208] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Myszkowski N. The first glance is the weakest: “Tasteful” individuals are slower to judge visual art. PERSONALITY AND INDIVIDUAL DIFFERENCES 2019. [DOI: 10.1016/j.paid.2019.01.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
14
|
Sinharay S. A New Person-Fit Statistic for the Lognormal Model for Response Times. JOURNAL OF EDUCATIONAL MEASUREMENT 2018. [DOI: 10.1111/jedm.12188] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
Zhan P, Jiao H, Liao D. Cognitive diagnosis modelling incorporating item response times. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2018; 71:262-286. [PMID: 28872185 DOI: 10.1111/bmsp.12114] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 06/24/2017] [Indexed: 05/07/2023]
Abstract
To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters.
Collapse
Affiliation(s)
- Peida Zhan
- Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, China
| | - Hong Jiao
- Measurement, Statistics and Evaluation, Department of Human Development and Quantitative Methodology, University of Maryland, College Park, Maryland, USA
| | - Dandan Liao
- Measurement, Statistics and Evaluation, Department of Human Development and Quantitative Methodology, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
16
|
Zhan P, Liao M, Bian Y. Joint Testlet Cognitive Diagnosis Modeling for Paired Local Item Dependence in Response Times and Response Accuracy. Front Psychol 2018; 9:607. [PMID: 29922192 PMCID: PMC5996944 DOI: 10.3389/fpsyg.2018.00607] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 04/10/2018] [Indexed: 12/04/2022] Open
Abstract
In joint models for item response times (RTs) and response accuracy (RA), local item dependence is composed of local RA dependence and local RT dependence. The two components are usually caused by the same common stimulus and emerge as pairs. Thus, the violation of local item independence in the joint models is called paired local item dependence. To address the issue of paired local item dependence while applying the joint cognitive diagnosis models (CDMs), this study proposed a joint testlet cognitive diagnosis modeling approach. The proposed approach is an extension of Zhan et al. (2017) and it incorporates two types of random testlet effect parameters (one for RA and the other for RTs) to account for paired local item dependence. The model parameters were estimated using the full Bayesian Markov chain Monte Carlo (MCMC) method. The 2015 PISA computer-based mathematics data were analyzed to demonstrate the application of the proposed model. Further, a brief simulation study was conducted to demonstrate the acceptable parameter recovery and the consequence of ignoring paired local item dependence.
Collapse
Affiliation(s)
- Peida Zhan
- Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing, China
| | - Manqian Liao
- Measurement, Statistics and Evaluation, Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD, United States
| | - Yufang Bian
- Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University, Beijing, China
| |
Collapse
|
17
|
Erratum. JOURNAL OF EDUCATIONAL MEASUREMENT 2017. [DOI: 10.1111/jedm.12149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|