1
|
Jost L, Jansen P. The influence of the design of mental rotation trials on performance and possible differences between sexes: A theoretical review and experimental investigation. Q J Exp Psychol (Hove) 2023:17470218231200127. [PMID: 37644655 DOI: 10.1177/17470218231200127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Sex differences in mental rotation performance are one of the largest in cognitive psychology. Men outperform women by up to 1 SD in psychometric mental rotation tests, but it is often neglected that there are no or only small sex differences for chronometric tests. As both tests are supposed to measure the same ability, we suspect some features of the tests themselves to affect sex differences in performance. Following a theoretical review of the test features, we evaluate the effects of the number of possible answer alternatives, whether they are presented as pairwise mirrored, and their interaction on sex differences in mental rotation performance. In an online experiment, 838 German-speaking participants, 421 women, 417 men, Mage = 42.58 (SD = 12.54) years, solved four blocks of mental rotation trials with two or eight alternatives, which were either pairwise mirrored or not. The results show that that the overall performance was lower for more alternatives and for mixed alternatives but not for their interaction. We could not determine explanations for sex differences as we did not observe meaningful sex differences at all. Possible reasons include the differences between men and women in age and education. This study suggests that the differences between tests affect performance. Sex differences, however, need more investigation, including possible effects and interactions of the test design, education, and age.
Collapse
Affiliation(s)
- Leonardo Jost
- Faculty of Human Sciences, University of Regensburg, Regensburg, Germany
| | - Petra Jansen
- Faculty of Human Sciences, University of Regensburg, Regensburg, Germany
| |
Collapse
|
2
|
Lions S, Dartnell P, Toledo G, Godoy MI, Córdova N, Jiménez D, Lemarié J. Position of Correct Option and Distractors Impacts Responses to Multiple-Choice Items: Evidence From a National Test. Educ Psychol Meas 2023; 83:861-884. [PMID: 37663536 PMCID: PMC10470158 DOI: 10.1177/00131644221132335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both correct response and distractors. Since multiple-choice tests' outcomes can be strikingly consequential and option position effects constitute a potential source of measurement error, these effects should be clarified. In this study, two experiments in which the position of correct response and distractors was carefully manipulated were performed within a Chilean national high-stakes standardized test, responded by 195,715 examinees. Results show small but clear and systematic effects of options position on examinees' responses in both experiments. They consistently indicate that a five-option item is slightly easier when the correct response is in A rather than E and when the most attractive distractor is after and far away from the correct response. They clarify and extend previous findings, showing that the appeal of all options is influenced by position. The existence and nature of a potential interference phenomenon between the options' processing are discussed, and implications for test development are considered.
Collapse
|
3
|
Kanzow AF, Schmidt D, Kanzow P. Scoring Single-Response Multiple-Choice Items: Scoping Review and Comparison of Different Scoring Methods. JMIR Med Educ 2023; 9:e44084. [PMID: 37001510 DOI: 10.2196/44084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 03/06/2023] [Accepted: 03/31/2023] [Indexed: 05/20/2023]
Abstract
BACKGROUND Single-choice items (eg, best-answer items, alternate-choice items, single true-false items) are 1 type of multiple-choice items and have been used in examinations for over 100 years. At the end of every examination, the examinees' responses have to be analyzed and scored to derive information about examinees' true knowledge. OBJECTIVE The aim of this paper is to compile scoring methods for individual single-choice items described in the literature. Furthermore, the metric expected chance score and the relation between examinees' true knowledge and expected scoring results (averaged percentage score) are analyzed. Besides, implications for potential pass marks to be used in examinations to test examinees for a predefined level of true knowledge are derived. METHODS Scoring methods for individual single-choice items were extracted from various databases (ERIC, PsycInfo, Embase via Ovid, MEDLINE via PubMed) in September 2020. Eligible sources reported on scoring methods for individual single-choice items in written examinations including but not limited to medical education. Separately for items with n=2 answer options (eg, alternate-choice items, single true-false items) and best-answer items with n=5 answer options (eg, Type A items) and for each identified scoring method, the metric expected chance score and the expected scoring results as a function of examinees' true knowledge using fictitious examinations with 100 single-choice items were calculated. RESULTS A total of 21 different scoring methods were identified from the 258 included sources, with varying consideration of correctly marked, omitted, and incorrectly marked items. Resulting credit varied between -3 and +1 credit points per item. For items with n=2 answer options, expected chance scores from random guessing ranged between -1 and +0.75 credit points. For items with n=5 answer options, expected chance scores ranged between -2.2 and +0.84 credit points. All scoring methods showed a linear relation between examinees' true knowledge and the expected scoring results. Depending on the scoring method used, examination results differed considerably: Expected scoring results from examinees with 50% true knowledge ranged between 0.0% (95% CI 0% to 0%) and 87.5% (95% CI 81.0% to 94.0%) for items with n=2 and between -60.0% (95% CI -60% to -60%) and 92.0% (95% CI 86.7% to 97.3%) for items with n=5. CONCLUSIONS In examinations with single-choice items, the scoring result is not always equivalent to examinees' true knowledge. When interpreting examination scores and setting pass marks, the number of answer options per item must usually be taken into account in addition to the scoring method used.
Collapse
Affiliation(s)
| | - Dennis Schmidt
- Department of Preventive Dentistry, Periodontology and Cariology, University Medical Center Göttingen, Göttingen, Germany
| | - Philipp Kanzow
- Department of Preventive Dentistry, Periodontology and Cariology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
4
|
Schurter T, Escher M, Gachoud D, Bednarski P, Hug B, Kropf R, Meng-Hentschel J, König B, Beyeler C, Guttormsen S, Huwendiek S. Essential steps in the development, implementation, evaluation and quality assurance of the written part of the Swiss federal licensing examination for human medicine. GMS J Med Educ 2022; 39:Doc43. [PMID: 36310888 PMCID: PMC9585413 DOI: 10.3205/zma001564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 06/07/2022] [Accepted: 07/05/2022] [Indexed: 06/16/2023]
Abstract
PURPOSE This report describes the essential steps in the development, implementation, evaluation and quality assurance of the written part of the Swiss Federal Licensing Examination for Human Medicine (FLE) and the insights gained since its introduction in 2011. METHODS Based on existing scientific evidence, international expertise, and experience gained from previous examinations, the FLE is developed by experts from all five medical faculties in Switzerland with the support of the Institute for Medical Education and is held simultaneously at five locations. The exam organisers document and review every examination held and continuously optimise the processes; they have summarised the results in this report. RESULTS The essential steps comprise the development, revision and translation of questions; construction of the exam and production of materials; candidate preparation; implementation and analysis. The quality assurance measures consist of guideline coherence in the development of the questions and implementation of the exam, revision processes, construction of the exam based on the national blueprint, multiphase review of the translations and exam material, and statistical analysis of the exam and the comments from candidates. The intensive collaboration, especially on the part of representatives from all the participating faculties and a central coordination unit, which provides methodological support throughout and oversees the analysis of the exam, has proven successful. Successfully completed examinations and reliable results in the eleven examinations so far implemented represent the outcomes of the quality assurance measures. Significant insights in recent years are the importance of appreciating the work of those involved and the central organisation of exam development, thus ensuring the long-term success of the process. CONCLUSION Common guidelines and workshops, quality assurance measures accompanied by the continuous improvement of all processes, and appreciation of everyone involved, are essential to carrying out such an examination at a high-quality level in the long term.
Collapse
Affiliation(s)
- Tina Schurter
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| | - Monica Escher
- University of Geneva, Medical Faculty, Geneva, Switzerland
| | - David Gachoud
- University of Lausanne, Medical Faculty, Lausanne, Switzerland
| | - Piotr Bednarski
- University of Fribourg, Medical Faculty, Fribourg, Switzerland
- University of Bern, Medical Faculty, Bern, Switzerland
| | - Balthasar Hug
- University of Basel, Medical Faculty, Basel, Switzerland
- University of Lucerne, Medical Faculty, Lucerne, Switzerland
| | - Roger Kropf
- University of Basel, Medical Faculty, Basel, Switzerland
- University of Zurich, Medical Faculty, Zurich, Switzerland
| | - Juliane Meng-Hentschel
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| | - Benjamin König
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| | - Christine Beyeler
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| | - Sissel Guttormsen
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| | - Sören Huwendiek
- University of Bern, Institute for Medical Education, Department for Assessment and Evaluation, Bern, Switzerland
| |
Collapse
|
5
|
Anschuetz W, Wagner F, Jucker-Kupper P, Huwendiek S. Workshops for developing written exam questions go online: appropriate format according to the participants. GMS J Med Educ 2021; 38:Doc17. [PMID: 33659622 PMCID: PMC7899100 DOI: 10.3205/zma001413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 10/20/2020] [Accepted: 11/24/2020] [Indexed: 06/12/2023]
Abstract
Background: The Corona pandemic has made it difficult to conduct face-to-face events, which is why two workshops planned for the development of multiple choice (MC) questions were conducted online. Whether the online format is suitable for MC question development has not yet been described to our knowledge. Questions: The study aimed to answer the following questions from the perspective of the participants: How are the two online workshops evaluated in terms of their implementation? Are these online workshops suitable for developing MC questions? Is the online or face-to-face format preferred? As a measure of efficiency, it was examined whether the expected question output (standard of comparable face-to-face workshops) was achieved in the online workshops. Methods: In May and June 2020, two online workshops with a total of 24 participants were conducted for Swiss professional societies with SWITCHinteract. The participants' feedback was collected via an anonymous online survey with 21 questions. Results: 88% of the participants took part in the voluntary online survey. The participants were satisfied with the implementation and found the online format suitable. The majority of the participants did not show a preference for a certain format (online vs. face-to-face), although in case of a format preference the online format was indicated more often. The expected question output was exceeded in both workshops. Technical aspects were most frequently cited as requiring improvement. Conclusion: Based on the results, online workshops for MC question development can be considered as a resource-saving and efficient alternative to face-to-face workshops. Increased use and optimization of online tools could further facilitate implementation and influence the format preference.
Collapse
Affiliation(s)
- Wilma Anschuetz
- Universität Bern, Institut für medizinische Lehre, Bern, Switzerland
| | - Felicitas Wagner
- Universität Bern, Institut für medizinische Lehre, Bern, Switzerland
| | | | - Sören Huwendiek
- Universität Bern, Institut für medizinische Lehre, Bern, Switzerland
| |
Collapse
|
6
|
Abstract
We focus on the relationship between the COVID-19 threat and variety-seeking. Increased perceived threat of COVID-19 increases the number of different options selected in multiple choices. Increased perceived threat of COVID-19 increases the number of risky activities selected. The type of decision moderates the impact of the perceived threat on variety-seeking.
The COVID-19 pandemic has significantly influenced our daily and social lives as well as our consumption patterns. This paper focuses on the relationship between the COVID-19 threat and variety-seeking. Based on several theories, including reactance theory and terror management theory, we predict that the perceived threat of COVID-19 will increase the tendency to choose more and different options in multiple choice settings. Firstly, two empirical studies demonstrate that variety-seeking in food and stationery choices is enhanced as people's perceived threat from the disease increases. Study 3 further suggests the boundary conditions of the above pattern in that the type of decision (i.e., multiple option selections across different brands vs. within the same brand) will moderate the impact of the perceived threat on varietyseeking. Specifically, when the decision involved choice across different brands, participants showed higher variety-seeking under high (vs. low) perceived threat. However, the opposite pattern was true when the decision involved choice within the same brand. This research offers a deeper understanding of how variety-seeking can be changed by the perceived threat of COVID-19.
Collapse
|
7
|
Chau BKH, Law CK, Lopez-Persem A, Klein-Flügge MC, Rushworth MFS. Consistent patterns of distractor effects during decision making. eLife 2020; 9:e53850. [PMID: 32628109 PMCID: PMC7371422 DOI: 10.7554/elife.53850] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 07/06/2020] [Indexed: 01/24/2023] Open
Abstract
The value of a third potential option or distractor can alter the way in which decisions are made between two other options. Two hypotheses have received empirical support: that a high value distractor improves the accuracy with which decisions between two other options are made and that it impairs accuracy. Recently, however, it has been argued that neither observation is replicable. Inspired by neuroimaging data showing that high value distractors have different impacts on prefrontal and parietal regions, we designed a dual route decision-making model that mimics the neural signals of these regions. Here we show in the dual route model and empirical data that both enhancement and impairment effects are robust phenomena but predominate in different parts of the decision space defined by the options' and the distractor's values. However, beyond these constraints, both effects co-exist under similar conditions. Moreover, both effects are robust and observable in six experiments.
Collapse
Affiliation(s)
- Bolton KH Chau
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic UniversityHong KongHong Kong
- University Research Facility in Behavioral and Systems Neuroscience, The Hong Kong Polytechnic UniversityHong KongHong Kong
| | - Chun-Kit Law
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic UniversityHong KongHong Kong
| | - Alizée Lopez-Persem
- Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of OxfordOxfordUnited Kingdom
- FrontLab, Paris Brain Institute (ICM), Inserm U 1127, CNRS UMR 7225, Sorbonne UniversitéParisFrance
| | - Miriam C Klein-Flügge
- Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of OxfordOxfordUnited Kingdom
| | - Matthew FS Rushworth
- Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of OxfordOxfordUnited Kingdom
| |
Collapse
|
8
|
Loudon C, Macias-Muñoz A. Item statistics derived from three-option versions of multiple-choice questions are usually as robust as four- or five-option versions: implications for exam design. Adv Physiol Educ 2018; 42:565-575. [PMID: 30192185 DOI: 10.1152/advan.00186.2016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Different versions of multiple-choice exams were administered to an undergraduate class in human physiology as part of normal testing in the classroom. The goal was to evaluate whether the number of options (possible answers) per question influenced the effectiveness of this assessment. Three exams (each with three versions) were given to each of two sections during an academic quarter. All versions were equally long, with 30 questions: 10 questions with 3 options, 10 questions with 4, and 10 questions with 5 (always one correct answer plus distractors). Each question appeared in all three versions of an exam, with a different number of options in each version (three, four, or five). Discrimination (point biserial and upper-lower discrimination indexes) and difficulty were evaluated for each question. There was a small increase in difficulty (a lower average score on a question) when more options were provided. The upper-lower discrimination index indicated a small improvement in assessment of student learning with more options, although the point biserial did not. The total length of a question (number of words) was associated with a small increase in discrimination and difficulty, independent of the number of options. Quantitative questions were more likely to show an increase in discrimination with more options than nonquantitative questions, but this effect was very small. Therefore, for these testing conditions, there appears to be little advantage in providing more than three options per multiple-choice question, and there are disadvantages, such as needing more time for an exam.
Collapse
Affiliation(s)
- Catherine Loudon
- Department of Ecology and Evolutionary Biology, University of California-Irvine , Irvine California
| | - Aide Macias-Muñoz
- Department of Ecology and Evolutionary Biology, University of California-Irvine , Irvine California
| |
Collapse
|
9
|
Stohl HE, Miller DA. Training residents to be factually accurate and articulate: A case study using foetal heart rate monitoring nomenclature. J OBSTET GYNAECOL 2016; 36:954-956. [PMID: 27184212 DOI: 10.1080/01443615.2016.1174835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Careful communication between members of the obstetric team about intrapartum foetal heart rate is critical for clinical management and patient safety. This study evaluated the benefits of two testing modalities in assessing resident physician knowledge of the 2008 NICHD nomenclature. Multiple-choice (MC) and short-answer (SA) examinations were administered to Obstetrics and Gynecology resident physicians before an educational intervention and then immediately after the training, at 6 months and at 12 months. Test scores on both the MC and the SA examinations improved after the training session. The improvement was sustained over the course of the study. Residents performed higher on the MC examination than on the SA test. This study suggests that formalised teaching in foetal heart rate monitoring improves resident physician knowledge of the NICHD nomenclature and that SA examinations may better discriminate between residents who are and are not able to accurately articulate foetal heart rate monitoring terminology.
Collapse
Affiliation(s)
- Hindi E Stohl
- a Department of Obstetrics and Gynecology, Harbor-UCLA Medicine Center , Torrance , CA , USA
| | - David A Miller
- b Department of Obstetrics and Gynecology , Keck School of Medicine, The University of Southern California , Los Angeles , CA , USA
| |
Collapse
|
10
|
Hift RJ. Should essays and other "open-ended"-type questions retain a place in written summative assessment in clinical medicine? BMC Med Educ 2014; 14:249. [PMID: 25431359 PMCID: PMC4275935 DOI: 10.1186/s12909-014-0249-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 11/07/2014] [Indexed: 05/27/2023]
Abstract
BACKGROUND Written assessments fall into two classes: constructed-response or open-ended questions, such as the essay and a number of variants of the short-answer question, and selected-response or closed-ended questions; typically in the form of multiple-choice. It is widely believed that constructed response written questions test higher order cognitive processes in a manner that multiple-choice questions cannot, and consequently have higher validity. DISCUSSION An extensive review of the literature suggests that in summative assessment neither premise is evidence-based. Well-structured open-ended and multiple-choice questions appear equivalent in their ability to assess higher cognitive functions, and performance in multiple-choice assessments may correlate more highly than the open-ended format with competence demonstrated in clinical practice following graduation. Studies of construct validity suggest that both formats measure essentially the same dimension, at least in mathematics, the physical sciences, biology and medicine. The persistence of the open-ended format in summative assessment may be due to the intuitive appeal of the belief that synthesising an answer to an open-ended question must be both more cognitively taxing and similar to actual experience than is selecting a correct response. I suggest that cognitive-constructivist learning theory would predict that a well-constructed context-rich multiple-choice item represents a complex problem-solving exercise which activates a sequence of cognitive processes which closely parallel those required in clinical practice, hence explaining the high validity of the multiple-choice format. SUMMARY The evidence does not support the proposition that the open-ended assessment format is superior to the multiple-choice format, at least in exit-level summative assessment, in terms of either its ability to test higher-order cognitive functioning or its validity. This is explicable using a theory of mental models, which might predict that the multiple-choice format will have higher validity, a statement for which some empiric support exists. Given the superior reliability and cost-effectiveness of the multiple-choice format consideration should be given to phasing out open-ended format questions in summative assessment. Whether the same applies to non-exit-level assessment and formative assessment is a question which remains to be answered; particularly in terms of the educational effect of testing, an area which deserves intensive study.
Collapse
Affiliation(s)
- Richard J Hift
- Clinical and Professional Practice Research Group, School of Clinical Medicine, University of KwaZulu-Natal, Durban, 4013 South Africa
| |
Collapse
|
11
|
Scheithauer MC, Tiger JH, Miller SJ. On the efficacy of a computer-based program to teach visual Braille reading. J Appl Behav Anal 2013; 46:436-43. [PMID: 24114158 DOI: 10.1002/jaba.48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Accepted: 11/04/2012] [Indexed: 11/07/2022]
Abstract
Scheithauer and Tiger (2012) created an efficient computerized program that taught 4 sighted college students to select text letters when presented with visual depictions of Braille alphabetic characters and resulted in the emergence of some braille reading. The current study extended these results to a larger sample (n = 81) and compared the efficacy and efficiency of the instructional program using 2 different response modalities. One variation of the program required a response in a multiple-choice format, and the other variation required a keyed response. Both instructional programs resulted in increased braille letter identification and braille reading. These skills were maintained at a follow-up session 7 to 14 days later. The mean time needed to complete the program was 22.8 min across participants. Implications of these results for future research, as well as practical implications for teaching the braille alphabet, are discussed.
Collapse
|
12
|
Gunderman RB, Ladowski JM. Inherent limitations of multiple-choice testing. Acad Radiol 2013; 20:1319-21. [PMID: 24029066 DOI: 10.1016/j.acra.2013.04.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Revised: 04/17/2013] [Accepted: 04/24/2013] [Indexed: 11/25/2022]
|