1
|
Fitzek S, Choi KEA. Shaping future practices: German-speaking medical and dental students' perceptions of artificial intelligence in healthcare. BMC MEDICAL EDUCATION 2024; 24:844. [PMID: 39107732 PMCID: PMC11304766 DOI: 10.1186/s12909-024-05826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 07/26/2024] [Indexed: 08/10/2024]
Abstract
BACKGROUND The growing use of artificial intelligence (AI) in healthcare necessitates understanding the perspectives of future practitioners. This study investigated the perceptions of German-speaking medical and dental students regarding the role of artificial intelligence (AI) in their future practices. METHODS A 28-item survey adapted from the AI in Healthcare Education Questionnaire (AIHEQ) and the Medical Student's Attitude Toward AI in Medicine (MSATAIM) scale was administered to students in Austria, Germany, and Switzerland from April to July 2023. Participants were recruited through targeted advertisements on Facebook and Instagram and were required to be proficient in German and enrolled in medical or dental programs. The data analysis included descriptive statistics, correlations, t tests, and thematic analysis of the open-ended responses. RESULTS Of the 409 valid responses (mean age = 23.13 years), only 18.2% of the participants reported receiving formal training in AI. Significant positive correlations were found between self-reported tech-savviness and AI familiarity (r = 0.67) and between confidence in finding reliable AI information and positive attitudes toward AI (r = 0.72). While no significant difference in AI familiarity was found between medical and dental students, dental students exhibited slightly more positive attitudes toward the integration of AI into their future practices. CONCLUSION This study underscores the need for comprehensive AI education in medical and dental curricula to address knowledge gaps and prepare future healthcare professionals for the ethical and effective integration of AI in practice.
Collapse
Affiliation(s)
- Sebastian Fitzek
- Health Services Research, Faculty of Medicine/Dentistry, Danube Private University, Steiner Landstraße 124, Krems‑Stein, 3500, Austria.
| | - Kyung-Eun Anna Choi
- Health Services Research, Faculty of Medicine/Dentistry, Danube Private University, Steiner Landstraße 124, Krems‑Stein, 3500, Austria
- Center for Health Services Research, Brandenburg Medical School, Seebad 82/83, 15562 Rüdersdorf b. Berlin, Neuruppin, Germany
| |
Collapse
|
2
|
Homer M. Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024; 29:919-934. [PMID: 37843678 PMCID: PMC11208245 DOI: 10.1007/s10459-023-10289-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 09/24/2023] [Indexed: 10/17/2023]
Abstract
Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades-with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades-to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways-via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.
Collapse
Affiliation(s)
- Matt Homer
- School of Medicine, University of Leeds, Leeds, LS2 JT, UK.
| |
Collapse
|
3
|
Tavares W, Pearce J. Attending to Variable Interpretations of Assessment Science and Practice. TEACHING AND LEARNING IN MEDICINE 2024; 36:244-252. [PMID: 37431929 DOI: 10.1080/10401334.2023.2231923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 05/31/2023] [Indexed: 07/12/2023]
Abstract
Issue: The way educators think about the nature of competence, the approaches one selects for the assessment of competence, what generated data implies, and what counts as good assessment now involve broader and more diverse interpretive processes. Broadening philosophical positions in assessment has educators applying different interpretations to similar assessment concepts. As a result, what is claimed through assessment, including what counts as quality, can be different for each of us despite using similar activities and language. This is leading to some uncertainty on how to proceed or worse, provides opportunities for questioning the legitimacy of any assessment activity or outcome. While some debate in assessment is inevitable, most have been within philosophical positions (e.g., how best to minimize error), whereas newer debates are happening across philosophical positions (e.g., whether error is a useful concept). As new ways of approaching assessment have emerged, the interpretive nature of underlying philosophical positions has not been sufficiently attended to. Evidence: We illustrate interpretive processes of assessment in action by: (a) summarizing the current health professions assessment context from a philosophical perspective as a way of describing its evolution; (b) demonstrating implications in practice using two examples (i.e., analysis of assessment work and validity claims); and (c) examining pragmatism to demonstrate how even within specific philosophical positions opportunities for variable interpretations still exist. Implications: Our concern is not that assessment designers and users have different assumptions, but that practically, educators may unknowingly (or insidiously) apply different assumptions, and methodological and interpretive norms, and subsequently settle on different views on what serves as quality assessment even for the same assessment program or event. With the state of assessment in health professions in flux, we conclude by calling for a philosophically explicit approach to assessment, and underscore assessment as, fundamentally, an interpretive process - one which demands the careful elucidation of philosophical assumptions to promote understanding and ultimately defensibility of assessment processes and outcomes.
Collapse
Affiliation(s)
- Walter Tavares
- The Wilson Centre for Health Professions Education Research, and Post-Graduate Medical Education, Toronto, Canada
- Temerty Faculty of Medicine, University Health Network and University of Toronto, Toronto, Canada
- Department of Health and Society, University of Toronto, Toronto, Canada
- York Region Paramedic Services, Community Health Services, Regional Municipality of York, Newmarket, Canada
| | - Jacob Pearce
- Tertiary Education, Australian Council for Educational Research, Camberwell, Australia
| |
Collapse
|
4
|
Pearce J. What do student experiences of programmatic assessment tell us about scoring programmatic assessment data? MEDICAL EDUCATION 2022; 56:872-875. [PMID: 35698736 DOI: 10.1111/medu.14852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Affiliation(s)
- Jacob Pearce
- Australian Council for Educational Research - Tertiary Education (Assessment), Camberwell, Victoria, Australia
| |
Collapse
|
5
|
de Jong LH, Bok HGJ, Schellekens LH, Kremer WDJ, Jonker FH, van der Vleuten CPM. Shaping the right conditions in programmatic assessment: how quality of narrative information affects the quality of high-stakes decision-making. BMC MEDICAL EDUCATION 2022; 22:409. [PMID: 35643442 PMCID: PMC9148525 DOI: 10.1186/s12909-022-03257-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/10/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Programmatic assessment is increasingly being implemented within competency-based health professions education. In this approach a multitude of low-stakes assessment activities are aggregated into a holistic high-stakes decision on the student's performance. High-stakes decisions need to be of high quality. Part of this quality is whether an examiner perceives saturation of information when making a holistic decision. The purpose of this study was to explore the influence of narrative information in perceiving saturation of information during the interpretative process of high-stakes decision-making. METHODS In this mixed-method intervention study the quality of the recorded narrative information was manipulated within multiple portfolios (i.e., feedback and reflection) to investigate its influence on 1) the perception of saturation of information and 2) the examiner's interpretative approach in making a high-stakes decision. Data were collected through surveys, screen recordings of the portfolio assessments, and semi-structured interviews. Descriptive statistics and template analysis were applied to analyze the data. RESULTS The examiners perceived less frequently saturation of information in the portfolios with low quality of narrative feedback. Additionally, they mentioned consistency of information as a factor that influenced their perception of saturation of information. Even though in general they had their idiosyncratic approach to assessing a portfolio, variations were present caused by certain triggers, such as noticeable deviations in the student's performance and quality of narrative feedback. CONCLUSION The perception of saturation of information seemed to be influenced by the quality of the narrative feedback and, to a lesser extent, by the quality of reflection. These results emphasize the importance of high-quality narrative feedback in making robust decisions within portfolios that are expected to be more difficult to assess. Furthermore, within these "difficult" portfolios, examiners adapted their interpretative process reacting on the intervention and other triggers by means of an iterative and responsive approach.
Collapse
Affiliation(s)
- Lubberta H de Jong
- Department Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands.
| | - Harold G J Bok
- Department Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Lonneke H Schellekens
- Department Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
- Faculty of Social and Behavioural Sciences, Educational Consultancy and Professional Development, Utrecht University, Utrecht, The Netherlands
| | - Wim D J Kremer
- Department Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - F Herman Jonker
- Department Population Health Sciences, Section Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Cees P M van der Vleuten
- Department of Educational Development and Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
6
|
Collares CF. Cognitive diagnostic modelling in healthcare professions education: an eye-opener. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022; 27:427-440. [PMID: 35201484 PMCID: PMC8866928 DOI: 10.1007/s10459-022-10093-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]
Abstract
Criticisms about psychometric paradigms currently used in healthcare professions education include claims of reductionism, objectification, and poor compliance with assumptions. Nevertheless, perhaps the most crucial criticism comes from learners' difficulty in interpreting and making meaningful use of summative scores and the potentially detrimental impact these scores have on learners. The term "post-psychometric era" has become popular, despite persisting calls for the sensible use of modern psychometrics. In recent years, cognitive diagnostic modelling has emerged as a new psychometric paradigm capable of providing meaningful diagnostic feedback. Cognitive diagnostic modelling allows the classification of examinees in multiple cognitive attributes. This measurement is obtained by modelling these attributes as categorical, discrete latent variables. Furthermore, items can reflect more than one latent variable simultaneously. The interactions between latent variables can be modelled with flexibility, allowing a unique perspective on complex cognitive processes. These characteristic features of cognitive diagnostic modelling enable diagnostic classification over a large number of constructs of interest, preventing the necessity of providing numerical scores as feedback to test takers. This paper provides an overview of cognitive diagnostic modelling, including an introduction to its foundations and illustrating potential applications, to help teachers be involved in developing and evaluating assessment tools used in healthcare professions education. Cognitive diagnosis may represent a revolutionary new psychometric paradigm, overcoming the known limitations found in frequently used psychometric approaches, offering the possibility of robust qualitative feedback and better alignment with competency-based curricula and modern programmatic assessment frameworks.
Collapse
Affiliation(s)
- Carlos Fernando Collares
- Department of Educational Development and Research, Faculty of Health, Medicine and Life Sciences, School of Health Professions Education (SHE), Maastricht University, Postbus 616, 6200, Maastricht, The Netherlands.
- European Board of Medical Assessors, Edinburgh, UK.
- Stichting Aphasia.help, Maastricht, The Netherlands.
| |
Collapse
|
7
|
Tavares W, Hodwitz K, Rowland P, Ng S, Kuper A, Friesen F, Shwetz K, Brydges R. Implicit and inferred: on the philosophical positions informing assessment science. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:1597-1623. [PMID: 34370126 DOI: 10.1007/s10459-021-10063-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 07/25/2021] [Indexed: 06/13/2023]
Abstract
Assessment practices have been increasingly informed by a range of philosophical positions. While generally beneficial, the addition of options can lead to misalignment in the philosophical assumptions associated with different features of assessment (e.g., the nature of constructs and competence, ways of assessing, validation approaches). Such incompatibility can threaten the quality and defensibility of researchers' claims, especially when left implicit. We investigated how authors state and use their philosophical positions when designing and reporting on performance-based assessments (PBA) of intrinsic roles, as well as the (in)compatibility of assumptions across assessment features. Using a representative sample of studies examining PBA of intrinsic roles, we used qualitative content analysis to extract data on how authors enacted their philosophical positions across three key assessment features: (1) construct conceptualizations, (2) assessment activities, and (3) validation methods. We also examined patterns in philosophical positioning across features and studies. In reviewing 32 papers from established peer-reviewed journals, we found (a) authors rarely reported their philosophical positions, meaning underlying assumptions could only be inferred; (b) authors approached features of assessment in variable ways that could be informed by or associated with different philosophical assumptions; (c) we experienced uncertainty in determining (in)compatibility of philosophical assumptions across features. Authors' philosophical positions were often vague or absent in the selected contemporary assessment literature. Leaving such details implicit may lead to misinterpretation by knowledge users wishing to implement, build on, or evaluate the work. As such, assessing claims, quality and defensibility, may increasingly depend more on who is interpreting, rather than what is being interpreted.
Collapse
Affiliation(s)
- Walter Tavares
- The Wilson Centre, Temerty Faculty of Medicine, Department of Medicine, Institute for Health Policy, Management and Evaluation, University of Toronto/University Health Network, Toronto, Ontario, Canada.
| | - Kathryn Hodwitz
- Li Ka Shing Knowledge Institute, St. Michaels Hospital, Toronto, Ontario, Canada
| | - Paula Rowland
- The Wilson Centre, Temerty Faculty of Medicine, Department of Occupational Therapy and Occupational Science, University of Toronto/University Health Network, Toronto, Ontario , Canada
| | - Stella Ng
- The Wilson Centre, Temerty Faculty of Medicine, Department of Speech-Language Pathology, Temerty Faculty of Medicine, The Wilson Centre, University of Toronto, Centre for Faculty Development, Unity Health Toronto, Toronto, Ontario, Canada
| | - Ayelet Kuper
- The Wilson Centre, University Health Network/University of Toronto, Division of General Internal Medicine, Sunnybrook Health Sciences Centre, Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Farah Friesen
- Centre for Faculty Development, Temerty Faculty of Medicine, University of Toronto at Unity Health Toronto, Toronto, Ontario, Canada
| | - Katherine Shwetz
- Department of English, University of Toronto, Toronto, Ontario, Canada
| | - Ryan Brydges
- The Wilson Centre, Temerty Faculty of Medicine, Department of Medicine, Unity Health Toronto, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
8
|
Pearce J, Tavares W. A philosophical history of programmatic assessment: tracing shifting configurations. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:1291-1310. [PMID: 33893881 DOI: 10.1007/s10459-021-10050-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/09/2021] [Indexed: 06/12/2023]
Abstract
Programmatic assessment is now well entrenched in medical education, allowing us to reflect on when it first emerged and how it evolved into the form we know today. Drawing upon the intellectual tradition of historical epistemology, we provide a philosophically-oriented historiographical study of programmatic assessment. Our goal is to trace its relatively short historical trajectory by describing shifting configurations in its scene of inquiry-focusing on questions, practices, and philosophical presuppositions. We identify three historical phases: emergence, evolution and entrenchment. For each, we describe the configurations of the scene; examine underlying philosophical presuppositions driving changes; and detail upshots in assessment practice. We find that programmatic assessment emerged in response to positivist 'turmoil' prior to 2005, driven by utility considerations and implicit pragmatist undertones. Once introduced, it evolved with notions of diversity and learning being underscored, and a constructivist ontology developing at its core. More recently, programmatic assessment has become entrenched as its own sub-discipline. Rich narratives have been emphasised, but philosophical underpinnings have been blurred. We hope to shed new light on current assessment practices in the medical education community by interrogating the history of programmatic assessment from this philosophical vantage point. Making philosophical presuppositions explicit highlights the perspectival nature of aspects of programmatic assessment, and suggest reasons for perceived benefits as well as potential tensions, contradictions and vulnerabilities in the approach today. We conclude by offering some reflections on important points to emerge from our historical study, and suggest 'what next' for programmatic assessment in light of this endeavour.
Collapse
Affiliation(s)
- J Pearce
- Tertiary Education (Assessment), Australian Council for Educational Research, 19 Prospect Hill Road, Camberwell, VIC, 3124, Australia.
| | - W Tavares
- The Wilson Centre and Post-MD Education. University Health Network and University of Toronto, Toronto, ON, Canada
| |
Collapse
|
9
|
Ginsburg S, Watling CJ, Schumacher DJ, Gingerich A, Hatala R. Numbers Encapsulate, Words Elaborate: Toward the Best Use of Comments for Assessment and Feedback on Entrustment Ratings. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021; 96:S81-S86. [PMID: 34183607 DOI: 10.1097/acm.0000000000004089] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The adoption of entrustment ratings in medical education is based on a seemingly simple premise: to align workplace-based supervision with resident assessment. Yet it has been difficult to operationalize this concept. Entrustment rating forms combine numeric scales with comments and are embedded in a programmatic assessment framework, which encourages the collection of a large quantity of data. The implicit assumption that more is better has led to an untamable volume of data that competency committees must grapple with. In this article, the authors explore the roles of numbers and words on entrustment rating forms, focusing on the intended and optimal use(s) of each, with a focus on the words. They also unpack the problematic issue of dual-purposing words for both assessment and feedback. Words have enormous potential to elaborate, to contextualize, and to instruct; to realize this potential, educators must be crystal clear about their use. The authors set forth a number of possible ways to reconcile these tensions by more explicitly aligning words to purpose. For example, educators could focus written comments solely on assessment; create assessment encounters distinct from feedback encounters; or use different words collected from the same encounter to serve distinct feedback and assessment purposes. Finally, the authors address the tyranny of documentation created by programmatic assessment and urge caution in yielding to the temptation to reduce words to numbers to make them manageable. Instead, they encourage educators to preserve some educational encounters purely for feedback, and to consider that not all words need to become data.
Collapse
Affiliation(s)
- Shiphra Ginsburg
- S. Ginsburg is professor of medicine, Department of Medicine, Sinai Health System and Faculty of Medicine, University of Toronto, scientist, Wilson Centre for Research in Education, University of Toronto, Toronto, Ontario, Canada, and Canada Research Chair in Health Professions Education; ORCID: http://orcid.org/0000-0002-4595-6650
| | - Christopher J Watling
- C.J. Watling is professor and director, Centre for Education Research and Innovation, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada; ORCID: https://orcid.org/0000-0001-9686-795X
| | - Daniel J Schumacher
- D.J. Schumacher is associate professor of pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, Ohio; ORCID: https://orcid.org/0000-0001-5507-8452
| | - Andrea Gingerich
- A. Gingerich is assistant professor, Northern Medical Program, University of Northern British Columbia, Prince George, British Columbia, Canada; ORCID: https://orcid.org/0000-0001-5765-3975
| | - Rose Hatala
- R. Hatala is professor, Department of Medicine, and director, Clinical Educator Fellowship, Center for Health Education Scholarship, University of British Columbia, Vancouver, British Columbia, Canada; ORCID: https://orcid.org/0000-0003-0521-2590
| |
Collapse
|
10
|
Zoanetti N, Pearce J. The potential use of Bayesian Networks to support committee decisions in programmatic assessment. MEDICAL EDUCATION 2021; 55:808-817. [PMID: 33151589 DOI: 10.1111/medu.14407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 10/13/2020] [Accepted: 11/02/2020] [Indexed: 06/11/2023]
Abstract
CONTEXT The benefits of programmatic assessment are well-established. Evidence from multiple assessment formats is accumulated and triangulated to inform progression committee decisions. Committees are consistently challenged to ensure consistency and fairness in programmatic deliberations. Traditional statistical and psychometric techniques are not well-suited to aggregating different assessment formats accumulated over time. Some of the strengths of programmatic assessment are also vulnerabilities viewed through this lens. While emphasis is often placed on data richness and considered input of qualified experts, committees reasonably wish for practical, defensible solutions to these challenges. METHODS We draw upon on existing literature regarding Bayesian Networks (BN), noting their utility and application in educational systems. We provide illustrative examples of how they could potentially be used in contexts that embed programmatic principles. We show a simple BN for a knowledge domain before presenting a full-scale 'proof of concept' BN to support committee decisions. We zoom in on one 'node' to demonstrate the capacity of incorporating disparate evidence throughout the network. CONCLUSIONS Bayesian Networks offer an approach that is theoretically well-supported for programmatic assessment. They can aid committees in managing evidence accumulation, help them make inferences under conditions of uncertainty, and buttress decisions by adding a layer of defensibility to the process. They are a pragmatic tool adding value to the programmatic space by applying a complementary statistical framework. We see four major benefits of BNs in programmatic assessment: BNs allow for visual capturing of evidentiary arguments by committees during decision-making; 'recommendations' from probabilistic pathways can be used by committees to confirm their qualitative judgments; BNs can ensure precedents are maintained and consistency occurs over time; and the imperative to capture data richness is maintained without resorting to questionable methodological strategies such as adding qualitatively different things together. Further research into their feasibility and robustness in practice is warranted.
Collapse
Affiliation(s)
- Nathan Zoanetti
- Psychometrics and Methodology, Australian Council for Educational Research, Camberwell, Vic., Australia
| | - Jacob Pearce
- Tertiary Education (Assessment), Australian Council for Educational Research, Camberwell, Vic., Australia
| |
Collapse
|
11
|
On Educational Assessment Theory: A High-Level Discussion of Adolphe Quetelet, Platonism, and Ergodicity. PHILOSOPHIES 2021. [DOI: 10.3390/philosophies6020046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Educational assessments, specifically standardized and normalized exams, owe most of their foundations to psychological test theory in psychometrics. While the theoretical assumptions of these practices are widespread and relatively uncontroversial in the testing community, there are at least two that are philosophically and mathematically suspect and have troubling implications in education. Assumption 1 is that repeated assessment measures that are calculated into an arithmetic mean are thought to represent some real stable, quantitative psychological trait or ability plus some error. Assumption 2 is that aggregated, group-level educational data collected from assessments can then be interpreted to make inferences about a given individual person over time without explicit justification. It is argued that the former assumption cannot be taken for granted; it is also argued that, while it is typically attributed to 20th century thought, the assumption in a rigorous form can be traced back at least to the 1830s via an unattractive Platonistic statistical thesis offered by one of the founders of the social sciences—Belgian mathematician Adolphe Quetelet (1796–1874). While contemporary research has moved away from using his work directly, it is demonstrated that cognitive psychology is still facing the preservation of assumption 1, which is becoming increasingly challenged by current paradigms that pitch human cognition as a dynamical, complex system. However, how to deal with assumption 1 and whether it is broadly justified is left as an open question. It is then argued that assumption 2 is only justified by assessments having ergodic properties, which is a criterion rarely met in education; specifically, some forms of normalized standardized exams are intrinsically non-ergodic and should be thought of as invalid assessments for saying much about individual students and their capability. The article closes with a call for the introduction of dynamical mathematics into educational assessment at a conceptual level (e.g., through Bayesian networks), the critical analysis of several key psychological testing assumptions, and the introduction of dynamical language into philosophical discourse. Each of these prima facie distinct areas ought to inform each other more closely in educational studies.
Collapse
|
12
|
Homer M. Re-conceptualising and accounting for examiner (cut-score) stringency in a 'high frequency, small cohort' performance test. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:369-383. [PMID: 32876815 PMCID: PMC8041694 DOI: 10.1007/s10459-020-09990-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 06/11/2023]
Abstract
Variation in examiner stringency is an ongoing problem in many performance settings such as in OSCEs, and usually is conceptualised and measured based on scores/grades examiners award. Under borderline regression, the standard within a station is set using checklist/domain scores and global grades acting in combination. This complexity requires a more nuanced view of what stringency might mean when considering sources of variation of cut-scores in stations. This study uses data from 349 administrations of an 18-station, 36 candidate single circuit OSCE for international medical graduates wanting to practice in the UK (PLAB2). The station-level data was gathered over a 34-month period up to July 2019. Linear mixed models are used to estimate and then separate out examiner (n = 547), station (n = 330) and examination (n = 349) effects on borderline regression cut-scores. Examiners are the largest source of variation in cut-scores accounting for 56% of variance in cut-scores, compared to 6% for stations, < 1% for exam and 37% residual. Aggregating to the exam level tends to ameliorate this effect. For 96% of examinations, a 'fair' cut-score, equalising out variation in examiner stringency that candidates experience, is within one standard error of measurement (SEM) of the actual cut-score. The addition of the SEM to produce the final pass mark generally ensures the public is protected from almost all false positives in the examination caused by examiner cut-score stringency acting in candidates' favour.
Collapse
Affiliation(s)
- Matt Homer
- Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, LS2 9JT, UK.
| |
Collapse
|
13
|
Boursicot K, Kemp S, Wilkinson T, Findyartini A, Canning C, Cilliers F, Fuller R. Performance assessment: Consensus statement and recommendations from the 2020 Ottawa Conference. MEDICAL TEACHER 2021; 43:58-67. [PMID: 33054524 DOI: 10.1080/0142159x.2020.1830052] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
INTRODUCTION In 2011 the Consensus Statement on Performance Assessment was published in Medical Teacher. That paper was commissioned by AMEE (Association for Medical Education in Europe) as part of the series of Consensus Statements following the 2010 Ottawa Conference. In 2019, it was recommended that a working group be reconvened to review and consider developments in performance assessment since the 2011 publication. METHODS Following review of the original recommendations in the 2011 paper and shifts in the field across the past 10 years, the group identified areas of consensus and yet to be resolved issues for performance assessment. RESULTS AND DISCUSSION This paper addresses developments in performance assessment since 2011, reiterates relevant aspects of the 2011 paper, and summarises contemporary best practice recommendations for OSCEs and WBAs, fit-for-purpose methods for performance assessment in the health professions.
Collapse
Affiliation(s)
- Katharine Boursicot
- Department of Assessment and Progression, Duke-National University of Singapore, Singapore, Singapore
| | - Sandra Kemp
- Curtin Medical School, Curtin University, Perth, Australia
| | - Tim Wilkinson
- Dean's Department, University of Otago, Christchurch, New Zealand
| | - Ardi Findyartini
- Department of Medical Education, Universitas Indonesia, Jakarta, Indonesia
| | - Claire Canning
- Department of Assessment and Progression, Duke-National University of Singapore, Singapore, Singapore
| | - Francois Cilliers
- Department of Health Sciences Education, University of Cape Town, Cape Town, South Africa
| | | |
Collapse
|