Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Pearce J. In defence of constructivist, utility-driven psychometrics for the 'post-psychometric era'. Med Educ 2020;54:99-102. [PMID: 31867758 DOI: 10.1111/medu.14039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Number

Cited by Other Article(s)

Fitzek S, Choi KEA. Shaping future practices: German-speaking medical and dental students' perceptions of artificial intelligence in healthcare. BMC MEDICAL EDUCATION 2024;24:844. [PMID: 39107732 PMCID: PMC11304766 DOI: 10.1186/s12909-024-05826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 07/26/2024] [Indexed: 08/10/2024]

Homer M. Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024;29:919-934. [PMID: 37843678 PMCID: PMC11208245 DOI: 10.1007/s10459-023-10289-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 09/24/2023] [Indexed: 10/17/2023]

Abstract

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades-with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades-to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways-via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.

Collapse

Tavares W, Pearce J. Attending to Variable Interpretations of Assessment Science and Practice. TEACHING AND LEARNING IN MEDICINE 2024;36:244-252. [PMID: 37431929 DOI: 10.1080/10401334.2023.2231923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 05/31/2023] [Indexed: 07/12/2023]

Abstract

Issue: The way educators think about the nature of competence, the approaches one selects for the assessment of competence, what generated data implies, and what counts as good assessment now involve broader and more diverse interpretive processes. Broadening philosophical positions in assessment has educators applying different interpretations to similar assessment concepts. As a result, what is claimed through assessment, including what counts as quality, can be different for each of us despite using similar activities and language. This is leading to some uncertainty on how to proceed or worse, provides opportunities for questioning the legitimacy of any assessment activity or outcome. While some debate in assessment is inevitable, most have been within philosophical positions (e.g., how best to minimize error), whereas newer debates are happening across philosophical positions (e.g., whether error is a useful concept). As new ways of approaching assessment have emerged, the interpretive nature of underlying philosophical positions has not been sufficiently attended to. Evidence: We illustrate interpretive processes of assessment in action by: (a) summarizing the current health professions assessment context from a philosophical perspective as a way of describing its evolution; (b) demonstrating implications in practice using two examples (i.e., analysis of assessment work and validity claims); and (c) examining pragmatism to demonstrate how even within specific philosophical positions opportunities for variable interpretations still exist. Implications: Our concern is not that assessment designers and users have different assumptions, but that practically, educators may unknowingly (or insidiously) apply different assumptions, and methodological and interpretive norms, and subsequently settle on different views on what serves as quality assessment even for the same assessment program or event. With the state of assessment in health professions in flux, we conclude by calling for a philosophically explicit approach to assessment, and underscore assessment as, fundamentally, an interpretive process - one which demands the careful elucidation of philosophical assumptions to promote understanding and ultimately defensibility of assessment processes and outcomes.

Collapse

Pearce J. What do student experiences of programmatic assessment tell us about scoring programmatic assessment data? MEDICAL EDUCATION 2022;56:872-875. [PMID: 35698736 DOI: 10.1111/medu.14852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]

de Jong LH, Bok HGJ, Schellekens LH, Kremer WDJ, Jonker FH, van der Vleuten CPM. Shaping the right conditions in programmatic assessment: how quality of narrative information affects the quality of high-stakes decision-making. BMC MEDICAL EDUCATION 2022;22:409. [PMID: 35643442 PMCID: PMC9148525 DOI: 10.1186/s12909-022-03257-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/10/2022] [Indexed: 06/15/2023]

Abstract

BACKGROUND

Programmatic assessment is increasingly being implemented within competency-based health professions education. In this approach a multitude of low-stakes assessment activities are aggregated into a holistic high-stakes decision on the student's performance. High-stakes decisions need to be of high quality. Part of this quality is whether an examiner perceives saturation of information when making a holistic decision. The purpose of this study was to explore the influence of narrative information in perceiving saturation of information during the interpretative process of high-stakes decision-making.

METHODS

In this mixed-method intervention study the quality of the recorded narrative information was manipulated within multiple portfolios (i.e., feedback and reflection) to investigate its influence on 1) the perception of saturation of information and 2) the examiner's interpretative approach in making a high-stakes decision. Data were collected through surveys, screen recordings of the portfolio assessments, and semi-structured interviews. Descriptive statistics and template analysis were applied to analyze the data.

RESULTS

The examiners perceived less frequently saturation of information in the portfolios with low quality of narrative feedback. Additionally, they mentioned consistency of information as a factor that influenced their perception of saturation of information. Even though in general they had their idiosyncratic approach to assessing a portfolio, variations were present caused by certain triggers, such as noticeable deviations in the student's performance and quality of narrative feedback.

CONCLUSION

The perception of saturation of information seemed to be influenced by the quality of the narrative feedback and, to a lesser extent, by the quality of reflection. These results emphasize the importance of high-quality narrative feedback in making robust decisions within portfolios that are expected to be more difficult to assess. Furthermore, within these "difficult" portfolios, examiners adapted their interpretative process reacting on the intervention and other triggers by means of an iterative and responsive approach.

Collapse

Collares CF. Cognitive diagnostic modelling in healthcare professions education: an eye-opener. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022;27:427-440. [PMID: 35201484 PMCID: PMC8866928 DOI: 10.1007/s10459-022-10093-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]

Abstract

Criticisms about psychometric paradigms currently used in healthcare professions education include claims of reductionism, objectification, and poor compliance with assumptions. Nevertheless, perhaps the most crucial criticism comes from learners' difficulty in interpreting and making meaningful use of summative scores and the potentially detrimental impact these scores have on learners. The term "post-psychometric era" has become popular, despite persisting calls for the sensible use of modern psychometrics. In recent years, cognitive diagnostic modelling has emerged as a new psychometric paradigm capable of providing meaningful diagnostic feedback. Cognitive diagnostic modelling allows the classification of examinees in multiple cognitive attributes. This measurement is obtained by modelling these attributes as categorical, discrete latent variables. Furthermore, items can reflect more than one latent variable simultaneously. The interactions between latent variables can be modelled with flexibility, allowing a unique perspective on complex cognitive processes. These characteristic features of cognitive diagnostic modelling enable diagnostic classification over a large number of constructs of interest, preventing the necessity of providing numerical scores as feedback to test takers. This paper provides an overview of cognitive diagnostic modelling, including an introduction to its foundations and illustrating potential applications, to help teachers be involved in developing and evaluating assessment tools used in healthcare professions education. Cognitive diagnosis may represent a revolutionary new psychometric paradigm, overcoming the known limitations found in frequently used psychometric approaches, offering the possibility of robust qualitative feedback and better alignment with competency-based curricula and modern programmatic assessment frameworks.

Collapse

Tavares W, Hodwitz K, Rowland P, Ng S, Kuper A, Friesen F, Shwetz K, Brydges R. Implicit and inferred: on the philosophical positions informing assessment science. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:1597-1623. [PMID: 34370126 DOI: 10.1007/s10459-021-10063-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 07/25/2021] [Indexed: 06/13/2023]

Abstract

Assessment practices have been increasingly informed by a range of philosophical positions. While generally beneficial, the addition of options can lead to misalignment in the philosophical assumptions associated with different features of assessment (e.g., the nature of constructs and competence, ways of assessing, validation approaches). Such incompatibility can threaten the quality and defensibility of researchers' claims, especially when left implicit. We investigated how authors state and use their philosophical positions when designing and reporting on performance-based assessments (PBA) of intrinsic roles, as well as the (in)compatibility of assumptions across assessment features. Using a representative sample of studies examining PBA of intrinsic roles, we used qualitative content analysis to extract data on how authors enacted their philosophical positions across three key assessment features: (1) construct conceptualizations, (2) assessment activities, and (3) validation methods. We also examined patterns in philosophical positioning across features and studies. In reviewing 32 papers from established peer-reviewed journals, we found (a) authors rarely reported their philosophical positions, meaning underlying assumptions could only be inferred; (b) authors approached features of assessment in variable ways that could be informed by or associated with different philosophical assumptions; (c) we experienced uncertainty in determining (in)compatibility of philosophical assumptions across features. Authors' philosophical positions were often vague or absent in the selected contemporary assessment literature. Leaving such details implicit may lead to misinterpretation by knowledge users wishing to implement, build on, or evaluate the work. As such, assessing claims, quality and defensibility, may increasingly depend more on who is interpreting, rather than what is being interpreted.

Collapse

Pearce J, Tavares W. A philosophical history of programmatic assessment: tracing shifting configurations. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:1291-1310. [PMID: 33893881 DOI: 10.1007/s10459-021-10050-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/09/2021] [Indexed: 06/12/2023]

Abstract

Programmatic assessment is now well entrenched in medical education, allowing us to reflect on when it first emerged and how it evolved into the form we know today. Drawing upon the intellectual tradition of historical epistemology, we provide a philosophically-oriented historiographical study of programmatic assessment. Our goal is to trace its relatively short historical trajectory by describing shifting configurations in its scene of inquiry-focusing on questions, practices, and philosophical presuppositions. We identify three historical phases: emergence, evolution and entrenchment. For each, we describe the configurations of the scene; examine underlying philosophical presuppositions driving changes; and detail upshots in assessment practice. We find that programmatic assessment emerged in response to positivist 'turmoil' prior to 2005, driven by utility considerations and implicit pragmatist undertones. Once introduced, it evolved with notions of diversity and learning being underscored, and a constructivist ontology developing at its core. More recently, programmatic assessment has become entrenched as its own sub-discipline. Rich narratives have been emphasised, but philosophical underpinnings have been blurred. We hope to shed new light on current assessment practices in the medical education community by interrogating the history of programmatic assessment from this philosophical vantage point. Making philosophical presuppositions explicit highlights the perspectival nature of aspects of programmatic assessment, and suggest reasons for perceived benefits as well as potential tensions, contradictions and vulnerabilities in the approach today. We conclude by offering some reflections on important points to emerge from our historical study, and suggest 'what next' for programmatic assessment in light of this endeavour.

Collapse

Ginsburg S, Watling CJ, Schumacher DJ, Gingerich A, Hatala R. Numbers Encapsulate, Words Elaborate: Toward the Best Use of Comments for Assessment and Feedback on Entrustment Ratings. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:S81-S86. [PMID: 34183607 DOI: 10.1097/acm.0000000000004089] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Zoanetti N, Pearce J. The potential use of Bayesian Networks to support committee decisions in programmatic assessment. MEDICAL EDUCATION 2021;55:808-817. [PMID: 33151589 DOI: 10.1111/medu.14407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 10/13/2020] [Accepted: 11/02/2020] [Indexed: 06/11/2023]

Abstract

CONTEXT

The benefits of programmatic assessment are well-established. Evidence from multiple assessment formats is accumulated and triangulated to inform progression committee decisions. Committees are consistently challenged to ensure consistency and fairness in programmatic deliberations. Traditional statistical and psychometric techniques are not well-suited to aggregating different assessment formats accumulated over time. Some of the strengths of programmatic assessment are also vulnerabilities viewed through this lens. While emphasis is often placed on data richness and considered input of qualified experts, committees reasonably wish for practical, defensible solutions to these challenges.

METHODS

We draw upon on existing literature regarding Bayesian Networks (BN), noting their utility and application in educational systems. We provide illustrative examples of how they could potentially be used in contexts that embed programmatic principles. We show a simple BN for a knowledge domain before presenting a full-scale 'proof of concept' BN to support committee decisions. We zoom in on one 'node' to demonstrate the capacity of incorporating disparate evidence throughout the network.

CONCLUSIONS

Bayesian Networks offer an approach that is theoretically well-supported for programmatic assessment. They can aid committees in managing evidence accumulation, help them make inferences under conditions of uncertainty, and buttress decisions by adding a layer of defensibility to the process. They are a pragmatic tool adding value to the programmatic space by applying a complementary statistical framework. We see four major benefits of BNs in programmatic assessment: BNs allow for visual capturing of evidentiary arguments by committees during decision-making; 'recommendations' from probabilistic pathways can be used by committees to confirm their qualitative judgments; BNs can ensure precedents are maintained and consistency occurs over time; and the imperative to capture data richness is maintained without resorting to questionable methodological strategies such as adding qualitatively different things together. Further research into their feasibility and robustness in practice is warranted.

Collapse

On Educational Assessment Theory: A High-Level Discussion of Adolphe Quetelet, Platonism, and Ergodicity. PHILOSOPHIES 2021. [DOI: 10.3390/philosophies6020046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Educational assessments, specifically standardized and normalized exams, owe most of their foundations to psychological test theory in psychometrics. While the theoretical assumptions of these practices are widespread and relatively uncontroversial in the testing community, there are at least two that are philosophically and mathematically suspect and have troubling implications in education. Assumption 1 is that repeated assessment measures that are calculated into an arithmetic mean are thought to represent some real stable, quantitative psychological trait or ability plus some error. Assumption 2 is that aggregated, group-level educational data collected from assessments can then be interpreted to make inferences about a given individual person over time without explicit justification. It is argued that the former assumption cannot be taken for granted; it is also argued that, while it is typically attributed to 20th century thought, the assumption in a rigorous form can be traced back at least to the 1830s via an unattractive Platonistic statistical thesis offered by one of the founders of the social sciences—Belgian mathematician Adolphe Quetelet (1796–1874). While contemporary research has moved away from using his work directly, it is demonstrated that cognitive psychology is still facing the preservation of assumption 1, which is becoming increasingly challenged by current paradigms that pitch human cognition as a dynamical, complex system. However, how to deal with assumption 1 and whether it is broadly justified is left as an open question. It is then argued that assumption 2 is only justified by assessments having ergodic properties, which is a criterion rarely met in education; specifically, some forms of normalized standardized exams are intrinsically non-ergodic and should be thought of as invalid assessments for saying much about individual students and their capability. The article closes with a call for the introduction of dynamical mathematics into educational assessment at a conceptual level (e.g., through Bayesian networks), the critical analysis of several key psychological testing assumptions, and the introduction of dynamical language into philosophical discourse. Each of these prima facie distinct areas ought to inform each other more closely in educational studies. Collapse

Homer M. Re-conceptualising and accounting for examiner (cut-score) stringency in a 'high frequency, small cohort' performance test. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:369-383. [PMID: 32876815 PMCID: PMC8041694 DOI: 10.1007/s10459-020-09990-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 06/11/2023]

Boursicot K, Kemp S, Wilkinson T, Findyartini A, Canning C, Cilliers F, Fuller R. Performance assessment: Consensus statement and recommendations from the 2020 Ottawa Conference. MEDICAL TEACHER 2021;43:58-67. [PMID: 33054524 DOI: 10.1080/0142159x.2020.1830052] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]