Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract 2014;19:233-50. [PMID: 23636643 DOI: 10.1007/s10459-013-9458-4] [Citation(s) in RCA: 195] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 04/09/2013] [Indexed: 05/26/2023]

For:	Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract 2014;19:233-50. [PMID: 23636643 DOI: 10.1007/s10459-013-9458-4] [Citation(s) in RCA: 195] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 04/09/2013] [Indexed: 05/26/2023]

Number

Cited by Other Article(s)

Carter TM, Sun T, Jones A, Smith BK. A study of internal structure validity for the American board of surgery in training examination. Am J Surg 2025;242:116184. [PMID: 39826310 DOI: 10.1016/j.amjsurg.2025.116184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/29/2024] [Accepted: 01/02/2025] [Indexed: 01/22/2025]

Seeger P, Kaldis N, Nickel F, Hackert T, Lykoudis PM, Giannou AD. Surgical training simulation modalities in minimally invasive surgery: How to achieve evidence-based curricula by translational research. Am J Surg 2025;242:116197. [PMID: 39889386 DOI: 10.1016/j.amjsurg.2025.116197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Revised: 12/19/2024] [Accepted: 01/09/2025] [Indexed: 02/03/2025]

Wong LY, Lam N, Son YA, Eddington H, Arnow KD, Tsai J, Anand A, Peralta FC, Shields S, Melcer EF, Lin DT, Liebert CA. Correlation of Performance on the ENTRUST Assessment Platform With Other Variables in Competency-Based Surgical Education. JOURNAL OF SURGICAL EDUCATION 2025;82:103293. [PMID: 39532037 DOI: 10.1016/j.jsurg.2024.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 08/13/2024] [Accepted: 09/18/2024] [Indexed: 11/16/2024]

Abstract

OBJECTIVE

With the implementation of American Board of Surgery (ABS) Entrustable Professional Activities (EPAs), there is continued need for objective, evidence-based assessment tools to augment existing microassessments and inform readiness for entrustment. The ENTRUST Assessment Platform is an online virtual-patient simulation platform to assess trainees' surgical decision-making competence across preoperative, intraoperative, and postoperative phases of care. This study collects additional validity evidence for the ENTRUST platform in its relationship to other established variables in competency-based surgical education.

DESIGN

This is a prospective analysis of surgical resident performance on the ENTRUST Right Lower Quadrant (RLQ) pain/Appendicitis EPA Assessment. ENTRUST scores were analyzed by PGY-level and correlations with Accreditation Council for Graduate Medical Education (ACGME) Case Logs, ACGME Surgery Milestones, and ABS In-Service Training Examination (ABSITE) scores were evaluated. Bivariate analyses were performed using Spearman rank correlations.

SETTING

This study was conducted at a tertiary academic center (Stanford University, Palo Alto, CA) in a proctored exam setting.

PARTICIPANTS

Thirty-two PGY-1 though PGY-5 general surgery residents completed the ENTRUST RLQ Pain/Appendicitis EPA Assessment containing four case scenarios which were iteratively developed and scored by expert consensus and aligned with ABS EPA definitions.

RESULTS

ENTRUST grand total score was positively correlated with PGY-level (rho = 0.57, p = 0.001), ACGME appendectomy case log volume (rho = 0.55, p = 0.002), and ABSITE raw score (rho = 0.66, p = 0.0004). ENTRUST performance was significantly correlated with all eighteen ACGME Surgery Milestones (rho = 0.43 to rho = 0.54, all p≤0.01), with the strongest correlation seen for PC1 (Patient Evaluation and Decision Making) (rho = 0.54, p = 0.006).

CONCLUSIONS

Performance on ENTRUST was significantly correlated with established variables in surgical training, including ACGME Appendectomy Case Logs, ABSITE, and ACGME Surgery Milestones. This study strengthens existing validity evidence for the ENTRUST Assessment Platform as an objective assessment of clinical decision-making. ENTRUST is an assessment tool which can augment microassessments and support competency-based medical education.

Collapse

Taha MH, Mohammed HEEG, Abdalla ME, Yusoff MSB, Mohd Napiah MK, Wadi MM. The pattern of reporting and presenting validity evidence of extended matching questions (EMQs) in health professions education: a systematic review. MEDICAL EDUCATION ONLINE 2024;29:2412392. [PMID: 39445670 PMCID: PMC11504699 DOI: 10.1080/10872981.2024.2412392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 10/16/2023] [Accepted: 09/30/2024] [Indexed: 10/25/2024]

Smith SE, McColgan-Smith S, Stewart F, Mardon J, Tallentire VR. Beyond reliability: assessing rater competence when using a behavioural marker system. Adv Simul (Lond) 2024;9:55. [PMID: 39736776 DOI: 10.1186/s41077-024-00329-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open

Abstract

BACKGROUND

Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS - pharmacists' behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect.

METHODS

Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist's behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson's chi-squared test.

RESULTS

The ICC for experienced faculty raters was good at 0.60 (0.48-0.72) and for near-peer raters was poor at 0.38 (0.27-0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077).

CONCLUSIONS

Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.

Collapse

Kinnear B, Schumacher DJ, Varpio L, Driessen EW, Konopasky A. Legitimation Without Argumentation: An Empirical Discourse Analysis of 'Validity as an Argument' in Assessment. PERSPECTIVES ON MEDICAL EDUCATION 2024;13:469-480. [PMID: 39372230 PMCID: PMC11451546 DOI: 10.5334/pme.1404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024]

Abstract

Introduction

Validity is frequently conceptualized in health professions education (HPE) assessment as an argument that supports the interpretation and uses of data. However, previous work has shown that many validity scholars believe argument and argumentation are relatively lacking in HPE. To better understand HPE's discourse around argument and argumentation with regard to assessment validity, the authors explored the discourses present in published HPE manuscripts.

Methods

The authors used a bricolage of critical discourse analysis approaches to understand how the language in influential peer reviewed manuscripts has shaped HPE's understanding of validity arguments and argumentation. The authors used multiple search strategies to develop a final corpus of 39 manuscripts that were seen as influential in how validity arguments are conceptualized within HPE. An analytic framework drawing on prior research on Argumentation Theory was used to code manuscripts before developing themes relevant to the research question.

Results

The authors found that the elaboration of argument and argumentation within HPE's validity discourse is scant, with few components of Argumentation Theory (such as intended audience) existing within the discourse. The validity as an argument discourse was legitimized via authorization (reference to authority), rationalization (reference to institutionalized action), and mythopoesis (narrative building). This legitimation has cemented the validity as an argument discourse in HPE despite minimal exploration of what argument and argumentation are.

Discussion

This study corroborates previous work showing the dearth of argument and argumentation present within HPE's validity discourse. An opportunity exists to use Argumentation Theory in HPE to better develop validation practices that support use of argument.

Collapse

Pollok F, Lund SB, Traynor MD, Alva-Ruiz R, MacArthur TA, Watkins RD, Mahony CR, Woerster M, Yeh VJH, Matovu A, Clarke DL, Laack TA, Rivera M. Systematic Review of Procedural Skill Simulation in Health Care in Low- and Middle-Income Countries. Simul Healthc 2024;19:309-318. [PMID: 37440427 DOI: 10.1097/sih.0000000000000737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]

Kinnear B, St-Onge C, Schumacher DJ, Marceau M, Naidu T. Validity in the Next Era of Assessment: Consequences, Social Impact, and Equity. PERSPECTIVES ON MEDICAL EDUCATION 2024;13:452-459. [PMID: 39280703 PMCID: PMC11396166 DOI: 10.5334/pme.1150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 08/12/2024] [Indexed: 09/18/2024]

Wespi R, Schwendimann L, Neher A, Birrenbach T, Schauber SK, Manser T, Sauter TC, Kämmer JE. TEAMs go VR-validating the TEAM in a virtual reality (VR) medical team training. Adv Simul (Lond) 2024;9:38. [PMID: 39261889 PMCID: PMC11389291 DOI: 10.1186/s41077-024-00309-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 08/29/2024] [Indexed: 09/13/2024] Open

Abstract

BACKGROUND

Inadequate collaboration in healthcare can lead to medical errors, highlighting the importance of interdisciplinary teamwork training. Virtual reality (VR) simulation-based training presents a promising, cost-effective approach. This study evaluates the effectiveness of the Team Emergency Assessment Measure (TEAM) for assessing healthcare student teams in VR environments to improve training methodologies.

METHODS

Forty-two medical and nursing students participated in a VR-based neurological emergency scenario as part of an interprofessional team training program. Their performances were assessed using a modified TEAM tool by two trained coders. Reliability, internal consistency, and concurrent validity of the tool were evaluated using intraclass correlation coefficients (ICC) and Cronbach's alpha.

RESULTS

Rater agreement on TEAM's leadership, teamwork, and task management domains was high, with ICC values between 0.75 and 0.90. Leadership demonstrated strong internal consistency (Cronbach's alpha = 0.90), while teamwork and task management showed moderate to acceptable consistency (alpha = 0.78 and 0.72, respectively). Overall, the TEAM tool exhibited high internal consistency (alpha = 0.89) and strong concurrent validity with significant correlations to global performance ratings.

CONCLUSION

The TEAM tool proved to be a reliable and valid instrument for evaluating team dynamics in VR-based training scenarios. This study highlights VR's potential in enhancing medical education, especially in remote or distanced learning contexts. It demonstrates a dependable approach for team performance assessment, adding value to VR-based medical training. These findings pave the way for more effective, accessible interdisciplinary team assessments, contributing significantly to the advancement of medical education.

Collapse

Cook DA, Stephenson CR. Validation of the Learner Engagement Instrument for Continuing Professional Development. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2024;99:1024-1031. [PMID: 38683885 DOI: 10.1097/acm.0000000000005749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]

Abstract

PURPOSE

Learner engagement is the energy learners exert to remain focused and motivated to learn. The Learner Engagement Instrument (LEI) was developed to measure learner engagement in a short continuing professional development (CPD) activity. The authors validated LEI scores using validity evidence of internal structure and relationships with other variables.

METHOD

Participants attended 1 of 4 CPD courses (1 in-person, 2 online livestreamed, and 1 either in-person or livestreamed) in 2018, 2020, 2021, and 2022. Confirmatory factor analysis was used to examine model fit for several alternative structural models, separately for each course. The authors also conducted a generalizability study to estimate score reliability. Associations were evaluated between LEI scores and Continuing Medical Education Teaching Effectiveness (CMETE) scores and participant demographics. Statistical methods accounted for repeated measures by participants.

RESULTS

Four hundred fifteen unique participants attended 203 different CPD presentations and completed the LEI 11,567 times. The originally hypothesized 4-domain model of learner engagement (domains: emotional, behavioral, cognitive in-class, cognitive out-of-class) demonstrated best model fit in all 4 courses, with comparative fit index ≥ 0.99, standardized root mean square residual ≤ 0.031, and root mean square error of approximation ≤ 0.047. The reliability for overall scores and domain scores were all acceptable (50-rater G-coefficient ≥ 0.74) except for the cognitive in-class domain (50-rater G-coefficient of 0.55 to 0.66). Findings were similar for both in-person and online delivery modalities. Correlation of LEI scores with teaching effectiveness was confirmed (rho=0.58), and a small correlation was found with participant age (rho=0.19); other associations were small and not statistically significant. Using these findings, we generated a shortened 4-item instrument, the LEI Short Form.

CONCLUSIONS

This study confirms a 4-domain model of learner engagement and provides validity evidence that supports using LEI scores to measure learner engagement in both in-person and livestreamed CPD activities.

Collapse

Amadou C, Veil R, Blanié A, Nicaise C, Rouquette A, Gajdos V. Variance due to the examination conditions and factors associated with success in objective structured clinical examinations (OSCEs): first experiences at Paris-Saclay medical school. BMC MEDICAL EDUCATION 2024;24:716. [PMID: 38956577 PMCID: PMC11221172 DOI: 10.1186/s12909-024-05688-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 06/20/2024] [Indexed: 07/04/2024]

Abstract

BACKGROUND

We aimed to measure the variance due to examination conditions during the first sessions of objective structured clinical examinations (OSCEs) performed at a French medical school and identify factors associated with student success.

METHODS

We conducted a retrospective, observational study using data from the first three OSCEs sessions performed at Paris-Saclay medical school in 2021 and 2022. For all sessions (each organized in 5 parallel circuits), we tested a circuit effect using a linear mixed-effects model adjusted for sex and the average academic level of students (according to written tests). Then, we studied the factors associated with student success at one station using a multivariate linear mixed-effects model, including the characteristics of students, assessors, and standardized patients.

RESULTS

The study included three OSCEs sessions, with 122, 175, and 197 students and a mean (± SD) session score of 13.7(± 1.5)/20, 12.7(± 1.7)/20 and 12.7(± 1.9)/20, respectively. The percentage of variance due to the circuit was 6.5%, 18.2% (statistically significant), and 3.8%, respectively. For all sessions, the student's average level and station scenario were significantly associated with the score obtained in a station. Still, specific characteristics of assessors or standardized patients were only associated with the student's score in April 2021 (first session).

CONCLUSION

The percentage of the variance of students' performance due to the examination conditions was significant in one out of three of the first OSCE sessions performed at Paris-Saclay medical school. This result seems more related to individual behaviors rather than specific characteristics of assessors or standardized patients, highlighting the need to continue training teaching teams.

NATIONAL CLINICAL TRIAL NUMBER

Not applicable.

Collapse

Lin J, Rooney DM, Yang SC, Antonoff M, Jaklitsch MT, Pickens A, Ha JS, Sudarshan M, Bribriesco A, Zapata D, Weiss K, Johnson C, Hennigar D, Orringer MB. Multi-institutional beta testing of a novel cervical esophagogastric anastomosis simulator. JTCVS Tech 2024;25:254-263. [PMID: 38899103 PMCID: PMC11184443 DOI: 10.1016/j.xjtc.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/21/2023] [Accepted: 12/10/2023] [Indexed: 06/21/2024] Open

Sample SH, Artemiou E, Donszelmann DJ, Adams C. Third Year Veterinary Student Academic Encumbrances and Tenacity: Navigating Clinical Skills Curricula and Assessment. JOURNAL OF VETERINARY MEDICAL EDUCATION 2024:e20230153. [PMID: 39504191 DOI: 10.3138/jvme-2023-0153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]

Saberzadeh-Ardestani B, Sima AR, Khosravi B, Young M, Mortaz Hejri S. The impact of prior performance information on subsequent assessment: is there evidence of retaliation in an anonymous multisource assessment system? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024;29:531-550. [PMID: 37488326 DOI: 10.1007/s10459-023-10267-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 07/16/2023] [Indexed: 07/26/2023]

Abstract

Few studies have engaged in data-driven investigations of the presence, or frequency, of what could be considered retaliatory assessor behaviour in Multi-source Feedback (MSF) systems. In this study, authors explored how assessors scored others if, before assessing others, they received their own assessment score. The authors examined assessments from an established MSF system in which all clinical team members - medical students, interns, residents, fellows, and supervisors - anonymously assessed each other. The authors identified assessments in which an assessor (i.e., any team member providing a score to another) gave an aberrant score to another individual. An aberrant score was defined as one that was more than two standard deviations from the assessment receiver's average score. Assessors who gave aberrant scores were categorized according to whether their behaviour was preceded by: (1) receiving a score or not from another individual in the MSF system (2) whether the score they received was aberrant or not. The authors used a multivariable logistic regression model to investigate the association between the type of score received and the type of score given by that same individual. In total, 367 unique assessors provided 6091 scores on the performance of 484 unique individuals. Aberrant scores were identified in 250 forms (4.1%). The chances of giving an aberrant score were 2.3 times higher for those who had received a score, compared to those who had not (odds ratio 2.30, 95% CI:1.54-3.44, P < 0.001). Individuals who had received an aberrant score were also 2.17 times more likely to give an aberrant score to others compared to those who had received a non-aberrant score (2.17, 95% CI:1.39-3.39, P < 0.005) after adjusting for all other variables. This study documents an association between receiving scores within an anonymous multi-source feedback (MSF) system and providing aberrant scores to team members. These findings suggest care must be given to designing MSF systems to protect against potential downstream consequences of providing and receiving anonymous feedback.

Collapse

Brogaard L, Hinshaw K, Kierkegaard O, Manser T, Uldbjerg N, Hvidman L. Developing the TeamOBS-vacuum-assisted delivery checklist to assess clinical performance in a vacuum-assisted delivery: a Delphi study with initial validation. Front Med (Lausanne) 2024;11:1330443. [PMID: 38371513 PMCID: PMC10869485 DOI: 10.3389/fmed.2024.1330443] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/08/2024] [Indexed: 02/20/2024] Open

Abstract

Introduction

In Northern Europe, vacuum-assisted delivery (VAD) accounts for 6-15% of all deliveries; VAD is considered safe when conducted by adequately trained personnel. However, failed vacuum extraction can be harmful to both the mother and child. Therefore, the clinical performance in VAD must be assessed to guide learning, determine a performance benchmark, and evaluate the quality to achieve an overall high performance. We were unable to identify a pre-existing tool for evaluating the clinical performance in real-life vacuum-assisted births.

Objective

We aimed to develop and validate a checklist for assessing the clinical performance in VAD.

Methods

We conducted a Delphi process, described as an interactive process where experts answer questions until answers converge toward a "joint opinion" (consensus). We invited international experts as Delphi panelists and reached a consensus after four Delphi rounds, described as follows: (1) the panelists were asked to add, remove, or suggest corrections to the preliminary list of items essential for evaluating clinical performance in VAD; (2) the panelists applied weights of clinical importance on a Likert scale of 1-5 for each item; (3) each panelist revised their original scores after reviewing a summary of the other panelists' scores and arguments; and (4) the TeamOBS-VAD was tested using videos of real-life VADs, and the Delphi panel made final adjustments and approved the checklist.

Results

Twelve Delphi panelists from the UK (n = 3), Norway (n = 2), Sweden (n = 3), Denmark (n = 3), and Iceland (n = 1) were included. After four Delphi rounds, the Delphi panel reached a consensus on the checklist items and scores. The TeamOBS-VAD checklist was tested using 60 videos of real-life vacuum extractions. The inter-rater agreement had an intraclass correlation coefficient (ICC) of 0.73; 95% confidence interval (95% CI) of [0.58, 0.83], and that for the average of two raters was ICC 0.84 95% CI [0.73, 0.91]. The TeamOBS-VAD score was not associated with difficulties in delivery, such as the number of contractions during vacuum extraction delivery, cephalic level, rotation, and position. Failed vacuum extraction occurred in 6% of the video deliveries, but none were associated with the teams with low clinical performance scores.

Conclusion

The TeamOBS-VAD checklist provides a valid and reliable evaluation of the clinical performance of vaginal-assisted vacuum extraction.

Collapse

Goldenberg MG. Surgical Artificial Intelligence in Urology: Educational Applications. Urol Clin North Am 2024;51:105-115. [PMID: 37945096 DOI: 10.1016/j.ucl.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]

Mitchell EC, Ott M, Ross D, Grant A. Development of a Tool to Assess Surgical Resident Competence On-Call: The Western University Call Assessment Tool (WUCAT). JOURNAL OF SURGICAL EDUCATION 2024;81:106-114. [PMID: 38008642 DOI: 10.1016/j.jsurg.2023.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/13/2023] [Accepted: 10/02/2023] [Indexed: 11/28/2023]

Tavares W, Kinnear B, Schumacher DJ, Forte M. "Rater training" re-imagined for work-based assessment in medical education. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2023;28:1697-1709. [PMID: 37140661 DOI: 10.1007/s10459-023-10237-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 04/30/2023] [Indexed: 05/05/2023]

Johansson T, Olsson Å, Tishelman C, Noonan K, Leonard R, Eriksson LE, Goliath I, Cohen J. Validation of a culturally adapted Swedish-language version of the Death Literacy Index. PLoS One 2023;18:e0295141. [PMID: 38033042 PMCID: PMC10688853 DOI: 10.1371/journal.pone.0295141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 11/14/2023] [Indexed: 12/02/2023] Open

Abstract

The death literacy index (DLI) was developed in Australia to measure death literacy, a set of experience-based knowledge needed to understand and act on end-of-life (EOL) care options but has not yet been validated outside its original context. The aim of this study was to develop a culturally adapted Swedish-language version of the DLI, the DLI-S, and assess sources of evidence for its validity in a Swedish context. The study involved a multi-step process of translation and cultural adaptation and two validation phases: examining first content and response process validity through expert review (n = 10) and cognitive interviews (n = 10); and second, internal structure validity of DLI-S data collected from an online cross-sectional survey (n = 503). The psychometric evaluation involved analysis of descriptive statistics on item and scale-level, internal consistency and test-retest reliability, and confirmatory factor analysis. During translation and adaptation, changes were made to adjust items to the Swedish context. Additional adjustments were made following findings from the expert review and cognitive interviews. The content validity index exceeded recommended thresholds (S-CVIAve = 0.926). The psychometric evaluation provided support for DLI-S' validity. The hypothesized six-factor model showed good fit (χ2 = 1107.631 p<0.001, CFI = 0.993, TLI = 0.993, RMSEA = 0.064, SRMR = 0.054). High internal consistency reliability was demonstrated for the overall scale (Cronbach's α = 0.94) and each sub-scale (α 0.81-0.92). Test-retest reliability was acceptable, ICC ranging between 0.66-0.85. Through a comprehensive assessment of several sources of evidence, we show that the DLI-S demonstrates satisfactory validity and acceptability to measure death literacy in the Swedish context. There are, however, indications that the sub-scales measuring community capacity perform worse in comparison to other sca and may function differently in Sweden than in the original context. The DLI-S has potential to contribute to research on community-based EOL interventions.

Collapse

Xu X, Wang H, Luo J, Zhang C, Konge L, Tang L. Difficulties in using simulation to assess abdominal palpation skills. BMC MEDICAL EDUCATION 2023;23:897. [PMID: 37996904 PMCID: PMC10668513 DOI: 10.1186/s12909-023-04861-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/09/2023] [Indexed: 11/25/2023]

Abstract

OBJECTIVES

Abdominal palpation is an essential examination to diagnose various digestive system diseases. This study aimed to develop an objective and standardized test based on abdominal palpation simulators, and establish a credible pass/fail standard of basic competency.

METHODS

Two tests were designed using the newly developed Jucheng abdominal palpation simulator (test 1) and the AbSim simulator (test 2), respectively. Validity evidence for both tests was gathered according to Messick's contemporary framework by using experts to define test content and then administering the tests in a highly standardized way to participants of different experience. Different simulator setups modified by the built-in software were selected from hepatomegaly, splenomegaly, positive McBurney's sign plus rebound tenderness, gallbladder tenderness (Murphy's sign), pancreas tenderness, and a normal setup without pathologies, with six sets used in test 1 and five sets used in test 2. Different novices and experienced were included in the tests, and test 1 was also administered to an intermediate group. Scores and test time were collected and analyzed statistically.

RESULTS

The internal consistency reliability of test 1 and test 2 showed low Cronbach's alphas of 0.35 and -0.41, respectively. Cronbach's alpha for palpation time across cases were 0.65 for test 1 and 0.76 for test 2. There was no statistical difference in total time spent and total scores among the three groups in test 1 (P-values (ANOVA) were 0.53 and 0.35 respectively), nor between novices and experienced groups in test 2 (P-values (t-test) were 0.13 and 1.0 respectively). It was not relevant to try to establish pass/fail standards due to the low reliability and lack of discriminatory ability of the tests.

CONCLUSIONS

It was not possible to measure abdominal palpation skills in a valid way using either of the two standardized, simulation-based tests in our study. Assessment of the patient's abdomen using palpation is a challenging clinical skill that is difficult to simulate as it highly relies on tactile sensations and adequate responsiveness from the patients.

Collapse

Zhao Y, Jalloh S, Lam PK, Kwarshak YK, Mbuthia D, Misago N, Namedre M, Phương NTB, Qaloewa S, Summers R, Tang K, Tweheyo R, Wills B, Zhang F, Nicodemo C, Gathara D, English M. Development and validation of a new measurement instrument to assess internship experience of medical doctors in low-income and middle-income countries. BMJ Glob Health 2023;8:e013399. [PMID: 37940205 PMCID: PMC10632816 DOI: 10.1136/bmjgh-2023-013399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 10/01/2023] [Indexed: 11/10/2023] Open

Affiliation(s)

Yingxi Zhao NDM Centre for Global Health Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK
Sulaiman Jalloh Ola During Children's Hospital, Freetown, Sierra Leone
Phung Khanh Lam Oxford University Clinical Research Unit, Ho Chi Minh City, Viet Nam University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
Yakubu Kevin Kwarshak Department of Surgery, Division of Urology, Jos University Teaching Hospital, Jos, Plateau State, Nigeria
Daniel Mbuthia KEMRI-Wellcome Trust Research Programme, Nairobi, Kenya
Nadine Misago Interdisciplinary Research Group in Public Health / Doctoral School, University of Burundi, Bujumbura, Burundi
Mesulame Namedre Independent Researcher, Suva, Fiji
Nguyễn Thị Bé Phương University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
Sefanaia Qaloewa College of Medicine, Nursing and Health Sciences, Fiji National University, Suva, Fiji
Richard Summers School of Social Policy, University of Birmingham, Birmingham, UK
Kun Tang Vanke School of Public Health, Tsinghua University, Beijing, People's Republic of China
Raymond Tweheyo Department of Health Policy Planning and Management, Makerere University School of Public Health, Kampala, Uganda Centre for Health Systems Research and Development (CHSRD), The University of Free State, Bloemfontein, South Africa
Bridget Wills NDM Centre for Global Health Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK Oxford University Clinical Research Unit, Ho Chi Minh City, Viet Nam
Fang Zhang Department of Endocrinology and Metabolism, Peking University People's Hospital, Beijing, People's Republic of China
Catia Nicodemo Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK Department of Economics, Verona University, Verona, Italy
David Gathara KEMRI-Wellcome Trust Research Programme, Nairobi, Kenya MARCH Centre, London School of Hygiene and Tropical Medicine, London, UK
Mike English NDM Centre for Global Health Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK KEMRI-Wellcome Trust Research Programme, Nairobi, Kenya

Collapse

Chen JH, Costa P, Gardner AK. Maximizing Standardization While Ensuring Equity: Exploring the Role of Applicant Experiences, Attributes, and Metrics on Performance of a Surgery-Specific Situational Judgment Test. JOURNAL OF SURGICAL EDUCATION 2023;80:1703-1710. [PMID: 37365117 DOI: 10.1016/j.jsurg.2023.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 05/12/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023]

Abstract

BACKGROUND

Situational judgment tests (SJT) are hypothetical but realistic scenario-based assessments that allow residency programs to measure judgment and decision-making among future trainees. A surgery-specific SJT was created to identify highly valued competencies among residency applicants. We aim to demonstrate a stepwise process for validation of this assessment for applicant screening through exploration of two often-overlooked sources of validity evidence - relations with other variables and consequences.

METHODS

This was a prospective multi-institutional study involving 7 general surgery residency programs. All applicants completed the SurgSJT, a 32-item test aimed to measure 10 core competencies: adaptability, attention to detail, communication, dependability, feedback receptivity, integrity, professionalism, resilience, self-directed learning, and team orientation. Performance on the SJT was compared to application data, including race, ethnicity, gender, medical school, and USMLE scores. Medical school rankings were determined based on the 2022 U.S. News & World Report rankings.

RESULTS

In total, 1491 applicants across seven residency programs were invited to complete the SJT. Of these, 1454 (97.5%) candidates completed the assessment. Applicants were predominantly White (57.5%), Asian (21.6%), Hispanic (9.7%), Black (7.3%), and 52% female. A total of 208 medical schools were represented, majority were allopathic (87.1%) and located in United States (98.7%). Less than a quarter of applicants (22.8%; N=337) were from a top 25 school based on U.S. News & World Report rankings for primary care, surgery, or research. Average USMLE Step 1 score was 235 (SD 37) and Step 2 score was 250 (SD 29). Sex, race, ethnicity, and medical school ranking did not significantly impact performance on the SJT. There was no relationship between SJT score and USMLE scores and medical school rankings.

CONCLUSIONS

We demonstrate the process of validity testing and importance of two specific sources of evidence-consequences and relations with other variables, in implementing future educational assessments.

Collapse

Whittaker G, Ghita IA, Taylor M, Salmasi MY, Granato F, Athanasiou T. Current Status of Simulation in Thoracic Surgical Training. Ann Thorac Surg 2023;116:1107-1115. [PMID: 37201622 DOI: 10.1016/j.athoracsur.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/21/2023] [Accepted: 05/01/2023] [Indexed: 05/20/2023]

Lüscher M, Konge L, Tingsgaard P, Barrett TQ, Andersen SAW. Gathering validity evidence for a 3D-printed simulator for training of myringotomy and ventilation tube insertion. Laryngoscope Investig Otolaryngol 2023;8:1357-1364. [PMID: 37899878 PMCID: PMC10601587 DOI: 10.1002/lio2.1123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 06/17/2023] [Accepted: 06/30/2023] [Indexed: 10/31/2023] Open

Frithioff A, Frendø M, Weiss K, Foghsgaard S, Mikkelsen PT, Frederiksen TW, Pedersen DB, Sørensen MS, Andersen SAW. 3-D-Printed Models for Temporal Bone Training: A Validity Study. Otol Neurotol 2023;44:e497-e503. [PMID: 37442608 DOI: 10.1097/mao.0000000000003936] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]

Abstract

OBJECTIVE

3-D printing offers convenient and low-cost mastoidectomy training; nonetheless, training benefits using 3-D-printed temporal bones remain largely unexplored. In this study, we have collected validity evidence for a low-cost, 3-D-printed temporal bone for mastoidectomy training and established a credible pass/fail score for performance on the model.

STUDY DESIGN

A prospective educational study gathering validity evidence using Messick's validity framework.

SETTING

Seven Danish otorhinolaryngology training institutions.

PARTICIPANTS

Eighteen otorhinolaryngology residents (novices) and 11 experienced otosurgeons (experts).

INTERVENTION

Residents and experienced otosurgeons each performed two to three anatomical mastoidectomies on a low-cost, 3-D-printed temporal bone model produced in-house. After drilling, mastoidectomy performances were rated by three blinded experts using a 25-item modified Welling scale (WS).

MAIN OUTCOME MEASURE

Validity evidence using Messick's framework including reliability assessment applying both classical test theory and Generalizability theory.

RESULTS

Novices achieved a mean score of 13.9 points; experienced otosurgeons achieved 23.2 points. Using the contrasting groups method, we established a 21/25-point pass/fail level. The Generalizability coefficient was 0.91, and 75% of the score variance was attributable to participant performance, indicating a high level of assessment reliability. Subsequent D studies revealed that two raters rating one performance or one rater rating two performances were sufficiently reliable for high-stakes assessment.

CONCLUSION

Validity evidence supports using a low-cost, 3-D-printed model for mastoidectomy training. The model can be printed in-house using consumer-grade 3-D printers and serves as an additional training tool in the temporal bone curriculum. For competency-based training, we established a cut-off score of 21 of 25 WS points using the contrasting groups method.

Collapse

Marceau M, Vachon Lachiver É, Lambert D, Daoust J, Dion V, Langlois MF, McConnell M, Thomas A, St-Onge C. Assessment Practices in Continuing Professional Development Activities in Health Professions: A Scoping Review. THE JOURNAL OF CONTINUING EDUCATION IN THE HEALTH PROFESSIONS 2023;44:81-89. [PMID: 37490015 DOI: 10.1097/ceh.0000000000000507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]

Liebert CA, Melcer EF, Eddington H, Trickey A, Shields S, Lee M, Korndorffer JR, Bekele A, Wren SM, Lin DT. Correlation of Performance on ENTRUST and Traditional Oral Objective Structured Clinical Examination for High-Stakes Assessment in the College of Surgeons of East, Central, and Southern Africa. J Am Coll Surg 2023;237:117-127. [PMID: 37144790 DOI: 10.1097/xcs.0000000000000740] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Abstract

BACKGROUND

To address the global need for accessible evidence-based tools for competency-based education, we developed ENTRUST, an innovative online virtual patient simulation platform to author and securely deploy case scenarios to assess surgical decision-making competence.

STUDY DESIGN

In partnership with the College of Surgeons of East, Central, and Southern Africa, ENTRUST was piloted during the Membership of the College of Surgeons (MCS) 2021 examination. Examinees (n = 110) completed the traditional 11-station oral objective structured clinical examinations (OSCEs), followed by 3 ENTRUST cases, authored to query similar clinical content of 3 corresponding OSCE cases. ENTRUST scores were analyzed for associations with MCS Examination outcome using independent sample t tests. Correlation of ENTRUST scores to MCS Examination Percentage and OSCE station scores was calculated with Pearson correlations. Bivariate and multivariate analyses were performed to evaluate predictors of performance.

RESULTS

ENTRUST performance was significantly higher in examinees who passed the MCS examination compared with those who failed (p < 0.001). The ENTRUST score was positively correlated with MCS Examination Percentage (p < 0.001) and combined OSCE station scores (p < 0.001). On multivariate analysis, there was a strong association between MCS Examination Percentage and ENTRUST Grand Total Score (p < 0.001), Simulation Total Score (p = 0.018), and Question Total Score (p < 0.001). Age was a negative predictor for ENTRUST Grand Total and Simulation Total Score, but not for Question Total Score. Sex, native language status, and intended specialty were not associated with performance on ENTRUST.

CONCLUSIONS

This study demonstrates feasibility and initial validity evidence for the use of ENTRUST in a high-stakes examination context for assessment of surgical decision-making. ENTRUST holds potential as an accessible learning and assessment platform for surgical trainees worldwide.

Collapse

Goldman MP, Auerbach MA. Autonomy Is Desired, Entrustment Is What Matters. Hosp Pediatr 2023;13:e150-e152. [PMID: 37153966 DOI: 10.1542/hpeds.2023-007205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Higham H, Greig P, Crabtree N, Hadjipavlou G, Young D, Vincent C. A study of validity and usability evidence for non-technical skills assessment tools in simulated adult resuscitation scenarios. BMC MEDICAL EDUCATION 2023;23:153. [PMID: 36906567 PMCID: PMC10007667 DOI: 10.1186/s12909-023-04108-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 02/15/2023] [Indexed: 06/18/2023]

Abstract

BACKGROUND

Non-technical skills (NTS) assessment tools are widely used to provide formative and summative assessment for healthcare professionals and there are now many of them. This study has examined three different tools designed for similar settings and gathered evidence to test their validity and usability.

METHODS

Three NTS assessment tools designed for use in the UK were used by three experienced faculty to review standardized videos of simulated cardiac arrest scenarios: ANTS (Anesthetists' Non-Technical Skills), Oxford NOTECHS (Oxford NOn-TECHnical Skills) and OSCAR (Observational Skill based Clinical Assessment tool for Resuscitation). Internal consistency, interrater reliability and quantitative and qualitative analysis of usability were analyzed for each tool.

RESULTS

Internal consistency and interrater reliability (IRR) varied considerably for the three tools across NTS categories and elements. Intraclass correlation scores of three expert raters ranged from poor (task management in ANTS [0.26] and situation awareness (SA) in Oxford NOTECHS [0.34]) to very good (problem solving in Oxford NOTECHS [0.81] and cooperation [0.84] and SA [0.87] in OSCAR). Furthermore, different statistical tests of IRR produced different results for each tool. Quantitative and qualitative examination of usability also revealed challenges in using each tool.

CONCLUSIONS

The lack of standardization of NTS assessment tools and training in their use is unhelpful for healthcare educators and students. Educators require ongoing support in the use of NTS assessment tools for the evaluation of individual healthcare professionals or healthcare teams. Summative or high-stakes examinations using NTS assessment tools should be undertaken with at least two assessors to provide consensus scoring. In light of the renewed focus on simulation as an educational tool to support and enhance training recovery in the aftermath of COVID-19, it is even more important that assessment of these vital skills is standardized, simplified and supported with adequate training.

Collapse

Bursac I, Mema B. Assessment in Simulation versus Clinical Context: A Different Lens for Different Moments. ATS Sch 2023;4:12-19. [PMID: 37089681 PMCID: PMC10117395 DOI: 10.34197/ats-scholar.2022-0040cm] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 10/12/2022] [Indexed: 04/25/2023] Open

El Hussein MT, Hakkola J. Valid and Reliable Tools to Measure Safety of Nursing Students During Simulated Learning Experiences: A Scoping Review. TEACHING AND LEARNING IN NURSING 2023. [DOI: 10.1016/j.teln.2022.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Lang F, Willuth E, Haney CM, Felinska EA, Wennberg E, Kowalewski KF, Schmidt MW, Wagner M, Müller-Stich BP, Nickel F. Serious gaming and virtual reality in the multimodal training of laparoscopic inguinal hernia repair: a randomized crossover study. Surg Endosc 2023;37:2050-2061. [PMID: 36289083 PMCID: PMC10017619 DOI: 10.1007/s00464-022-09733-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 10/11/2022] [Indexed: 11/26/2022]

Abstract

BACKGROUND

The aim of this study was to assess the transferability of surgical skills for the laparoscopic hernia module between the serious game Touch Surgery™ (TS) and the virtual reality (VR) trainer Lap Mentor™. Furthermore, this study aimed to collect validity evidence and to discuss "sources of validity evidence" for the findings using the laparoscopic inguinal hernia module on TS.

METHODS

In a randomized crossover study, medical students (n = 40) in their clinical years performed laparoscopic inguinal hernia modules on TS and the VR trainer. TS group started with "Laparoscopic Inguinal Hernia Module" on TS (phase 1: Preparation, phase 2: Port Placement and Hernia Repair), performed the module first in training, then in test mode until proficiency was reached. VR group started with "Inguinal Hernia Module" on the VR trainer (task 1: Anatomy Identification, task 2: Incision and Dissection) and also performed the module until proficiency. Once proficiency reached in the first modality, the groups performed the other training modality until reaching proficiency. Primary endpoint was the number of attempts needed to achieve proficiency for each group for each task/phase.

RESULTS

Students starting with TS needed significantly less attempts to reach proficiency for task 1 on the VR trainer than students who started with the VR trainer (TS = 2.7 ± 0.6 vs. VR = 3.2 ± 0.7; p = 0.028). No significant differences for task 2 were observed between groups (TS = 2.3 ± 1.1 vs. VR = 2.1 ± 0.8; p = 0.524). For both phases on TS, no significant skill transfer from the VR trainer to TS was observed. Aspects of validity evidence for the module on TS were collected.

CONCLUSION

The results show that TS brought additional benefit to improve performances on the VR trainer for task 1 but not for task 2. Skill transfer from the VR trainer to TS could not be shown. VR and TS should thus be used in combination with TS first in multimodal training to ensure optimal training conditions.

Collapse

Jørgensen RJ, Olsen RG, Svendsen MBS, Stadeager M, Konge L, Bjerrum F. Comparing Simulator Metrics and Rater Assessment of Laparoscopic Suturing Skills. JOURNAL OF SURGICAL EDUCATION 2023;80:302-310. [PMID: 37683093 DOI: 10.1016/j.jsurg.2022.09.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 08/17/2022] [Accepted: 09/25/2022] [Indexed: 09/10/2023]

Buléon C, Mattatia L, Minehart RD, Rudolph JW, Lois FJ, Guillouet E, Philippon AL, Brissaud O, Lefevre-Scelles A, Benhamou D, Lecomte F, group TSAWS, Bellot A, Crublé I, Philippot G, Vanderlinden T, Batrancourt S, Boithias-Guerot C, Bréaud J, de Vries P, Sibert L, Sécheresse T, Boulant V, Delamarre L, Grillet L, Jund M, Mathurin C, Berthod J, Debien B, Gacia O, Der Sahakian G, Boet S, Oriot D, Chabot JM. Simulation-based summative assessment in healthcare: an overview of key principles for practice. ADVANCES IN SIMULATION (LONDON, ENGLAND) 2022;7:42. [PMID: 36578052 PMCID: PMC9795938 DOI: 10.1186/s41077-022-00238-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 11/30/2022] [Indexed: 12/29/2022]

Abstract

BACKGROUND

Healthcare curricula need summative assessments relevant to and representative of clinical situations to best select and train learners. Simulation provides multiple benefits with a growing literature base proving its utility for training in a formative context. Advancing to the next step, "the use of simulation for summative assessment" requires rigorous and evidence-based development because any summative assessment is high stakes for participants, trainers, and programs. The first step of this process is to identify the baseline from which we can start.

METHODS

First, using a modified nominal group technique, a task force of 34 panelists defined topics to clarify the why, how, what, when, and who for using simulation-based summative assessment (SBSA). Second, each topic was explored by a group of panelists based on state-of-the-art literature reviews technique with a snowball method to identify further references. Our goal was to identify current knowledge and potential recommendations for future directions. Results were cross-checked among groups and reviewed by an independent expert committee.

RESULTS

Seven topics were selected by the task force: "What can be assessed in simulation?", "Assessment tools for SBSA", "Consequences of undergoing the SBSA process", "Scenarios for SBSA", "Debriefing, video, and research for SBSA", "Trainers for SBSA", and "Implementation of SBSA in healthcare". Together, these seven explorations provide an overview of what is known and can be done with relative certainty, and what is unknown and probably needs further investigation. Based on this work, we highlighted the trustworthiness of different summative assessment-related conclusions, the remaining important problems and questions, and their consequences for participants and institutions of how SBSA is conducted.

CONCLUSION

Our results identified among the seven topics one area with robust evidence in the literature ("What can be assessed in simulation?"), three areas with evidence that require guidance by expert opinion ("Assessment tools for SBSA", "Scenarios for SBSA", "Implementation of SBSA in healthcare"), and three areas with weak or emerging evidence ("Consequences of undergoing the SBSA process", "Debriefing for SBSA", "Trainers for SBSA"). Using SBSA holds much promise, with increasing demand for this application. Due to the important stakes involved, it must be rigorously conducted and supervised. Guidelines for good practice should be formalized to help with conduct and implementation. We believe this baseline can direct future investigation and the development of guidelines.

Collapse

Affiliation(s)

Clément Buléon grid.460771.30000 0004 1785 9671Department of Anesthesiology, Intensive Care and Perioperative Medicine, Caen Normandy University Hospital, 6th Floor, Caen, France ,2grid.412043.00000 0001 2186 4076Medical School, University of Caen Normandy, Caen, France ,3grid.419998.40000 0004 0452 5971Center for Medical Simulation, Boston, MA USA
Laurent Mattatia grid.411165.60000 0004 0593 8241Department of Anesthesiology, Intensive Care and Perioperative Medicine, Nîmes University Hospital, Nîmes, France
Rebecca D. Minehart grid.419998.40000 0004 0452 5971Center for Medical Simulation, Boston, MA USA ,5grid.32224.350000 0004 0386 9924Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA USA ,6grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
Jenny W. Rudolph grid.419998.40000 0004 0452 5971Center for Medical Simulation, Boston, MA USA ,5grid.32224.350000 0004 0386 9924Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA USA ,6grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
Fernande J. Lois grid.4861.b0000 0001 0805 7253Department of Anesthesiology, Intensive Care and Perioperative Medicine, Liège University Hospital, Liège, Belgique
Erwan Guillouet grid.460771.30000 0004 1785 9671Department of Anesthesiology, Intensive Care and Perioperative Medicine, Caen Normandy University Hospital, 6th Floor, Caen, France ,2grid.412043.00000 0001 2186 4076Medical School, University of Caen Normandy, Caen, France
Anne-Laure Philippon grid.411439.a0000 0001 2150 9058Department of Emergency Medicine, Pitié Salpêtrière University Hospital, APHP, Paris, France
Olivier Brissaud grid.42399.350000 0004 0593 7118Department of Pediatric Intensive Care, Pellegrin University Hospital, Bordeaux, France
Antoine Lefevre-Scelles grid.41724.340000 0001 2296 5231Department of Emergency Medicine, Rouen University Hospital, Rouen, France
Dan Benhamou grid.413784.d0000 0001 2181 7253Department of Anesthesiology, Intensive Care and Perioperative Medicine, Kremlin Bicêtre University Hospital, APHP, Paris, France
François Lecomte grid.411784.f0000 0001 0274 3893Department of Emergency Medicine, Cochin University Hospital, APHP, Paris, France
the SoFraSimS Assessment with simulation group
Anne Bellot
Isabelle Crublé
Guillaume Philippot
Thierry Vanderlinden
Sébastien Batrancourt
Claire Boithias-Guerot
Jean Bréaud
Philine de Vries
Louis Sibert
Thierry Sécheresse
Virginie Boulant
Louis Delamarre
Laurent Grillet
Marianne Jund
Christophe Mathurin
Jacques Berthod
Blaise Debien
Olivier Gacia
Guillaume Der Sahakian
Sylvain Boet
Denis Oriot
Jean-Michel Chabot

Collapse

McNamara L, Scott K, Boyd RN, Farmer E, Webb A, Bosanquet M, Nguyen K, Novak I. Can web-based implementation interventions improve physician early diagnosis of cerebral palsy? Protocol for a 3-arm parallel superiority randomised controlled trial and cost-consequence analysis comparing adaptive and non-adaptive virtual patient instructional designs with control to evaluate effectiveness on physician behaviour, diagnostic skills and patient outcomes. BMJ Open 2022;12:e063558. [PMID: 36410832 PMCID: PMC9680174 DOI: 10.1136/bmjopen-2022-063558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 10/18/2022] [Indexed: 11/23/2022] Open

Abstract

INTRODUCTION

Cerebral palsy (CP) is the most common childhood physical disability. Accurate diagnosis before 6 months is possible using predictive tools and decision-making skills. Yet diagnosis is typically made at 12-24 months of age, hindering access to early interventions that improve functional outcomes. Change in practice is required for physicians in key diagnostic behaviours. This study aims to close the identified research-practice gap and increase accurate CP diagnosis before 6 months of age through tailored web-based implementation interventions. This trial will determine whether adaptive e-learning using virtual patients, targeting CP diagnostic behaviours and clinical decision-making skills, effectively changes physician behaviour and practice compared with non-adaptive e-learning instructional design or control.

METHODS AND ANALYSIS

This study is a 3-arm parallel superiority randomised controlled trial of two tailored e-learning interventions developed to expedite physician CP diagnosis. The trial will compare adaptive (arm 1) and non-adaptive (arm 2) instructional designs with waitlist control (arm 3) to evaluate change in physician behaviour, skills and diagnostic practice. A sample size of 275 paediatric physicians enables detection of small magnitude effects (0.2) of primary outcomes between intervention comparators with 90% power (α=0.05), allowing for 30% attrition. Barrier analysis, Delphi survey, Behaviour Change Wheel and learning theory frameworks guided the intervention designs. Adaptive and non-adaptive video and navigation sequences utilising virtual patients and clinical practice guideline content were developed, integrating formative key features assessment targeting clinical decision-making skills relative to CP diagnosis.Physician outcomes will be evaluated based on postintervention key feature examination scores plus preintervention/postintervention behavioural intentions and practice measures. Associations with CP population registers will evaluate real-world diagnostic patient outcomes. Intervention costs will be reported in a cost-consequence analysis from funders' and societal perspectives.

ETHICS AND DISSEMINATION

Ethics approved from The University of Sydney (Project number 2021/386). Results will be disseminated through peer-reviewed journals and scientific conferences.

TRIAL REGISTRATION NUMBER

Australian New Zealand Clinical Trials Registry: ACTRN 12622000184774.

Collapse

Kinnear B, Schumacher DJ, Driessen EW, Varpio L. How argumentation theory can inform assessment validity: A critical review. MEDICAL EDUCATION 2022;56:1064-1075. [PMID: 35851965 PMCID: PMC9796688 DOI: 10.1111/medu.14882] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 07/07/2022] [Accepted: 07/15/2022] [Indexed: 05/21/2023]

Abstract

INTRODUCTION

Many health professions education (HPE) scholars frame assessment validity as a form of argumentation in which interpretations and uses of assessment scores must be supported by evidence. However, what are purported to be validity arguments are often merely clusters of evidence without a guiding framework to evaluate, prioritise, or debate their merits. Argumentation theory is a field of study dedicated to understanding the production, analysis, and evaluation of arguments (spoken or written). The aim of this study is to describe argumentation theory, articulating the unique insights it can offer to HPE assessment, and presenting how different argumentation orientations can help reconceptualize the nature of validity in generative ways.

METHODS

The authors followed a five-step critical review process consisting of iterative cycles of focusing, searching, appraising, sampling, and analysing the argumentation theory literature. The authors generated and synthesised a corpus of manuscripts on argumentation orientations deemed to be most applicable to HPE.

RESULTS

We selected two argumentation orientations that we considered particularly constructive for informing HPE assessment validity: New rhetoric and informal logic. In new rhetoric, the goal of argumentation is to persuade, with a focus on an audience's values and standards. Informal logic centres on identifying, structuring, and evaluating arguments in real-world settings, with a variety of normative standards used to evaluate argument validity.

DISCUSSION

Both new rhetoric and informal logic provide philosophical, theoretical, or practical groundings that can advance HPE validity argumentation. New rhetoric's foregrounding of audience aligns with HPE's social imperative to be accountable to specific stakeholders such as the public and learners. Informal logic provides tools for identifying and structuring validity arguments for analysis and evaluation.

Collapse

Liebert CA, Melcer EF, Keehl O, Eddington H, Trickey AW, Lee M, Tsai J, Camacho F, Merrell SB, Korndorffer JR, Lin DT. Validity Evidence for ENTRUST as an Assessment of Surgical Decision-Making for the Inguinal Hernia Entrustable Professional Activity (EPA). JOURNAL OF SURGICAL EDUCATION 2022;79:e202-e212. [PMID: 35909070 DOI: 10.1016/j.jsurg.2022.07.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 06/02/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]

Abstract

OBJECTIVE

As the American Board of Surgery (ABS) moves toward implementation of Entrustable Professional Activities (EPAs), there is a growing need for objective evaluation of readiness for entrustment of residents. This requires not only assessment of technical skills and knowledge, but also surgical decision-making in preoperative, intraoperative, and postoperative settings. We developed and piloted an Inguinal Hernia EPA Assessment on ENTRUST, a serious game-based online virtual patient simulation platform to assess trainees' decision-making competence.

DESIGN

This is a prospective analysis of resident performance on the ENTRUST Inguinal Hernia EPA Assessment using bivariate analyses.

SETTING

This study was conducted at an academic institution in a proctored exam setting.

PARTICIPANTS

Forty-three surgical residents completed the ENTRUST Inguinal Hernia EPA Assessment.

RESULTS

Four case scenarios for the Inguinal Hernia EPA and corresponding scoring algorithms were iteratively developed by expert consensus aligned with ABS EPA descriptions and functions. ENTRUST Inguinal Hernia Grand Total Score was positively correlated with PGY-level (p < 0.0001). Preoperative, Intraoperative, and Postoperative Total Scores were also positively correlated with PGY-level (p = 0.001, p = 0.006, and p = 0.038, respectively). Total Case Scores were positively correlated with PGY-level for cases representing elective unilateral inguinal hernia (p = 0.0004), strangulated inguinal hernia (p < 0.0001), and elective bilateral inguinal hernia (p = 0.0003). Preoperative Sub-Scores were positively correlated with PGY-level for all cases (p < 0.01). Intraoperative Sub-Scores were positively correlated with PGY-level for strangulated inguinal hernia and bilateral inguinal hernia (p = 0.0007 and p = 0.0002, respectively). Grand Total Score and Intraoperative Sub-Score were correlated with prior operative experience (p < 0.0001). Prior video game experience did not correlate with performance on ENTRUST (p = 0.56).

CONCLUSIONS

Performance on the ENTRUST Inguinal Hernia EPA Assessment was positively correlated to PGY-level and prior inguinal hernia operative performance, providing initial validity evidence for its use as an objective assessment for surgical decision-making. The ENTRUST platform holds potential as tool for assessment of ABS EPAs in surgical residency programs.

Collapse

Oviedo-Peñata CA, Giraldo Mejía GE, Riaño-Benavides CH, Maldonado-Estrada JG, Lemos Duque JD. Development and validation of a composed canine simulator for advanced veterinary laparoscopic training. Front Vet Sci 2022;9:936144. [PMID: 36325095 PMCID: PMC9621388 DOI: 10.3389/fvets.2022.936144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 09/05/2022] [Indexed: 11/04/2022] Open

Vatral C, Biswas G, Cohn C, Davalos E, Mohammed N. Using the DiCoT framework for integrated multimodal analysis in mixed-reality training environments. Front Artif Intell 2022;5:941825. [PMID: 35937140 PMCID: PMC9353401 DOI: 10.3389/frai.2022.941825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/27/2022] [Indexed: 11/17/2022] Open

Marceau M, St-Onge C, Gallagher F, Young M. Validity as a social imperative: users' and leaders' perceptions. CANADIAN MEDICAL EDUCATION JOURNAL 2022;13:22-36. [PMID: 35875440 PMCID: PMC9297243 DOI: 10.36834/cmej.73518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Whalen AM, Merves MH, Kharayat P, Barry JS, Glass KM, Berg RA, Sawyer T, Nadkarni V, Boyer DL, Nishisaki A. Validity Evidence for a Novel, Comprehensive Bag-Mask Ventilation Assessment Tool. J Pediatr 2022;245:165-171.e13. [PMID: 35181294 DOI: 10.1016/j.jpeds.2022.02.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/20/2022] [Accepted: 02/09/2022] [Indexed: 01/15/2023]

Abstract

OBJECTIVE

To develop a comprehensive competency assessment tool for pediatric bag-mask ventilation (pBMV) and demonstrate multidimensional validity evidence for this tool.

STUDY DESIGN

A novel pBMV assessment tool was developed consisting of 3 components: a 22-item-based checklist (trichotomized response), global rating scale (GRS, 5-point), and entrustment assessment (4-point). Participants' performance in a realistic simulation scenario was video-recorded and assessed by blinded raters. Multidimensional validity evidence for procedural assessment, including evidence for content, response-process, internal structure, and relation to other variables, was assessed. The scores of each scale were compared with training level. Item-based checklist scores also were correlated with GRS and entrustment scores.

RESULTS

Fifty-eight participants (9 medical students, 10 pediatric residents, 18 critical care/neonatology fellows, 21 critical care/neonatology attendings) were evaluated. The pBMV tool was supported by high internal consistency (Cronbach α = 0.867). Inter-rater reliability for the item-based checklist component was acceptable (r = 0.65, P < .0001). The item-based checklist scores differentiated between medical students and other providers (P < .0001), but not by other trainee level. GRS and entrustment scores significantly differentiated between training levels (P < .001). Correlation between skill item-based checklist and GRS was r = 0.489 (P = .0001) and between item-based checklist and entrustment score was r = 0.52 (P < .001). This moderate correlation suggested each component measures pBMV skills differently. The GRS and entrustment scores demonstrated moderate inter-rater reliability (0.42 and 0.46).

CONCLUSIONS

We established evidence of multidimensional validity for a novel entrustment-based pBMV competence assessment tool, incorporating global and entrustment-based assessments. This comprehensive tool can provide learner feedback and aid in entrustment decisions as learners progress through training.

Collapse

Gourbault LJ, Hopley EL, Finch F, Shiels S, Higham H. Non-technical Skills for Medical Students: Validating the Tools of the Trade. Cureus 2022;14:e24776. [PMID: 35676998 PMCID: PMC9167572 DOI: 10.7759/cureus.24776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2022] [Indexed: 12/01/2022] Open

Jacobsen N, Larsen JD, Falster C, Nolsøe CP, Konge L, Graumann O, Laursen CB. Using Immersive Virtual Reality Simulation to Ensure Competence in Contrast-Enhanced Ultrasound. ULTRASOUND IN MEDICINE & BIOLOGY 2022;48:912-923. [PMID: 35227531 DOI: 10.1016/j.ultrasmedbio.2022.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/11/2022] [Accepted: 01/24/2022] [Indexed: 06/14/2023]

Gabarin N, Trinkaus M, Selby R, Goldberg N, Hanif H, Sholzberg M. Coagulation test understanding and ordering by medical trainees: Novel teaching approach. Res Pract Thromb Haemost 2022;6:S2475-0379(22)01240-7. [PMID: 35755855 PMCID: PMC9204395 DOI: 10.1002/rth2.12746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 03/15/2022] [Accepted: 04/05/2022] [Indexed: 11/09/2022] Open

Abstract

Background

Coagulation testing provides a prime opportunity to make an impact on the reduction of unnecessary laboratory test ordering, as there are clear indications for testing. Despite the prothrombin time/international normalized ratio and activated partial thromboplastin time being validated for specific clinical indications, they are frequently ordered as screening tests and often ordered together, suggesting a gap in understanding of coagulation.

Methods

Based on a needs assessment, we developed an online educational module on coagulation for trainees, incorporating education on testing cost, specificity, and sensitivity. Fifty participating resident physicians and medical students completed a validated premodule quiz, postmodule quiz after completion of the module, and a latent quiz 3 to 6 months after to assess longer-term knowledge retention. Trainees provided responses regarding their subjective laboratory test-ordering practices before and after module completion.

Results

The median premodule quiz score was 67% (n = 50; range, 24%-86%) with an increase of 24% to a median postmodule quiz score of 91% (n = 50; range, 64%-100%). There was evidence of sustained knowledge acquisition with a latent quiz median score of 89% (n = 40; range, 67%-100%). Trainees were more likely to consider the sensitivity, specificity, and cost of laboratory investigations before ordering them following completion of the educational module.

Conclusions

Using the expertise of medical educators and incorporating trainee feedback, we employed a novel approach to the teaching of coagulation to maximize its approachability and clinical relevance. We found sustained knowledge retention regarding coagulation and appropriate coagulation test ordering, and a subjective change to trainee ordering habits following participation in our educational intervention.

Collapse

Oh SY, Cook DA, Van Gerven PWM, Nicholson J, Fairbrother H, Smeenk FWJM, Pusic MV. Physician Training for Electrocardiogram Interpretation: A Systematic Review and Meta-Analysis. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2022;97:593-602. [PMID: 35086115 DOI: 10.1097/acm.0000000000004607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Abstract

PURPOSE

Using electrocardiogram (ECG) interpretation as an example of a widely taught diagnostic skill, the authors conducted a systematic review and meta-analysis to demonstrate how research evidence on instruction in diagnosis can be synthesized to facilitate improvement of educational activities (instructional modalities, instructional methods, and interpretation approaches), guide the content and specificity of such activities, and provide direction for research.

METHOD

The authors searched PubMed/MEDLINE, Embase, Cochrane CENTRAL, PsycInfo, CINAHL, ERIC, and Web of Science databases through February 21, 2020, for empirical investigations of ECG interpretation training enrolling medical students, residents, or practicing physicians. They appraised study quality with the Medical Education Research Study Quality Instrument and pooled standardized mean differences (SMDs) using random effects meta-analysis.

RESULTS

Of 1,002 articles identified, 59 were included (enrolling 17,251 participants). Among 10 studies comparing instructional modalities, 8 compared computer-assisted and face-to-face instruction, with pooled SMD 0.23 (95% CI, 0.09, 0.36) indicating a small, statistically significant difference favoring computer-assisted instruction. Among 19 studies comparing instructional methods, 5 evaluated individual versus group training (pooled SMD -0.35 favoring group study [95% CI, -0.06, -0.63]), 4 evaluated peer-led versus faculty-led instruction (pooled SMD 0.38 favoring peer instruction [95% CI, 0.01, 0.74]), and 4 evaluated contrasting ECG features (e.g., QRS width) from 2 or more diagnostic categories versus routine examination of features within a single ECG or diagnosis (pooled SMD 0.23 not significantly favoring contrasting features [95% CI, -0.30, 0.76]). Eight studies compared ECG interpretation approaches, with pooled SMD 0.92 (95% CI, 0.48, 1.37) indicating a large, statistically significant effect favoring more systematic interpretation approaches.

CONCLUSIONS

Some instructional interventions appear to improve learning in ECG interpretation; however, many evidence-based instructional strategies are insufficiently investigated. The findings may have implications for future research and design of training to improve skills in ECG interpretation and other types of visual diagnosis.

Collapse

Cook DA, Oh SY, Pusic MV. Assessments of Physicians' Electrocardiogram Interpretation Skill: A Systematic Review. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2022;97:603-615. [PMID: 33913438 DOI: 10.1097/acm.0000000000004140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Abstract

PURPOSE

To identify features of instruments, test procedures, study design, and validity evidence in published studies of electrocardiogram (ECG) skill assessments.

METHOD

The authors conducted a systematic review, searching MEDLINE, Embase, Cochrane CENTRAL, PsycINFO, CINAHL, ERIC, and Web of Science databases in February 2020 for studies that assessed the ECG interpretation skill of physicians or medical students. Two authors independently screened articles for inclusion and extracted information on test features, study design, risk of bias, and validity evidence.

RESULTS

The authors found 85 eligible studies. Participants included medical students (42 studies), postgraduate physicians (48 studies), and practicing physicians (13 studies). ECG selection criteria were infrequently reported: 25 studies (29%) selected single-diagnosis or straightforward ECGs; 5 (6%) selected complex cases. ECGs were selected by generalists (15 studies [18%]), cardiologists (10 studies [12%]), or unspecified experts (4 studies [5%]). The median number of ECGs per test was 10. The scoring rubric was defined by 2 or more experts in 32 studies (38%), by 1 expert in 5 (6%), and using clinical data in 5 (6%). Scoring was performed by a human rater in 34 studies (40%) and by computer in 7 (8%). Study methods were appraised as low risk of selection bias in 16 studies (19%), participant flow bias in 59 (69%), instrument conduct and scoring bias in 20 (24%), and applicability problems in 56 (66%). Evidence of test score validity was reported infrequently, namely evidence of content (39 studies [46%]), internal structure (11 [13%]), relations with other variables (10 [12%]), response process (2 [2%]), and consequences (3 [4%]).

CONCLUSIONS

ECG interpretation skill assessments consist of idiosyncratic instruments that are too short, composed of items of obscure provenance, with incompletely specified answers, graded by individuals with underreported credentials, yielding scores with limited interpretability. The authors suggest several best practices.

Collapse

Cullen MW, Klarich KW, Baldwin KM, Engstler GJ, Mandrekar J, Scott CG, Beckman TJ. Validity of a cardiology fellow performance assessment: reliability and associations with standardized examinations and awards. BMC MEDICAL EDUCATION 2022;22:177. [PMID: 35291995 PMCID: PMC8925146 DOI: 10.1186/s12909-022-03239-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 03/03/2022] [Indexed: 06/14/2023]

Abstract

BACKGROUND

Most work on the validity of clinical assessments for measuring learner performance in graduate medical education has occurred at the residency level. Minimal research exists on the validity of clinical assessments for measuring learner performance in advanced subspecialties. We sought to determine validity characteristics of cardiology fellows' assessment scores during subspecialty training, which represents the largest subspecialty of internal medicine. Validity evidence included item content, internal consistency reliability, and associations between faculty-of-fellow clinical assessments and other pertinent variables.

METHODS

This was a retrospective validation study exploring the domains of content, internal structure, and relations to other variables validity evidence for scores on faculty-of-fellow clinical assessments that include the 10-item Mayo Cardiology Fellows Assessment (MCFA-10). Participants included 7 cardiology fellowship classes. The MCFA-10 item content included questions previously validated in the assessment of internal medicine residents. Internal structure evidence was assessed through Cronbach's α. The outcome for relations to other variables evidence was overall mean of faculty-of-fellow assessment score (scale 1-5). Independent variables included common measures of fellow performance.

FINDINGS

Participants included 65 cardiology fellows. The overall mean ± standard deviation faculty-of-fellow assessment score was 4.07 ± 0.18. Content evidence for the MCFA-10 scores was based on published literature and core competencies. Cronbach's α was 0.98, suggesting high internal consistency reliability and offering evidence for internal structure validity. In multivariable analysis to provide relations to other variables evidence, mean assessment scores were independently associated with in-training examination scores (beta = 0.088 per 10-point increase; p = 0.05) and receiving a departmental or institutional award (beta = 0.152; p = 0.001). Assessment scores were not associated with educational conference attendance, compliance with completion of required evaluations, faculty appointment upon completion of training, or performance on the board certification exam. R² for the multivariable model was 0.25.

CONCLUSIONS

These findings provide sound validity evidence establishing item content, internal consistency reliability, and associations with other variables for faculty-of-fellow clinical assessment scores that include MCFA-10 items during cardiology fellowship. Relations to other variables evidence included associations of assessment scores with performance on the in-training examination and receipt of competitive awards. These data support the utility of the MCFA-10 as a measure of performance during cardiology training and could serve as the foundation for future research on the assessment of subspecialty learners.

Collapse

Ensuring competence in ultrasound-guided procedures-a validity study of a newly developed assessment tool. Eur Radiol 2022;32:4954-4966. [PMID: 35195745 DOI: 10.1007/s00330-022-08542-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 11/17/2021] [Accepted: 12/16/2021] [Indexed: 11/04/2022]

Abstract

OBJECTIVES

To investigate the validity of the Interventional Ultrasound Skills Evaluation (IUSE) tool for assessment of procedural competence in ultrasound-guided procedures in a clinical environment, including a pass/fail score.

METHODS

Novices and experienced radiologists were recruited from four hospitals and were observed and assessed while performing ultrasound-guided procedures. Performances were assessed using the IUSE tool by two independent raters. Validity evidence was gathered in accordance with Messick's framework: response process was ensured by standardisation of written rater instructions. Internal structure was explored using Cronbach's alpha for internal consistency reliability; inter-rater reliability was calculated as Pearson's r independently across all ratings, and test-retest reliability was reported using Cronbach's alpha. Relationship to other variables was investigated by comparing performances of the participants in each group. Consequences evidence was explored by calculating a pass/fail standard using the contrasting groups method.

RESULTS

Six novices and twelve experienced radiologists were enrolled. The IUSE tool had high internal consistency (Cronbach's alpha = 0.96, high inter-rater reliability (Pearson's r = 0.95), and high test-retest reliability (Cronbach's alpha = 0.98), and the mean score was 33.28 for novices and 59.25 for experienced with a highly significant difference (p value < 0.001). The pass/fail score was set at 55 resulting in no false positives or false negatives.

CONCLUSIONS

Validity evidence from multiple sources supports the use of the IUSE tool for assessment of competence in ultrasound-guided procedures in a clinical environment and its use in high-stakes assessment such as certification. A credible pass/fail criterion was established to inform decision-making.

KEY POINTS

• A multi-site validity investigation established that the Interventional Ultrasound Skills Evaluation (IUSE) tool can be used to assess procedural competence in ultrasound-guided procedures. • Validity evidence was gathered according to Messick's framework validity from the following sources: response process, internal structure, relationship to other variables, and consequences evidence. • The IUSE tool can be used for both formative and summative assessment, and a credible pass/fail score was established to help inform decision-making such as certification.

Collapse

Validity evidence for the Anesthesia Clinical Encounter Assessment (ACEA) tool to support competency-based medical education. Br J Anaesth 2022;128:691-699. [PMID: 35027168 DOI: 10.1016/j.bja.2021.12.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 11/23/2021] [Accepted: 12/10/2021] [Indexed: 01/01/2023] Open

Abstract

BACKGROUND

Workplace-based assessment (WBA) is key to a competency-based assessment strategy. Concomitantly with our programme's launch of competency-based medical education, we developed an entrustment-based WBA, the Anesthesia Clinical Encounter Assessment (ACEA), to assess readiness for independent practice of competencies essential to perioperative patient care. This study aimed to examine validity evidence of the ACEA during postgraduate anaesthesiology training.

METHODS

The ACEA comprises an eight-item global rating scale (GRS), an overall independence rating, an eight-item checklist, and case details. ACEA data were extracted for University of Toronto anaesthesia residents from July 2017 to January 2020 from the programme's online assessment portal. Validity evidence was generated following Messick's validity framework, including response process, internal structure, relations with other variables, and consequences.

RESULTS

We analysed 8664 assessments for 137 residents completed by 342 assessors. From generalisability analysis, 10 independent observations (two assessments each from five assessors) were sufficient to achieve a reliability threshold of ≥0.70 for in-training assessments. A composite GRS score of 3.65/5 provided optimal sensitivity (93.6%) and specificity (90.8%) for determining entrustment on receiver operator characteristic curve analysis. Test-retest reliability was high (intraclass correlation coefficient [ICC_2,1]=0.81) for matched assessments within 14 days of each other. Composite GRS scores differed significantly between residents based on their training level (P<0.0001) and correlated highly with overall independence (0.91, P<0.001). The internal consistency of the GRS (α=0.96) was excellent.

CONCLUSIONS

This study supports the validity of the ACEA for assessing the competence of residents performing perioperative care and supports its use in competency-based anaesthesiology training.

Collapse

Kalet A, Ark TK, Monson V, Song HS, Buckvar-Keltz L, Harnik V, Yingling S, Rivera R, Tewksbury L, Lusk P, Crowe R. Does a measure of Medical Professional Identity Formation predict communication skills performance? PATIENT EDUCATION AND COUNSELING 2021;104:3045-3052. [PMID: 33896685 DOI: 10.1016/j.pec.2021.03.040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 03/05/2021] [Accepted: 03/31/2021] [Indexed: 06/12/2023]