1
|
Hall AM, Gray A, Ragsdale JW. Making narrative feedback meaningful. CLINICAL TEACHER 2024; 21:e13766. [PMID: 38651603 DOI: 10.1111/tct.13766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 03/12/2024] [Indexed: 04/25/2024]
Abstract
BACKGROUND Narrative written feedback given to students by faculty often fails to identify areas for improvement and recommended actions to lead to this improvement. When these elements are missing, it is challenging for students to improve and for medical schools to use narrative feedback in promotion decisions, to guide coaching plans and to pass on meaningful information to residency programs. Large-group faculty development has improved narrative written feedback, but less is known about individualised faculty development to supplement large-group sessions. To fill this gap, we built a curriculum with general and individualised faculty development to improve narrative written feedback from Internal Medicine faculty to clerkship students. APPROACH We used Kern's steps to build a curriculum with general and individualised one-on-one faculty development to improve the problem of inadequate narrative written feedback. We used a novel narrative feedback rubric for pre and post-intervention faculty scores. RESULTS/FINDINGS/EVALUATION Through general and individualised one-on-one faculty development with peer comparison scores, we were able to improve narrative written feedback from 3.7/6 to 4.6/6, for an increase of 23%. IMPLICATIONS We found our faculty development program effective in improving feedback and was easy to implement. Our rubric was easy to use, and faculty were receptive to feedback in one-on-one meetings. We plan to extend this work locally to other divisions/departments and into graduate medical education; it should also be easily extended to other medical disciplines or health professions.
Collapse
Affiliation(s)
- Alan M Hall
- Departments of Internal Medicine and Pediatrics, University of Kentucky College of Medicine, Lexington, Kentucky, USA
| | - Adam Gray
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, Kentucky, USA
| | - John W Ragsdale
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, Kentucky, USA
| |
Collapse
|
2
|
Sekar DR, Ehrenberger KA, Dakroub A, Rothenberger S, Grau T, Carter AE. What/Why/When/Where/How Framework and Faculty Development Workshop to Improve the Utility of Narrative Evaluations for Assessing Internal Medicine Residents. MEDEDPORTAL : THE JOURNAL OF TEACHING AND LEARNING RESOURCES 2024; 20:11420. [PMID: 39081631 PMCID: PMC11286767 DOI: 10.15766/mep_2374-8265.11420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/14/2024] [Indexed: 08/02/2024]
Abstract
Introduction Clinical competency committees (CCCs) rely on narrative evaluations to assess resident competency. Despite the emphasis on these evaluations, their utility is frequently hindered by lack of sufficient detail for use by CCCs. Prior resources have sought to improve specificity of comments and use of evaluations by residents but not their utility for CCCs in assessing trainee performance. Methods We developed a 1-hour faculty development workshop focused on a newly devised framework for Department of Medicine faculty supervising internal medicine residents. The what/why/when/where/how framework highlighted key features of useful narrative evaluations: behaviors of strength and growth, contextualized observations, improvement over time, and actionable next steps. Workshop sessions were implemented at a large multisite internal medicine residency program. We assessed the workshop by measuring attendee confidence and skill in writing narrative evaluations useful for CCCs. Skill was assessed through a rubric adapted from literature on the utility of narrative evaluations. Results Fifty-four participants started the presurvey, and 33 completed the workshop, for a response rate of 61%. Participant confidence improved pre-, post-, and 3 months postworkshop. Total utility scores improved in mock evaluations from 12.4 to 15.5 and in real evaluations from 13.7 to 15.0, but only some subcomponent scores improved, with fewer improving in the real evaluations. Discussion A short workshop focusing on our framework improves confidence and utility of narrative evaluations of internal medicine residents for use by CCCs. Next steps should include developing more challenging components of narrative evaluations for continued improvement in trainee performance and faculty assessment.
Collapse
Affiliation(s)
- Dheepa R. Sekar
- Assistant Professor, Division of General Internal Medicine and Geriatrics, Department of Medicine, Emory University School of Medicine
| | - Kristen Ann Ehrenberger
- Assistant Professor, Division of General Internal Medicine, Department of Medicine and Department of Pediatrics, University of Pittsburgh School of Medicine
| | - Allie Dakroub
- Assistant Professor, Division of General Internal Medicine, Department of Medicine and Department of Pediatrics, University of Pittsburgh School of Medicine
| | - Scott Rothenberger
- Assistant Professor, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh School of Medicine
| | - Thomas Grau
- Associate Professor, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh School of Medicine; Associate Chief of Staff of Education, VA Pittsburgh Healthcare System
| | - Andrea E. Carter
- Assistant Professor, Division of General Internal Medicine, Department of Medicine, University of Pittsburgh School of Medicine
| |
Collapse
|
3
|
Birman NA, Vashdi DR, Miller-Mor Atias R, Riskin A, Zangen S, Litmanovitz I, Sagi D. Unveiling the paradoxes of implementing post graduate competency based medical education programs. MEDICAL TEACHER 2024:1-8. [PMID: 38803298 DOI: 10.1080/0142159x.2024.2356826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 05/14/2024] [Indexed: 05/29/2024]
Abstract
PURPOSE Competency-based medical education (CBME) has gained prominence as an innovative model for post-graduate medical education, yet its implementation poses significant challenges, especially with regard to its sustainability. Drawing on paradox theory, we suggest that revealing the paradoxes underlying these challenges may contribute to our understanding of post graduate competency-based medical education (PGCBME) implementation processes and serve as a first-step in enhancing better implementation. Thus, the purpose of the current study is to identify the paradoxes associated with PGCBME implementation. METHOD A qualitative study was conducted, as part of a larger action research, using in-depth semi-structured interviews with fellows and educators in eight Neonatal wards. RESULTS Analysis revealed that the PGCBME program examined in this study involves three different levels of standardization, each serving as one side of paradoxical tensions; (1) a paradox between the need for standardized assessment tools and for free-flow flexible assessment tools, (2) a paradox between the need for a standardized implementation process across all wards and the need for unique implementation protocols in each ward; and 3) a paradox between the need for a standardized meaning of competency proficiency and the need for flexible and personal competency achievement indicators. CONCLUSIONS Implementing PGCBME programs involves many challenges, some of which are paradoxical, i.e. two contradictory challenges in which solving one challenge exacerbates another. Revealing these paradoxes is important in navigating them successfully.
Collapse
Affiliation(s)
- Noa A Birman
- University of Haifa, The Herta and Paul Amir Faculty of Social Sciences, School of Political Science, Department of Public Administration, Haifa, Israel
| | - Dana R Vashdi
- University of Haifa, The Herta and Paul Amir Faculty of Social Sciences, School of Political Science, Department of Public Administration, Haifa, Israel
| | - Rotem Miller-Mor Atias
- University of Haifa, The Herta and Paul Amir Faculty of Social Sciences, School of Political Science, Department of Public Administration, Haifa, Israel
| | - Arieh Riskin
- Technion Israel Institute of Technology, The Ruth and Bruce Rappaport Faculty of Medicine, Haifa, Israel
| | - Shmuel Zangen
- Ben- Gurion University of the Negev, Faculty of Health Sciences, Be'er-Sheva, Israel
| | - Ita Litmanovitz
- Tel Aviv University, Faculty of Medicine & Health Sciences, Tel-Aviv, Israel
| | - Doron Sagi
- The Israel Center for Medical Simulation, Sheba Medical Center, Tel-Hashomer, Ramat-Gan, Israel
| |
Collapse
|
4
|
Van Ostaeyen S, Embo M, Rotsaert T, De Clercq O, Schellens T, Valcke M. A Qualitative Textual Analysis of Feedback Comments in ePortfolios: Quality and Alignment with the CanMEDS Roles. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:584-593. [PMID: 38144672 PMCID: PMC10742175 DOI: 10.5334/pme.1050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 11/08/2023] [Indexed: 12/26/2023]
Abstract
Introduction Competency-based education requires high-quality feedback to guide students' acquisition of competencies. Sound assessment and feedback systems, such as ePortfolios, are needed to facilitate seeking and giving feedback during clinical placements. However, it is unclear whether the written feedback comments in ePortfolios are of high quality and aligned with the current competency focus. Therefore, this study investigates the quality of written feedback comments in ePortfolios of healthcare students, as well as how these feedback comments align with the CanMEDS roles. Methods A qualitative textual analysis was conducted. 2,349 written feedback comments retrieved from the ePortfolios of 149 healthcare students (specialist medicine, general practice, occupational therapy, speech therapy and midwifery) were analysed retrospectively using deductive content analysis. Two structured categorisation matrices, one based on four literature-derived feedback quality criteria (performance, judgment, elaboration and improvement) and another one on the seven CanMEDS roles (Medical Expert, Communicator, Collaborator, Leader, Health Advocate, Scholar and Professional), guided the analysis. Results The minority of the feedback comments (n = 352; 14.9%) could be considered of high quality because they met all four quality criteria. Most feedback comments were of moderate quality and met only two to three quality criteria. Regarding the CanMEDS roles, the Medical Expert role was most frequently represented in the feedback comments, as opposed to the roles Leader and Health Advocate. Discussion The results highlighted that providing high-quality feedback is challenging. To respond to these challenges, it is recommended to set up individual and continuous feedback training.
Collapse
Affiliation(s)
- Sofie Van Ostaeyen
- Department of Educational Sciences at Ghent University in Belgium, Belgium
| | - Mieke Embo
- Department of Nursing and Midwifery at the University of Antwerp, Belgium
- Department of Educational Sciences at Ghent University and in the Expertise Network Health and Care at the Artevelde University of Applied Sciences in Belgium, Belgium
| | - Tijs Rotsaert
- Department of Educational Sciences at Ghent University in Belgium, Belgium
| | - Orphée De Clercq
- Language and Translation Technology Team at Ghent University in Belgium, Belgium
| | - Tammy Schellens
- Department of Educational Sciences at Ghent University in Belgium, Belgium
| | - Martin Valcke
- Department of Educational Sciences at Ghent University in Belgium, Belgium
| |
Collapse
|
5
|
Mooney CJ, Stone RT, Wang L, Blatt AE, Pascoe JM, Lang VJ. Examining Generalizability of Faculty Members' Narrative Assessments. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023; 98:S210. [PMID: 37983456 DOI: 10.1097/acm.0000000000005417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Affiliation(s)
- Christopher J Mooney
- Author affiliations: C.J. Mooney, R.T. Stone, L. Wang, A.E. Blatt, J.M. Pascoe, V.J. Lang, University of Rochester School of Medicine and Dentistry
| | | | | | | | | | | |
Collapse
|
6
|
Renting N, Jaarsma D, Borleffs JC, Slaets JPJ, Cohen-Schotanus J, Gans ROB. Effectiveness of a supervisor training on quality of feedback to internal medicine residents: a controlled longitudinal multicentre study. BMJ Open 2023; 13:e076946. [PMID: 37770280 PMCID: PMC10546104 DOI: 10.1136/bmjopen-2023-076946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/04/2023] [Indexed: 09/30/2023] Open
Abstract
OBJECTIVES High-quality feedback on different dimensions of competence is important for resident learning. Supervisors may need additional training and information to fulfil this demanding task. This study aimed to evaluate whether a short and simple training improves the quality of feedback residents receive from their clinical supervisors in daily practice. DESIGN Longitudinal quasi-experimental controlled study with a pretest/post-test design. We collected multiple premeasurements and postmeasurements for each supervisor over 2 years. A repeated measurements ANOVA was performed on the data. SETTING Internal medicine departments of seven Dutch teaching hospitals. PARTICIPANTS Internal medicine supervisors (n=181) and residents (n=192). INTERVENTION Half of the supervisors attended a short 2.5-hour training session during which they could practise giving feedback in a simulated setting using video fragments. Highly experienced internal medicine educators guided the group discussions about the feedback. The other half of the supervisors formed the control group and received no feedback training. OUTCOME MEASURES Residents rated the quality of supervisors' oral feedback with a previously validated questionnaire. Furthermore, the completeness of the supervisors' written feedback on evaluation forms was analysed. RESULTS The data showed a significant increase in the quality of feedback after the training F (1, 87)=6.76, p=0.04. This effect remained significant up to 6 months after the training session. CONCLUSIONS A short training session in which supervisors practise giving feedback in a simulated setting increases the quality of their feedback. This is a promising outcome since it is a feasible approach to faculty development.
Collapse
Affiliation(s)
- Nienke Renting
- Faculty of Behavioral & Social Sciences, GION, University of Groningen, Groningen, The Netherlands
| | - Debbie Jaarsma
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Jan Cc Borleffs
- Center for Education Developmand and Research in Health Professions, University Medical Center Groningen, Groningen, The Netherlands
| | - Joris P J Slaets
- Geriatric Medicine, Leyden Academy on Vitality and Ageing, Leiden, The Netherlands
| | - Janke Cohen-Schotanus
- Center for Education Developmand and Research in Health Professions, University Medical Center Groningen, Groningen, The Netherlands
| | - Rob O B Gans
- Internal Medicine, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| |
Collapse
|
7
|
Hauer KE, Park YS, Bullock JL, Tekian A. "My Assessments Are Biased!" Measurement and Sociocultural Approaches to Achieve Fairness in Assessment in Medical Education. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023; 98:S16-S27. [PMID: 37094278 DOI: 10.1097/acm.0000000000005245] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Assessing learners is foundational to their training and developmental growth throughout the medical education continuum. However, growing evidence shows the prevalence and impact of harmful bias in assessments in medical education, accelerating the urgency to identify solutions. Assessment bias presents a critical problem for all stages of learning and the broader educational system. Bias poses significant challenges to learners, disrupts the learning environment, and threatens the pathway and transition of learners into health professionals. While the topic of assessment bias has been examined within the context of measurement literature, limited guidance and solutions exist for learners in medical education, particularly in the clinical environment. This article presents an overview of assessment bias, focusing on clinical learners. A definition of bias and its manifestations in assessments are presented. Consequences of assessment bias are discussed within the contexts of validity and fairness and their impact on learners, patients/caregivers, and the broader field of medicine. Messick's unified validity framework is used to contextualize assessment bias; in addition, perspectives from sociocultural contexts are incorporated into the discussion to elaborate the nuanced implications in the clinical training environment. Discussions of these topics are conceptualized within the literature and the interventions used to date. The article concludes with practical recommendations to overcome bias and to develop an ideal assessment system. Recommendations address articulating values to guide assessment, designing assessment to foster learning and outcomes, attending to assessment procedures, promoting continuous quality improvement of assessment, and fostering equitable learning and assessment environments.
Collapse
Affiliation(s)
- Karen E Hauer
- K.E. Hauer is associate dean for competency assessment and professional standards, and professor, Department of Medicine, University of California, San Francisco School of Medicine, San Francisco, California; ORCID: http://orcid.org/0000-0002-8812-4045
| | - Yoon Soo Park
- Y.S. Park is associate professor and associate head, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0001-8583-4335
| | - Justin L Bullock
- J.L. Bullock is a fellow, Department of Medicine, Division of Nephrology, University of Washington School of Medicine, Seattle, Washington; ORCID: http://orcid.org/0000-0003-4240-9798
| | - Ara Tekian
- A. Tekian is professor and associate dean for international education, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0002-9252-1588
| |
Collapse
|
8
|
Chakroun M, Dion VR, Ouellet K, Graillon A, Désilets V, Xhignesse M, St-Onge C. Quality of Narratives in Assessment: Piloting a List of Evidence-Based Quality Indicators. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:XX. [PMID: 37252269 PMCID: PMC10215990 DOI: 10.5334/pme.925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/12/2023] [Indexed: 05/31/2023]
Abstract
Background & Need for Innovation Appraising the quality of narratives used in assessment is challenging for educators and administrators. Although some quality indicators for writing narratives exist in the literature, they remain context specific and not always sufficiently operational to be easily used. Creating a tool that gathers applicable quality indicators and ensuring its standardized use would equip assessors to appraise the quality of narratives. Steps taken for Development and Implementation of innovation We used DeVellis' framework to develop a checklist of evidence-informed indicators for quality narratives. Two team members independently piloted the checklist using four series of narratives coming from three different sources. After each series, team members documented their agreement and achieved a consensus. We calculated frequencies of occurrence for each quality indicator as well as the interrater agreement to assess the standardized application of the checklist. Outcomes of Innovation We identified seven quality indicators and applied them on narratives. Frequencies of quality indicators ranged from 0% to 100%. Interrater agreement ranged from 88.7% to 100% for the four series. Critical Reflection Although we were able to achieve a standardized application of a list of quality indicators for narratives used in health sciences education, it does not exclude the fact that users would need training to be able to write good quality narratives. We also noted that some quality indicators were less frequent than others and we suggested a few reflections on this.
Collapse
Affiliation(s)
- Molk Chakroun
- Faculty of medicine and health sciences, Universitéde Sherbrooke, Sherbrooke, Québec, CA
| | - Vincent R. Dion
- Faculty of medicine and health sciences, Universitéde Sherbrooke, Sherbrooke, Québec, CA
| | - Kathleen Ouellet
- Paul Grand’Maison de la Sociétédes médecins de l’Universitéde Sherbrooke research chair in medical education, Sherbrooke, Québec, CA
| | - Ann Graillon
- Centre de pédagogie et des sciences de la santé, Faculty of medicine and health sciences, Universitéde Sherbrooke, Sherbrooke, Québec, CA
| | - Valérie Désilets
- Department of Pediatrics, Faculty of medicine and health sciences, Universitéde Sherbrooke, Sherbrooke, Québec, CA
| | - Marianne Xhignesse
- Department of Family and Emergency Medicine, Faculty of medicine and health sciences, Universitéde Sherbrooke, Sherbrooke, Québec, CA
| | - Christina St-Onge
- Department of Medicine, Faculty of medicine and health sciences, Universitéde Sherbrooke, Paul Grand’Maison de la Sociétédes médecins de l’Universitéde Sherbrooke research chair in medical education, Sherbrooke, Québec, CA
| |
Collapse
|
9
|
Kogan JR, Dine CJ, Conforti LN, Holmboe ES. Can Rater Training Improve the Quality and Accuracy of Workplace-Based Assessment Narrative Comments and Entrustment Ratings? A Randomized Controlled Trial. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023; 98:237-247. [PMID: 35857396 DOI: 10.1097/acm.0000000000004819] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE Prior research evaluating workplace-based assessment (WBA) rater training effectiveness has not measured improvement in narrative comment quality and accuracy, nor accuracy of prospective entrustment-supervision ratings. The purpose of this study was to determine whether rater training, using performance dimension and frame of reference training, could improve WBA narrative comment quality and accuracy. A secondary aim was to assess impact on entrustment rating accuracy. METHOD This single-blind, multi-institution, randomized controlled trial of a multifaceted, longitudinal rater training intervention consisted of in-person training followed by asynchronous online spaced learning. In 2018, investigators randomized 94 internal medicine and family medicine physicians involved with resident education. Participants assessed 10 scripted standardized resident-patient videos at baseline and follow-up. Differences in holistic assessment of narrative comment accuracy and specificity, accuracy of individual scenario observations, and entrustment rating accuracy were evaluated with t tests. Linear regression assessed impact of participant demographics and baseline performance. RESULTS Seventy-seven participants completed the study. At follow-up, the intervention group (n = 41), compared with the control group (n = 36), had higher scores for narrative holistic specificity (2.76 vs 2.31, P < .001, Cohen V = .25), accuracy (2.37 vs 2.06, P < .001, Cohen V = .20) and mean quantity of accurate (6.14 vs 4.33, P < .001), inaccurate (3.53 vs 2.41, P < .001), and overall observations (2.61 vs 1.92, P = .002, Cohen V = .47). In aggregate, the intervention group had more accurate entrustment ratings (58.1% vs 49.7%, P = .006, Phi = .30). Baseline performance was significantly associated with performance on final assessments. CONCLUSIONS Quality and specificity of narrative comments improved with rater training; the effect was mitigated by inappropriate stringency. Training improved accuracy of prospective entrustment-supervision ratings, but the effect was more limited. Participants with lower baseline rating skill may benefit most from training.
Collapse
Affiliation(s)
- Jennifer R Kogan
- J.R. Kogan is associate dean, Student Success and Professional Development, and professor of medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; ORCID: https://orcid.org/0000-0001-8426-9506
| | - C Jessica Dine
- C.J. Dine is associate dean, Evaluation and Assessment, and associate professor of medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania; ORCID: https://orcid.org/0000-0001-5894-0861
| | - Lisa N Conforti
- L.N. Conforti is research associate for milestones evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois; ORCID: https://orcid.org/0000-0002-7317-6221
| | - Eric S Holmboe
- E.S. Holmboe is chief, research, milestones development and evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois; ORCID: https://orcid.org/0000-0003-0108-6021
| |
Collapse
|
10
|
Mooney CJ, Pascoe JM, Blatt AE, Lang VJ, Kelly MS, Braun MK, Burch JE, Stone RT. Predictors of faculty narrative evaluation quality in medical school clerkships. MEDICAL EDUCATION 2022; 56:1223-1231. [PMID: 35950329 DOI: 10.1111/medu.14911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/01/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
INTRODUCTION Narrative approaches to assessment provide meaningful and valid representations of trainee performance. Yet, narratives are frequently perceived as vague, nonspecific and low quality. To date, there is little research examining factors associated with narrative evaluation quality, particularly in undergraduate medical education. The purpose of this study was to examine associations of faculty- and student-level characteristics with the quality of faculty member's narrative evaluations of clerkship students. METHODS The authors reviewed faculty narrative evaluations of 50 students' clinical performance in their inpatient medicine and neurology clerkships, resulting in 165 and 87 unique evaluations in the respective clerkships. The authors evaluated narrative quality using the Narrative Evaluation Quality Instrument (NEQI). The authors used linear mixed effects modelling to predict total NEQI score. Explanatory covariates included the following: time to evaluation completion, number of weeks spent with student, faculty total weeks on service per year, total faculty years in clinical education, student gender, faculty gender, and an interaction term between student and faculty gender. RESULTS Significantly higher narrative evaluation quality was associated with a shorter time to evaluation completion, with NEQI scores decreasing by approximately 0.3 points every 10 days following students' rotations (p = .004). Additionally, women faculty had statistically higher quality narrative evaluations with NEQI scores 1.92 points greater than men faculty (p = .012). All other covariates were not significant. CONCLUSIONS The quality of faculty members' narrative evaluations of medical students was associated with time to evaluation completion and faculty gender but not faculty experience in clinical education, faculty weeks on service, or the amount of time spent with students. Findings advance understanding on ways to improve the quality of narrative evaluations which are imperative given assessment models that will increase the volume and reliance on narratives.
Collapse
Affiliation(s)
- Christopher J Mooney
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Jennifer M Pascoe
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Amy E Blatt
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Valerie J Lang
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | | | - Melanie K Braun
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Jaclyn E Burch
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | | |
Collapse
|
11
|
Kukulski P, Ahn J. Validity Evidence for the Emergency Medicine Standardized Letter of Evaluation. J Grad Med Educ 2021; 13:490-499. [PMID: 34434509 PMCID: PMC8370378 DOI: 10.4300/jgme-d-20-01110.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/04/2021] [Accepted: 04/15/2021] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND The standardized letter of evaluation (SLOE) is the application component that program directors value most when evaluating candidates to interview and rank for emergency medicine (EM) residency. Given its successful implementation, other specialties, including otolaryngology, dermatology, and orthopedics, have adopted similar SLOEs of their own, and more specialties are considering creating one. Unfortunately, for such a significant assessment tool, no study to date has comprehensively examined the validity evidence for the EM SLOE. OBJECTIVE We summarized the published evidence for validity for the EM SLOE using Messick's framework for validity evidence. METHODS A scoping review of the validity evidence of the EM SLOE was performed in 2020. A scoping review was chosen to identify gaps and future directions, and because the heterogeneity of the literature makes a systematic review difficult. Included articles were assigned to an aspect of Messick's framework and determined to provide evidence for or against validity. RESULTS There have been 22 articles published relating to validity evidence for the EM SLOE. There is evidence for content validity; however, there is a lack of evidence for internal structure, relation to other variables, and consequences. Additionally, the literature regarding response process demonstrates evidence against validity. CONCLUSIONS Overall, there is little published evidence in support of validity for the EM SLOE. Stakeholders need to consider changing the ranking system, improving standardization of clerkships, and further studying relation to other variables to improve validity. This will be important across GME as more specialties adopt a standardized letter.
Collapse
Affiliation(s)
- Paul Kukulski
- Both authors are with the University of Chicago Medical Center
- is Assistant Professor and Assistant Clerkship Director, Section of Emergency Medicine, Department of Medicine
| | - James Ahn
- Both authors are with the University of Chicago Medical Center
- is Associate Professor and Program Director, Section of Emergency Medicine, Department of Medicine
| |
Collapse
|
12
|
Chan T, Oswald A, Hauer KE, Caretta-Weyer HA, Nousiainen MT, Cheung WJ. Diagnosing conflict: Conflicting data, interpersonal conflict, and conflicts of interest in clinical competency committees. MEDICAL TEACHER 2021; 43:765-773. [PMID: 34182879 DOI: 10.1080/0142159x.2021.1925101] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Clinical competency committees (CCCs) are increasingly used within health professions education as their decisions are thought to be more defensible and fairer than those generated by previous training promotion processes. However, as with most group-based processes, it is inevitable that conflict will arise. In this paper the authors explore three ways conflict may arise within a CCC: (1) conflicting data submissions that are presented to the committee, (2) conflicts between members of the committee, and (3) conflicts of interest between a specific committee member and a trainee. The authors describe each of these conflict situations, dissect out the underlying problems, and explore possible solutions based on the current literature.
Collapse
Affiliation(s)
- Teresa Chan
- Faculty Development, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Division of Emergency Medicine, Department of Medicine, McMaster University, Hamilton, Canada
- McMaster program for Education Research, Innovation, and Theory (MERIT), Hamilton, Canada
| | - Anna Oswald
- Competency Based Medical Education, Office of Postgraduate Medical Education, University of Alberta, Edmonton, Canada
- CanMEDS Clinician Educator, Royal College of Physicians and Surgeons of Canada, Edmonton, Canada
- Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada
| | - Karen E Hauer
- Competency Assessment and Professional Standards, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco School of Medicine, San Francisco, CA, USA
| | - Holly A Caretta-Weyer
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
| | | | - Warren J Cheung
- Department of Emergency Medicine, University of Ottawa, Ottawa, Canada
- Senior Clinician Investigator, Ottawa Hospital Research Institute, Ottawa, Canada
- CanMEDS Clinician Educator, Royal College of Physicians and Surgeons of Canada, Ottawa, Canada
| |
Collapse
|
13
|
Comparing the Ottawa Emergency Department Shift Observation Tool (O-EDShOT) to the traditional daily encounter card: measuring the quality of documented assessments. CAN J EMERG MED 2021; 23:383-389. [PMID: 33512695 DOI: 10.1007/s43678-020-00070-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/14/2020] [Indexed: 10/22/2022]
Abstract
OBJECTIVES The Ottawa Emergency Department Shift Observation Tool (O-EDShOT) is a workplace-based assessment designed to assess a trainee's performance across an entire shift. It was developed in response to validity concerns with traditional end-of-shift workplace-based assessments, such as the daily encounter card. The O-EDShOT previously demonstrated strong psychometric characteristics; however, it remains unknown whether the O-EDShOT facilitates measurable improvements in the quality of documented assessments compared to daily encounter cards. METHODS Three randomly selected daily encounter cards and three O-EDShOTs completed by 24 faculty were scored by two raters using the Completed Clinical Evaluation Report Rating (CCERR), a previously published 9-item quantitative measure of the quality of a completed workplace-based assessment. Automated-CCERR (A-CCERR) scores, which do not require raters, were also calculated. Paired sample t tests were conducted to compare the quality of assessments between O-EDShOTs and DECs as measured by the CCERR and A-CCERR. RESULTS CCERR scores were significantly higher for O-EDShOTs (mean(SD) = 25.6(2.6)) compared to daily encounter cards (21.5(3.9); t(23) = 5.2, p < 0.001, d = 1.1). A-CCERR scores were also significantly higher for O-EDShOTs (mean(SD) = 18.5(1.6)) than for daily encounter cards (15.5(1.2); t(24) = 8.4, p < 0.001). CCERR items 1, 4 and 9 were rated significantly higher for O-EDShOTs compared to daily encounter cards. CONCLUSIONS The O-EDShOT yields higher quality documented assessments when compared to the traditional end-of-shift daily encounter card. Our results provide additional validity evidence for the O-EDShOT as an assessment tool for capturing trainee on-shift performance that can be used as a stimulus for actionable feedback and as a source for high-quality workplace-based assessment data to inform decisions about emergency medicine trainee progress and promotion.
Collapse
|
14
|
Vergis A, Leung C, Roberston R. Rater Training in Medical Education: A Scoping Review. Cureus 2020; 12:e11363. [PMID: 33304696 PMCID: PMC7721070 DOI: 10.7759/cureus.11363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
There is an increasing focus in medical education on trainee evaluation. Often, reliability and other psychometric properties of evaluations fall below expected standards. Rater training, a process whereby raters undergo instruction on how to consistently evaluate trainees and produce reliable and accurate scores, has been suggested to improve rater performance within behavioral sciences. A scoping literature review was undertaken to examine the effect of rater training in medical education and address the question: “Does rater training improve performance attending physician evaluations of medical trainees?” Two independent reviewers searched PubMed®, MEDLINE®, EMBASE™, the Cochrane Library, CINAHL®, ERIC™, and PsycInfo® databases and identified all prospective studies examining the effect of rater training on physician evaluations of medical trainees. Consolidated Standards of Reporting Trials (CONSORT) and Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklists were used to assess quality. Fourteen prospective studies met the inclusion criteria. All had heterogeneity in design, type of rater training, and measured outcomes. Pooled analysis was not performed. Four studies examined rater training used to assess technical skills; none identified a positive effect. Ten studies assessed its use to evaluate non-technical skills: six demonstrated no effect, while four showed a positive effect. The overall quality of studies was poor to moderate. Rater training in medical education literature is heterogeneous, limited, and describes minimal improvement on the psychometric properties of trainee evaluations when implemented. Further research is required to assess rater training’s efficacy in medical education.
Collapse
Affiliation(s)
- Ashley Vergis
- Surgery, St. Boniface Hospital, University of Manitoba, Winnipeg, CAN
| | - Caleb Leung
- Surgery, St. Boniface Hospital, University of Manitoba, Winnipeg, CAN
| | - Reagan Roberston
- Surgery, St. Boniface Hospital, University of Manitoba, Winnipeg, CAN
| |
Collapse
|
15
|
Ginsburg S, Kogan JR, Gingerich A, Lynch M, Watling CJ. Taken Out of Context: Hazards in the Interpretation of Written Assessment Comments. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2020; 95:1082-1088. [PMID: 31651432 DOI: 10.1097/acm.0000000000003047] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE Written comments are increasingly valued for assessment; however, a culture of politeness and the conflation of assessment with feedback lead to ambiguity. Interpretation requires reading between the lines, which is untenable with large volumes of qualitative data. For computer analytics to help with interpreting comments, the factors influencing interpretation must be understood. METHOD Using constructivist grounded theory, the authors interviewed 17 experienced internal medicine faculty at 4 institutions between March and July, 2017, asking them to interpret and comment on 2 sets of words: those that might be viewed as "red flags" (e.g., good, improving) and those that might be viewed as signaling feedback (e.g., should, try). Analysis focused on how participants ascribed meaning to words. RESULTS Participants struggled to attach meaning to words presented acontextually. Four aspects of context were deemed necessary for interpretation: (1) the writer; (2) the intended and potential audiences; (3) the intended purpose(s) for the comments, including assessment, feedback, and the creation of a permanent record; and (4) the culture, including norms around assessment language. These contextual factors are not always apparent; readers must balance the inevitable need to interpret others' language with the potential hazards of second-guessing intent. CONCLUSIONS Comments are written for a variety of intended purposes and audiences, sometimes simultaneously; this reality creates dilemmas for faculty attempting to interpret these comments, with or without computer assistance. Attention to context is essential to reduce interpretive uncertainty and ensure that written comments can achieve their potential to enhance both assessment and feedback.
Collapse
Affiliation(s)
- Shiphra Ginsburg
- S. Ginsburg is professor of medicine, Department of Medicine, Faculty of Medicine, University of Toronto, scientist, Wilson Centre for Research in Education, University Health Network, University of Toronto, Toronto, Ontario, Canada, and Canada Research Chair in Health Professions Education; ORCID: http://orcid.org/0000-0002-4595-6650. J.R. Kogan is professor of medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania. A. Gingerich is assistant professor, Northern Medical Program, University of Northern British Columbia, Prince George, British Columbia, Canada; ORCID: http://orcid.org/0000-0001-5765-3975. M. Lynch is postdoctoral fellow, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. C.J. Watling is professor, Department of Clinical Neurological Sciences, scientist, Centre for Education Research and Innovation, and associate dean of postgraduate medical education, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada; ORCID: http://orcid.org/0000-0001-9686-795X
| | | | | | | | | |
Collapse
|
16
|
Dory V, Cummings BA, Mondou M, Young M. Nudging clinical supervisors to provide better in-training assessment reports. PERSPECTIVES ON MEDICAL EDUCATION 2020; 9:66-70. [PMID: 31848999 PMCID: PMC7012977 DOI: 10.1007/s40037-019-00554-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
INTRODUCTION In-training assessment reports (ITARs) summarize assessment during a clinical placement to inform decision-making and provide formal feedback to learners. Faculty development is an effective but resource-intensive means of improving the quality of completed ITARs. We examined whether the quality of completed ITARs could be improved by 'nudges' from the format of ITAR forms. METHODS Our first intervention consisted of placing the section for narrative comments at the beginning of the form, and using prompts for recommendations (Do more, Keep doing, Do less, Stop doing). In a second intervention, we provided a hyperlink to a detailed assessment rubric and shortened the checklist section. We analyzed a sample of 360 de-identified completed ITARs from six disciplines across the three academic years where the different versions of the ITAR were used. Two raters independently scored the ITARs using the Completed Clinical Evaluation Report Rating (CCERR) scale. We tested for differences between versions of the ITAR forms using a one-way ANOVA for the total CCERR score, and MANOVA for the nine CCERR item scores. RESULTS Changes to the form structure (nudges) improved the quality of information generated as measured by the CCERR instrument, from a total score of 18.0/45 (SD 2.6) to 18.9/45 (SD 3.1) and 18.8/45 (SD 2.6), p = 0.04. Specifically, comments were more balanced, more detailed, and more actionable compared with the original ITAR. DISCUSSION Nudge interventions, which are inexpensive and feasible, should be included in multipronged approaches to improve the quality of assessment reports.
Collapse
Affiliation(s)
- Valérie Dory
- Department of Medicine and Centre for Medical Education; Faculty of Medicine, McGill University, Montreal, QC, Canada.
| | - Beth-Ann Cummings
- Undergraduate Medical Education, Department of Medicine, and Institute of Health Sciences Education; Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Mélanie Mondou
- Department of Medicine and Institute of Health Sciences Education; Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Meredith Young
- Department of Medicine and Institute of Health Sciences Education; Faculty of Medicine, McGill University, Montreal, QC, Canada
| |
Collapse
|
17
|
Tekian A, Park YS, Tilton S, Prunty PF, Abasolo E, Zar F, Cook DA. Competencies and Feedback on Internal Medicine Residents' End-of-Rotation Assessments Over Time: Qualitative and Quantitative Analyses. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2019; 94:1961-1969. [PMID: 31169541 PMCID: PMC6882536 DOI: 10.1097/acm.0000000000002821] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
PURPOSE To examine how qualitative narrative comments and quantitative ratings from end-of-rotation assessments change for a cohort of residents from entry to graduation, and explore associations between comments and ratings. METHOD The authors obtained end-of-rotation quantitative ratings and narrative comments for 1 cohort of internal medicine residents at the University of Illinois at Chicago College of Medicine from July 2013-June 2016. They inductively identified themes in comments, coded orientation (praising/critical) and relevance (specificity and actionability) of feedback, examined associations between codes and ratings, and evaluated changes in themes and ratings across years. RESULTS Data comprised 1,869 assessments (828 comments) on 33 residents. Five themes aligned with ACGME competencies (interpersonal and communication skills, professionalism, medical knowledge, patient care, and systems-based practice), and 3 did not (personal attributes, summative judgment, and comparison to training level). Work ethic was the most frequent subtheme. Comments emphasized medical knowledge more in year 1 and focused more on autonomy, leadership, and teaching in later years. Most comments (714/828 [86%]) contained high praise, and 412/828 (50%) were very relevant. Average ratings correlated positively with orientation (β = 0.46, P < .001) and negatively with relevance (β = -0.09, P = .01). Ratings increased significantly with each training year (year 1, mean [standard deviation]: 5.31 [0.59]; year 2: 5.58 [0.47]; year 3: 5.86 [0.43]; P < .001). CONCLUSIONS Narrative comments address resident attributes beyond the ACGME competencies and change as residents progress. Lower quantitative ratings are associated with more specific and actionable feedback.
Collapse
Affiliation(s)
- Ara Tekian
- A. Tekian is professor and associate dean for international affairs, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: https://orcid.org/0000-0002-9252-1588
| | - Yoon Soo Park
- Y.S. Park is associate professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0001-8583-4335
| | - Sarette Tilton
- S. Tilton is a PharmD candidate, University of Illinois at Chicago College of Pharmacy, Chicago, Illinois
| | - Patrick F. Prunty
- P.F. Prunty is a PharmD candidate, University of Illinois at Chicago College of Pharmacy, Chicago, Illinois
| | - Eric Abasolo
- E. Abasolo is a PharmD candidate, University of Illinois at Chicago College of Pharmacy, Chicago, Illinois
| | - Fred Zar
- F. Zar is professor and program director, Department of Medicine, University of Illinois at Chicago College of Medicine, Chicago, Illinois
| | - David A. Cook
- D.A. Cook is professor of medicine and medical education and associate director, Office of Applied Scholarship and Education Science, and consultant, Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota; ORCID: https://orcid.org/0000-0003-2383-4633
| |
Collapse
|
18
|
Dauphinee WD, Boulet JR, Norcini JJ. Considerations that will determine if competency-based assessment is a sustainable innovation. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2019; 24:413-421. [PMID: 29777463 DOI: 10.1007/s10459-018-9833-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 05/11/2018] [Indexed: 06/08/2023]
Abstract
Educational assessment for the health professions has seen a major attempt to introduce competency based frameworks. As high level policy developments, the changes were intended to improve outcomes by supporting learning and skills development. However, we argue that previous experiences with major innovations in assessment offer an important road map for developing and refining assessment innovations, including careful piloting and analyses of their measurement qualities and impacts. Based on the literature, numerous assessment workshops, personal interactions with potential users, and our 40 years of experience in implementing assessment change, we lament the lack of a coordinated approach to clarify and improve measurement qualities and functionality of competency based assessment (CBA). To address this worrisome situation, we offer two roadmaps to guide CBA's further development. Initially, reframe and address CBA as a measurement development opportunity. Secondly, using a roadmap adapted from the management literature on sustainable innovation, the medical assessment community needs to initiate an integrated plan to implement CBA as a sustainable innovation within existing educational programs and self-regulatory enterprises. Further examples of down-stream opportunities to refocus CBA at the implementation level within faculties and within the regulatory framework of the profession are offered. In closing, we challenge the broader assessment community in medicine to step forward and own the challenge and opportunities to reframe CBA as an innovation to improve the quality of the clinical educational experience. The goal is to optimize assessment in health education and ultimately improve the public's health.
Collapse
Affiliation(s)
- W Dale Dauphinee
- Foundation for the Advancement of International Medical Education and Research, 3624 Market Street, Fourth Floor, Philadelphia, PA, 19104, USA.
- McGill University, 1140 Pine Avenue West, Montreal, QC, H3A 1A3, Canada.
- , Saint Andrews, NB, Canada.
| | - John R Boulet
- Foundation for the Advancement of International Medical Education and Research, 3624 Market Street, Fourth Floor, Philadelphia, PA, 19104, USA
| | - John J Norcini
- Foundation for the Advancement of International Medical Education and Research, 3624 Market Street, Fourth Floor, Philadelphia, PA, 19104, USA
| |
Collapse
|
19
|
Robertson RL, Vergis A, Gillman LM, Park J. Effect of rater training on the reliability of technical skill assessments: a randomized controlled trial. Can J Surg 2018; 61:15917. [PMID: 30265636 PMCID: PMC6281450 DOI: 10.1503/cjs.015917] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 01/30/2018] [Indexed: 11/01/2022] Open
Abstract
BACKGROUND Rater training improves the reliability of observational assessment tools but has not been well studied for technical skills. This study assessed whether rater training could improve the reliability of technical skill assessment. METHODS Academic and community surgeons in Royal College of Physicians and Surgeons of Canada surgical subspecialties were randomly allocated to either rater training (7-minute video incorporating frame-of-reference training elements) or no training. Participants then assessed trainees performing a suturing and knot-tying task using 3 assessment tools: a visual analogue scale, a task-specific checklist and a modified version of the Objective Structured Assessment of Technical Skill global rating scale (GRS). We measured interrater reliability (IRR) using intraclass correlation type 2. RESULTS There were 24 surgeons in the training group and 23 in the no-training group. Mean assessment tool scores were not significantly different between the 2 groups. The training group had higher IRR than the no-training group on the visual analogue scale (0.71 v. 0.46), task-specific checklist (0.46 v. 0.33) and GRS (0.71 v. 0.61). However, confidence intervals were wide and overlapping for all 3 tools. CONCLUSION For education purposes, the reliability of the visual analogue scale and GRS would be considered "good" for the training group but "moderate" for the no-training group. However, a significant difference in IRR was not shown, and reliability remained below the desired level of 0.8 for high-stakes testing. Training did not significantly improve assessment tool reliability. Although rater training may represent a way to improve reliability, further study is needed to determine effective training methods.
Collapse
Affiliation(s)
| | - Ashley Vergis
- From the Department of Surgery, University of Manitoba, Winnipeg, Man
| | | | - Jason Park
- From the Department of Surgery, University of Manitoba, Winnipeg, Man
| |
Collapse
|
20
|
Cheung WJ, Dudek NL, Wood TJ, Frank JR. Supervisor-trainee continuity and the quality of work-based assessments. MEDICAL EDUCATION 2017; 51:1260-1268. [PMID: 28971502 DOI: 10.1111/medu.13415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 05/30/2017] [Accepted: 07/11/2017] [Indexed: 05/12/2023]
Abstract
CONTEXT Work-based assessments (WBAs) represent an increasingly important means of reporting expert judgements of trainee competence in clinical practice. However, the quality of WBAs completed by clinical supervisors is of concern. The episodic and fragmented interaction that often occurs between supervisors and trainees has been proposed as a barrier to the completion of high-quality WBAs. OBJECTIVES The primary purpose of this study was to determine the effect of supervisor-trainee continuity on the quality of assessments documented on daily encounter cards (DECs), a common form of WBA. The relationship between trainee performance and DEC quality was also examined. METHODS Daily encounter cards representing three differing degrees of supervisor-trainee continuity (low, intermediate, high) were scored by two raters using the Completed Clinical Evaluation Report Rating (CCERR), a previously published nine-item quantitative measure of DEC quality. An analysis of variance (anova) was performed to compare mean CCERR scores among the three groups. Linear regression analysis was conducted to examine the relationship between resident performance and DEC quality. RESULTS Differences in mean CCERR scores were observed between the three continuity groups (p = 0.02); however, the magnitude of the absolute differences was small (partial eta-squared = 0.03) and not educationally meaningful. Linear regression analysis demonstrated a significant inverse relationship between resident performance and CCERR score (p < 0.001, r2 = 0.18). This inverse relationship was observed in both groups representing on-service residents (p = 0.001, r2 = 0.25; p = 0.04, r2 = 0.19), but not in the Off-service group (p = 0.62, r2 = 0.05). CONCLUSIONS Supervisor-trainee continuity did not have an educationally meaningful influence on the quality of assessments documented on DECs. However, resident performance was found to affect assessor behaviours in the On-service group, whereas DEC quality remained poor regardless of performance in the Off-service group. The findings suggest that greater attention should be given to determining ways of improving the quality of assessments reported for off-service residents, as well as for those residents demonstrating appropriate clinical competence progression.
Collapse
Affiliation(s)
- Warren J Cheung
- Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Nancy L Dudek
- Division of Physical Medicine and Rehabilitation, Department of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Timothy J Wood
- Department of Innovation in Medical Education, University of Ottawa, Ottawa, Ontario, Canada
| | - Jason R Frank
- Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario, Canada
- Royal College of Physicians and Surgeons of Canada, Ottawa, Ontario, Canada
| |
Collapse
|
21
|
Sebok-Syer SS, Klinger DA, Sherbino J, Chan TM. Mixed Messages or Miscommunication? Investigating the Relationship Between Assessors' Workplace-Based Assessment Scores and Written Comments. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2017; 92:1774-1779. [PMID: 28562452 DOI: 10.1097/acm.0000000000001743] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE The shift toward broader, programmatic assessment has revolutionized the approaches that many take in assessing medical competence. To understand the association between quantitative and qualitative evaluations, the authors explored the relationships that exist among assessors' checklist scores, task ratings, global ratings, and written comments. METHOD The authors collected and analyzed, using regression analyses, data from the McMaster Modular Assessment Program. The data were from emergency medicine residents in their first or second year of postgraduate training from 2012 through 2014. Additionally, using content analysis, the authors analyzed narrative comments corresponding to the "done" and "done, but needs attention" checklist score options. RESULTS The regression analyses revealed that the task ratings, provided by faculty assessors, are associated with the use of the "done, but needs attention" checklist score option. Analyses also identified that the "done, but needs attention" option is associated with a narrative comment that is balanced, providing both strengths and areas for improvement. Analysis of qualitative comments revealed differences in the type of comments provided to higher- and lower-performing residents. CONCLUSIONS This study highlights some of the relationships that exist among checklist scores, rating scales, and written comments. The findings highlight that task ratings are associated with checklist options while global ratings are not. Furthermore, analysis of written comments supports the notion of a "hidden code" used to communicate assessors' evaluation of medical competence, especially when communicating areas for improvement or concern. This study has implications for how individuals should interpret information obtained from qualitative assessments.
Collapse
Affiliation(s)
- Stefanie S Sebok-Syer
- S.S. Sebok-Syer is instructor of education, Queen's University, Kingston, Ontario, Canada. D.A. Klinger is professor of education, Queen's University, Kingston, Ontario, Canada. J. Sherbino is associate professor of medicine, McMaster University, Hamilton, Ontario, Canada. T.M. Chan is assistant professor of medicine, McMaster University, Hamilton, Ontario, Canada; ORCID: http://orcid.org/0000-0001-6104-462
| | | | | | | |
Collapse
|
22
|
Wilbur K. Does faculty development influence the quality of in-training evaluation reports in pharmacy? BMC MEDICAL EDUCATION 2017; 17:222. [PMID: 29157239 PMCID: PMC5697106 DOI: 10.1186/s12909-017-1054-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/02/2017] [Indexed: 06/02/2023]
Abstract
BACKGROUND In-training evaluation reports (ITERs) of student workplace-based learning are completed by clinical supervisors across various health disciplines. However, outside of medicine, the quality of submitted workplace-based assessments is largely uninvestigated. This study assessed the quality of ITERs in pharmacy and whether clinical supervisors could be trained to complete higher quality reports. METHODS A random sample of ITERs submitted in a pharmacy program during 2013-2014 was evaluated. These ITERs served as a historical control (control group 1) for comparison with ITERs submitted in 2015-2016 by clinical supervisors who participated in an interactive faculty development workshop (intervention group) and those who did not (control group 2). Two trained independent raters scored the ITERs using a previously validated nine-item scale assessing report quality, the Completed Clinical Evaluation Report Rating (CCERR). The scoring scale for each item is anchored at 1 ("not at all") and 5 ("exemplary"), with 3 categorized as "acceptable". RESULTS Mean CCERR score for reports completed after the workshop (22.9 ± 3.39) did not significantly improve when compared to prospective control group 2 (22.7 ± 3.63, p = 0.84) and were worse than historical control group 1 (37.9 ± 8.21, p = 0.001). Mean item scores for individual CCERR items were below acceptable thresholds for 5 of the 9 domains in control group 1, including supervisor documented evidence of specific examples to clearly explain weaknesses and concrete recommendations for student improvement. Mean item scores for individual CCERR items were below acceptable thresholds for 6 and 7 of the 9 domains in control group 2 and the intervention group, respectively. CONCLUSIONS This study is the first using CCERR to evaluate ITER quality outside of medicine. Findings demonstrate low baseline CCERR scores in a pharmacy program not demonstrably changed by a faculty development workshop, but strategies are identified to augment future rater training.
Collapse
Affiliation(s)
- Kerry Wilbur
- College of Pharmacy, Qatar University, PO Box 2713, Doha, Qatar.
| |
Collapse
|
23
|
Hatala R, Sawatsky AP, Dudek N, Ginsburg S, Cook DA. Using In-Training Evaluation Report (ITER) Qualitative Comments to Assess Medical Students and Residents: A Systematic Review. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2017; 92:868-879. [PMID: 28557953 DOI: 10.1097/acm.0000000000001506] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE In-training evaluation reports (ITERs) constitute an integral component of medical student and postgraduate physician trainee (resident) assessment. ITER narrative comments have received less attention than the numeric scores. The authors sought both to determine what validity evidence informs the use of narrative comments from ITERs for assessing medical students and residents and to identify evidence gaps. METHOD Reviewers searched for relevant English-language studies in MEDLINE, EMBASE, Scopus, and ERIC (last search June 5, 2015), and in reference lists and author files. They included all original studies that evaluated ITERs for qualitative assessment of medical students and residents. Working in duplicate, they selected articles for inclusion, evaluated quality, and abstracted information on validity evidence using Kane's framework (inferences of scoring, generalization, extrapolation, and implications). RESULTS Of 777 potential articles, 22 met inclusion criteria. The scoring inference is supported by studies showing that rich narratives are possible, that changing the prompt can stimulate more robust narratives, and that comments vary by context. Generalization is supported by studies showing that narratives reach thematic saturation and that analysts make consistent judgments. Extrapolation is supported by favorable relationships between ITER narratives and numeric scores from ITERs and non-ITER performance measures, and by studies confirming that narratives reflect constructs deemed important in clinical work. Evidence supporting implications is scant. CONCLUSIONS The use of ITER narratives for trainee assessment is generally supported, except that evidence is lacking for implications and decisions. Future research should seek to confirm implicit assumptions and evaluate the impact of decisions.
Collapse
Affiliation(s)
- Rose Hatala
- R. Hatala is associate professor of medicine, Faculty of Medicine, and director, Clinical Educator Fellowship, Centre for Health Education Scholarship, University of British Columbia, Vancouver, British Columbia, Canada. A.P. Sawatsky is assistant professor of medicine and senior associate consultant, Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota. N. Dudek is associate professor, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada. S. Ginsburg is professor, Department of Medicine, Faculty of Medicine, University of Toronto, scientist, Wilson Centre for Research in Education, University Health Network/University of Toronto, and staff physician, Mount Sinai Hospital, Toronto, Ontario, Canada. D.A. Cook is professor of medicine and medical education, associate director, Mayo Clinic Online Learning, and consultant, Division of General Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota
| | | | | | | | | |
Collapse
|
24
|
Cheung WJ, Dudek N, Wood TJ, Frank JR. Daily Encounter Cards-Evaluating the Quality of Documented Assessments. J Grad Med Educ 2016; 8:601-604. [PMID: 27777675 PMCID: PMC5058597 DOI: 10.4300/jgme-d-15-00505.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Concerns over the quality of work-based assessment (WBA) completion has resulted in faculty development and rater training initiatives. Daily encounter cards (DECs) are a common form of WBA used in ambulatory care and shift work settings. A tool is needed to evaluate initiatives aimed at improving the quality of completion of this widely used form of WBA. OBJECTIVE The completed clinical evaluation report rating (CCERR) was designed to provide a measure of the quality of documented assessments on in-training evaluation reports. The purpose of this study was to provide validity evidence to support using the CCERR to assess the quality of DEC completion. METHODS Six experts in resident assessment grouped 60 DECs into 3 quality categories (high, average, and poor) based on how informative each DEC was for reporting judgments of the resident's performance. Eight supervisors (blinded to the expert groupings) scored the 10 most representative DECs in each group using the CCERR. Mean scores were compared to determine if the CCERR could discriminate based on DEC quality. RESULTS Statistically significant differences in CCERR scores were observed between all quality groups (P < .001). A generalizability analysis demonstrated the majority of score variation was due to differences in DECs. The reliability with a single rater was 0.95. CONCLUSIONS The CCERR is a reliable and valid tool to evaluate DEC quality. It can serve as an outcome measure for studying interventions targeted at improving the quality of assessments documented on DECs.
Collapse
Affiliation(s)
- Warren J. Cheung
- Corresponding author: Warren J. Cheung, MD, MMEd, FRCPC, University of Ottawa, Department of Emergency Medicine, F-Main, Room EM-206, 1053 Carling Avenue, Ottawa, Ontario K1Y 4E9 Canada, 613.798.5555, ext 17196, fax 613.761.5488,
| | | | | | | |
Collapse
|
25
|
Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane's framework. MEDICAL EDUCATION 2015; 49:560-75. [PMID: 25989405 DOI: 10.1111/medu.12678] [Citation(s) in RCA: 323] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 11/20/2014] [Accepted: 12/19/2014] [Indexed: 05/13/2023]
Abstract
CONTEXT Assessment is central to medical education and the validation of assessments is vital to their use. Earlier validity frameworks suffer from a multiplicity of types of validity or failure to prioritise among sources of validity evidence. Kane's framework addresses both concerns by emphasising key inferences as the assessment progresses from a single observation to a final decision. Evidence evaluating these inferences is planned and presented as a validity argument. OBJECTIVES We aim to offer a practical introduction to the key concepts of Kane's framework that educators will find accessible and applicable to a wide range of assessment tools and activities. RESULTS All assessments are ultimately intended to facilitate a defensible decision about the person being assessed. Validation is the process of collecting and interpreting evidence to support that decision. Rigorous validation involves articulating the claims and assumptions associated with the proposed decision (the interpretation/use argument), empirically testing these assumptions, and organising evidence into a coherent validity argument. Kane identifies four inferences in the validity argument: Scoring (translating an observation into one or more scores); Generalisation (using the score[s] as a reflection of performance in a test setting); Extrapolation (using the score[s] as a reflection of real-world performance), and Implications (applying the score[s] to inform a decision or action). Evidence should be collected to support each of these inferences and should focus on the most questionable assumptions in the chain of inference. Key assumptions (and needed evidence) vary depending on the assessment's intended use or associated decision. Kane's framework applies to quantitative and qualitative assessments, and to individual tests and programmes of assessment. CONCLUSIONS Validation focuses on evaluating the key claims, assumptions and inferences that link assessment scores with their intended interpretations and uses. The Implications and associated decisions are the most important inferences in the validity argument.
Collapse
Affiliation(s)
- David A Cook
- Mayo Clinic Online Learning, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
- Division of General Internal Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Ryan Brydges
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Wilson Centre, University Health Network, Toronto, Ontario, Canada
| | - Shiphra Ginsburg
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
- Wilson Centre, University Health Network, Toronto, Ontario, Canada
| | - Rose Hatala
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
26
|
Dudek N, Dojeiji S. Twelve tips for completing quality in-training evaluation reports. MEDICAL TEACHER 2014; 36:1038-1042. [PMID: 24986650 DOI: 10.3109/0142159x.2014.932897] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Assessing learners in the clinical setting is vital to determining their level of professional competence. Clinical performance assessments can be documented using In-training evaluation reports (ITERs). Previous research has suggested a need for faculty development in order to improve the quality of these reports. Previous work identified key features of high-quality completed ITERs which primarily involve the narrative comments. This aligns well with the recent discourse in the assessment literature focusing on the value of qualitative assessments. Evidence exists to demonstrate that faculty can be trained to complete higher quality ITERs. We present 12 key strategies to assist clinical supervisors in improving the quality of their completed ITERs. Higher quality completed ITERs will improve the documentation of the trainee's progress and be more defensible when questioned in an appeal or legal process.
Collapse
|
27
|
Bismil R, Dudek NL, Wood TJ. In-training evaluations: developing an automated screening tool to measure report quality. MEDICAL EDUCATION 2014; 48:724-732. [PMID: 24909534 DOI: 10.1111/medu.12490] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2013] [Revised: 01/22/2014] [Accepted: 03/19/2014] [Indexed: 06/03/2023]
Abstract
OBJECTIVES In-training evaluation (ITE) is used to assess resident competencies in clinical settings. This assessment is documented on an evaluation report (In-Training Evaluation Report [ITER]). Unfortunately, the quality of these reports can be questionable. Therefore, training programmes to improve report quality are common. The Completed Clinical Evaluation Report Rating (CCERR) was developed to assess completed report quality and has been shown to do so in a reliable manner, thus enabling the evaluation of these programmes. The CCERR is a resource-intensive instrument, which may limit its use. The purpose of this study was to create a screening measure (Proxy-CCERR) that can predict the CCERR outcome in a less resource-intensive manner. METHODS Using multiple regression, the authors analysed a dataset of 269 ITERs to create a model that can predict the associated CCERR scores. The resulting predictive model was tested on the CCERR scores for an additional sample of 300 ITERs. RESULTS The quality of an ITER, as measured by the CCERR, can be predicted using a model involving only three variables (R(2) = 0.61). The predictive variables included the total number of words in the comments, the variability of the ratings and the proportion of comment boxes completed on the form. CONCLUSIONS It is possible to model CCERR scores in a highly predictive manner. The predictive variables can be easily extracted in an automated process. Because this model is less resource-intensive than the CCERR, it makes it possible to provide feedback from ITER training programmes to large groups of supervisors and institutions, and even to create automated feedback systems using Proxy-CCERR scores.
Collapse
Affiliation(s)
- Ramprasad Bismil
- Department of Psychiatry, University of Ottawa, Ottawa, Ontario, Canada
| | | | | |
Collapse
|