1
|
Tanaka P, Soo Park Y, Chen CY, Yumul R, Macario A. Domains Influencing Faculty Decisions on the Level of Supervision Required for Anesthesiology EPAs with Analysis of Feedback Comments. JOURNAL OF SURGICAL EDUCATION 2024; 81:741-752. [PMID: 38553368 DOI: 10.1016/j.jsurg.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/30/2023] [Accepted: 02/02/2024] [Indexed: 04/26/2024]
Abstract
OBJECTIVE The purpose of this qualitative study was to examine responses related to entrustment and feedback comments from an assessment tool. DESIGN Qualitative analyses using semi-structured interviews and analysis of narrative comments. SETTING Main hospital OR suite at a large academic medical center. PARTICIPANTS faculty, and residents who work in the OR suite. RESULTS Seven of the 14 theoretical domains from the Theoretical Domains Framework were identified as influencing faculty decision on entrustment: knowledge, skills, intention, memory/attention/decision processes, environmental context, and resources, beliefs of capabilities, and reinforcement. The majority (651/1116 (58.4%)) of faculty comments were critical/modest praise and relevant, consistent across all 6 EPAs. The written in feedback comments for all 1,116 Web App EPA assessments yielded a total of 1,599 sub-competency specific responses. These responses were mapped to core competencies, and at least once to 13 of the 23 ACGME subcompetencies. CONCLUSIONS Domains identified as influencing faculty decision on entrustment were knowledge, skills, intention, memory/attention/decision processes, environmental context, and resources, beliefs of capabilities, and reinforcement. Most narrative feedback comments were critical/modest praise and relevant, consistent across each of the EPAs.
Collapse
Affiliation(s)
- Pedro Tanaka
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California.
| | - Yoon Soo Park
- Associate Professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois
| | - Chien-Yu Chen
- Department of Anesthesiology, Taipei Medical University Hospital, Taipei, Taiwan; Department of Humanities in Medicine, School of Medicine, College of Medicine, Taipei
| | - Roya Yumul
- Professor, Cedars-Sinai Medical Center, Los Angeles, California
| | - Alex Macario
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
2
|
Van Ostaeyen S, De Langhe L, De Clercq O, Embo M, Schellens T, Valcke M. Automating the Identification of Feedback Quality Criteria and the CanMEDS Roles in Written Feedback Comments Using Natural Language Processing. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:540-549. [PMID: 38144670 PMCID: PMC10742245 DOI: 10.5334/pme.1056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 10/03/2023] [Indexed: 12/26/2023]
Abstract
Introduction Manually analysing the quality of large amounts of written feedback comments is time-consuming and demands extensive resources and human effort. Therefore, this study aimed to explore whether a state-of-the-art large language model (LLM) could be fine-tuned to identify the presence of four literature-derived feedback quality criteria (performance, judgment, elaboration and improvement) and the seven CanMEDS roles (Medical Expert, Communicator, Collaborator, Leader, Health Advocate, Scholar and Professional) in written feedback comments. Methods A set of 2,349 labelled feedback comments of five healthcare educational programs in Flanders (Belgium) (specialistic medicine, general practice, midwifery, speech therapy and occupational therapy) was split into 12,452 sentences to create two datasets for the machine learning analysis. The Dutch BERT models BERTje and RobBERT were used to train four multiclass-multilabel classification models: two to identify the four feedback quality criteria and two to identify the seven CanMEDS roles. Results The classification models trained with BERTje and RobBERT to predict the presence of the four feedback quality criteria attained macro average F1-scores of 0.73 and 0.76, respectively. The F1-score of the model predicting the presence of the CanMEDS roles trained with BERTje was 0.71 and 0.72 with RobBERT. Discussion The results showed that a state-of-the-art LLM is able to identify the presence of the four feedback quality criteria and the CanMEDS roles in written feedback comments. This implies that the quality analysis of written feedback comments can be automated using an LLM, leading to savings of time and resources.
Collapse
Affiliation(s)
| | - Loic De Langhe
- Language and Translation Technology Team at Ghent University, Belgium
| | - Orphée De Clercq
- Language and Translation Technology Team at Ghent University, Belgium
| | - Mieke Embo
- Department of Educational Sciences at Ghent University and in the Expertise Network Health and Care at the Artevelde University of Applied Sciences, Belgium
| | - Tammy Schellens
- Department of Educational Sciences at Ghent University, Belgium
| | - Martin Valcke
- Department of Educational Sciences at Ghent University, Belgium
| |
Collapse
|
3
|
McGuire N, Acai A, Sonnadara RR. The McMaster Narrative Comment Rating Tool: Development and Initial Validity Evidence. TEACHING AND LEARNING IN MEDICINE 2023:1-13. [PMID: 37964518 DOI: 10.1080/10401334.2023.2276799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 10/05/2023] [Indexed: 11/16/2023]
Abstract
CONSTRUCT The McMaster Narrative Comment Rating Tool aims to capture critical features reflecting the quality of written narrative comments provided in the medical education context: valence/tone of language, degree of correction versus reinforcement, specificity, actionability, and overall usefulness. BACKGROUND Despite their role in competency-based medical education, not all narrative comments contribute meaningfully to the development of learners' competence. To develop solutions to mitigate this problem, robust measures of narrative comment quality are needed. While some tools exist, most were created in specialty-specific contexts, have focused on one or two features of feedback, or have focused on faculty perceptions of feedback, excluding learners from the validation process. In this study, we aimed to develop a detailed, broadly applicable narrative comment quality assessment tool that drew upon features of high-quality assessment and feedback and could be used by a variety of raters to inform future research, including applications related to automated analysis of narrative comment quality. APPROACH In Phase 1, we used the literature to identify five critical features of feedback. We then developed rating scales for each of the features, and collected 670 competency-based assessments completed by first-year surgical residents in the first six-weeks of training. Residents were from nine different programs at a Canadian institution. In Phase 2, we randomly selected 50 assessments with written feedback from the dataset. Two education researchers used the scale to independently score the written comments and refine the rating tool. In Phase 3, 10 raters, including two medical education researchers, two medical students, two residents, two clinical faculty members, and two laypersons from the community, used the tool to independently and blindly rate written comments from another 50 randomly selected assessments from the dataset. We compared scores between and across rater pairs to assess reliability. FINDINGS Single and average measures intraclass correlation (ICC) scores ranged from moderate to excellent (ICCs = .51-.83 and .91-.98) across all categories and rater pairs. All tool domains were significantly correlated (p's <.05), apart from valence, which was only significantly correlated with degree of correction versus reinforcement. CONCLUSION Our findings suggest that the McMaster Narrative Comment Rating Tool can reliably be used by multiple raters, across a variety of rater types, and in different surgical contexts. As such, it has the potential to support faculty development initiatives on assessment and feedback, and may be used as a tool to conduct research on different assessment strategies, including automated analysis of narrative comments.
Collapse
Affiliation(s)
- Natalie McGuire
- Office of Professional Development and Educational Scholarship, Queen's University, Kingston, Ontario, Canada
| | - Anita Acai
- Department of Psychiatry and Behavioural Neurosciences and McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, and St. Joseph's Education Research Centre (SERC), St. Joseph's Healthcare Hamilton, Hamilton, Canada
| | - Ranil R Sonnadara
- Office of Education Science, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
4
|
Mooney CJ, Stone RT, Wang L, Blatt AE, Pascoe JM, Lang VJ. Examining Generalizability of Faculty Members' Narrative Assessments. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023; 98:S210. [PMID: 37983456 DOI: 10.1097/acm.0000000000005417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Affiliation(s)
- Christopher J Mooney
- Author affiliations: C.J. Mooney, R.T. Stone, L. Wang, A.E. Blatt, J.M. Pascoe, V.J. Lang, University of Rochester School of Medicine and Dentistry
| | | | | | | | | | | |
Collapse
|
5
|
Quinn JK, Mongelluzzo J, Addo N, Nip A, Graterol J, Chen EH. The Standardized Letter of Evaluation: How We Perceive the Quiet Student. West J Emerg Med 2023; 24:259-263. [PMID: 36976603 PMCID: PMC10047751 DOI: 10.5811/westjem.2022.12.56137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 12/20/2022] [Indexed: 03/29/2023] Open
Abstract
INTRODUCTION The Standardized Letter of Evaluation (SLOE) is an emergency medicine (EM)-specific assessment designed to help EM residency programs differentiate applicants. We became interested in SLOE-narrative language referencing personality when we observed less enthusiasm for applicants described as "quiet" in their SLOEs. In this study our objective was to compare how quiet-labeled, EM-bound applicants were ranked compared to their non-quiet peers in the global assessment (GA) and anticipated rank list (ARL) categories in the SLOE. METHODS We conducted a planned subgroup analysis of a retrospective cohort study of all core EM clerkship SLOEs submitted to one, four-year academic EM residency program in the 2016-2017 recruitment cycle. We compared SLOEs of applicants who were described as "quiet," "shy," and/or "reserved" - collectively referred to as "quiet" - to SLOEs from all other applicants, referred to as "non-quiet." We compared frequencies of quiet to non-quiet students in GA and ARL categories using chi-square goodness-of-fit tests with a rejection criteria (alpha) of 0.05. RESULTS We reviewed 1,582 SLOEs from 696 applicants. Of these, 120 SLOEs described quiet applicants. The distributions of quiet and non-quiet applicants across GA and ARL categories were significantly different (P < 0.001). Quiet applicants were less likely than non-quiet applicants to be ranked in the top 10% and top one-third GA categories combined (31% vs 60%) and more likely to be in the middle one-third category (58% vs 32%). For ARL, quiet applicants were also less likely to be ranked in the top 10% and top one-third categories combined (33% vs 58%) and more likely to be in the middle one-third category (50% vs 31%). CONCLUSION Emergency medicine-bound students described as quiet in their SLOEs were less likely to be ranked in the top GA and ARL categories compared to non-quiet students. More research is needed to determine the cause of these ranking disparities and address potential biases in teaching and assessment practices.
Collapse
Affiliation(s)
- John K Quinn
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| | - Jillian Mongelluzzo
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| | - Newton Addo
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| | - Alyssa Nip
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| | - Joseph Graterol
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| | - Esther H Chen
- University of California, San Francisco, Department of Emergency Medicine, San Francisco, California
| |
Collapse
|
6
|
Maimone C, Dolan BM, Green MM, Sanguino SM, Garcia PM, O’Brien CL. Utilizing Natural Language Processing of Narrative Feedback to Develop a Predictive Model of Pre-Clerkship Performance: Lessons Learned. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:141-148. [PMID: 37151853 PMCID: PMC10162355 DOI: 10.5334/pme.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/19/2023] [Indexed: 05/09/2023]
Abstract
Background Natural language processing is a promising technique that can be used to create efficiencies in the review of narrative feedback to learners. The Feinberg School of Medicine has implemented formal review of pre-clerkship narrative feedback since 2014 through its portfolio assessment system but this process requires considerable time and effort. This article describes how natural language processing was used to build a predictive model of pre-clerkship student performance that can be utilized to assist competency committee reviews. Approach The authors took an iterative and inductive approach to the analysis, which allowed them to identify characteristics of narrative feedback that are both predictive of performance and useful to faculty reviewers. Words and phrases were manually grouped into topics that represented concepts illustrating student performance. Topics were reviewed by experienced reviewers, tested for consistency across time, and checked to ensure they did not demonstrate bias. Outcomes Sixteen topic groups of words and phrases were found to be predictive of performance. The best-fitting model used a combination of topic groups, word counts, and categorical ratings. The model had an AUC value of 0.92 on the training data and 0.88 on the test data. Reflection A thoughtful, careful approach to using natural language processing was essential. Given the idiosyncrasies of narrative feedback in medical education, standard natural language processing packages were not adequate for predicting student outcomes. Rather, employing qualitative techniques including repeated member checking and iterative revision resulted in a useful and salient predictive model.
Collapse
Affiliation(s)
- Christina Maimone
- Associate director of research data services, Northwestern IT Research Computing Services, Northwestern University, Evanston, Illinois, USA
| | - Brigid M. Dolan
- Associate professor of medicine and medical education and director of assessment, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Marianne M. Green
- Raymond H. Curry, MD Professor of Medical Education, professor of medicine, and vice dean for education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Sandra M. Sanguino
- Associate professor of pediatrics and senior associate dean of medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Patricia M. Garcia
- Professor of obstetrics and gynecology and medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Celia Laird O’Brien
- Assistant professor of medical education and assistant dean of program evaluation and accreditation, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
7
|
Zavodnick J, Doroshow J, Rosenberg S, Banks J, Leiby BE, Mingioni N. Hawks and Doves: Perceptions and Reality of Faculty Evaluations. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2023; 10:23821205231197079. [PMID: 37692558 PMCID: PMC10492463 DOI: 10.1177/23821205231197079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 08/08/2023] [Indexed: 09/12/2023]
Abstract
OBJECTIVES Internal medicine clerkship grades are important for residency selection, but inconsistencies between evaluator ratings threaten their ability to accurately represent student performance and perceived fairness. Clerkship grading committees are recommended as best practice, but the mechanisms by which they promote accuracy and fairness are not certain. The ability of a committee to reliably assess and account for grading stringency of individual evaluators has not been previously studied. METHODS This is a retrospective analysis of evaluations completed by faculty considered to be stringent, lenient, or neutral graders by members of a grading committee of a single medical college. Faculty evaluations were assessed for differences in ratings on individual skills and recommendations for final grade between perceived stringency categories. Logistic regression was used to determine if actual assigned ratings varied based on perceived faculty's grading stringency category. RESULTS "Easy graders" consistently had the highest probability of awarding an above-average rating, and "hard graders" consistently had the lowest probability of awarding an above-average rating, though this finding only reached statistical significance only for 2 of 8 questions on the evaluation form (P = .033 and P = .001). Odds ratios of assigning a higher final suggested grade followed the expected pattern (higher for "easy" and "neutral" compared to "hard," higher for "easy" compared to "neutral") but did not reach statistical significance. CONCLUSIONS Perceived differences in faculty grading stringency have basis in reality for clerkship evaluation elements. However, final grades recommended by faculty perceived as "stringent" or "lenient" did not differ. Perceptions of "hawks" and "doves" are not just lore but may not have implications for students' final grades. Continued research to describe the "hawk and dove effect" will be crucial to enable assessment of local grading variation and empower local educational leadership to correct, but not overcorrect, for this effect to maintain fairness in student evaluations.
Collapse
Affiliation(s)
- Jillian Zavodnick
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA
| | | | - Sarah Rosenberg
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA
| | - Joshua Banks
- Department of Pharmacology and Experimental Therapeutics, Division of Biostatistics, Thomas Jefferson University, Philadelphia, USA
| | - Benjamin E Leiby
- Department of Pharmacology and Experimental Therapeutics, Division of Biostatistics, Thomas Jefferson University, Philadelphia, USA
| | - Nina Mingioni
- Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA
| |
Collapse
|
8
|
Mooney CJ, Pascoe JM, Blatt AE, Lang VJ, Kelly MS, Braun MK, Burch JE, Stone RT. Predictors of faculty narrative evaluation quality in medical school clerkships. MEDICAL EDUCATION 2022; 56:1223-1231. [PMID: 35950329 DOI: 10.1111/medu.14911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/01/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
INTRODUCTION Narrative approaches to assessment provide meaningful and valid representations of trainee performance. Yet, narratives are frequently perceived as vague, nonspecific and low quality. To date, there is little research examining factors associated with narrative evaluation quality, particularly in undergraduate medical education. The purpose of this study was to examine associations of faculty- and student-level characteristics with the quality of faculty member's narrative evaluations of clerkship students. METHODS The authors reviewed faculty narrative evaluations of 50 students' clinical performance in their inpatient medicine and neurology clerkships, resulting in 165 and 87 unique evaluations in the respective clerkships. The authors evaluated narrative quality using the Narrative Evaluation Quality Instrument (NEQI). The authors used linear mixed effects modelling to predict total NEQI score. Explanatory covariates included the following: time to evaluation completion, number of weeks spent with student, faculty total weeks on service per year, total faculty years in clinical education, student gender, faculty gender, and an interaction term between student and faculty gender. RESULTS Significantly higher narrative evaluation quality was associated with a shorter time to evaluation completion, with NEQI scores decreasing by approximately 0.3 points every 10 days following students' rotations (p = .004). Additionally, women faculty had statistically higher quality narrative evaluations with NEQI scores 1.92 points greater than men faculty (p = .012). All other covariates were not significant. CONCLUSIONS The quality of faculty members' narrative evaluations of medical students was associated with time to evaluation completion and faculty gender but not faculty experience in clinical education, faculty weeks on service, or the amount of time spent with students. Findings advance understanding on ways to improve the quality of narrative evaluations which are imperative given assessment models that will increase the volume and reliance on narratives.
Collapse
Affiliation(s)
- Christopher J Mooney
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Jennifer M Pascoe
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Amy E Blatt
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Valerie J Lang
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | | | - Melanie K Braun
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Jaclyn E Burch
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | | |
Collapse
|
9
|
Branfield Day L, Rassos J, Billick M, Ginsburg S. 'Next steps are…': An exploration of coaching and feedback language in EPA assessment comments. MEDICAL TEACHER 2022; 44:1368-1375. [PMID: 35944554 DOI: 10.1080/0142159x.2022.2098098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE Entrustable Professional Activities (EPA) assessments are intended to facilitate meaningful, low-stakes coaching and feedback, partly through the provision of written comments. We sought to explore EPA assessment comments provided to internal medicine (IM) residents for evidence of feedback and coaching language as well as politeness. METHODS We collected all written comments from EPA assessments of communication from a first-year IM resident cohort at the University of Toronto. Sensitized by politeness theory, we analyzed data using principles of constructivist grounded theory. RESULTS Nearly all EPA assessments (94%) contained written feedback based on focused clinical encounters. The majority of comments demonstrated coaching language, including phrases like 'don't forget to,' and 'next steps are,' followed by specific suggestions for improvement. A variety of words, including 'autonomy' and 'independence' denoted entrustment decisions. Linguistic politeness strategies such as hedging were pervasive, seemingly to minimize harm to the supervisor-trainee relationship. CONCLUSION Evidence of written coaching feedback suggests that EPA assessment comments are being used as intended as a means of formative feedback to promote learning. Yet, the frequent use of polite language suggests that EPAs may be higher-stakes than expected, highlighting a need for changes to the assessment culture and improved feedback literacy.
Collapse
Affiliation(s)
- Leora Branfield Day
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - James Rassos
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Maxime Billick
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Shiphra Ginsburg
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Canada
- Wilson Centre for Research in Education, Toronto, Canada
| |
Collapse
|
10
|
Woods R, Singh S, Thoma B, Patocka C, Cheung W, Monteiro S, Chan TM. Validity evidence for the Quality of Assessment for Learning score: a quality metric for supervisor comments in Competency Based Medical Education. CANADIAN MEDICAL EDUCATION JOURNAL 2022; 13:19-35. [PMID: 36440075 PMCID: PMC9684040 DOI: 10.36834/cmej.74860] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
BACKGROUND Competency based medical education (CBME) relies on supervisor narrative comments contained within entrustable professional activities (EPA) for programmatic assessment, but the quality of these supervisor comments is unassessed. There is validity evidence supporting the QuAL (Quality of Assessment for Learning) score for rating the usefulness of short narrative comments in direct observation. OBJECTIVE We sought to establish validity evidence for the QuAL score to rate the quality of supervisor narrative comments contained within an EPA by surveying the key end-users of EPA narrative comments: residents, academic advisors, and competence committee members. METHODS In 2020, the authors randomly selected 52 de-identified narrative comments from two emergency medicine EPA databases using purposeful sampling. Six collaborators (two residents, two academic advisors, and two competence committee members) were recruited from each of four EM Residency Programs (Saskatchewan, McMaster, Ottawa, and Calgary) to rate these comments with a utility score and the QuAL score. Correlation between utility and QuAL score were calculated using Pearson's correlation coefficient. Sources of variance and reliability were calculated using a generalizability study. RESULTS All collaborators (n = 24) completed the full study. The QuAL score had a high positive correlation with the utility score amongst the residents (r = 0.80) and academic advisors (r = 0.75) and a moderately high correlation amongst competence committee members (r = 0.68). The generalizability study found that the major source of variance was the comment indicating the tool performs well across raters. CONCLUSION The QuAL score may serve as an outcome measure for program evaluation of supervisors, and as a resource for faculty development.
Collapse
Affiliation(s)
- Rob Woods
- Department of Emergency Medicine, University of Saskatchewan, Saskatchewan, Canada
| | - Sim Singh
- College of Medicine, University of Saskatchewan, Saskatchewan, Canada
| | - Brent Thoma
- Department of Emergency Medicine, University of Saskatchewan, Saskatchewan, Canada
| | - Catherine Patocka
- Department of Emergency Medicine, University of Calgary, Alberta, Canada
| | - Warren Cheung
- Department of Emergency Medicine, University of Ottawa, Ontario, Canada
| | - Sandra Monteiro
- Department of Health Research Methods Evidence and Impact, McMaster University, Ontario, Canada
| | - Teresa M Chan
- Division of Emergency Medicine and Education & Innovation, Department of Medicine, McMaster University, Ontario, Canada
| | | |
Collapse
|
11
|
Mooney CJ, Blatt A, Pascoe J, Lang V, Kelly M, Braun M, Burch J, Stone RT. Predictors of Narrative Evaluation Quality in Undergraduate Medical Education Clerkships. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2022; 97:S168. [PMID: 37838897 DOI: 10.1097/acm.0000000000004809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Affiliation(s)
- Christopher J Mooney
- Author affiliations: C.J. Mooney, A. Blatt, J. Pascoe, V. Lang, M. Braun, J. Burch, R.T. Stone, University of Rochester School of Medicine and Dentistry; M. Kelly, Massachusetts General Hospital
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Roy M, Kain N, Touchie C. Exploring Content Relationships Among Components of a Multisource Feedback Program. THE JOURNAL OF CONTINUING EDUCATION IN THE HEALTH PROFESSIONS 2022; 42:243-248. [PMID: 34609355 DOI: 10.1097/ceh.0000000000000398] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
INTRODUCTION A new multisource feedback (MSF) program was specifically designed to support physician quality improvement (QI) around the CanMEDS roles of Collaborator , Communicator , and Professional . Quantitative ratings and qualitative comments are collected from a sample of physician colleagues, co-workers (C), and patients (PT). These data are supplemented with self-ratings and given back to physicians in individualized reports. Each physician reviews the report with a trained feedback facilitator and creates one-to-three action plans for QI. This study explores how the content of the four aforementioned multisource feedback program components supports the elicitation and translation of feedback into a QI plan for change. METHODS Data included survey items, rater comments, a portion of facilitator reports, and action plans components for 159 physicians. Word frequency queries were used to identify common words and explore relationships among data sources. RESULTS Overlap between high frequency words in surveys and rater comments was substantial. The language used to describe goals in physician action plans was highly related to respondent comments, but less so to survey items. High frequency words in facilitator reports related heavily to action plan content. DISCUSSION All components of the program relate to one another indicating that each plays a part in the process. Patterns of overlap suggest unique functions conducted by program components. This demonstration of coherence across components of this program is one piece of evidence that supports the program's validity.
Collapse
Affiliation(s)
- Marguerite Roy
- Dr. Roy: Adjunct Professor, Department of Innovation in Medical Education, University of Ottawa, Ottawa, Ontario, Canada. Dr. Kain: Program Manager, Research & Evaluation Unit, College of Physicians and Surgeons of Alberta, Edmonton, Alberta, Canada. Dr. Touchie: Professor, Department of Innovation in Medical Education, University of Ottawa, Canada, Chief Medical Education Advisor, Medical Council of Canada, Ottawa, Ontario, Canada, and Department of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | | | | |
Collapse
|
13
|
Yilmaz Y, Jurado Nunez A, Ariaeinejad A, Lee M, Sherbino J, Chan TM. Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education. JMIR MEDICAL EDUCATION 2022; 8:e30537. [PMID: 35622398 PMCID: PMC9187970 DOI: 10.2196/30537] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 12/05/2021] [Accepted: 04/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. OBJECTIVE This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. METHODS NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. RESULTS The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). CONCLUSIONS The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments.
Collapse
Affiliation(s)
- Yusuf Yilmaz
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Department of Medical Education, Ege University, Izmir, Turkey
- Program for Faculty Development, Office of Continuing Professional Development, McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Alma Jurado Nunez
- Department of Medicine and Masters in eHealth Program, McMaster University, Hamilton, ON, Canada
| | - Ali Ariaeinejad
- Department of Medicine and Masters in eHealth Program, McMaster University, Hamilton, ON, Canada
| | - Mark Lee
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Jonathan Sherbino
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Emergency Medicine, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Education and Innovation, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Teresa M Chan
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Program for Faculty Development, Office of Continuing Professional Development, McMaster University, Hamilton, ON, Canada
- Division of Emergency Medicine, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Education and Innovation, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
14
|
Ginsburg S, Stroud L, Lynch M, Melvin L, Kulasegaram K. Beyond the ratings: gender effects in written comments from clinical teaching assessments. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022; 27:355-374. [PMID: 35088152 DOI: 10.1007/s10459-021-10088-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/12/2021] [Indexed: 06/14/2023]
Abstract
Assessment of clinical teachers by learners is problematic. Construct-irrelevant factors influence ratings, and women teachers often receive lower ratings than men. However, most studies focus only on numeric scores. Therefore, the authors analyzed written comments on 4032 teacher assessments, representing 282 women and 448 men teachers in one Department of Medicine, to explore for gender differences. NVivo was used to search for 61 evidence- and theoretically-based terms purported to reflect teaching excellence, which were analyzed using 2 × 2 chi-squared tests. The Linguistic Index and Word Count (LIWC) was used to categorize comment data, which were analyzed using linear regressions. The only significant difference in NVivo was that men were more likely than women to have the word "available" in a comment (OR 1.4, p < .05). A subset of LIWC variables showed significant gender differences, but all effects were modest. Men teachers had more positive emotion words written about them, while negative emotion words appeared equally. Significant differences occurred more often between the men and women residents who wrote the comments, rather than those attributed to the gender of the teachers. For example, women residents used more social and gender-related words (β 1.87, p < 0.001) and fewer words related to power or achievement (β -3.78, p < 0.001) than men residents. Profound gender differences were not found in teacher assessment comments in this large, diverse academic department of medicine, which differs from other studies. The authors explore possible reasons including differences in departmental culture and issues related to the methods used.
Collapse
Affiliation(s)
- Shiphra Ginsburg
- Department of Medicine, Sinai Health System, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.
- Wilson Centre for Research in Education, University Health Network and University of Toronto, Toronto, Ontario, Canada.
- Canada Research Chair in Health Professions Education, Ottawa, Canada.
- Mount Sinai Hospital, 433-600, University Ave., Toronto, Ontario, M5G 1X5, Canada.
| | - Lynfa Stroud
- Wilson Centre for Research in Education, University Health Network and University of Toronto, Toronto, Ontario, Canada
- Department of Medicine, Sunnybrook HSC and Temerty Faculty of Medicine, Toronto, Ontario, Canada
| | - Meghan Lynch
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Lindsay Melvin
- Department of Medicine, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Kulamakan Kulasegaram
- Wilson Centre for Research in Education, University Health Network and University of Toronto, Toronto, Ontario, Canada
- Department of Family and Community Medicine, Temerty Faculty of Medicine, Toronto, Ontario, Canada
- Temerty Chair in Learner Assessment and Program Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
15
|
The effect of gender dyads on the quality of narrative assessments of general surgery trainees. Am J Surg 2021; 224:179-184. [PMID: 34911639 DOI: 10.1016/j.amjsurg.2021.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 11/30/2021] [Accepted: 12/01/2021] [Indexed: 01/13/2023]
Abstract
BACKGROUND Prior studies have shown that gender can influence how learners are assessed and the feedback they receive. We investigated the quality of faculty narrative comments in general surgery trainee evaluation using trainee-assessor gender dyads. METHODS Narrative assessments of surgical trainees at the University of British Columbia were collected and rated using the McMaster Narrative Comment Rating Scale (MNCRS). Variables from the MNCRS were inputted into a generalized linear mixed model to explore the impact of gender dyads on the quality of narrative feedback. RESULTS 2,469 assessments were collected. Women assessors tended to give higher-quality comments (p's < 0.05) than men assessors. Comments from men assessors to women trainees were significantly more positive than comments from men assessors to men trainees (p = 0.02). Men assessors also tended to give women trainees more reinforcing than corrective comments than to men trainees (p < 0.01). CONCLUSIONS There are significant differences in the quality of faculty feedback to trainees by gender dyads. A range of solutions to improve and reduce differences in feedback quality are discussed.
Collapse
|
16
|
Kelleher M, Kinnear B, Sall DR, Weber DE, DeCoursey B, Nelson J, Klein M, Warm EJ, Schumacher DJ. Warnings in early narrative assessment that might predict performance in residency: signal from an internal medicine residency program. PERSPECTIVES ON MEDICAL EDUCATION 2021; 10:334-340. [PMID: 34476730 PMCID: PMC8633188 DOI: 10.1007/s40037-021-00681-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 07/08/2021] [Accepted: 07/11/2021] [Indexed: 05/10/2023]
Abstract
INTRODUCTION Narrative assessment data are valuable in understanding struggles in resident performance. However, it remains unknown which themes in narrative data that occur early in training may indicate a higher likelihood of struggles later in training, allowing programs to intervene sooner. METHODS Using learning analytics, we identified 26 internal medicine residents in three cohorts that were below expected entrustment during training. We compiled all narrative data in the first 6 months of training for these residents as well as 13 typically performing residents for comparison. Narrative data were blinded for all 39 residents during initial phases of an inductive thematic analysis for initial coding. RESULTS Many similarities were identified between the two cohorts. Codes that differed between typical and lower entrusted residents were grouped into two types of themes: three explicit/manifest and three implicit/latent with six total themes. The explicit/manifest themes focused on specific aspects of resident performance with assessors describing 1) Gaps in attention to detail, 2) Communication deficits with patients, and 3) Difficulty recognizing the "big picture" in patient care. Three implicit/latent themes, focused on how narrative data were written, were also identified: 1) Feedback described as a deficiency rather than an opportunity to improve, 2) Normative comparisons to identify a resident as being behind their peers, and 3) Warning of possible risk to patient care. DISCUSSION Clinical competency committees (CCCs) usually rely on accumulated data and trends. Using the themes in this paper while reviewing narrative comments may help CCCs with earlier recognition and better allocation of resources to support residents' development.
Collapse
Affiliation(s)
- Matthew Kelleher
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
| | - Benjamin Kinnear
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Dana R Sall
- HonorHealth Internal Medicine Residency Program, Scottsdale, Arizona and University of Arizona College of Medicine, Phoenix, AZ, USA
| | - Danielle E Weber
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Bailey DeCoursey
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Jennifer Nelson
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Melissa Klein
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Eric J Warm
- Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Daniel J Schumacher
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| |
Collapse
|
17
|
Roshan A, Wagner N, Acai A, Emmerton-Coughlin H, Sonnadara RR, Scott TM, Karimuddin AA. Comparing the Quality of Narrative Comments by Rotation Setting. JOURNAL OF SURGICAL EDUCATION 2021; 78:2070-2077. [PMID: 34301523 DOI: 10.1016/j.jsurg.2021.06.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/20/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE To investigate the effect of rotation setting on trainee-directed narrative comments within a Canadian General Surgery Residency Program. The primary outcome was to use the McMaster Narrative Comment Rating Scale (MNCRS) to evaluate the quality of narrative comments across five domains: valence of language, degree of correction versus reinforcement, specificity, actionability and overall usefulness. As distributed medical education in the postgraduate training context becomes more prevalent, delineating differences in feedback between various sites will be imperative, as it may affect how narrative comments are interpreted by clinical competency committee (CCC) members. DESIGN, SETTING, AND PARTICIPANTS A retrospective analysis of 2,469 assessments obtained between July 1, 2014 and May 5, 2019 from the General Surgery Residency Program at the University of British Columbia (UBC) was conducted. Narrative comments were rated using the McMaster Narrative Comment Rating Scale (MNCRS), a validated instrument for evaluating the quality of narrative comments. A repeated measures Analysis of Variance (ANOVA) was conducted to explore the impact of rotation setting, academic, urban tertiary, distributed urban, and distributed rural on the quality of narrative feedback. RESULTS Overall, the quality of the narrative comments varied substantially between and within rotation settings. Academic sites tended to provide more actionable comments (p = 0.01) and more corrective versus reinforcing comments, compared with other sites (p's < 0.01). Comments produced by the urban tertiary rotation setting were consistently lower in quality across all scale categories compared with other settings (p's < 0.01). CONCLUSION The type of rotation setting has a significant effect on the quality of faculty feedback for trainees. Faculty development on the provision of feedback is necessary, regardless of rotation setting, and should appropriately combine rotation-specific needs and overarching program goals to ensure trainees and clinical competence committees receive high quality narrative.
Collapse
Affiliation(s)
- Aishwarya Roshan
- University of British Columbia, Vancouver, British Columbia, Canada.
| | - Natalie Wagner
- Office of Professional Development & Educational Scholarship, Queen's University, Kingston, Ontario Canada
| | - Anita Acai
- Department of Psychology, Neuroscience & Behavior, McMaster University, Hamilton, Ontario, Canada; Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Ontario, Canada; Office of Education Science, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Heather Emmerton-Coughlin
- Department of Surgery, University of British Columbia, Vancouver, British Columbia Canada; Department of Surgery, Royal Jubilee Hospital, Victoria, British Columbia, Canada
| | - Ranil R Sonnadara
- Office of Education Science, Department of Surgery, McMaster University, Hamilton, Ontario, Canada; Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Tracy M Scott
- Department of Surgery, University of British Columbia, Vancouver, British Columbia Canada; Department of Surgery, St. Paul's Hospital, Vancouver, British Columbia, Canada
| | - Ahmer A Karimuddin
- Department of Surgery, University of British Columbia, Vancouver, British Columbia Canada; Department of Surgery, St. Paul's Hospital, Vancouver, British Columbia, Canada
| |
Collapse
|
18
|
Ginsburg S, Watling CJ, Schumacher DJ, Gingerich A, Hatala R. Numbers Encapsulate, Words Elaborate: Toward the Best Use of Comments for Assessment and Feedback on Entrustment Ratings. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021; 96:S81-S86. [PMID: 34183607 DOI: 10.1097/acm.0000000000004089] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The adoption of entrustment ratings in medical education is based on a seemingly simple premise: to align workplace-based supervision with resident assessment. Yet it has been difficult to operationalize this concept. Entrustment rating forms combine numeric scales with comments and are embedded in a programmatic assessment framework, which encourages the collection of a large quantity of data. The implicit assumption that more is better has led to an untamable volume of data that competency committees must grapple with. In this article, the authors explore the roles of numbers and words on entrustment rating forms, focusing on the intended and optimal use(s) of each, with a focus on the words. They also unpack the problematic issue of dual-purposing words for both assessment and feedback. Words have enormous potential to elaborate, to contextualize, and to instruct; to realize this potential, educators must be crystal clear about their use. The authors set forth a number of possible ways to reconcile these tensions by more explicitly aligning words to purpose. For example, educators could focus written comments solely on assessment; create assessment encounters distinct from feedback encounters; or use different words collected from the same encounter to serve distinct feedback and assessment purposes. Finally, the authors address the tyranny of documentation created by programmatic assessment and urge caution in yielding to the temptation to reduce words to numbers to make them manageable. Instead, they encourage educators to preserve some educational encounters purely for feedback, and to consider that not all words need to become data.
Collapse
Affiliation(s)
- Shiphra Ginsburg
- S. Ginsburg is professor of medicine, Department of Medicine, Sinai Health System and Faculty of Medicine, University of Toronto, scientist, Wilson Centre for Research in Education, University of Toronto, Toronto, Ontario, Canada, and Canada Research Chair in Health Professions Education; ORCID: http://orcid.org/0000-0002-4595-6650
| | - Christopher J Watling
- C.J. Watling is professor and director, Centre for Education Research and Innovation, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada; ORCID: https://orcid.org/0000-0001-9686-795X
| | - Daniel J Schumacher
- D.J. Schumacher is associate professor of pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, Ohio; ORCID: https://orcid.org/0000-0001-5507-8452
| | - Andrea Gingerich
- A. Gingerich is assistant professor, Northern Medical Program, University of Northern British Columbia, Prince George, British Columbia, Canada; ORCID: https://orcid.org/0000-0001-5765-3975
| | - Rose Hatala
- R. Hatala is professor, Department of Medicine, and director, Clinical Educator Fellowship, Center for Health Education Scholarship, University of British Columbia, Vancouver, British Columbia, Canada; ORCID: https://orcid.org/0000-0003-0521-2590
| |
Collapse
|
19
|
Roy M, Wojcik J, Bartman I, Smee S. Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:313-328. [PMID: 32816242 DOI: 10.1007/s10459-020-09987-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 08/17/2020] [Indexed: 06/11/2023]
Abstract
In Canada, high stakes objective structured clinical examinations (OSCEs) administered by the Medical Council of Canada have relied exclusively on physician examiners (PEs) for scoring. Prior research has looked at using SPs to replace PEs. This paper reports on two studies that implement and evaluate a standardized patient (SP) scoring tool to augment PE scoring. The unique aspect of this study is that it explores the benefits of combining SP and PE scores. SP focus groups developed rating scales for four dimensions they labelled: Listening, Communication, Empathy/Rapport, and Global Impression. In Study I, 43 SPs from one site of a national PE-scored OSCE rated 60 examinees with the initial SP rating scales. In Study II, 137 SPs used slightly revised rating scales with optional narrative comments to score 275 examinees at two sites. Examinees were blinded to SP scoring and SP ratings did not count. Separate PE and SP scoring was examined using descriptive statistics and correlations. Combinations of SP and PE scoring were assessed using pass-rates, reliability, and decision consistency and accuracy indices. In Study II, SP and PE comments were examined. SPs showed greater variability in their scoring, and rated examinees lower than PEs on common elements, resulting in slightly lower pass rates when combined. There was a moderate tendency for both SPs and PEs to make negative comments for the same examinee but for different reasons. We argue that SPs and PE assess performance from different perspectives, and that combining scores from both augments overall reliability of scores and pass/fail decisions. There is potential to provide examinees with feedback comments from each group.
Collapse
Affiliation(s)
- Marguerite Roy
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada.
| | - Josée Wojcik
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| | - Ilona Bartman
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| | - Sydney Smee
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| |
Collapse
|
20
|
Ginsburg S, Gingerich A, Kogan JR, Watling CJ, Eva KW. Idiosyncrasy in Assessment Comments: Do Faculty Have Distinct Writing Styles When Completing In-Training Evaluation Reports? ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2020; 95:S81-S88. [PMID: 32769454 DOI: 10.1097/acm.0000000000003643] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
PURPOSE Written comments are gaining traction as robust sources of assessment data. Compared with the structure of numeric scales, what faculty choose to write is ad hoc, leading to idiosyncratic differences in what is recorded. This study offers exploration of what aspects of writing styles are determined by the faculty offering comment and what aspects are determined by the trainee being commented upon. METHOD The authors compiled in-training evaluation report comment data, generated from 2012 to 2015 by 4 large North American Internal Medicine training programs. The Linguistic Index and Word Count (LIWC) was used to categorize and quantify the language contained. Generalizability theory was used to determine whether faculty could be reliably discriminated from one another based on writing style. Correlations and ANOVAs were used to determine what styles were related to faculty or trainee demographics. RESULTS Datasets contained 23-142 faculty who provided 549-2,666 assessments on 161-989 trainees. Faculty could easily be discriminated from one another using a variety of LIWC metrics including word count, words per sentence, and the use of "clout" words. These patterns appeared person specific and did not reflect demographic factors such as gender or rank. These metrics were similarly not consistently associated with trainee factors such as postgraduate year or gender. CONCLUSIONS Faculty seem to have detectable writing styles that are relatively stable across the trainees they assess, which may represent an under-recognized source of construct irrelevance. If written comments are to meaningfully contribute to decision making, we need to understand and account for idiosyncratic writing styles.
Collapse
Affiliation(s)
- Shiphra Ginsburg
- S. Ginsburg is professor of medicine, Department of Medicine, Faculty of Medicine, University of Toronto, scientist, Wilson Centre for Research in Education, University Health Network, University of Toronto, Toronto, Ontario, Canada, and Canada Research Chair in Health Professions Education; ORCID: http://orcid.org/0000-0002-4595-6650
| | - Andrea Gingerich
- A. Gingerich is assistant professor, Northern Medical Program, University of Northern British Columbia, Prince George, British Columbia, Canada; ORCID: https://orcid.org/0000-0001-5765-3975
| | - Jennifer R Kogan
- J.R. Kogan is professor and associate dean for student success and professional development, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania; ORCID: https://orcid.org/0000-0001-8426-9506
| | - Christopher J Watling
- C.J. Watling is professor and director, Centre for Education Research and Innovation, Schulich School of Medicine & Dentistry, Western University, London, Ontario, Canada; ORCID: https://orcid.org/0000-0001-9686-795X
| | - Kevin W Eva
- K.W. Eva is professor and director of education research and scholarship, Department of Medicine, and associate director and senior scientist, Centre for Health Education Scholarship, University of British Columbia, Vancouver, British Columbia, Canada; ORCID: http://orcid.org/0000-0002-8672-2500
| |
Collapse
|