Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ginsburg S, Kogan JR, Gingerich A, Lynch M, Watling CJ. Taken Out of Context: Hazards in the Interpretation of Written Assessment Comments. Acad Med 2020;95:1082-1088. [PMID: 31651432 DOI: 10.1097/acm.0000000000003047] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

For:	Ginsburg S, Kogan JR, Gingerich A, Lynch M, Watling CJ. Taken Out of Context: Hazards in the Interpretation of Written Assessment Comments. Acad Med 2020;95:1082-1088. [PMID: 31651432 DOI: 10.1097/acm.0000000000003047] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Number

Cited by Other Article(s)

Tanaka P, Soo Park Y, Chen CY, Yumul R, Macario A. Domains Influencing Faculty Decisions on the Level of Supervision Required for Anesthesiology EPAs with Analysis of Feedback Comments. JOURNAL OF SURGICAL EDUCATION 2024;81:741-752. [PMID: 38553368 DOI: 10.1016/j.jsurg.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/30/2023] [Accepted: 02/02/2024] [Indexed: 04/26/2024]

Van Ostaeyen S, De Langhe L, De Clercq O, Embo M, Schellens T, Valcke M. Automating the Identification of Feedback Quality Criteria and the CanMEDS Roles in Written Feedback Comments Using Natural Language Processing. PERSPECTIVES ON MEDICAL EDUCATION 2023;12:540-549. [PMID: 38144670 PMCID: PMC10742245 DOI: 10.5334/pme.1056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 10/03/2023] [Indexed: 12/26/2023]

McGuire N, Acai A, Sonnadara RR. The McMaster Narrative Comment Rating Tool: Development and Initial Validity Evidence. TEACHING AND LEARNING IN MEDICINE 2023:1-13. [PMID: 37964518 DOI: 10.1080/10401334.2023.2276799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 10/05/2023] [Indexed: 11/16/2023]

Abstract

CONSTRUCT

The McMaster Narrative Comment Rating Tool aims to capture critical features reflecting the quality of written narrative comments provided in the medical education context: valence/tone of language, degree of correction versus reinforcement, specificity, actionability, and overall usefulness.

BACKGROUND

Despite their role in competency-based medical education, not all narrative comments contribute meaningfully to the development of learners' competence. To develop solutions to mitigate this problem, robust measures of narrative comment quality are needed. While some tools exist, most were created in specialty-specific contexts, have focused on one or two features of feedback, or have focused on faculty perceptions of feedback, excluding learners from the validation process. In this study, we aimed to develop a detailed, broadly applicable narrative comment quality assessment tool that drew upon features of high-quality assessment and feedback and could be used by a variety of raters to inform future research, including applications related to automated analysis of narrative comment quality.

APPROACH

In Phase 1, we used the literature to identify five critical features of feedback. We then developed rating scales for each of the features, and collected 670 competency-based assessments completed by first-year surgical residents in the first six-weeks of training. Residents were from nine different programs at a Canadian institution. In Phase 2, we randomly selected 50 assessments with written feedback from the dataset. Two education researchers used the scale to independently score the written comments and refine the rating tool. In Phase 3, 10 raters, including two medical education researchers, two medical students, two residents, two clinical faculty members, and two laypersons from the community, used the tool to independently and blindly rate written comments from another 50 randomly selected assessments from the dataset. We compared scores between and across rater pairs to assess reliability.

FINDINGS

Single and average measures intraclass correlation (ICC) scores ranged from moderate to excellent (ICCs = .51-.83 and .91-.98) across all categories and rater pairs. All tool domains were significantly correlated (p's <.05), apart from valence, which was only significantly correlated with degree of correction versus reinforcement.

CONCLUSION

Our findings suggest that the McMaster Narrative Comment Rating Tool can reliably be used by multiple raters, across a variety of rater types, and in different surgical contexts. As such, it has the potential to support faculty development initiatives on assessment and feedback, and may be used as a tool to conduct research on different assessment strategies, including automated analysis of narrative comments.

Collapse

Mooney CJ, Stone RT, Wang L, Blatt AE, Pascoe JM, Lang VJ. Examining Generalizability of Faculty Members' Narrative Assessments. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023;98:S210. [PMID: 37983456 DOI: 10.1097/acm.0000000000005417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]

Quinn JK, Mongelluzzo J, Addo N, Nip A, Graterol J, Chen EH. The Standardized Letter of Evaluation: How We Perceive the Quiet Student. West J Emerg Med 2023;24:259-263. [PMID: 36976603 PMCID: PMC10047751 DOI: 10.5811/westjem.2022.12.56137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 12/20/2022] [Indexed: 03/29/2023] Open

Abstract

INTRODUCTION

The Standardized Letter of Evaluation (SLOE) is an emergency medicine (EM)-specific assessment designed to help EM residency programs differentiate applicants. We became interested in SLOE-narrative language referencing personality when we observed less enthusiasm for applicants described as "quiet" in their SLOEs. In this study our objective was to compare how quiet-labeled, EM-bound applicants were ranked compared to their non-quiet peers in the global assessment (GA) and anticipated rank list (ARL) categories in the SLOE.

METHODS

We conducted a planned subgroup analysis of a retrospective cohort study of all core EM clerkship SLOEs submitted to one, four-year academic EM residency program in the 2016-2017 recruitment cycle. We compared SLOEs of applicants who were described as "quiet," "shy," and/or "reserved" - collectively referred to as "quiet" - to SLOEs from all other applicants, referred to as "non-quiet." We compared frequencies of quiet to non-quiet students in GA and ARL categories using chi-square goodness-of-fit tests with a rejection criteria (alpha) of 0.05.

RESULTS

We reviewed 1,582 SLOEs from 696 applicants. Of these, 120 SLOEs described quiet applicants. The distributions of quiet and non-quiet applicants across GA and ARL categories were significantly different (P < 0.001). Quiet applicants were less likely than non-quiet applicants to be ranked in the top 10% and top one-third GA categories combined (31% vs 60%) and more likely to be in the middle one-third category (58% vs 32%). For ARL, quiet applicants were also less likely to be ranked in the top 10% and top one-third categories combined (33% vs 58%) and more likely to be in the middle one-third category (50% vs 31%).

CONCLUSION

Emergency medicine-bound students described as quiet in their SLOEs were less likely to be ranked in the top GA and ARL categories compared to non-quiet students. More research is needed to determine the cause of these ranking disparities and address potential biases in teaching and assessment practices.

Collapse

Maimone C, Dolan BM, Green MM, Sanguino SM, Garcia PM, O’Brien CL. Utilizing Natural Language Processing of Narrative Feedback to Develop a Predictive Model of Pre-Clerkship Performance: Lessons Learned. PERSPECTIVES ON MEDICAL EDUCATION 2023;12:141-148. [PMID: 37151853 PMCID: PMC10162355 DOI: 10.5334/pme.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/19/2023] [Indexed: 05/09/2023]

Zavodnick J, Doroshow J, Rosenberg S, Banks J, Leiby BE, Mingioni N. Hawks and Doves: Perceptions and Reality of Faculty Evaluations. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2023;10:23821205231197079. [PMID: 37692558 PMCID: PMC10492463 DOI: 10.1177/23821205231197079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 08/08/2023] [Indexed: 09/12/2023]

Abstract

OBJECTIVES

Internal medicine clerkship grades are important for residency selection, but inconsistencies between evaluator ratings threaten their ability to accurately represent student performance and perceived fairness. Clerkship grading committees are recommended as best practice, but the mechanisms by which they promote accuracy and fairness are not certain. The ability of a committee to reliably assess and account for grading stringency of individual evaluators has not been previously studied.

METHODS

This is a retrospective analysis of evaluations completed by faculty considered to be stringent, lenient, or neutral graders by members of a grading committee of a single medical college. Faculty evaluations were assessed for differences in ratings on individual skills and recommendations for final grade between perceived stringency categories. Logistic regression was used to determine if actual assigned ratings varied based on perceived faculty's grading stringency category.

RESULTS

"Easy graders" consistently had the highest probability of awarding an above-average rating, and "hard graders" consistently had the lowest probability of awarding an above-average rating, though this finding only reached statistical significance only for 2 of 8 questions on the evaluation form (P = .033 and P = .001). Odds ratios of assigning a higher final suggested grade followed the expected pattern (higher for "easy" and "neutral" compared to "hard," higher for "easy" compared to "neutral") but did not reach statistical significance.

CONCLUSIONS

Perceived differences in faculty grading stringency have basis in reality for clerkship evaluation elements. However, final grades recommended by faculty perceived as "stringent" or "lenient" did not differ. Perceptions of "hawks" and "doves" are not just lore but may not have implications for students' final grades. Continued research to describe the "hawk and dove effect" will be crucial to enable assessment of local grading variation and empower local educational leadership to correct, but not overcorrect, for this effect to maintain fairness in student evaluations.

Collapse

Mooney CJ, Pascoe JM, Blatt AE, Lang VJ, Kelly MS, Braun MK, Burch JE, Stone RT. Predictors of faculty narrative evaluation quality in medical school clerkships. MEDICAL EDUCATION 2022;56:1223-1231. [PMID: 35950329 DOI: 10.1111/medu.14911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 08/01/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]

Abstract

INTRODUCTION

Narrative approaches to assessment provide meaningful and valid representations of trainee performance. Yet, narratives are frequently perceived as vague, nonspecific and low quality. To date, there is little research examining factors associated with narrative evaluation quality, particularly in undergraduate medical education. The purpose of this study was to examine associations of faculty- and student-level characteristics with the quality of faculty member's narrative evaluations of clerkship students.

METHODS

The authors reviewed faculty narrative evaluations of 50 students' clinical performance in their inpatient medicine and neurology clerkships, resulting in 165 and 87 unique evaluations in the respective clerkships. The authors evaluated narrative quality using the Narrative Evaluation Quality Instrument (NEQI). The authors used linear mixed effects modelling to predict total NEQI score. Explanatory covariates included the following: time to evaluation completion, number of weeks spent with student, faculty total weeks on service per year, total faculty years in clinical education, student gender, faculty gender, and an interaction term between student and faculty gender.

RESULTS

Significantly higher narrative evaluation quality was associated with a shorter time to evaluation completion, with NEQI scores decreasing by approximately 0.3 points every 10 days following students' rotations (p = .004). Additionally, women faculty had statistically higher quality narrative evaluations with NEQI scores 1.92 points greater than men faculty (p = .012). All other covariates were not significant.

CONCLUSIONS

The quality of faculty members' narrative evaluations of medical students was associated with time to evaluation completion and faculty gender but not faculty experience in clinical education, faculty weeks on service, or the amount of time spent with students. Findings advance understanding on ways to improve the quality of narrative evaluations which are imperative given assessment models that will increase the volume and reliance on narratives.

Collapse

Branfield Day L, Rassos J, Billick M, Ginsburg S. 'Next steps are…': An exploration of coaching and feedback language in EPA assessment comments. MEDICAL TEACHER 2022;44:1368-1375. [PMID: 35944554 DOI: 10.1080/0142159x.2022.2098098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Woods R, Singh S, Thoma B, Patocka C, Cheung W, Monteiro S, Chan TM. Validity evidence for the Quality of Assessment for Learning score: a quality metric for supervisor comments in Competency Based Medical Education. CANADIAN MEDICAL EDUCATION JOURNAL 2022;13:19-35. [PMID: 36440075 PMCID: PMC9684040 DOI: 10.36834/cmej.74860] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Mooney CJ, Blatt A, Pascoe J, Lang V, Kelly M, Braun M, Burch J, Stone RT. Predictors of Narrative Evaluation Quality in Undergraduate Medical Education Clerkships. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2022;97:S168. [PMID: 37838897 DOI: 10.1097/acm.0000000000004809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]

Roy M, Kain N, Touchie C. Exploring Content Relationships Among Components of a Multisource Feedback Program. THE JOURNAL OF CONTINUING EDUCATION IN THE HEALTH PROFESSIONS 2022;42:243-248. [PMID: 34609355 DOI: 10.1097/ceh.0000000000000398] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Yilmaz Y, Jurado Nunez A, Ariaeinejad A, Lee M, Sherbino J, Chan TM. Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education. JMIR MEDICAL EDUCATION 2022;8:e30537. [PMID: 35622398 PMCID: PMC9187970 DOI: 10.2196/30537] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 12/05/2021] [Accepted: 04/30/2022] [Indexed: 06/15/2023]

Abstract

BACKGROUND

Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance.

OBJECTIVE

This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings.

METHODS

NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported.

RESULTS

The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%).

CONCLUSIONS

The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments.

Collapse

Ginsburg S, Stroud L, Lynch M, Melvin L, Kulasegaram K. Beyond the ratings: gender effects in written comments from clinical teaching assessments. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022;27:355-374. [PMID: 35088152 DOI: 10.1007/s10459-021-10088-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/12/2021] [Indexed: 06/14/2023]

The effect of gender dyads on the quality of narrative assessments of general surgery trainees. Am J Surg 2021;224:179-184. [PMID: 34911639 DOI: 10.1016/j.amjsurg.2021.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 11/30/2021] [Accepted: 12/01/2021] [Indexed: 01/13/2023]

Kelleher M, Kinnear B, Sall DR, Weber DE, DeCoursey B, Nelson J, Klein M, Warm EJ, Schumacher DJ. Warnings in early narrative assessment that might predict performance in residency: signal from an internal medicine residency program. PERSPECTIVES ON MEDICAL EDUCATION 2021;10:334-340. [PMID: 34476730 PMCID: PMC8633188 DOI: 10.1007/s40037-021-00681-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 07/08/2021] [Accepted: 07/11/2021] [Indexed: 05/10/2023]

Roshan A, Wagner N, Acai A, Emmerton-Coughlin H, Sonnadara RR, Scott TM, Karimuddin AA. Comparing the Quality of Narrative Comments by Rotation Setting. JOURNAL OF SURGICAL EDUCATION 2021;78:2070-2077. [PMID: 34301523 DOI: 10.1016/j.jsurg.2021.06.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/20/2021] [Indexed: 06/13/2023]

Abstract

OBJECTIVE

To investigate the effect of rotation setting on trainee-directed narrative comments within a Canadian General Surgery Residency Program. The primary outcome was to use the McMaster Narrative Comment Rating Scale (MNCRS) to evaluate the quality of narrative comments across five domains: valence of language, degree of correction versus reinforcement, specificity, actionability and overall usefulness. As distributed medical education in the postgraduate training context becomes more prevalent, delineating differences in feedback between various sites will be imperative, as it may affect how narrative comments are interpreted by clinical competency committee (CCC) members.

DESIGN, SETTING, AND PARTICIPANTS

A retrospective analysis of 2,469 assessments obtained between July 1, 2014 and May 5, 2019 from the General Surgery Residency Program at the University of British Columbia (UBC) was conducted. Narrative comments were rated using the McMaster Narrative Comment Rating Scale (MNCRS), a validated instrument for evaluating the quality of narrative comments. A repeated measures Analysis of Variance (ANOVA) was conducted to explore the impact of rotation setting, academic, urban tertiary, distributed urban, and distributed rural on the quality of narrative feedback.

RESULTS

Overall, the quality of the narrative comments varied substantially between and within rotation settings. Academic sites tended to provide more actionable comments (p = 0.01) and more corrective versus reinforcing comments, compared with other sites (p's < 0.01). Comments produced by the urban tertiary rotation setting were consistently lower in quality across all scale categories compared with other settings (p's < 0.01).

CONCLUSION

The type of rotation setting has a significant effect on the quality of faculty feedback for trainees. Faculty development on the provision of feedback is necessary, regardless of rotation setting, and should appropriately combine rotation-specific needs and overarching program goals to ensure trainees and clinical competence committees receive high quality narrative.

Collapse

Ginsburg S, Watling CJ, Schumacher DJ, Gingerich A, Hatala R. Numbers Encapsulate, Words Elaborate: Toward the Best Use of Comments for Assessment and Feedback on Entrustment Ratings. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:S81-S86. [PMID: 34183607 DOI: 10.1097/acm.0000000000004089] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Roy M, Wojcik J, Bartman I, Smee S. Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:313-328. [PMID: 32816242 DOI: 10.1007/s10459-020-09987-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 08/17/2020] [Indexed: 06/11/2023]

Ginsburg S, Gingerich A, Kogan JR, Watling CJ, Eva KW. Idiosyncrasy in Assessment Comments: Do Faculty Have Distinct Writing Styles When Completing In-Training Evaluation Reports? ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2020;95:S81-S88. [PMID: 32769454 DOI: 10.1097/acm.0000000000003643] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]