King AJ, Kahn JM, Brant EB, Cooper GF, Mowery DL. Initial Development of an Automated Platform for Assessing Trainee Performance on Case Presentations.
ATS Sch 2022;
3:548-560. [PMID:
36726701 PMCID:
PMC9886197 DOI:
10.34197/ats-scholar.2022-0010oc]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 08/08/2022] [Indexed: 02/04/2023] Open
Abstract
Background
Oral case presentation is a crucial skill of physicians and a key component of team-based care. However, consistent and objective assessment and feedback on presentations during training are infrequent.
Objective
To determine the potential value of applying natural language processing, computer software that extracts meaning from text, to transcripts of oral case presentations as a strategy to assess their quality automatically and objectively.
Methods
We transcribed a collection of simulated oral case presentations. The presentations were from eight critical care fellows and one critical care attending. They were instructed to review the medical charts of 11 real intensive care unit patient cases and to audio record themselves, presenting each case as if they were doing so on morning rounds. We then used natural language processing to convert the transcripts from human-readable text into machine-readable numbers. These numbers represent details of the presentation style and content. The distance between the numeric representation of two different transcripts negatively correlates with the similarity of those two transcripts. We ranked fellows on the basis of how similar their presentations were to the attending's presentations.
Results
The 99 presentations included 260 minutes of audio (mean length: 2.6 ± 1.24 min per case). On average, 23.88 ± 2.65 sentences were spoken, and each sentence had 14.10 ± 0.67 words, 3.62 ± 0.15 medical concepts, and 0.75 ± 0.09 medical adjectives. When ranking fellows on the basis of how similar their presentations were to the attending's presentation, we found a gap between the five fellows with the most similar presentations and the three fellows with the least similar presentations (average group similarity scores of 0.62 ± 0.01 and 0.53 ± 0.01, respectively). Rankings were sensitive to whether presentation style or content information were weighted more heavily when calculating transcript similarity.
Conclusion
Natural language processing enabled the ranking of case presentations on the basis of how similar they were to a reference presentation. Although additional work is needed to convert these rankings, and underlying similarity scores, into actionable feedback for trainees, these methods may support new tools for improving medical education.
Collapse