1
|
Gustafsson A, Rölfing JD, Palm H, Viberg B, Grimstrup S, Konge L. Setting proficiency standards for simulation-based mastery learning of short antegrade femoral nail osteosynthesis: a multicenter study. Acta Orthop 2024; 95:275-281. [PMID: 38819402 PMCID: PMC11141712 DOI: 10.2340/17453674.2024.40812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 05/05/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND AND PURPOSE Orthopedic trainees frequently perform short antegrade femoral nail osteosynthesis of trochanteric fractures, but virtual reality simulation-based training (SBT) with haptic feedback has been unavailable. We explored a novel simulator, with the aim of gathering validity evidence for an embedded test and setting a credible pass/fail standard allowing trainees to practice to proficiency. PATIENTS AND METHODS The research, conducted from May to September 2020 across 3 Danish simulation centers, utilized the Swemac TraumaVision simulator for short antegrade femoral nail osteosynthesis. The validation process adhered to Messick's framework, covering all 5 sources of validity evidence. Participants included novice groups, categorized by training to plateau (n = 14) or to mastery (n = 10), and experts (n = 9), focusing on their performance metrics and training duration. RESULTS The novices in the plateau group and experts had hands-on training for 77 (95% confidence interval [CI] 59-95) and 52 (CI 36-69) minutes while the plateau test score, defined as the average of the last 4 scores, was 75% (CI 65-86) and 96% (CI 94-98) respectively. The pass/fail standard was established at the average expert plateau test score of 96%. All novices in the mastery group could meet this standard and interestingly without increased hands-on training time (65 [CI 46-84] minutes). CONCLUSION Our study provides supporting validity evidence from all sources of Messick's framework for a simulation-based test in short antegrade nail osteosynthesis of intertrochanteric hip fracture and establishes a defensible pass/fail standard for mastery learning of SBT. Novices who practiced using mastery learning were able to reach the pre-defined pass/fail standard and outperformed novices without a set goal for external motivation.
Collapse
Affiliation(s)
- Amandus Gustafsson
- Orthopaedic Department, Slagelse Hospital, Region Zealand, Slagelse; Copenhagen Academy for Medical Education and Simulation, Rigshospitalet, Copenhagen; Department of Clinical Medicine, Faculty of Health Science, University of Copenhagen, Copenhagen.
| | - Jan D Rölfing
- Department of Orthopaedics, Aarhus University Hospital, Aarhus; MidtSim, Corporate HR, Central Denmark Region, Aarhus
| | - Henrik Palm
- Orthopaedic Department, Bispebjerg Hospital, Region H, Copenhagen
| | - Bjarke Viberg
- Orthopaedic Department, Odense Hospital, Region Syd, Odense, Denmark
| | - Søren Grimstrup
- Copenhagen Academy for Medical Education and Simulation, Rigshospitalet, Copenhagen
| | - Lars Konge
- Copenhagen Academy for Medical Education and Simulation, Rigshospitalet, Copenhagen; Department of Clinical Medicine, Faculty of Health Science, University of Copenhagen, Copenhagen
| |
Collapse
|
2
|
Haque TF, Knudsen JE, You J, Hui A, Djaladat H, Ma R, Cen S, Goldenberg M, Hung AJ. Competency in Robotic Surgery: Standard Setting for Robotic Suturing Using Objective Assessment and Expert Evaluation. JOURNAL OF SURGICAL EDUCATION 2024; 81:422-430. [PMID: 38290967 PMCID: PMC10923136 DOI: 10.1016/j.jsurg.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/17/2023] [Accepted: 12/12/2023] [Indexed: 02/01/2024]
Abstract
OBJECTIVE Surgical skill assessment tools such as the End-to-End Assessment of Suturing Expertise (EASE) can differentiate a surgeon's experience level. In this simulation-based study, we define a competency benchmark for intraoperative robotic suturing using EASE as a validated measure of performance. DESIGN Participants conducted a dry-lab vesicourethral anastomosis (VUA) exercise. Videos were each independently scored by 2 trained, blinded reviewers using EASE. Inter-rater reliability was measured with prevalence-adjusted bias-adjusted Kappa (PABAK) using 2 example videos. All videos were reviewed by an expert surgeon, who determined if the suturing skills exhibited were at a competency level expected for residency graduation (pass or fail). The Contrasting Group (CG) method was then used to set a pass/fail score at the intercept of the pass and fail cohorts' EASE score distributions. SETTING Keck School of Medicine, University of Southern California. PARTICIPANTS Twenty-six participants: 8 medical students, 8 junior residents (PGY 1-2), 7 senior residents (PGY 3-5) and 3 attending urologists. RESULTS After 1 round of consensus-building, average PABAK across EASE subskills was 0.90 (Range 0.67-1.0). The CG method produced a competency benchmark EASE score of >35/39, with a pass rate of 10/26 (38%); 27% were deemed competent by expert evaluation. False positives and negatives were defined as medical students who passed and attendings who failed the assessment, respectively. This pass/fail score produced no false positives or negatives, and fewer JR than SR were considered competent by both the expert and CG benchmark. CONCLUSIONS Using an absolute standard setting method, competency scores were set to identify trainees who could competently execute a standardized dry-lab robotic suturing exercise. This standard can be used for high stakes decisions regarding a trainee's technical readiness for independent practice. Future work includes validation of this standard in the clinical environment through correlation with clinical outcomes.
Collapse
Affiliation(s)
- Taseen F Haque
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, California
| | - J Everett Knudsen
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, California
| | - Jonathan You
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, California
| | - Alvin Hui
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, California
| | - Hooman Djaladat
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, California
| | - Runzhuo Ma
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, California
| | - Steven Cen
- Department Radiology, University of Southern California, Los Angeles, California
| | - Mitchell Goldenberg
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, Los Angeles, California
| | - Andrew J Hung
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, California.
| |
Collapse
|
3
|
Goldenberg MG. Surgical Artificial Intelligence in Urology: Educational Applications. Urol Clin North Am 2024; 51:105-115. [PMID: 37945096 DOI: 10.1016/j.ucl.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Surgical education has seen immense change recently. Increased demand for iterative evaluation of trainees from medical school to independent practice has led to the generation of an overwhelming amount of data related to an individual's competency. Artificial intelligence has been proposed as a solution to automate and standardize the ability of stakeholders to assess the technical and nontechnical abilities of a surgical trainee. In both the simulation and clinical environments, evidence supports the use of machine learning algorithms to both evaluate trainee skill and provide real-time and automated feedback, enabling a shortened learning curve for many key procedural skills and ensuring patient safety.
Collapse
Affiliation(s)
- Mitchell G Goldenberg
- Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California, 1441 Eastlake Avenue, Suite 7416, Los Angeles, CA 90033, USA.
| |
Collapse
|
4
|
Nayahangan LJ, Svendsen MBS, Bodtger U, Rahman N, Maskell N, Sidhu JS, Lawaetz J, Clementsen PF, Konge L. Assessment of competence in local anaesthetic thoracoscopy: development and validity investigation of a new assessment tool. J Thorac Dis 2021; 13:3998-4007. [PMID: 34422330 PMCID: PMC8339737 DOI: 10.21037/jtd-20-3560] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 02/24/2021] [Indexed: 12/03/2022]
Abstract
Background The aims of the study were to develop an assessment tool in local anaesthetic thoracoscopy (LAT), investigate validity evidence, and establish a pass/fail standard. Methods Validity evidence for the assessment tool was gathered using the unified Messick framework. The tool was developed by five experts in respiratory medicine and medical education. Doctors with varying experience performed two consecutive procedures in a standardized, simulation-based setting using a newly developed thorax/lung silicone model. Performances were video-recorded and assessed by four expert raters using the new tool. Contrasting groups’ method was used to set a pass/fail standard. Results Nine novices and 8 experienced participants were included, generating 34 recorded performances and 136 expert assessments. The tool had a high internal consistency (Cronbach’s alpha =0.94) and high inter-rater reliability (Cronbach’s alpha =0.91). The total item score significantly correlated with the global score (rs=0.86, P<0.001). Participants’ first performance correlated to second performance (test-retest reliability) with a Pearson’s r of 0.93, P<0.001. Generalisability (G) study showed a G-coefficient of 0.92 and decision (D) study estimated that one performance assessed by two raters or four performances assessed by one rater are needed to reach an acceptable reliability, i.e., G-coefficient >0.80. The tool was able to discriminate between the two groups in both performances: experienced mean score =30.8±4.2; novice mean score =15.8±2.3, P<0.001. Pass/fail standard was set at 22 points. Conclusions The newly developed assessment tool showed solid evidence of validity and can be used to ensure competence in LAT.
Collapse
Affiliation(s)
- Leizl Joy Nayahangan
- Copenhagen Academy for Medical Education and Simulation, Centre for HR and Education, The Capital Region of Denmark, Copenhagen, Denmark
| | - Morten Bo Søndergaard Svendsen
- Copenhagen Academy for Medical Education and Simulation, Centre for HR and Education, The Capital Region of Denmark, Copenhagen, Denmark
| | - Uffe Bodtger
- Department of Internal and Respiratory Medicine, Zealand University Hospital, Roskilde, Denmark.,Department of Respiratory Medicine, Næstved Hospital, Næstved, Denmark
| | - Najib Rahman
- Nuffield Department of Medicine, Oxford Respiratory Trials Unit, University of Oxford, Oxford, UK
| | - Nick Maskell
- Academic Respiratory Unit, School of Clinical Sciences, University of Bristol, Bristol, UK
| | | | - Jonathan Lawaetz
- Copenhagen Academy for Medical Education and Simulation, Centre for HR and Education, The Capital Region of Denmark, Copenhagen, Denmark.,Department of Vascular Surgery, Rigshospitalet, Copenhagen, Denmark
| | - Paul Frost Clementsen
- Copenhagen Academy for Medical Education and Simulation, Centre for HR and Education, The Capital Region of Denmark, Copenhagen, Denmark.,Department of Internal and Respiratory Medicine, Zealand University Hospital, Roskilde, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Lars Konge
- Copenhagen Academy for Medical Education and Simulation, Centre for HR and Education, The Capital Region of Denmark, Copenhagen, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Hennings LI, Sørensen JL, Hybscmann J, Strandbygaard J. Tools for measuring technical skills during gynaecologic surgery: a scoping review. BMC MEDICAL EDUCATION 2021; 21:402. [PMID: 34311735 PMCID: PMC8314568 DOI: 10.1186/s12909-021-02790-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 06/09/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Standardised assessment is key to structured surgical training. Currently, there is no consensus on which surgical assessment tool to use in live gynaecologic surgery. The purpose of this review is to identify assessment tools measuring technical skills in gynaecologic surgery and evaluate the measurement characteristics of each tool. METHOD We utilized the scoping review methodology and searched PubMed, Medline, Embase and Cochrane. Inclusion criteria were studies that analysed assessment tools in live gynaecologic surgery. Kane's validity argument was applied to evaluate the assessment tools in the included studies. RESULTS Eight studies out of the 544 identified fulfilled the inclusion criteria. The assessment tools were categorised as global rating scales, global and procedure rating scales combined, procedure-specific rating scales or as a non-procedure-specific error assessment tool. CONCLUSION This scoping review presents the current different tools for observational assessment of technical skills in intraoperative, gynaecologic surgery. This scoping review can serve as a guide for surgical educators who want to apply a scale or a specific tool in surgical assessment.
Collapse
Affiliation(s)
| | - Jette Led Sørensen
- Juliane Marie Centre for children, women and reproduction, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Jane Hybscmann
- Juliane Marie Centre for children, women and reproduction, Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
6
|
Samaratunga R, Johnson L, Gatzidis C, Swain I, Wainwright T, Middleton R. A review of participant recruitment transparency for sound validation of hip surgery simulators: a novel umbrella approach. J Med Eng Technol 2021; 45:434-456. [PMID: 34016011 DOI: 10.1080/03091902.2021.1921868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Malposition of implants is associated with complications, higher wear and increased revision rates in total hip replacement (THR) along with surgeon inexperience. Training THR residents to reach expert proficiency is affected by the high cost and resource limitations of traditional training techniques. Research in extended reality (XR) technologies can overcome such barriers. These offer a platform for learning, objective skill-monitoring and, potentially, for automated certification. Prior to their incorporation into curricula however, thorough validation must be undertaken. As validity is heavily dependent on the participants recruited, there is a need to review, scrutinise and define recruitment criteria in the absence of pre-defined standards, for sound simulator validation. A systematic review on PubMed and IEEE databases was conducted. Training simulator validation research in fracture, arthroscopy and arthroplasty relating to the hip was included. 46 validation studies were reviewed. It was observed that there was no uniformity in reporting or recruitment criteria, rendering cross-comparison challenging. This work developed Umbrella categories to help prioritise recruitment, and has formulated a detailed template of fields and guidelines for reporting criteria so that, in future, research may come to a consensus as to recruitment criteria for a hip "expert" or "novice".
Collapse
Affiliation(s)
| | - Layla Johnson
- Faculty of Science and Technology, Bournemouth University, Poole, UK
| | - Christos Gatzidis
- Faculty of Science and Technology, Bournemouth University, Poole, UK
| | - Ian Swain
- Faculty of Science and Technology, Bournemouth University, Poole, UK.,Orthopaedic Research Institute, Bournemouth University, UK
| | - Thomas Wainwright
- Orthopaedic Research Institute, Bournemouth University, UK.,University Hospitals Dorset NHS Foundation Trust, UK
| | - Robert Middleton
- Orthopaedic Research Institute, Bournemouth University, UK.,University Hospitals Dorset NHS Foundation Trust, UK
| |
Collapse
|
7
|
Louridas M, de Montbrun S. Competency-Based Education in Minimally Invasive and Robotic Colorectal Surgery. Clin Colon Rectal Surg 2021; 34:155-162. [PMID: 33814997 DOI: 10.1055/s-0040-1718683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Minimally invasive and robotic techniques have become increasingly implemented into surgical practice and are now an essential part of the foundational skills of training colorectal surgeons. Over the past 5 years there has been a shift in the surgical educational paradigm toward competency-based education (CBE). CBE recognizes that trainees learn at different rates but regardless, are required to meet a competent threshold of performance prior to independent practice. Thus, CBE attempts to replace the traditional "time" endpoint of training with "performance." Although conceptually sensible, implementing CBE has proven challenging. This article will define competence, outline appropriate assessment tools to assess technical skill, and review the literature on the number of cases required to achieve competence in colorectal procedures while outlining the barriers to implementing CBE.
Collapse
Affiliation(s)
- Marisa Louridas
- Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
8
|
Simulation-based VATS resection of the five lung lobes: a technical skills test. Surg Endosc 2021; 36:1234-1242. [PMID: 33660123 DOI: 10.1007/s00464-021-08392-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 02/09/2021] [Indexed: 11/12/2022]
Abstract
BACKGROUND Video-Assisted Thoracoscopic Surgery (VATS) lobectomy is an advanced procedure and to maximize patient safety it is important to ensure the competency of thoracic surgeons before performing the procedure. The objective of this study was to investigate validity evidence for a virtual reality simulator-based test including multiple lobes of the lungs. METHOD VATS experts from the department of Cardiothoracic Surgery at Rigshospitalet, Copenhagen, Denmark, worked with Surgical Science (Gothenburg, Sweden) to develop VATS lobectomy modules for the LapSim® virtual reality simulator covering all five lobes of the lungs. Participants with varying experience in VATS were recruited and classified as either novice, intermediate, or experienced surgeons. Each participant performed VATS lobectomy on the simulator for three different randomly chosen lobes. Nine predefined simulator metrics were automatically recorded on the simulator. RESULTS Twenty-two novice, ten intermediate, and nine experienced surgeons performed the test resulting in a total of 123 lobectomies. Analysis of Variances (ANOVA) found significant differences between the three groups for parameters: blood loss (p < 0.001), procedure time (p < 0.001), and total instrument path length (p = 0.03). These three metrics demonstrated high internal consistency and significant test-retest reliability was found between each of them. Relevant pass/fail levels were established for each of the three metrics, 541 ml, 30 min, and 71 m, respectively. CONCLUSION This study provides validity evidence for a simulator-based test of VATS lobectomy competence including multiple lobes of the lungs. The test can be used to ensure basic competence at the end of a simulation-based training program for thoracic surgery trainees.
Collapse
|
9
|
Mackenzie CF, Elster EA, Bowyer MW, Sevdalis N. Scoping Evidence Review on Training and Skills Assessment for Open Emergency Surgery. JOURNAL OF SURGICAL EDUCATION 2020; 77:1211-1226. [PMID: 32224033 DOI: 10.1016/j.jsurg.2020.02.029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 02/18/2020] [Accepted: 02/27/2020] [Indexed: 06/10/2023]
Abstract
OBJECTIVE Scope evidence on technical performance metrics for open emergency surgery. Identify surgical performance metrics and procedures used in trauma training courses. DESIGN Structured literature searches of electronic databases were conducted from January 2010 to December 2019 to identify systematic reviews of tools to measure surgical skills employed in vascular or trauma surgery evaluation and training. SETTING AND PARTICIPANTS Faculty of Shock Trauma Anesthesiology Research Center, University of Maryland School of Medicine, Uniformed Services University of Health Sciences, Bethesda, Maryland and Implementation Science, King's College, London. RESULTS The evidence from 21 systematic reviews including over 54,000 subjects enrolled into over 840 eligible studies, identified that the Objective Structured Assessment of Technical Skill was used for elective surgery not for emergency trauma and vascular control surgery procedures. The Individual Procedure Score (IPS), used to evaluate emergency trauma procedures performed before and after training, distinguished performance of residents from experts and practicing surgeons. IPS predicted surgeons who make critical errors and need remediation interventions. No metrics showed Kirkpatrick's Level 4 evidence of technical skills training benefit to emergency surgery outcomes. CONCLUSIONS Expert benchmarks, errors, complication rates, task completion time, task-specific checklists, global rating scales, Objective Structured Assessment of Technical Skills, and IPS were found to identify surgeons, at all levels of seniority, who are in need of remediation of technical skills for open surgical hemorrhage control. Large-scale, multicenter studies are needed to evaluate any benefit of trauma technical skills training on patient outcomes.
Collapse
Affiliation(s)
| | - Eric A Elster
- The Uniformed Services University of Health Sciences and the Walter Reed National Military Medical Center, Bethesda, Maryland
| | - Mark W Bowyer
- The Uniformed Services University of Health Sciences and the Walter Reed National Military Medical Center, Bethesda, Maryland
| | - Nick Sevdalis
- Center for Implementation Science, King's College, London, United Kingdom
| |
Collapse
|
10
|
Bourque J, Skinner H, Dupré J, Bacchus M, Ainslie M, Ma IWY, Cole G. Performance of the Ebel standard-setting method in spring 2019 Royal College of Physicians and Surgeons of Canada internal medicine certification examination consisted of multiple-choice questions. JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS 2020; 17:12. [PMID: 32306708 PMCID: PMC7242791 DOI: 10.3352/jeehp.2020.17.12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 04/20/2020] [Indexed: 05/26/2023]
Abstract
PURPOSE It aimed to know the performance of the Ebel standard-setting method in in spring 2019 Royal College of Physicians and Surgeons of Canada internal medicine certification examination consisted of multiple-choice questions. Specifically followings were searched: the inter-rater agreement; the correlation between Ebel scores and item facility indices; raters' knowledge of correct answers' impact on the Ebel score; and affection of rater's specialty on theinter-rater agreement and Ebel scores. METHODS Data were drawn from a Royal College of Physicians and Surgeons of Canada certification exam. Ebel's method was applied to 203 MCQs by 49 raters. Facility indices came from 194 candidates. We computed Fleiss' kappa and the Pearson correlation between Ebel scores and item facility indices. We investigated differences in the Ebel score (correct answers provided or not) and differences between internists and other specialists with t-tests. RESULTS Kappa was below 0.15 for facility and relevance. The correlation between Ebel scores and facility indices was low when correct answers were provided and negligible when they were not. The Ebel score was the same, whether the correct answers were provided or not. Inter-rater agreement and Ebel scores was not differentbetween internists and other specialists. CONCLUSION Inter-rater agreement and correlations between item Ebel scores and facility indices wee consistently low; furthermore, raters' knowledge of correct answer and rater specialty had no effect on Ebel scores in the present setting.
Collapse
Affiliation(s)
- Jimmy Bourque
- Exam Quality and Analytics Unit, Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
| | - Haley Skinner
- Exam Quality and Analytics Unit, Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
| | - Jonathan Dupré
- Exam Quality and Analytics Unit, Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
| | - Maria Bacchus
- Department of Medicine, University of Calgary, Calgary, AB, Canada
| | - Martha Ainslie
- Department of Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Irene W. Y. Ma
- Department of Medicine, University of Calgary, Calgary, AB, Canada
| | - Gary Cole
- Exam Quality and Analytics Unit, Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
| |
Collapse
|
11
|
Ensuring Competency in Open Aortic Aneurysm Repair - Development and Validation of a New Assessment Tool. Eur J Vasc Endovasc Surg 2020; 59:767-774. [PMID: 32089508 DOI: 10.1016/j.ejvs.2020.01.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 12/11/2019] [Accepted: 01/16/2020] [Indexed: 01/12/2023]
Abstract
OBJECTIVE The aims of this study were to develop a procedure specific assessment tool for open abdominal aortic aneurysm (AAA) repair, gather validity evidence for the tool and establish a pass/fail standard. METHODS Validity was studied based on the contemporary framework by Messick. Three vascular surgeons experienced in open AAA repair and an expert in assessment and validation within medical education developed the OPEn aortic aneurysm Repair Assessment of Technical Expertise (OPERATE) tool. Vascular surgeons with varying experiences performed open AAA repair in a standardised simulation based setting. All procedures were video recorded with the faces anonymised and scored independently by three experts in a mutual blinded setup. The Angoff standard setting method was used to establish a credible pass/fail score. RESULTS Sixteen novices and nine experienced open vascular surgeons were enrolled. The OPERATE tool achieved high internal consistency (Cronbach's alpha .92) and inter-rater reliability (Cronbach's alpha .95) and was able to differentiate novices and experienced surgeons with mean scores (higher score is better) of 13.4 ± 12 and 25.6 ± 6, respectively (p = .01). The pass/fail score was set high (27.7). One novice passed the test while six experienced surgeons failed. CONCLUSION Validity evidence was established for the newly developed OPERATE tool and was able to differentiate between novices and experienced surgeons providing a good argument that this tool can be used for both formative and summative assessment in a simulation based environment. The high pass/fail score emphasises the need for novices to train in a simulation based environment up to a certain level of competency before apprenticeship training in the clinical environment under the tutelage of a supervisor. Familiarisation with the simulation equipment must be ensured before performance is assessed as reflected by the low scores in the experienced group's first attempt.
Collapse
|
12
|
Goldenberg M, Ordon M, Honey JRD, Andonian S, Lee JY. Objective Assessment and Standard Setting for Basic Flexible Ureterorenoscopy Skills Among Urology Trainees Using Simulation-Based Methods. J Endourol 2020; 34:495-501. [PMID: 32059622 DOI: 10.1089/end.2019.0626] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Objective: To objectively assess the performance of graduating urology residents performing flexible ureterorenoscopy (fURS) using a simulation-based model and to set an entrustability standard or benchmark for use across the educational spectrum. Methods: Chief urology residents and attending endourologists performed a standardized fURS task (ureterorenoscopy and repositioning of stones) using a Boston Scientific© Lithovue ureteroscope on a Cook Medical© URS model. All performances were video-recorded and blindly scored by both endourology experts and crowd-workers (C-SATS) using the Ureteroscopic Global Rating Scale, plus an overall entrustability score. Validity evidence supporting the scores was collected and categorized. The Borderline Group (BG) method was used to set absolute performance standards for the expert and crowdsourced ratings. Results: A total of 44 participants (40 chief residents, 4 faculties) completed testing. Eighty-three percent of participants had performed >50 fURS cases at the time of the study. Only 47.7% (mean score 12.6/20) and 61.4% (mean score 12.4/20) of participants were deemed "entrustable" by experts and crowd-workers, respectively. The BG method produced entrustability benchmarks of 11.8/20 for experts and 11.4/20 for crowd-worker ratings, resulting in pass rates of 56.9% and 61.4%. Conclusion: Using absolute standard setting methods, benchmark scores were set to identify trainees who could safely carry out fURS in the simulated setting. Only 60% of residents in our cohort were rated as entrustable. These findings support the use of benchmarks to earlier identify trainees requiring remediation.
Collapse
Affiliation(s)
- Mitchell Goldenberg
- Division of Urology, Department of Surgery, St. Michael's Hospital, University of Toronto, Toronto, Canada
| | - Michael Ordon
- Division of Urology, Department of Surgery, St. Michael's Hospital, University of Toronto, Toronto, Canada
| | - John R D'A Honey
- Division of Urology, Department of Surgery, St. Michael's Hospital, University of Toronto, Toronto, Canada
| | - Sero Andonian
- Division of Urology, McGill University Health Centre, McGill University, Quebec, Canada
| | - Jason Y Lee
- Division of Urology, Department of Surgery, University Health Network-Toronto General Hospital, University of Toronto, Toronto, Canada
| |
Collapse
|
13
|
Jørgensen M, Savran MM, Christakopoulos C, Bek T, Grauslund J, Toft PB, Ziemssen F, Konge L, Sørensen TL, Subhi Y. Development and validation of a multiple-choice questionnaire-based theoretical test in direct ophthalmoscopy. Acta Ophthalmol 2019; 97:700-706. [PMID: 30816642 DOI: 10.1111/aos.14065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 02/02/2019] [Indexed: 12/17/2022]
Abstract
PURPOSE Direct ophthalmoscopy can reveal systemic, neurologic and ophthalmic conditions, but is poorly mastered among young physicians. A theoretical test is needed to measure effect of educational interventions. We developed and gathered validity evidence for a multiple-choice questionnaire (MCQ)-based theoretical test in direct ophthalmoscopy. METHODS The MCQ was developed by interviewing experts. Then, validity evidence was evaluated using Messick's validity framework. Content was ensured by inviting the experts to contribute in a Delphi-like process. Response process was ensured by piloting and by streamlining all instructions. Then, the test was taken by ophthalmologists and by medical students without experience in direct ophthalmoscopy. Results were used to evaluate internal structure (item quality analysis and internal consistency), relations to other variables (correlation of test scores to experience level) and consequences (establishment of pass-fail score and the consequences of its use). RESULTS The first phase of the study yielded 100 MCQs. In second phase, we identified that 60 items fulfilled predefined relevance and item quality requirements. These items demonstrated very high internal consistency (Cronbach's alpha = 0.95), significantly discriminated medical students from specialists (p < 0.001, independent samples t-test) and the established pass-fail score of 50 (83%) correct answers resulted in no false positives (students passing) and no false negatives (specialists failing). A Decision study identified that sampling 15 items suffice for certification. CONCLUSION We developed and validated an MCQ-based theoretical test in direct ophthalmoscopy that enables an evidence-based approach to measuring, evaluating and certifying the theoretical knowledge necessary for direct ophthalmoscopy.
Collapse
Affiliation(s)
- Morten Jørgensen
- Department of Ophthalmology Zealand University Hospital Roskilde Denmark
- CAMES – Copenhagen Academy for Medical Education and Simulation Capital Region of Denmark Copenhagen Denmark
- Faculty of Health and Medical Sciences University of Copenhagen Copenhagen Denmark
| | - Mona Meral Savran
- CAMES – Copenhagen Academy for Medical Education and Simulation Capital Region of Denmark Copenhagen Denmark
- Department of Obstetrics and Gynaecology Copenhagen University Hospital Amager and Hvidovre Hvidovre Denmark
| | | | - Toke Bek
- Department of Ophthalmology Aarhus University Hospital Aarhus Denmark
| | - Jakob Grauslund
- Department of Ophthalmology Odense University Hospital Odense Denmark
- Department of Clinical Research Faculty of Healthy Science University of Southern Denmark Odense Denmark
| | - Peter Bjerre Toft
- Department of Ophthalmology, Rigshospitalet Copenhagen University Hospital Copenhagen Denmark
| | - Focke Ziemssen
- Center for Ophthalmology Eberhard‐Karl University Tübingen Tübingen Germany
| | - Lars Konge
- CAMES – Copenhagen Academy for Medical Education and Simulation Capital Region of Denmark Copenhagen Denmark
- Faculty of Health and Medical Sciences University of Copenhagen Copenhagen Denmark
| | - Torben Lykke Sørensen
- Department of Ophthalmology Zealand University Hospital Roskilde Denmark
- Faculty of Health and Medical Sciences University of Copenhagen Copenhagen Denmark
| | - Yousif Subhi
- Department of Ophthalmology Zealand University Hospital Roskilde Denmark
- Faculty of Health and Medical Sciences University of Copenhagen Copenhagen Denmark
| |
Collapse
|
14
|
Abstract
OBJECTIVE To identify current trends in the use of validity frameworks in surgical simulation, to provide an overview of the evidence behind the assessment of technical skills in all surgical specialties, and to present recommendations and guidelines for future validity studies. SUMMARY OF BACKGROUND DATA Validity evidence for assessment tools used in the evaluation of surgical performance is of paramount importance to ensure valid and reliable assessment of skills. METHODS We systematically reviewed the literature by searching 5 databases (PubMed, EMBASE, Web of Science, PsycINFO, and the Cochrane Library) for studies published from January 1, 2008, to July 10, 2017. We included original studies evaluating simulation-based assessments of health professionals in surgical specialties and extracted data on surgical specialty, simulator modality, participant characteristics, and the validity framework used. Data were synthesized qualitatively. RESULTS We identified 498 studies with a total of 18,312 participants. Publications involving validity assessments in surgical simulation more than doubled from 2008 to 2010 (∼30 studies/year) to 2014 to 2016 (∼70 to 90 studies/year). Only 6.6% of the studies used the recommended contemporary validity framework (Messick). The majority of studies used outdated frameworks such as face validity. Significant differences were identified across surgical specialties. The evaluated assessment tools were mostly inanimate or virtual reality simulation models. CONCLUSION An increasing number of studies have gathered validity evidence for simulation-based assessments in surgical specialties, but the use of outdated frameworks remains common. To address the current practice, this paper presents guidelines on how to use the contemporary validity framework when designing validity studies.
Collapse
|
15
|
Gustafsson A, Pedersen P, Rømer TB, Viberg B, Palm H, Konge L. Hip-fracture osteosynthesis training: exploring learning curves and setting proficiency standards. Acta Orthop 2019; 90:348-353. [PMID: 31017542 PMCID: PMC6718183 DOI: 10.1080/17453674.2019.1607111] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background and purpose - Orthopedic surgeons must be able to perform internal fixation of proximal femoral fractures early in their career, but inexperienced trainees prolong surgery and cause increased reoperation rates. Simulation-based virtual reality (VR) training has been proposed to overcome the initial steep part of the learning curve but it is unknown how much simulation training is necessary before trainees can progress to supervised surgery on patients. We determined characteristics of learning curves for novices and experts and a pass/fail mastery-learning standard for junior trainees was established. Methods - 38 first-year residents and 8 consultants specialized in orthopedic trauma surgery performed cannulated screws, Hansson pins, and sliding hip screw on the Swemac TraumaVision VR simulator. A previously validated test was used. The participants repeated the procedures until they reached their learning plateau. Results - The novices and the experts reached their learning plateau after an average of 169 minutes (95% CI 152-87) and 143 minutes (CI 109-177), respectively. Highest achieved scores were 92% (CI 91-93) for novices and 96% (CI 94-97) for experts. Plateau score, defined as the average of the 4 last scores, was 85% (CI 82-87) and 92% (CI 89-96) for the novices and the experts, respectively. Interpretation - Training time to reach plateau varied widely and it is paramount that simulation-based training continues to a predefined standard instead of ending after a fixed number of attempts or amount of time. A score of 92% comparable to the experts' plateau score could be used as a mastery learning pass/fail standard.
Collapse
Affiliation(s)
- Amandus Gustafsson
- Copenhagen Academy for Medical Education and Simulation
- Orthopedic Department, Slagelse Hospital, Region Zealand
| | | | | | | | - Henrik Palm
- Orthopedic Department, University Hospital Bispebjerg
| | - Lars Konge
- Copenhagen Academy for Medical Education and Simulation
- Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| |
Collapse
|
16
|
Savran MM, Hoffmann E, Konge L, Ottosen C, Larsen CR. Objective assessment of total laparoscopic hysterectomy: Development and validation of a feasible rating scale for formative and summative feedback. Eur J Obstet Gynecol Reprod Biol 2019; 237:74-78. [PMID: 31022656 DOI: 10.1016/j.ejogrb.2019.04.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 03/08/2019] [Accepted: 04/14/2019] [Indexed: 01/18/2023]
Abstract
OBJECTIVES The aims of the study were to develop and gather validity evidence for a feasible rating scale for formative and summative assessment of total laparoscopic hysterectomy in the operating theatre. STUDY DESIGN The study was a prospective observer-blinded cohort study. The rating scale was developed according to the generic format of Objective Structured Assessment of Technical Skills. We applied the contemporary framework of validity to examine validity evidence of the content, response process, internal structure, relationship to other variables, and consequences. Two experienced gynecologists constructed a preliminary version of the rating scale, which was reviewed by a multicentre team of experienced gynecologists in a modified Delphi process. The surgeons (beginners and experienced surgeons) were video recorded during live performance of total laparoscopic hysterectomies. Two blinded raters evaluated the performances independently using the rating scale. Internal consistency reliability and interrater reliability were calculated as measures of internal structure. The performances of the two groups were compared and a pass/fail score was set to show the consequences of the rating scale. RESULTS The content of the rating scale was defined during three Delphi rounds and upon agreement comprised of 12 items. Sixteen participants including 8 beginners and 8 experienced surgeons performed total laparoscopic hysterectomies. The internal consistency reliability of the items was 0.95 (Cronbach's alpha), and the interrater reliabilities (Intraclass Correlation Coefficient, absolute agreement) were 0.996 for one rater and 0.998 for two raters (P < 0.001 for all correlations). The beginners' mean performance score was 19.2 (SD 7.1) and the experienced surgeons' score was 36.4 (SD 3.9); the groups performed statistically significantly different (P < 0.001). The pass/fail score was 29.3 with no false positives and no false negatives. CONCLUSION With this study, a feasible rating scale for the objective assessment of total laparoscopic hysterectomy was developed with sound validity evidence. The rating scale is suitable for both formative and summative feedback in the commencement of surgical training in gynecology.
Collapse
Affiliation(s)
- Mona M Savran
- Department of Obstetrics and Gynecology, Copenhagen University Hospital Amager and Hvidovre, Hvidovre, Denmark.
| | - Elise Hoffmann
- Department of Obstetrics and Gynecology, Roskilde University Hospital, Roskilde, Denmark
| | - Lars Konge
- Copenhagen Academy for Medical Education and Simulation (CAMES), Center for HR, Copenhagen, Denmark
| | - Christian Ottosen
- Division of Obstetrics and Gynecology, Karolinska University Hospital, Stockholm, Sweden
| | - Christian Rifbjerg Larsen
- Robotic- and Minimal Invasive Surgery Research Unit, Department of Obstetrics and Gynecology, Copenhagen University Hospital Herlev, Denmark
| |
Collapse
|
17
|
Jensen JK, Wisborg T. Training and assessment of anaesthesiologist skills: The contrasting groups method and mastery learning levels. Acta Anaesthesiol Scand 2018; 62:742-743. [PMID: 29864214 DOI: 10.1111/aas.13143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- J K Jensen
- Department of Anesthesiology and Intensive Care, Odense University Hospital, Odense C, Denmark
| | - T Wisborg
- Anaesthesia and Critical Care Research Group, Faculty of Health Sciences, University of Tromsø, Tromsø, Norway
- Department of Anaesthesiology and Intensive Care, Finnmark Health Trust, Hammerfest Hospital, Hammerfest, Norway
- Norwegian National Advisory Unit on Trauma, Division of Emergencies and Critical Care, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
18
|
Jørgensen M, Konge L, Subhi Y. Contrasting groups' standard setting for consequences analysis in validity studies: reporting considerations. Adv Simul (Lond) 2018; 3:5. [PMID: 29556423 PMCID: PMC5845294 DOI: 10.1186/s41077-018-0064-7] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 02/15/2018] [Indexed: 12/14/2022] Open
Abstract
Background The contrasting groups’ standard setting method is commonly used for consequences analysis in validity studies for performance in medicine and surgery. The method identifies a pass/fail cut-off score, from which it is possible to determine false positives and false negatives based on observed numbers in each group. Since groups in validity studies are often small, e.g., due to a limited number of experts, these analyses are sensitive to outliers on the normal distribution curve. Methods We propose that these shortcomings can be addressed in a simple manner using the cumulative distribution function. Results We demonstrate considerable absolute differences between the observed false positives/negatives and the theoretical false positives/negatives. In addition, several important examples are given. Conclusions We propose that a better reporting strategy is to report theoretical false positives and false negatives together with the observed false positives and negatives, and we have developed an Excel sheet to facilitate such calculations. Trial registration Not relevant. Electronic supplementary material The online version of this article (10.1186/s41077-018-0064-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Morten Jørgensen
- 1Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,2Copenhagen Academy for Medical Education and Simulation, Capital Region of Denmark, Copenhagen, Denmark.,3Department of Ophthalmology, Zealand University Hospital, Vestermarksvej 23, DK-4000 Roskilde, Denmark
| | - Lars Konge
- 1Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,2Copenhagen Academy for Medical Education and Simulation, Capital Region of Denmark, Copenhagen, Denmark
| | - Yousif Subhi
- 1Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,3Department of Ophthalmology, Zealand University Hospital, Vestermarksvej 23, DK-4000 Roskilde, Denmark
| |
Collapse
|
19
|
Abstract
OBJECTIVE To describe a novel, outcome-based method of standard setting that differentiates between clinical outcomes rather than arbitrary educational goals. BACKGROUND Standard setting methods used in assessments of procedural skill are currently not evidence-driven or outcome-based. This represents a potential obstacle for the broad implementation of these evaluations in summative assessments such as certification and credentialing. METHODS The concept is based on deriving a receiver operating characteristic curve from a regression model that incorporates measures of intraoperative surgeon performance and confounding patient characteristics. This allows the creation of a performance standard that best predicts a clinically significant outcome of interest. The discovery cohort used to create the predictive model was derived from pilot data that used the Global Evaluative Assessment of Robotic Skill assessment tool to predict patient urinary continence 3 months following robotic-assisted radical prostatectomy. RESULTS A receiver operating characteristic curve with an area under the curve of 0.75 was created from predicted probability statistic generated by the predictive model. We chose a predicted probability of 0.35, based on an optimal tradeoff in sensitivity and specificity (Youden Index). Rearranging the regression equation, we determined the performance score required to predict a 35%, patient-adjusted probability of postoperative urinary incontinence. CONCLUSIONS This novel methodology is context, patient, and assessment-specific. Current standard setting methods do not account for the heterogeneity of the clinical environment. Workplace-based assessments in competency-based medical education require standards that are credible to the educator and the trainee. High-stakes assessments must ensure that surgeons have been evaluated to a standard that prioritizes satisfactory patient outcomes and safety.
Collapse
|
20
|
Basic Laparoscopic Skills Assessment Study: Validation and Standard Setting among Canadian Urology Trainees. J Urol 2017; 197:1539-1544. [DOI: 10.1016/j.juro.2016.12.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2016] [Indexed: 11/16/2022]
|
21
|
Thomsen ASS. Intraocular surgery - assessment and transfer of skills using a virtual-reality simulator. Acta Ophthalmol 2017. [PMID: 28626885 DOI: 10.1111/aos.13505] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ann Sofia Skou Thomsen
- Department of Ophthalmology; Rigshospitalet - Glostrup, University of Copenhagen; Copenhagen Denmark
| |
Collapse
|
22
|
Assessing Surgeons' Technical Performance and Effect on Outcomes: Still Early Days. Ann Surg 2017; 267:e75-e76. [PMID: 28230664 DOI: 10.1097/sla.0000000000002182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|