1
|
Boal MWE, Anastasiou D, Tesfai F, Ghamrawi W, Mazomenos E, Curtis N, Collins JW, Sridhar A, Kelly J, Stoyanov D, Francis NK. Evaluation of objective tools and artificial intelligence in robotic surgery technical skills assessment: a systematic review. Br J Surg 2024; 111:znad331. [PMID: 37951600 PMCID: PMC10771126 DOI: 10.1093/bjs/znad331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/18/2023] [Accepted: 09/19/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND There is a need to standardize training in robotic surgery, including objective assessment for accreditation. This systematic review aimed to identify objective tools for technical skills assessment, providing evaluation statuses to guide research and inform implementation into training curricula. METHODS A systematic literature search was conducted in accordance with the PRISMA guidelines. Ovid Embase/Medline, PubMed and Web of Science were searched. Inclusion criterion: robotic surgery technical skills tools. Exclusion criteria: non-technical, laparoscopy or open skills only. Manual tools and automated performance metrics (APMs) were analysed using Messick's concept of validity and the Oxford Centre of Evidence-Based Medicine (OCEBM) Levels of Evidence and Recommendation (LoR). A bespoke tool analysed artificial intelligence (AI) studies. The Modified Downs-Black checklist was used to assess risk of bias. RESULTS Two hundred and forty-seven studies were analysed, identifying: 8 global rating scales, 26 procedure-/task-specific tools, 3 main error-based methods, 10 simulators, 28 studies analysing APMs and 53 AI studies. Global Evaluative Assessment of Robotic Skills and the da Vinci Skills Simulator were the most evaluated tools at LoR 1 (OCEBM). Three procedure-specific tools, 3 error-based methods and 1 non-simulator APMs reached LoR 2. AI models estimated outcomes (skill or clinical), demonstrating superior accuracy rates in the laboratory with 60 per cent of methods reporting accuracies over 90 per cent, compared to real surgery ranging from 67 to 100 per cent. CONCLUSIONS Manual and automated assessment tools for robotic surgery are not well validated and require further evaluation before use in accreditation processes.PROSPERO: registration ID CRD42022304901.
Collapse
Affiliation(s)
- Matthew W E Boal
- The Griffin Institute, Northwick Park & St Marks’ Hospital, London, UK
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
| | - Dimitrios Anastasiou
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - Freweini Tesfai
- The Griffin Institute, Northwick Park & St Marks’ Hospital, London, UK
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
| | - Walaa Ghamrawi
- The Griffin Institute, Northwick Park & St Marks’ Hospital, London, UK
| | - Evangelos Mazomenos
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - Nathan Curtis
- Department of General Surgey, Dorset County Hospital NHS Foundation Trust, Dorchester, UK
| | - Justin W Collins
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - Ashwin Sridhar
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - John Kelly
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - Danail Stoyanov
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Computer Science, UCL, London, UK
| | - Nader K Francis
- The Griffin Institute, Northwick Park & St Marks’ Hospital, London, UK
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- Yeovil District Hospital, Somerset Foundation NHS Trust, Yeovil, Somerset, UK
| |
Collapse
|
2
|
Hira S, Singh D, Kim TS, Gupta S, Hager G, Sikder S, Vedula SS. Video-based assessment of intraoperative surgical skill. Int J Comput Assist Radiol Surg 2022; 17:1801-1811. [PMID: 35635639 PMCID: PMC10323985 DOI: 10.1007/s11548-022-02681-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 05/11/2022] [Indexed: 11/27/2022]
Abstract
PURPOSE Surgeons' skill in the operating room is a major determinant of patient outcomes. Assessment of surgeons' skill is necessary to improve patient outcomes and quality of care through surgical training and coaching. Methods for video-based assessment of surgical skill can provide objective and efficient tools for surgeons. Our work introduces a new method based on attention mechanisms and provides a comprehensive comparative analysis of state-of-the-art methods for video-based assessment of surgical skill in the operating room. METHODS Using a dataset of 99 videos of capsulorhexis, a critical step in cataract surgery, we evaluated image feature-based methods and two deep learning methods to assess skill using RGB videos. In the first method, we predict instrument tips as keypoints and predict surgical skill using temporal convolutional neural networks. In the second method, we propose a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network), both of which are augmented by visual attention mechanisms. We computed the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and predictive values through fivefold cross-validation. RESULTS To classify a binary skill label (expert vs. novice), the range of AUC estimates was 0.49 (95% confidence interval; CI = 0.37 to 0.60) to 0.76 (95% CI = 0.66 to 0.85) for image feature-based methods. The sensitivity and specificity were consistently high for none of the methods. For the deep learning methods, the AUC was 0.79 (95% CI = 0.70 to 0.88) using keypoints alone, 0.78 (95% CI = 0.69 to 0.88) and 0.75 (95% CI = 0.65 to 0.85) with and without attention mechanisms, respectively. CONCLUSION Deep learning methods are necessary for video-based assessment of surgical skill in the operating room. Attention mechanisms improved discrimination ability of the network. Our findings should be evaluated for external validity in other datasets.
Collapse
Affiliation(s)
- Sanchit Hira
- Laboratory for Computational Sensing & Robotics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
| | - Digvijay Singh
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
| | - Tae Soo Kim
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
| | - Shobhit Gupta
- Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
| | - Gregory Hager
- Laboratory for Computational Sensing & Robotics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
| | - Shameema Sikder
- Laboratory for Computational Sensing & Robotics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, 615 N. Wolfe Street, Baltimore, MD, 21287, USA
| | - S Swaroop Vedula
- Malone Center for Engineering in Healthcare, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA.
| |
Collapse
|
3
|
Zheng Y, Ershad M, Fey AM. Toward Correcting Anxious Movements Using Haptic Cues on the Da Vinci Surgical Robot. PROCEEDINGS OF THE ... IEEE/RAS-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL ROBOTICS AND BIOMECHATRONICS. IEEE/RAS-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL ROBOTICS AND BIOMECHATRONICS 2022; 2022:10.1109/biorob52689.2022.9925380. [PMID: 37408769 PMCID: PMC10321328 DOI: 10.1109/biorob52689.2022.9925380] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Surgical movements have an important stylistic quality that individuals without formal surgical training can use to identify expertise. In our prior work, we sought to characterize quantitative metrics associated with surgical style and developed a near-real-time detection framework for stylistic deficiencies using a commercial haptic device. In this paper, we implement bimanual stylistic detection on the da Vinci Research Kit (dVRK) and focus on one stylistic deficiency, "Anxious", which may describe movements under stressful conditions. Our goal is to potentially correct these "Anxious" movements by exploring the effects of three different types of haptic cues (time-variant spring, damper, and spring-damper feedback) on performance during a basic surgical training task using the da Vinci Research Kit (dVRK). Eight subjects were recruited to complete peg transfer tasks using a randomized order of haptic cues and with baseline trials between each task. Overall, all cues lead to a significant improvement over baseline economy of volume and time-variant spring haptic cues lead to significant improvements in reducing the classified "Anxious" movements and also corresponded with significantly lower path length and economy of volume for the non-dominant hand. This work is the first step in evaluating our stylistic detection model on a surgical robot and could lay the groundwork for future methods to actively and adaptively reduce the negative effect of stress in the operating room.
Collapse
Affiliation(s)
- Yi Zheng
- Department of Mechanical Engineering, the University of Texas at Austin, 204 East Dean Keeton Street, Austin, TX 78712, USA
| | - Marzieh Ershad
- Intuitive Surgical, Inc., 1020 Kifer Road Sunnyvale, CA 94086
| | - Ann Majewicz Fey
- Department of Mechanical Engineering, the University of Texas at Austin, 204 East Dean Keeton Street, Austin, TX 78712, USA
- Department of Surgery, UT South-western Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390, USA
| |
Collapse
|
4
|
Olsen RG, Genét MF, Konge L, Bjerrum F. Crowdsourced assessment of surgical skills: A systematic review. Am J Surg 2022; 224:1229-1237. [DOI: 10.1016/j.amjsurg.2022.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/30/2022] [Accepted: 07/14/2022] [Indexed: 11/25/2022]
|
5
|
Maier-Hein L, Eisenmann M, Sarikaya D, März K, Collins T, Malpani A, Fallert J, Feussner H, Giannarou S, Mascagni P, Nakawala H, Park A, Pugh C, Stoyanov D, Vedula SS, Cleary K, Fichtinger G, Forestier G, Gibaud B, Grantcharov T, Hashizume M, Heckmann-Nötzel D, Kenngott HG, Kikinis R, Mündermann L, Navab N, Onogur S, Roß T, Sznitman R, Taylor RH, Tizabi MD, Wagner M, Hager GD, Neumuth T, Padoy N, Collins J, Gockel I, Goedeke J, Hashimoto DA, Joyeux L, Lam K, Leff DR, Madani A, Marcus HJ, Meireles O, Seitel A, Teber D, Ückert F, Müller-Stich BP, Jannin P, Speidel S. Surgical data science - from concepts toward clinical translation. Med Image Anal 2022; 76:102306. [PMID: 34879287 PMCID: PMC9135051 DOI: 10.1016/j.media.2021.102306] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 11/03/2021] [Accepted: 11/08/2021] [Indexed: 02/06/2023]
Abstract
Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.
Collapse
Affiliation(s)
- Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany; Medical Faculty, Heidelberg University, Heidelberg, Germany.
| | - Matthias Eisenmann
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Duygu Sarikaya
- Department of Computer Engineering, Faculty of Engineering, Gazi University, Ankara, Turkey; LTSI, Inserm UMR 1099, University of Rennes 1, Rennes, France
| | - Keno März
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Anand Malpani
- The Malone Center for Engineering in Healthcare, The Johns Hopkins University, Baltimore, Maryland, USA
| | | | - Hubertus Feussner
- Department of Surgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
| | - Stamatia Giannarou
- The Hamlyn Centre for Robotic Surgery, Imperial College London, London, United Kingdom
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, Strasbourg, France
| | | | - Adrian Park
- Department of Surgery, Anne Arundel Health System, Annapolis, Maryland, USA; Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Carla Pugh
- Department of Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Swaroop S Vedula
- The Malone Center for Engineering in Healthcare, The Johns Hopkins University, Baltimore, Maryland, USA
| | - Kevin Cleary
- The Sheikh Zayed Institute for Pediatric Surgical Innovation, Children's National Hospital, Washington, D.C., USA
| | | | - Germain Forestier
- L'Institut de Recherche en Informatique, Mathématiques, Automatique et Signal (IRIMAS), University of Haute-Alsace, Mulhouse, France; Faculty of Information Technology, Monash University, Clayton, Victoria, Australia
| | - Bernard Gibaud
- LTSI, Inserm UMR 1099, University of Rennes 1, Rennes, France
| | - Teodor Grantcharov
- University of Toronto, Toronto, Ontario, Canada; The Li Ka Shing Knowledge Institute of St. Michael's Hospital, Toronto, Ontario, Canada
| | - Makoto Hashizume
- Kyushu University, Fukuoka, Japan; Kitakyushu Koga Hospital, Fukuoka, Japan
| | - Doreen Heckmann-Nötzel
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hannes G Kenngott
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Ron Kikinis
- Department of Radiology, Brigham and Women's Hospital, and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Nassir Navab
- Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany; Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, USA
| | - Sinan Onogur
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tobias Roß
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany; Medical Faculty, Heidelberg University, Heidelberg, Germany
| | - Raphael Sznitman
- ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| | - Russell H Taylor
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, USA
| | - Minu D Tizabi
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Martin Wagner
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Gregory D Hager
- The Malone Center for Engineering in Healthcare, The Johns Hopkins University, Baltimore, Maryland, USA; Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, USA
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, Leipzig, Germany
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, Strasbourg, France
| | - Justin Collins
- Division of Surgery and Interventional Science, University College London, London, United Kingdom
| | - Ines Gockel
- Department of Visceral, Transplant, Thoracic and Vascular Surgery, Leipzig University Hospital, Leipzig, Germany
| | - Jan Goedeke
- Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-University, Munich, Germany
| | - Daniel A Hashimoto
- University Hospitals Cleveland Medical Center, Case Western Reserve University, Cleveland, Ohio, USA; Surgical AI and Innovation Laboratory, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Luc Joyeux
- My FetUZ Fetal Research Center, Department of Development and Regeneration, Biomedical Sciences, KU Leuven, Leuven, Belgium; Center for Surgical Technologies, Faculty of Medicine, KU Leuven, Leuven, Belgium; Department of Obstetrics and Gynecology, Division Woman and Child, Fetal Medicine Unit, University Hospitals Leuven, Leuven, Belgium; Michael E. DeBakey Department of Surgery, Texas Children's Hospital and Baylor College of Medicine, Houston, Texas, USA
| | - Kyle Lam
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Daniel R Leff
- Department of BioSurgery and Surgical Technology, Imperial College London, London, United Kingdom; Hamlyn Centre for Robotic Surgery, Imperial College London, London, United Kingdom; Breast Unit, Imperial Healthcare NHS Trust, London, United Kingdom
| | - Amin Madani
- Department of Surgery, University Health Network, Toronto, Ontario, Canada
| | - Hani J Marcus
- National Hospital for Neurology and Neurosurgery, and UCL Queen Square Institute of Neurology, London, United Kingdom
| | - Ozanan Meireles
- Massachusetts General Hospital, and Harvard Medical School, Boston, Massachusetts, USA
| | - Alexander Seitel
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dogu Teber
- Department of Urology, City Hospital Karlsruhe, Karlsruhe, Germany
| | - Frank Ückert
- Institute for Applied Medical Informatics, Hamburg University Hospital, Hamburg, Germany
| | - Beat P Müller-Stich
- Department for General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Pierre Jannin
- LTSI, Inserm UMR 1099, University of Rennes 1, Rennes, France
| | - Stefanie Speidel
- Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC) Dresden, Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| |
Collapse
|
6
|
Bilgic E, Gorgy A, Yang A, Cwintal M, Ranjbar H, Kahla K, Reddy D, Li K, Ozturk H, Zimmermann E, Quaiattini A, Abbasgholizadeh-Rahimi S, Poenaru D, Harley JM. Exploring the roles of artificial intelligence in surgical education: A scoping review. Am J Surg 2021; 224:205-216. [PMID: 34865736 DOI: 10.1016/j.amjsurg.2021.11.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 11/19/2021] [Accepted: 11/22/2021] [Indexed: 01/02/2023]
Abstract
BACKGROUND Technology-enhanced teaching and learning, including Artificial Intelligence (AI) applications, has started to evolve in surgical education. Hence, the purpose of this scoping review is to explore the current and future roles of AI in surgical education. METHODS Nine bibliographic databases were searched from January 2010 to January 2021. Full-text articles were included if they focused on AI in surgical education. RESULTS Out of 14,008 unique sources of evidence, 93 were included. Out of 93, 84 were conducted in the simulation setting, and 89 targeted technical skills. Fifty-six studies focused on skills assessment/classification, and 36 used multiple AI techniques. Also, increasing sample size, having balanced data, and using AI to provide feedback were major future directions mentioned by authors. CONCLUSIONS AI can help optimize the education of trainees and our results can help educators and researchers identify areas that need further investigation.
Collapse
Affiliation(s)
- Elif Bilgic
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Andrew Gorgy
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Alison Yang
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Michelle Cwintal
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Hamed Ranjbar
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Kalin Kahla
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Dheeksha Reddy
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Kexin Li
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Helin Ozturk
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Eric Zimmermann
- Department of Surgery, McGill University, Montreal, Quebec, Canada
| | - Andrea Quaiattini
- Schulich Library of Physical Sciences, Life Sciences, and Engineering, McGill University, Canada; Institute of Health Sciences Education, McGill University, Montreal, Quebec, Canada
| | - Samira Abbasgholizadeh-Rahimi
- Department of Family Medicine, McGill University, Montreal, Quebec, Canada; Department of Electrical and Computer Engineering, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada; Mila Quebec AI Institute, Montreal, Canada
| | - Dan Poenaru
- Institute of Health Sciences Education, McGill University, Montreal, Quebec, Canada; Department of Pediatric Surgery, McGill University, Canada
| | - Jason M Harley
- Department of Surgery, McGill University, Montreal, Quebec, Canada; Institute of Health Sciences Education, McGill University, Montreal, Quebec, Canada; Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada; Steinberg Centre for Simulation and Interactive Learning, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
7
|
Explaining a model predicting quality of surgical practice: a first presentation to and review by clinical experts. Int J Comput Assist Radiol Surg 2021; 16:2009-2019. [PMID: 34143373 DOI: 10.1007/s11548-021-02422-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 05/27/2021] [Indexed: 10/21/2022]
Abstract
PURPOSE Surgical Data Science (SDS) is an emerging research domain offering data-driven answers to challenges encountered by clinicians during training and practice. We previously developed a framework to assess quality of practice based on two aspects: exposure of the surgical scene (ESS) and the surgeon's profile of practice (SPP). Here, we wished to investigate the clinical relevance of the parameters learned by this model by (1) interpreting these parameters and identifying associated representative video samples and (2) presenting this information to surgeons in the form of a video-enhanced questionnaire. To our knowledge, this is the first approach in the field of SDS for laparoscopy linking the choices made by a machine learning model predicting surgical quality to clinical expertise. METHOD Spatial features and quality of practice scores extracted from labeled and segmented frames in 30 laparoscopic videos were used to predict the ESS and the SPP. The relationships between the inputs and outputs of the model were then analyzed and translated into meaningful sentences (statements, e.g., "To optimize the ESS, it is very important to correctly handle the spleen"). Representative video clips illustrating these statements were semi-automatically identified. Eleven statements and video clips were used in a survey presented to six experienced digestive surgeons to gather their opinions on the algorithmic analyses. RESULTS All but one of the surgeons agreed with the proposed questionnaire overall. On average, surgeons agreed with 7/11 statements. CONCLUSION This proof-of-concept study provides preliminary validation of our model which has a high potential for use to analyze and understand surgical practices.
Collapse
|
8
|
Ward TM, Fer DM, Ban Y, Rosman G, Meireles OR, Hashimoto DA. Challenges in surgical video annotation. Comput Assist Surg (Abingdon) 2021; 26:58-68. [PMID: 34126014 DOI: 10.1080/24699322.2021.1937320] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Annotation of surgical video is important for establishing ground truth in surgical data science endeavors that involve computer vision. With the growth of the field over the last decade, several challenges have been identified in annotating spatial, temporal, and clinical elements of surgical video as well as challenges in selecting annotators. In reviewing current challenges, we provide suggestions on opportunities for improvement and possible next steps to enable translation of surgical data science efforts in surgical video analysis to clinical research and practice.
Collapse
Affiliation(s)
- Thomas M Ward
- Surgical AI & Innovation Laboratory, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Danyal M Fer
- Department of Surgery, University of California San Francisco East Bay, Hayward, CA, USA
| | - Yutong Ban
- Surgical AI & Innovation Laboratory, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA.,Distributed Robotics Laboratory, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Guy Rosman
- Surgical AI & Innovation Laboratory, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA.,Distributed Robotics Laboratory, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ozanan R Meireles
- Surgical AI & Innovation Laboratory, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel A Hashimoto
- Surgical AI & Innovation Laboratory, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
9
|
Gardenier J, Underwood J, Weary DM, Clark CEF. Pairwise comparison locomotion scoring for dairy cattle. J Dairy Sci 2021; 104:6185-6193. [PMID: 33663829 DOI: 10.3168/jds.2020-19356] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 12/10/2020] [Indexed: 11/19/2022]
Abstract
Conventional locomotion scoring is a subjective, absolute, and discrete assessment of locomotion. Here we assess pairwise comparison scoring to improve upon the limited intra- and interobserver consistency typical of conventional locomotion scoring. Five observers performed conventional 4-level locomotion scoring using 50 video recordings of dairy cattle, and also assessed 90 pairs of videos (composed from the same 50 recordings) using relative pairwise scoring. Intra- and interobserver consistency of pairwise scores [intraobserver: percentage agreement (PA) = 82%, κ = 0.63; interobserver: PA = 79%, κ = 0.57] were greater than of 4-level absolute scores (intraobserver: PA = 72%, κw = 0.74; interobserver: PA = 56%, κw = 0.59). Pairwise scores were scaled with an optimization method to obtain the position of the 50 recordings on a continuous locomotion scale. These continuous locomotion scores (CLS) were compared with the conventional mean absolute visual locomotion scores (VLS). Correlation between CLS and VLS was strong (τ = 0.69), and consistency between binarized CLS and binarized VLS was high (PA = 84%, κ = 0.66 for threshold VLS ≥1). Just noticeable difference (JND) for locomotion scoring was 0.3 on a 4-level scale ranging from 0 to 3. Pairwise scoring and scaling had the scoring consistency of binary absolute scoring with finer continuous granularity than 4-level absolute scoring. The pairwise scoring method, and associated scaling, offer a more consistent and informative alternative to conventional absolute multilevel locomotion scoring.
Collapse
Affiliation(s)
- John Gardenier
- Australian Centre for Field Robotics, Faculty of Engineering, the University of Sydney, Darlington, NSW 2006, Australia.
| | - James Underwood
- Australian Centre for Field Robotics, Faculty of Engineering, the University of Sydney, Darlington, NSW 2006, Australia
| | - D M Weary
- Animal Welfare Program, Faculty of Agricultural Science, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4
| | - C E F Clark
- Livestock Production and Welfare Group, School of Life and Environmental Sciences, Faculty of Science, the University of Sydney, Camden, NSW 2570, Australia
| |
Collapse
|
10
|
St John-Matthews J, Robinson L, Martin F, Newton PM, Grant AJ. Crowdsourcing: A novel tool to elicit the student voice in the curriculum design process for an undergraduate diagnostic radiography degree programme. Radiography (Lond) 2020; 26 Suppl 2:S54-S61. [PMID: 32507591 DOI: 10.1016/j.radi.2020.04.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/13/2020] [Accepted: 04/30/2020] [Indexed: 11/26/2022]
Abstract
INTRODUCTION Stakeholder participation in healthcare curriculum design is an important aspect of higher education with stakeholders including students, staff members, clinical partners, healthcare organisations, patients and members of the public. Significantly, student co-creation, of the curriculum, has become increasingly important. Yet there is limited research which addresses how to engage this group in design processes. METHODS This paper represents the first phase of a three stage action research spiral whereby the authors evaluated the use of a novel tool for curriculum design processes, anonymised crowdsourcing. This initial phase was open to all students enrolled on an undergraduate diagnostic radiography programme in the UK. To confirm the reliability of the crowdsource design an established eight point crowdsourcing verification tool was applied. RESULTS Twenty-three unique ideas were generated by participants, 40 comments made and 173 votes cast. Inductive analysis of the comments generated five themes. These included: the role of technology enhanced learning; simulation activities; patient focused curriculum; mental wealth (resilience) authentic assessment approaches. An evaluation of those who had and had not engaged highlighted areas of improvement for the administration of the second and third iterations which will include a wider pool of participants. CONCLUSION This study from a single programme offers lessons for others wishing to adopt and develop this approach elsewhere. IMPLICATIONS FOR PRACTICE Several ideas elicited by the crowdsource have been considered by the curriculum design team and will be implemented in the 2020 curriculum thus demonstrating the impact on local education practice of this research approach.
Collapse
Affiliation(s)
- J St John-Matthews
- Department of Allied Health Professions, Faculty of Health and Applied Sciences, University of the West of England, Bristol, BS16 1DD, UK.
| | - L Robinson
- Research in Health Professions Education, Swansea University Medical School, Swansea, Wales, SA2 8PP, UK
| | | | - P M Newton
- Research in Health Professions Education, Swansea University Medical School, Swansea, Wales, SA2 8PP, UK
| | - A J Grant
- Research in Health Professions Education, Swansea University Medical School, Swansea, Wales, SA2 8PP, UK
| |
Collapse
|
11
|
Anh NX, Nataraja RM, Chauhan S. Towards near real-time assessment of surgical skills: A comparison of feature extraction techniques. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 187:105234. [PMID: 31794913 DOI: 10.1016/j.cmpb.2019.105234] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 10/31/2019] [Accepted: 11/18/2019] [Indexed: 05/22/2023]
Abstract
BACKGROUND AND OBJECTIVE Surgical skill assessment aims to objectively evaluate and provide constructive feedback for trainee surgeons. Conventional methods require direct observation with assessment from surgical experts which are both unscalable and subjective. The recent involvement of surgical robotic systems in the operating room has facilitated the ability of automated evaluation of the expertise level of trainees for certain representative maneuvers by using machine learning for motion analysis. The features extraction technique plays a critical role in such an automated surgical skill assessment system. METHODS We present a direct comparison of nine well-known feature extraction techniques which are statistical features, principal component analysis, discrete Fourier/Cosine transform, codebook, deep learning models and auto-encoder for automated surgical skills evaluation. Towards near real-time evaluation, we also investigate the effect of time interval on the classification accuracy and efficiency. RESULTS We validate the study on the benchmark robotic surgical training JIGSAWS dataset. An accuracy of 95.63, 90.17 and 90.26% by the Principal Component Analysis and 96.84, 92.75 and 95.36% by the deep Convolutional Neural Network for suturing, knot tying and needle passing, respectively, highlighted the effectiveness of these two techniques in extracting the most discriminative features among different surgical skill levels. CONCLUSIONS This study contributes toward the development of an online automated and efficient surgical skills assessment technique.
Collapse
Affiliation(s)
- Nguyen Xuan Anh
- Department of Mechanical and Aerospace Engineering, Monash University, Melbourne, Australia
| | - Ramesh Mark Nataraja
- Department of Surgical Simulation, Monash Children's Hospital, Melbourne, Australia
| | - Sunita Chauhan
- Department of Mechanical and Aerospace Engineering, Monash University, Melbourne, Australia.
| |
Collapse
|
12
|
Crowdsourcing in health and medical research: a systematic review. Infect Dis Poverty 2020; 9:8. [PMID: 31959234 PMCID: PMC6971908 DOI: 10.1186/s40249-020-0622-9] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 01/07/2020] [Indexed: 12/31/2022] Open
Abstract
Background Crowdsourcing is used increasingly in health and medical research. Crowdsourcing is the process of aggregating crowd wisdom to solve a problem. The purpose of this systematic review is to summarize quantitative evidence on crowdsourcing to improve health. Methods We followed Cochrane systematic review guidance and systematically searched seven databases up to September 4th 2019. Studies were included if they reported on crowdsourcing and related to health or medicine. Studies were excluded if recruitment was the only use of crowdsourcing. We determined the level of evidence associated with review findings using the GRADE approach. Results We screened 3508 citations, accessed 362 articles, and included 188 studies. Ninety-six studies examined effectiveness, 127 examined feasibility, and 37 examined cost. The most common purposes were to evaluate surgical skills (17 studies), to create sexual health messages (seven studies), and to provide layperson cardio-pulmonary resuscitation (CPR) out-of-hospital (six studies). Seventeen observational studies used crowdsourcing to evaluate surgical skills, finding that crowdsourcing evaluation was as effective as expert evaluation (low quality). Four studies used a challenge contest to solicit human immunodeficiency virus (HIV) testing promotion materials and increase HIV testing rates (moderate quality), and two of the four studies found this approach saved money. Three studies suggested that an interactive technology system increased rates of layperson initiated CPR out-of-hospital (moderate quality). However, studies analyzing crowdsourcing to evaluate surgical skills and layperson-initiated CPR were only from high-income countries. Five studies examined crowdsourcing to inform artificial intelligence projects, most often related to annotation of medical data. Crowdsourcing was evaluated using different outcomes, limiting the extent to which studies could be pooled. Conclusions Crowdsourcing has been used to improve health in many settings. Although crowdsourcing is effective at improving behavioral outcomes, more research is needed to understand effects on clinical outcomes and costs. More research is needed on crowdsourcing as a tool to develop artificial intelligence systems in medicine. Trial registration PROSPERO: CRD42017052835. December 27, 2016.
Collapse
|
13
|
St John-Matthews J, Newton PM, Grant AJ, Robinson L. Crowdsourcing in health professions education: What radiography educators can learn from other disciplines. Radiography (Lond) 2019; 25:164-169. [PMID: 30955690 DOI: 10.1016/j.radi.2018.11.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 11/15/2018] [Accepted: 11/20/2018] [Indexed: 10/27/2022]
Abstract
OBJECTIVES Crowdsourcing works through an institution outsourcing a function normally performed by an employee or group of individuals. Within a crowdsource users, known as the crowd, form a community who voluntarily undertake a task which involves the pooling of knowledge resources. A literature review was undertaken to identify how the tool is being used in health professions education, and potential for use in radiography education. KEY FINDINGS 17 papers were returned. Literature identified was assessed against an established crowdsourcing definition. Reviewing these yielded four themes for discussion: student selection procedures, lesson planning, teaching materials and assessment. CONCLUSION Crowdsourcing is associated with innovative activities through collective solution seeking via a large network of users. It is increasingly being adopted in healthcare training and maybe transferable to educational activities within the field of radiography education.
Collapse
Affiliation(s)
- J St John-Matthews
- Department of Allied Health Professions, Faculty of Health and Applied Sciences, University of the West of England, Bristol, BS16 1DD, UK.
| | - P M Newton
- Research in Health Professions Education, Swansea University Medical School, Swansea, Wales, SA2 8PP, UK
| | - A J Grant
- Research in Health Professions Education, Swansea University Medical School, Swansea, Wales, SA2 8PP, UK
| | - L Robinson
- School of Health Science, Frederick Road Campus, University of Salford, Allerton Building, M6 6PU, UK
| |
Collapse
|
14
|
Ershad M, Rege R, Majewicz Fey A. Automatic and near real-time stylistic behavior assessment in robotic surgery. Int J Comput Assist Radiol Surg 2019; 14:635-643. [PMID: 30779023 DOI: 10.1007/s11548-019-01920-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Accepted: 01/28/2019] [Indexed: 12/20/2022]
Abstract
PURPOSE Automatic skill evaluation is of great importance in surgical robotic training. Extensive research has been done to evaluate surgical skill, and a variety of quantitative metrics have been proposed. However, these methods primarily use expert selected features which may not capture latent information in movement data. In addition, these features are calculated over the entire task time and are provided to the user after the completion of the task. Thus, these quantitative metrics do not provide users with information on how to modify their movements to improve performance in real time. This study focuses on automatic stylistic behavior recognition that has the potential to be implemented in near real time. METHODS We propose a sparse coding framework for automatic stylistic behavior recognition in short time intervals using only position data from the hands, wrist, elbow, and shoulder. A codebook is built for each stylistic adjective using the positive and negative labels provided for each trial through crowd sourcing. Sparse code coefficients are obtained for short time intervals (0.25 s) in a trial using this codebook. A support vector machine classifier is trained and validated through tenfold cross-validation using the sparse codes from the training set. RESULTS The results indicate that the proposed dictionary learning method is able to assess stylistic behavior performance in near real time using user joint position data with improved accuracy compared to using PCA features or raw data. CONCLUSION The possibility to automatically evaluate a trainee's style of movement in short time intervals could provide the user with online customized feedback and thus improve performance during surgical tasks.
Collapse
Affiliation(s)
- M Ershad
- Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
| | - R Rege
- Department of Surgery, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Ann Majewicz Fey
- Department of Surgery, UT Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Mechanical Engineering, University of Texas at Dallas, Richardson, TX, 75080, USA
| |
Collapse
|
15
|
A computer vision technique for automated assessment of surgical performance using surgeons’ console-feed videos. Int J Comput Assist Radiol Surg 2018; 14:697-707. [DOI: 10.1007/s11548-018-1881-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/24/2018] [Indexed: 11/26/2022]
|
16
|
Ershad M, Rege R, Fey AM. Automatic Surgical Skill Rating Using Stylistic Behavior Components. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:1829-1832. [PMID: 30440751 DOI: 10.1109/embc.2018.8512593] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A gold standard in surgical skill rating and evaluation is direct observation, which a group of experts rate trainees based on a likert scale, by observing their performance during a surgical task. This method is time and resource intensive. To alleviate this burden, many studies have focused on automatic surgical skill assessment; however, the metrics suggested by the literature for automatic evaluation do not capture the stylistic behavior of the user. In addition very few studies focus on automatic rating of surgical skills based on available likert scales. In a previous study we presented a stylistic behavior lexicon for surgical skill. In this study we evaluate the lexicon's ability to automatically rate robotic surgical skill, based on the 6 domains in the Global Evaluative Assessment of Robotic Skills (GEARS). 14 subjects of different skill levels performed two surgical tasks on da Vinci surgical simulator. Different measurements were acquired as subjects performed the tasks, including limb (hand and arm) kinematics and joint (shoulder, elbow, wrist) positions. Posture videos of the subjects performing the task, as well as videos of the task being performed were viewed and rated by faculty experts based on the 6 domains in GEARS. The paired videos were also rated via crowd-sourcing based on our stylistic behavior lexicon. Two separate regression learner models, one using the sensor measurements and the other using crowd ratings for our proposed lexicon, were trained for each domain in GEARS. The results indicate that the scores predicted from both prediction models are in agreement with the gold standard faculty ratings.
Collapse
|
17
|
Créquit P, Mansouri G, Benchoufi M, Vivot A, Ravaud P. Mapping of Crowdsourcing in Health: Systematic Review. J Med Internet Res 2018; 20:e187. [PMID: 29764795 PMCID: PMC5974463 DOI: 10.2196/jmir.9330] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 02/10/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background Crowdsourcing involves obtaining ideas, needed services, or content by soliciting Web-based contributions from a crowd. The 4 types of crowdsourced tasks (problem solving, data processing, surveillance or monitoring, and surveying) can be applied in the 3 categories of health (promotion, research, and care). Objective This study aimed to map the different applications of crowdsourcing in health to assess the fields of health that are using crowdsourcing and the crowdsourced tasks used. We also describe the logistics of crowdsourcing and the characteristics of crowd workers. Methods MEDLINE, EMBASE, and ClinicalTrials.gov were searched for available reports from inception to March 30, 2016, with no restriction on language or publication status. Results We identified 202 relevant studies that used crowdsourcing, including 9 randomized controlled trials, of which only one had posted results at ClinicalTrials.gov. Crowdsourcing was used in health promotion (91/202, 45.0%), research (73/202, 36.1%), and care (38/202, 18.8%). The 4 most frequent areas of application were public health (67/202, 33.2%), psychiatry (32/202, 15.8%), surgery (22/202, 10.9%), and oncology (14/202, 6.9%). Half of the reports (99/202, 49.0%) referred to data processing, 34.6% (70/202) referred to surveying, 10.4% (21/202) referred to surveillance or monitoring, and 5.9% (12/202) referred to problem-solving. Labor market platforms (eg, Amazon Mechanical Turk) were used in most studies (190/202, 94%). The crowd workers’ characteristics were poorly reported, and crowdsourcing logistics were missing from two-thirds of the reports. When reported, the median size of the crowd was 424 (first and third quartiles: 167-802); crowd workers’ median age was 34 years (32-36). Crowd workers were mainly recruited nationally, particularly in the United States. For many studies (58.9%, 119/202), previous experience in crowdsourcing was required, and passing a qualification test or training was seldom needed (11.9% of studies; 24/202). For half of the studies, monetary incentives were mentioned, with mainly less than US $1 to perform the task. The time needed to perform the task was mostly less than 10 min (58.9% of studies; 119/202). Data quality validation was used in 54/202 studies (26.7%), mainly by attention check questions or by replicating the task with several crowd workers. Conclusions The use of crowdsourcing, which allows access to a large pool of participants as well as saving time in data collection, lowering costs, and speeding up innovations, is increasing in health promotion, research, and care. However, the description of crowdsourcing logistics and crowd workers’ characteristics is frequently missing in study reports and needs to be precisely reported to better interpret the study findings and replicate them.
Collapse
Affiliation(s)
- Perrine Créquit
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France
| | - Ghizlène Mansouri
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France
| | - Mehdi Benchoufi
- Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Alexandre Vivot
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Philippe Ravaud
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France.,Department of Epidemiology, Columbia University, Mailman School of Public Health, New York, NY, United States
| |
Collapse
|
18
|
Ershad M, Rege R, Fey AM. Meaningful Assessment of Robotic Surgical Style using the Wisdom of Crowds. Int J Comput Assist Radiol Surg 2018; 13:1037-1048. [PMID: 29574500 DOI: 10.1007/s11548-018-1738-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 03/15/2018] [Indexed: 01/22/2023]
Abstract
OBJECTIVE Quantitative assessment of surgical skills is an important aspect of surgical training; however, the proposed metrics are sometimes difficult to interpret and may not capture the stylistic characteristics that define expertise. This study proposes a methodology for evaluating the surgical skill, based on metrics associated with stylistic adjectives, and evaluates the ability of this method to differentiate expertise levels. METHODS We recruited subjects from different expertise levels to perform training tasks on a surgical simulator. A lexicon of contrasting adjective pairs, based on important skills for robotic surgery, inspired by the global evaluative assessment of robotic skills tool, was developed. To validate the use of stylistic adjectives for surgical skill assessment, posture videos of the subjects performing the task, as well as videos of the task were rated by crowd-workers. Metrics associated with each adjective were found using kinematic and physiological measurements through correlation with the crowd-sourced adjective assignment ratings. To evaluate the chosen metrics' ability in distinguishing expertise levels, two classifiers were trained and tested using these metrics. RESULTS Crowd-assignment ratings for all adjectives were significantly correlated with expertise levels. The results indicate that naive Bayes classifier performs the best, with an accuracy of [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] when classifying into four, three, and two levels of expertise, respectively. CONCLUSION The proposed method is effective at mapping understandable adjectives of expertise to the stylistic movements and physiological response of trainees.
Collapse
Affiliation(s)
- M Ershad
- Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
| | - R Rege
- Department of Surgery, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - A Majewicz Fey
- Department of Surgery, UT Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Mechanical Engineering, University of Texas at Dallas, Richardson, TX, 75080, USA
| |
Collapse
|
19
|
Dai JC. Crowdsourcing in Surgical Skills Acquisition: A Developing Technology in Surgical Education. J Grad Med Educ 2017; 9:697-705. [PMID: 29270257 PMCID: PMC5734322 DOI: 10.4300/jgme-d-17-00322.1] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 07/26/2017] [Accepted: 08/07/2017] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND The application of crowdsourcing to surgical education is a recent phenomenon and adds to increasing demands on surgical residency training. The efficacy, range, and scope of this technology for surgical education remains incompletely defined. OBJECTIVE A systematic review was performed using the PubMed database of English-language literature on crowdsourced evaluation of surgical technical tasks up to April 2017. METHODS Articles were reviewed, abstracted, and analyzed, and were assessed for quality using the Medical Education Research Study Quality Instrument (MERSQI). Articles were evaluated with eligibility criteria for inclusion. Study information, performance task, subjects, evaluative standards, crowdworker compensation, time to response, and correlation between crowd and expert or standard evaluations were abstracted and analyzed. RESULTS Of 63 unique publications initially identified, 13 with MERSQI scores ranging from 10 to 13 (mean = 11.85) were included in the review. Overall, crowd and expert evaluations demonstrated good to excellent correlation across a wide range of tasks (Pearson's coefficient 0.59-0.95, Cronbach's alpha 0.32-0.92), with 1 exception being a study involving medical students. There was a wide range of reported interrater variability among experts. Nonexpert evaluation was consistently quicker than expert evaluation (ranging from 4.8 to 150.9 times faster), and was more cost effective. CONCLUSIONS Crowdsourced feedback appears to be comparable to expert feedback and is cost effective and efficient. Further work is needed to increase consistency in expert evaluations, to explore sources of discrepant assessments between surgeons and crowds, and to identify optimal populations and novel applications for this technology.
Collapse
|
20
|
|
21
|
Educational Crowdsourcing: Developing RadExam. J Am Coll Radiol 2017; 14:800-803. [DOI: 10.1016/j.jacr.2017.01.033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 01/24/2017] [Accepted: 01/30/2017] [Indexed: 11/19/2022]
|
22
|
Abstract
BACKGROUND Assessing surgical skill is critical in improving patient care while reducing medical errors, length of stay, and readmission rates. Crowdsourcing provides 1 potential method for accurately assessing this; only recently has crowdsourcing been studied as a valid way to provide feedback to surgeons. The results of such studies are explored. DATA SOURCES A systematic literature search was performed on PubMed to identify studies that have attempted to validate crowdsourcing as a method for assessing surgical skill. Through a combination of abstract screening and full-length review, 9 studies that met the inclusion criteria were reviewed. CONCLUSIONS Crowdsourcing has been validated as an important way to provide feedback for surgical skill. It has been demonstrated to be effective in both dry-lab and live surgery, for a variety of tasks and methods. However, more studies must be performed to ensure that crowdsourcing can provide quality feedback in a wider variety of scenarios.
Collapse
|
23
|
Vedula SS, Ishii M, Hager GD. Objective Assessment of Surgical Technical Skill and Competency in the Operating Room. Annu Rev Biomed Eng 2017; 19:301-325. [PMID: 28375649 DOI: 10.1146/annurev-bioeng-071516-044435] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Training skillful and competent surgeons is critical to ensure high quality of care and to minimize disparities in access to effective care. Traditional models to train surgeons are being challenged by rapid advances in technology, an intensified patient-safety culture, and a need for value-driven health systems. Simultaneously, technological developments are enabling capture and analysis of large amounts of complex surgical data. These developments are motivating a "surgical data science" approach to objective computer-aided technical skill evaluation (OCASE-T) for scalable, accurate assessment; individualized feedback; and automated coaching. We define the problem space for OCASE-T and summarize 45 publications representing recent research in this domain. We find that most studies on OCASE-T are simulation based; very few are in the operating room. The algorithms and validation methodologies used for OCASE-T are highly varied; there is no uniform consensus. Future research should emphasize competency assessment in the operating room, validation against patient outcomes, and effectiveness for surgical training.
Collapse
Affiliation(s)
- S Swaroop Vedula
- Malone Center for Engineering in Healthcare, Department of Computer Science, The Johns Hopkins University Whiting School of Engineering, Baltimore, Maryland 21218;
| | - Masaru Ishii
- Department of Otolaryngology-Head and Neck Surgery, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21287
| | - Gregory D Hager
- Malone Center for Engineering in Healthcare, Department of Computer Science, The Johns Hopkins University Whiting School of Engineering, Baltimore, Maryland 21218;
| |
Collapse
|
24
|
Yeung C, Carrillo B, Pope V, Hosseinpour S, Gerstle JT, Azzie G. Video assessment of laparoscopic skills by novices and experts: implications for surgical education. Surg Endosc 2017; 31:3883-3889. [DOI: 10.1007/s00464-017-5417-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 01/20/2017] [Indexed: 11/30/2022]
|
25
|
Vedula SS, Malpani A, Ahmidi N, Khudanpur S, Hager G, Chen CCG. Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment. JOURNAL OF SURGICAL EDUCATION 2016; 73:482-489. [PMID: 26896147 DOI: 10.1016/j.jsurg.2015.11.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Revised: 09/21/2015] [Accepted: 11/08/2015] [Indexed: 06/05/2023]
Abstract
OBJECTIVE Task-level metrics of time and motion efficiency are valid measures of surgical technical skill. Metrics may be computed for segments (maneuvers and gestures) within a task after hierarchical task decomposition. Our objective was to compare task-level and segment (maneuver and gesture)-level metrics for surgical technical skill assessment. DESIGN Our analyses include predictive modeling using data from a prospective cohort study. We used a hierarchical semantic vocabulary to segment a simple surgical task of passing a needle across an incision and tying a surgeon's knot into maneuvers and gestures. We computed time, path length, and movements for the task, maneuvers, and gestures using tool motion data. We fit logistic regression models to predict experience-based skill using the quantitative metrics. We compared the area under a receiver operating characteristic curve (AUC) for task-level, maneuver-level, and gesture-level models. SETTING Robotic surgical skills training laboratory. PARTICIPANTS In total, 4 faculty surgeons with experience in robotic surgery and 14 trainee surgeons with no or minimal experience in robotic surgery. RESULTS Experts performed the task in shorter time (49.74s; 95% CI = 43.27-56.21 vs. 81.97; 95% CI = 69.71-94.22), with shorter path length (1.63m; 95% CI = 1.49-1.76 vs. 2.23; 95% CI = 1.91-2.56), and with fewer movements (429.25; 95% CI = 383.80-474.70 vs. 728.69; 95% CI = 631.84-825.54) than novices. Experts differed from novices on metrics for individual maneuvers and gestures. The AUCs were 0.79; 95% CI = 0.62-0.97 for task-level models, 0.78; 95% CI = 0.6-0.96 for maneuver-level models, and 0.7; 95% CI = 0.44-0.97 for gesture-level models. There was no statistically significant difference in AUC between task-level and maneuver-level (p = 0.7) or gesture-level models (p = 0.17). CONCLUSIONS Maneuver-level and gesture-level metrics are discriminative of surgical skill and can be used to provide targeted feedback to surgical trainees.
Collapse
Affiliation(s)
- S Swaroop Vedula
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland.
| | - Anand Malpani
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Narges Ahmidi
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Sanjeev Khudanpur
- Department of Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Gregory Hager
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Chi Chiung Grace Chen
- Department of Gynecology & Obstetrics, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
26
|
Gao Y, Vedula SS, Lee GI, Lee MR, Khudanpur S, Hager GD. Query-by-example surgical activity detection. Int J Comput Assist Radiol Surg 2016; 11:987-96. [PMID: 27072835 DOI: 10.1007/s11548-016-1386-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 03/14/2016] [Indexed: 10/22/2022]
Abstract
PURPOSE Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. METHODS The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. RESULTS We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. CONCLUSION We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall across different complexity datasets and experimental conditions.
Collapse
Affiliation(s)
- Yixin Gao
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - S Swaroop Vedula
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Gyusung I Lee
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Mija R Lee
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Sanjeev Khudanpur
- Department of Electrical and Computer Engineering, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Gregory D Hager
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| |
Collapse
|