1
|
Huang SS, Lin YF, Huang AY, Lin JY, Yang YY, Lin SM, Lin WY, Huang PH, Chen TY, Yang SJH, Lirng JF, Chen CH. Using machine learning to identify key subject categories predicting the pre-clerkship and clerkship performance: 8-year cohort study. J Chin Med Assoc 2024; 87:609-614. [PMID: 38648194 DOI: 10.1097/jcma.0000000000001097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Medical students need to build a solid foundation of knowledge to become physicians. Clerkship is often considered the first transition point, and clerkship performance is essential for their development. We hope to identify subjects that could predict the clerkship performance, thus helping medical students learn more efficiently to achieve high clerkship performance. METHODS This cohort study collected background and academic data from medical students who graduated between 2011 and 2019. Prediction models were developed by machine learning techniques to identify the affecting features in predicting the pre-clerkship performance and clerkship performance. Following serial processes of data collection, data preprocessing before machine learning, and techniques and performance of machine learning, different machine learning models were trained and validated using the 10-fold cross-validation method. RESULTS Thirteen subjects from the pre-med stage and 10 subjects from the basic medical science stage with an area under the ROC curve (AUC) >0.7 for either pre-clerkship performance or clerkship performance were found. In each subject category, medical humanities and sociology in social science, chemistry, and physician scientist-related training in basic science, and pharmacology, immunology-microbiology, and histology in basic medical science have predictive abilities for clerkship performance above the top tertile. Using a machine learning technique based on random forest, the prediction model predicted clerkship performance with 95% accuracy and 88% AUC. CONCLUSION Clerkship performance was predicted by selected subjects or combination of different subject categories in the pre-med and basic medical science stages. The demonstrated predictive ability of subjects or categories in the medical program may facilitate students' understanding of how these subjects or categories of the medical program relate to their performance in the clerkship to enhance their preparedness for the clerkship.
Collapse
Affiliation(s)
- Shiau-Shian Huang
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Department of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Yu-Fan Lin
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Anna YuQing Huang
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Ji-Yang Lin
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Ying-Ying Yang
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Department of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Sheng-Min Lin
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Wen-Yu Lin
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Pin-Hsiang Huang
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Tzu-Yao Chen
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Stephen J H Yang
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Jiing-Feng Lirng
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Department of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Chen-Huan Chen
- Department of Medical Education, Clinical Innovation Center, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Department of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| |
Collapse
|
2
|
Thelen AE, George BC, Burkhardt JC, Khamees D, Haas MRC, Weinstein D. Improving Graduate Medical Education by Aggregating Data Across the Medical Education Continuum. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2024; 99:139-145. [PMID: 37406284 DOI: 10.1097/acm.0000000000005313] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
ABSTRACT Meaningful improvements to graduate medical education (GME) have been achieved in recent decades, yet many GME improvement pilots have been small trials without rigorous outcome measures and with limited generalizability. Thus, lack of access to large-scale data is a key barrier to generating empiric evidence to improve GME. In this article, the authors examine the potential of a national GME data infrastructure to improve GME, review the output of 2 national workshops on this topic, and propose a path toward achieving this goal.The authors envision a future where medical education is shaped by evidence from rigorous research powered by comprehensive, multi-institutional data. To achieve this goal, premedical education, undergraduate medical education, GME, and practicing physician data must be collected using a common data dictionary and standards and longitudinally linked using unique individual identifiers. The envisioned data infrastructure could provide a foundation for evidence-based decisions across all aspects of GME and help optimize the education of individual residents.Two workshops hosted by the National Academies of Sciences, Engineering, and Medicine Board on Health Care Services explored the prospect of better using GME data to improve education and its outcomes. There was broad consensus about the potential value of a longitudinal data infrastructure to improve GME. Significant obstacles were also noted.Suggested next steps outlined by the authors include producing a more complete inventory of data already being collected and managed by key medical education leadership organizations, pursuing a grass-roots data sharing pilot among GME-sponsoring institutions, and formulating the technical and governance frameworks needed to aggregate data across organizations.The power and potential of big data is evident across many disciplines, and the authors believe that harnessing the power of big data in GME is the best next step toward advancing evidence-based physician education.
Collapse
|
3
|
Baumgart DC. An intriguing vision for transatlantic collaborative health data use and artificial intelligence development. NPJ Digit Med 2024; 7:19. [PMID: 38263436 PMCID: PMC10806986 DOI: 10.1038/s41746-024-01005-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 01/03/2024] [Indexed: 01/25/2024] Open
Abstract
Our traditional approach to diagnosis, prognosis, and treatment, can no longer process and transform the enormous volume of information into therapeutic success, innovative discovery, and health economic performance. Precision health, i.e., the right treatment, for the right person, at the right time in the right place, is enabled through a learning health system, in which medicine and multidisciplinary science, economic viability, diverse culture, and empowered patient's preferences are digitally integrated and conceptually aligned for continuous improvement and maintenance of health, wellbeing, and equity. Artificial intelligence (AI) has been successfully evaluated in risk stratification, accurate diagnosis, and treatment allocation, and to prevent health disparities. There is one caveat though: dependable AI models need to be trained on population-representative, large and deep data sets by multidisciplinary and multinational teams to avoid developer, statistical and social bias. Such applications and models can neither be created nor validated with data at the country, let alone institutional level and require a new dimension of collaboration, a cultural change with the establishment of trust in a precompetitive space. The Data for Health (#DFH23) conference in Berlin and the Follow-Up Workshop at Harvard University in Boston hosted a representative group of stakeholders in society, academia, industry, and government. With the momentum #DFH23 created, the European Health Data Space (EHDS) as a solid and safe foundation for consented collaborative health data use and the G7 Hiroshima AI process in place, we call on citizens and their governments to fully support digital transformation of medicine, research and innovation including AI.
Collapse
Affiliation(s)
- Daniel C Baumgart
- Precision Health Signature Area, College of Health Sciences, College of Natural and Applied Sciences all at University of Alberta, Edmonton, Alberta, Canada.
| |
Collapse
|
4
|
Bond WF, Zhou J, Bhat S, Park YS, Ebert-Allen RA, Ruger RL, Yudkowsky R. Automated Patient Note Grading: Examining Scoring Reliability and Feasibility. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023; 98:S90-S97. [PMID: 37983401 DOI: 10.1097/acm.0000000000005357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
PURPOSE Scoring postencounter patient notes (PNs) yields significant insights into student performance, but the resource intensity of scoring limits its use. Recent advances in natural language processing (NLP) and machine learning allow application of automated short answer grading (ASAG) for this task. This retrospective study evaluated psychometric characteristics and reliability of an ASAG system for PNs and factors contributing to implementation, including feasibility and case-specific phrase annotation required to tune the system for a new case. METHOD PNs from standardized patient (SP) cases within a graduation competency exam were used to train the ASAG system, applying a feed-forward neural networks algorithm for scoring. Using faculty phrase-level annotation, 10 PNs per case were required to tune the ASAG system. After tuning, ASAG item-level ratings for 20 notes were compared across ASAG-faculty (4 cases, 80 pairings) and ASAG-nonfaculty (2 cases, 40 pairings). Psychometric characteristics were examined using item analysis and Cronbach's alpha. Inter-rater reliability (IRR) was examined using kappa. RESULTS ASAG scores demonstrated sufficient variability in differentiating learner PN performance and high IRR between machine and human ratings. Across all items the ASAG-faculty scoring mean kappa was .83 (SE ± .02). The ASAG-nonfaculty pairings kappa was .83 (SE ± .02). The ASAG scoring demonstrated high item discrimination. Internal consistency reliability values at the case level ranged from a Cronbach's alpha of .65 to .77. Faculty time cost to train and supervise nonfaculty raters for 4 cases was approximately $1,856. Faculty cost to tune the ASAG system was approximately $928. CONCLUSIONS NLP-based automated scoring of PNs demonstrated a high degree of reliability and psychometric confidence for use as learner feedback. The small number of phrase-level annotations required to tune the system to a new case enhances feasibility. ASAG-enabled PN scoring has broad implications for improving feedback in case-based learning contexts in medical education.
Collapse
Affiliation(s)
- William F Bond
- W.F. Bond is professor, Department of Emergency Medicine, University of Illinois College of Medicine, Peoria, Illinois, and is affiliated with Jump Simulation, an OSF HealthCare and University of Illinois College of Medicine at Peoria Collaboration; ORCID: http://orcid.org/0000-0001-6714-7152
| | - Jianing Zhou
- J. Zhou is a PhD student, Department of Computer Science, University of Illinois, Urbana-Champaign, Champaign, Illinois
| | - Suma Bhat
- S. Bhat is assistant professor, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Champaign, Illinois; ORCID: http://orcid.org/0000-0003-0324-5890
| | - Yoon Soo Park
- Y.S. Park is professor, Department of Medical Education, University of Illinois College of Medicine, Chicago, Illinois
| | - Rebecca A Ebert-Allen
- R.A. Ebert-Allen is a research project manager, Jump Simulation, an OSF HealthCare and University of Illinois College of Medicine at Peoria Collaboration, Peoria, Illinois; ORCID: http://orcid.org/0000-0001-6607-0229
| | - Rebecca L Ruger
- R.L. Ruger was a research assistant, Jump Simulation, and is now a graduate student, Department of Psychology, Penn State University, University Park, Pennsylvania; ORCID: http://orcid.org/0009-0005-8739-3226
| | - Rachel Yudkowsky
- R. Yudkowsky is professor, Department of Medical Education, University of Illinois College of Medicine, Chicago, Illinois; ORCID: https://orcid.org/0000-0002-2145-7582
| |
Collapse
|
5
|
Srinivas S, Young AJ. Machine Learning and Artificial Intelligence in Surgical Research. Surg Clin North Am 2023; 103:299-316. [PMID: 36948720 DOI: 10.1016/j.suc.2022.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Machine learning, a subtype of artificial intelligence, is an emerging field of surgical research dedicated to predictive modeling. From its inception, machine learning has been of interest in medical and surgical research. Built on traditional research metrics for optimal success, avenues of research include diagnostics, prognosis, operative timing, and surgical education, in a variety of surgical subspecialties. Machine learning represents an exciting and developing future in the world of surgical research that will not only allow for more personalized and comprehensive medical care.
Collapse
Affiliation(s)
- Shruthi Srinivas
- Department of Surgery, The Ohio State University, 370 West 9th Avenue, Columbus, OH 43210, USA
| | - Andrew J Young
- Division of Trauma, Critical Care, and Burn, The Ohio State University, 181 Taylor Avenue, Suite 1102K, Columbus, OH 43203, USA.
| |
Collapse
|
6
|
Stefan P, Pfandler M, Kullmann A, Eck U, Koch A, Mehren C, von der Heide A, Weidert S, Fürmetz J, Euler E, Lazarovici M, Navab N, Weigl M. Computer-assisted simulated workplace-based assessment in surgery: application of the universal framework of intraoperative performance within a mixed-reality simulation. BMJ SURGERY, INTERVENTIONS, & HEALTH TECHNOLOGIES 2023; 5:e000135. [PMID: 36687799 PMCID: PMC9853221 DOI: 10.1136/bmjsit-2022-000135] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 08/24/2022] [Indexed: 01/20/2023] Open
Abstract
Objectives Workplace-based assessment (WBA) is a key requirement of competency-based medical education in postgraduate surgical education. Although simulated workplace-based assessment (SWBA) has been proposed to complement WBA, it is insufficiently adopted in surgical education. In particular, approaches to criterion-referenced and automated assessment of intraoperative surgical competency in contextualized SWBA settings are missing.Main objectives were (1) application of the universal framework of intraoperative performance and exemplary adaptation to spine surgery (vertebroplasty); (2) development of computer-assisted assessment based on criterion-referenced metrics; and (3) implementation in contextualized, team-based operating room (OR) simulation, and evaluation of validity. Design Multistage development and assessment study: (1) expert-based definition of performance indicators based on framework's performance domains; (2) development of respective assessment metrics based on preoperative planning and intraoperative performance data; (3) implementation in mixed-reality OR simulation and assessment of surgeons operating in a confederate team. Statistical analyses included internal consistency and interdomain associations, correlations with experience, and technical and non-technical performances. Setting Surgical simulation center. Full surgical team set-up within mixed-reality OR simulation. Participants Eleven surgeons were recruited from two teaching hospitals. Eligibility criteria included surgical specialists in orthopedic, trauma, or neurosurgery with prior VP or kyphoplasty experience. Main outcome measures Computer-assisted assessment of surgeons' intraoperative performance. Results Performance scores were associated with surgeons' experience, observational assessment (Objective Structured Assessment of Technical Skill) scores and overall pass/fail ratings. Results provide strong evidence for validity of our computer-assisted SWBA approach. Diverse indicators of surgeons' technical and non-technical performances could be quantified and captured. Conclusions This study is the first to investigate computer-assisted assessment based on a competency framework in authentic, contextualized team-based OR simulation. Our approach discriminates surgical competency across the domains of intraoperative performance. It advances previous automated assessment based on the use of current surgical simulators in decontextualized settings. Our findings inform future use of computer-assisted multidomain competency assessments of surgeons using SWBA approaches.
Collapse
Affiliation(s)
- Philipp Stefan
- Chair for Computer Aided Medical Procedures and Augmented Reality, Department of Informatics, Technical University of Munich, München, Germany
| | - Michael Pfandler
- Institute and Outpatient Clinic for Occupational, Social, and Environmental Medicine, University Hospital, Ludwig Maximilians University Munich, München, Germany
| | - Aljoscha Kullmann
- Chair for Computer Aided Medical Procedures and Augmented Reality, Department of Informatics, Technical University of Munich, München, Germany
| | - Ulrich Eck
- Chair for Computer Aided Medical Procedures and Augmented Reality, Department of Informatics, Technical University of Munich, München, Germany
| | - Amelie Koch
- Institute and Outpatient Clinic for Occupational, Social, and Environmental Medicine, University Hospital, Ludwig Maximilians University Munich, München, Germany
| | - Christoph Mehren
- Spine Center, Schön Klinik München Harlaching, München, Germany,Academic Teaching Hospital and Spine Research Institute, Paracelsus Medical University, Salzburg, Austria
| | - Anna von der Heide
- Department of General, Trauma and Reconstructive Surgery, University Hospital, Campus Grosshadern, Ludwig Maximilians University Munich, München, Germany
| | - Simon Weidert
- Department of General, Trauma and Reconstructive Surgery, University Hospital, Campus Grosshadern, Ludwig Maximilians University Munich, München, Germany
| | - Julian Fürmetz
- Department of General, Trauma and Reconstructive Surgery, University Hospital, Campus Innenstadt, Ludwig Maximilians University Munich, München, Germany
| | - Ekkehard Euler
- Department of General, Trauma and Reconstructive Surgery, University Hospital, Campus Innenstadt, Ludwig Maximilians University Munich, München, Germany
| | - Marc Lazarovici
- Institute for Emergency Medicine and Management in Medicine (INM), University Hospital, Ludwig Maximilians University Munich, München, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures and Augmented Reality, Department of Informatics, Technical University of Munich, München, Germany
| | - Matthias Weigl
- Institute and Outpatient Clinic for Occupational, Social, and Environmental Medicine, University Hospital, Ludwig Maximilians University Munich, München, Germany,Institute for Patient Safety, University of Bonn, Bonn, Germany
| |
Collapse
|
7
|
Louis N, Zhou L, Yule SJ, Dias RD, Manojlovich M, Pagani FD, Likosky DS, Corso JJ. Temporally guided articulated hand pose tracking in surgical videos. Int J Comput Assist Radiol Surg 2023; 18:117-125. [PMID: 36190616 PMCID: PMC9883342 DOI: 10.1007/s11548-022-02761-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 09/13/2022] [Indexed: 02/01/2023]
Abstract
PURPOSE Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks. METHODS In this work, we propose a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating a pose prior into its prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions. RESULTS We collect Surgical Hands, the first dataset that provides multi-instance articulated hand pose annotations for videos. Our dataset provides over 8.1k annotated hand poses from publicly available surgical videos and bounding boxes, pose annotations, and tracking IDs to enable multi-instance tracking. When evaluated on Surgical Hands, we show our method outperforms the state-of-the-art approach using mean Average Precision, to measure pose estimation accuracy, and Multiple Object Tracking Accuracy, to assess pose tracking performance. CONCLUSION In comparison to a frame-wise independent strategy, we show greater performance in detecting and tracking hand poses and more substantial impact on localization accuracy. This has positive implications in generating more accurate representations of hands in the scene to be used for targeted downstream tasks.
Collapse
Affiliation(s)
| | | | - Steven J. Yule
- Clinical Surgery, University of Edinburgh, Edinburgh, Scotland, UK
| | - Roger D. Dias
- Emergency Medicine, Harvard Medical School, Boston, MA USA
| | | | | | | | | |
Collapse
|
8
|
Lam A, Lam L, Blacketer C, Parnis R, Franke K, Wagner M, Wang D, Tan Y, Oakden-Rayner L, Gallagher S, Perry SW, Licinio J, Symonds I, Thomas J, Duggan P, Bacchi S. Professionalism and clinical short answer question marking with machine learning. Intern Med J 2022; 52:1268-1271. [PMID: 35879236 DOI: 10.1111/imj.15839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/14/2022] [Indexed: 11/29/2022]
Abstract
Machine learning may assist in medical student evaluation. This study involved scoring short answer questions administered at three centres. Bidirectional encoder representations from transformers were particularly effective for professionalism question scoring (accuracy ranging from 41.6% to 92.5%). In the scoring of 3-mark professionalism questions, as compared with clinical questions, machine learning had a lower classification accuracy (P < 0.05). The role of machine learning in medical professionalism evaluation warrants further investigation.
Collapse
Affiliation(s)
- Antoinette Lam
- University of Adelaide, Adelaide, South Australia, Australia
| | - Lydia Lam
- University of Adelaide, Adelaide, South Australia, Australia
| | - Charlotte Blacketer
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Roger Parnis
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Kyle Franke
- University of Adelaide, Adelaide, South Australia, Australia
| | - Morganne Wagner
- State University of New York (SUNY) Upstate Medical University, Syracuse, New York, USA
| | - David Wang
- University of Otago, Dunedin, New Zealand
| | - Yiran Tan
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Lauren Oakden-Rayner
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | | | - Seth W Perry
- State University of New York (SUNY) Upstate Medical University, Syracuse, New York, USA
| | - Julio Licinio
- State University of New York (SUNY) Upstate Medical University, Syracuse, New York, USA
| | - Ian Symonds
- University of Adelaide, Adelaide, South Australia, Australia
| | - Josephine Thomas
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Paul Duggan
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Stephen Bacchi
- University of Adelaide, Adelaide, South Australia, Australia.,Royal Adelaide Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
9
|
Yilmaz Y, Jurado Nunez A, Ariaeinejad A, Lee M, Sherbino J, Chan TM. Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education. JMIR MEDICAL EDUCATION 2022; 8:e30537. [PMID: 35622398 PMCID: PMC9187970 DOI: 10.2196/30537] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 12/05/2021] [Accepted: 04/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. OBJECTIVE This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. METHODS NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. RESULTS The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). CONCLUSIONS The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments.
Collapse
Affiliation(s)
- Yusuf Yilmaz
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Department of Medical Education, Ege University, Izmir, Turkey
- Program for Faculty Development, Office of Continuing Professional Development, McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Alma Jurado Nunez
- Department of Medicine and Masters in eHealth Program, McMaster University, Hamilton, ON, Canada
| | - Ali Ariaeinejad
- Department of Medicine and Masters in eHealth Program, McMaster University, Hamilton, ON, Canada
| | - Mark Lee
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Jonathan Sherbino
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Emergency Medicine, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Education and Innovation, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Teresa M Chan
- McMaster Education Research, Innovation, and Theory Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Program for Faculty Development, Office of Continuing Professional Development, McMaster University, Hamilton, ON, Canada
- Division of Emergency Medicine, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Division of Education and Innovation, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
10
|
Rees CA, Ryder HF. Machine Learning for The Prediction of Ranked Applicants and Matriculants to an Internal Medicine Residency Program. TEACHING AND LEARNING IN MEDICINE 2022:1-10. [PMID: 35591808 DOI: 10.1080/10401334.2022.2059664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 03/07/2022] [Indexed: 06/15/2023]
Abstract
Phenomenon: Residency programs throughout the country each receive hundreds to thousands of applications every year. Holistic review of this many applications is challenging, and to-date, few tools exist to streamline or assist in the process for selecting candidates to interview and rank. Machine learning could assist programs in predicting which applicants are likely to be ranked, and among ranked applicants, which are likely to matriculate.Approach: In the present study, we used the machine learning algorithm Random Forest (RF) to differentiate between ranked and unranked applicants as well as matriculants and ranked non-matriculants to an internal medicine residency program in northern New England over a three-year period. In total, 5,067 ERAS applications were received during the 2016-17, 2017-18, and 2018-19 application cycles. Of these, 4,256 (84.0%) were unranked applicants, 754 (14.9%) were ranked non-matriculants, and 57 (1.12%) were ranked matriculants.Findings: For differentiating between ranked and unranked applicants, the RF algorithm achieved an area under the receiver operating characteristic (AUROC) curve of 0.925 (95% CI: 0.918-0.932) and area under the precision-recall curve (AUPRC) of 0.652 (0.611-0.685), while for differentiating between matriculants and ranked non-matriculants, the AUROC was 0.597 (95% CI: 0.516-0.680) and AUPRC was 0.114 (0.075-0.167). The ranks of matriculated applicants were significantly higher using the algorithmic rank list as compared with the actual rank list for the 2017-18 (median rank: 98 versus 204, p < .001) and 2018-19 cycles (74 versus 192, p = .006), but not the 2016-17 cycle (97 versus 144, p = .37).Insights: The RF algorithm predicted which applicants among the overall applicant pool were ranked with impressive accuracy and identified matriculants among ranked candidates with modest but better-than-random accuracy. This approach could assist residency programs with triaging applicants based on the likelihood of a candidate being ranked and/or matriculating.
Collapse
Affiliation(s)
- Christiaan A Rees
- Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Hilary F Ryder
- Department of Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, USA
- Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| |
Collapse
|
11
|
Mora J. Proyecciones de la ciencia de datos en la cirugía cardíaca. REVISTA MÉDICA CLÍNICA LAS CONDES 2022. [DOI: 10.1016/j.rmclc.2022.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
12
|
Sukhera J, Ahmed H. Leveraging Machine Learning to Understand How Emotions Influence Equity Related Education: Quasi-Experimental Study. JMIR MEDICAL EDUCATION 2022; 8:e33934. [PMID: 35353048 PMCID: PMC9008524 DOI: 10.2196/33934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 01/24/2022] [Accepted: 02/15/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Teaching and learning about topics such as bias are challenging due to the emotional nature of bias-related discourse. However, emotions can be challenging to study in health professions education for numerous reasons. With the emergence of machine learning and natural language processing, sentiment analysis (SA) has the potential to bridge the gap. OBJECTIVE To improve our understanding of the role of emotions in bias-related discourse, we developed and conducted a SA of bias-related discourse among health professionals. METHODS We conducted a 2-stage quasi-experimental study. First, we developed a SA (algorithm) within an existing archive of interviews with health professionals about bias. SA refers to a mechanism of analysis that evaluates the sentiment of textual data by assigning scores to textual components and calculating and assigning a sentiment value to the text. Next, we applied our SA algorithm to an archive of social media discourse on Twitter that contained equity-related hashtags to compare sentiment among health professionals and the general population. RESULTS When tested on the initial archive, our SA algorithm was highly accurate compared to human scoring of sentiment. An analysis of bias-related social media discourse demonstrated that health professional tweets (n=555) were less neutral than the general population (n=6680) when discussing social issues on professionally associated accounts (χ2 [2, n=555)]=35.455; P<.001), suggesting that health professionals attach more sentiment to their posts on Twitter than seen in the general population. CONCLUSIONS The finding that health professionals are more likely to show and convey emotions regarding equity-related issues on social media has implications for teaching and learning about sensitive topics related to health professions education. Such emotions must therefore be considered in the design, delivery, and evaluation of equity and bias-related education.
Collapse
Affiliation(s)
- Javeed Sukhera
- Institute of Living, Hartford Hospital, Hartford, CT, United States
| | - Hasan Ahmed
- Centre for Education Research and Innovation, Western University, London, ON, Canada
| |
Collapse
|
13
|
Davids J, Ashrafian H. AIM and mHealth, Smartphones and Apps. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
A Machine Learning Model for Predicting Unscheduled 72 h Return Visits to the Emergency Department by Patients with Abdominal Pain. Diagnostics (Basel) 2021; 12:diagnostics12010082. [PMID: 35054249 PMCID: PMC8775134 DOI: 10.3390/diagnostics12010082] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 12/28/2021] [Accepted: 12/29/2021] [Indexed: 12/12/2022] Open
Abstract
Seventy-two-hour unscheduled return visits (URVs) by emergency department patients are a key clinical index for evaluating the quality of care in emergency departments (EDs). This study aimed to develop a machine learning model to predict 72 h URVs for ED patients with abdominal pain. Electronic health records data were collected from the Chang Gung Research Database (CGRD) for 25,151 ED visits by patients with abdominal pain and a total of 617 features were used for analysis. We used supervised machine learning models, namely logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and voting classifier (VC), to predict URVs. The VC model achieved more favorable overall performance than other models (AUROC: 0.74; 95% confidence interval (CI), 0.69–0.76; sensitivity, 0.39; specificity, 0.89; F1 score, 0.25). The reduced VC model achieved comparable performance (AUROC: 0.72; 95% CI, 0.69–0.74) to the full models using all clinical features. The VC model exhibited the most favorable performance in predicting 72 h URVs for patients with abdominal pain, both for all-features and reduced-features models. Application of the VC model in the clinical setting after validation may help physicians to make accurate decisions and decrease URVs.
Collapse
|
15
|
Richardson ML, Garwood ER, Lee Y, Li MD, Lo HS, Nagaraju A, Nguyen XV, Probyn L, Rajiah P, Sin J, Wasnik AP, Xu K. Noninterpretive Uses of Artificial Intelligence in Radiology. Acad Radiol 2021; 28:1225-1235. [PMID: 32059956 DOI: 10.1016/j.acra.2020.01.012] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 01/08/2020] [Accepted: 01/09/2020] [Indexed: 12/12/2022]
Abstract
We deem a computer to exhibit artificial intelligence (AI) when it performs a task that would normally require intelligent action by a human. Much of the recent excitement about AI in the medical literature has revolved around the ability of AI models to recognize anatomy and detect pathology on medical images, sometimes at the level of expert physicians. However, AI can also be used to solve a wide range of noninterpretive problems that are relevant to radiologists and their patients. This review summarizes some of the newer noninterpretive uses of AI in radiology.
Collapse
Affiliation(s)
| | - Elisabeth R Garwood
- Department of Radiology, University of Massachusetts, Worcester, Massachusetts
| | - Yueh Lee
- Department of Radiology, University of North Carolina, Chapel Hill, North Carolina
| | - Matthew D Li
- Department of Radiology, Harvard Medical School/Massachusetts General Hospital, Boston, Massachusets
| | - Hao S Lo
- Department of Radiology, University of Washington, Seattle, Washington
| | - Arun Nagaraju
- Department of Radiology, University of Chicago, Chicago, Illinois
| | - Xuan V Nguyen
- Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, Ohio
| | - Linda Probyn
- Department of Radiology, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Ontario
| | - Prabhakar Rajiah
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Jessica Sin
- Department of Radiology, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire
| | - Ashish P Wasnik
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | - Kali Xu
- Department of Medicine, Santa Clara Valley Medical Center, Santa Clara, California
| |
Collapse
|
16
|
Jones KA, Jani KH, Jones GW, Nye ML, Duff JP, Cheng A, Lin Y, Davidson J, Chatfield J, Tofil N, Gaither S, Kessler DO. Using natural language processing to compare task-specific verbal cues in coached versus noncoached cardiac arrest teams during simulated pediatrics resuscitation. AEM EDUCATION AND TRAINING 2021; 5:e10707. [PMID: 34926971 PMCID: PMC8643156 DOI: 10.1002/aet2.10707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 10/20/2021] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
OBJECTIVES Coaches improve cardiopulmonary (CPR) outcomes in real-world and simulated settings. To explore verbal feedback that targets CPR quality, we used natural language processing (NLP) methodologies on transcripts from a published pediatric randomized trial (coach vs. no coach in simulated CPR). Study objectives included determining any differences by trial arm in (1) overall communication and (2) metrics over minutes of CPR and (3) exploring overall frequencies and temporal patterns according to degrees of CPR excellence. METHODS A human-generated transcription service produced 40 team transcripts. Automated text search with manual review assigned semantic category; word count; and presence of verbal cues for general CPR, compression depth or rate, or positive feedback to transcript utterances. Resulting cue counts per minute (CPM) were corresponded to CPR quality based on compression rate and depth per minute. CPMs were compared across trial arms and over the 18 min of CPR. Adaptation to excellence was analyzed across four patterns of CPR excellence determined by k-shape methods. RESULTS Overall coached teams experienced more rate-directive, depth-directive, and positive verbal cues compared with noncoached teams. The frequency of coaches' depth cues changed over minutes of CPR, indicating adaptation. In coached teams, the number of depth-directive cues differed among the four patterns of CPR excellence. Noncoached teams experienced fewer utterances by type, with no adaptation over time or to CPR performance. CONCLUSION NLP extracted verbal metrics and their patterns in resuscitation sessions provides insight into communication patterns and skills used by CPR coaches and other team members. This could help to further optimize CPR training, feedback, excellence, and outcomes.
Collapse
Affiliation(s)
- Kai A. Jones
- Vagelos College of Physicians and SurgeonsColumbia UniversityNew YorkNew YorkUSA
| | - Karan H. Jani
- Vagelos College of Physicians and SurgeonsColumbia UniversityNew YorkNew YorkUSA
| | | | - Megan L. Nye
- Department of Emergency MedicineColumbia UniversityNew YorkNew YorkUSA
| | - Jonathan P. Duff
- Department of PediatricsUniversity of AlbertaEdmontonAlbertaCanada
| | - Adam Cheng
- Department of PediatricsUniversity of CalgaryAlberta Children's HospitalCalgaryAlbertaCanada
- Department of Emergency MedicineUniversity of CalgaryCalgaryAlbertaCanada
| | - Yiqun Lin
- Department of PediatricsUniversity of CalgaryAlberta Children's HospitalCalgaryAlbertaCanada
| | - Jennifer Davidson
- Department of PediatricsUniversity of CalgaryAlberta Children's HospitalCalgaryAlbertaCanada
| | - Jenny Chatfield
- Department of PediatricsUniversity of CalgaryAlberta Children's HospitalCalgaryAlbertaCanada
| | - Nancy Tofil
- Department of PediatricsUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Stacy Gaither
- Department of PediatricsUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - David O. Kessler
- Department of Emergency MedicineColumbia UniversityNew YorkNew YorkUSA
| | | |
Collapse
|
17
|
Cianciolo AT, LaVoie N, Parker J. Machine Scoring of Medical Students' Written Clinical Reasoning: Initial Validity Evidence. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021; 96:1026-1035. [PMID: 33637657 PMCID: PMC8243833 DOI: 10.1097/acm.0000000000004010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
PURPOSE Developing medical students' clinical reasoning requires a structured longitudinal curriculum with frequent targeted assessment and feedback. Performance-based assessments, which have the strongest validity evidence, are currently not feasible for this purpose because they are time-intensive to score. This study explored the potential of using machine learning technologies to score one such assessment-the diagnostic justification essay. METHOD From May to September 2018, machine scoring algorithms were trained to score a sample of 700 diagnostic justification essays written by 414 third-year medical students from the Southern Illinois University School of Medicine classes of 2012-2017. The algorithms applied semantically based natural language processing metrics (e.g., coherence, readability) to assess essay quality on 4 criteria (differential diagnosis, recognition and use of findings, workup, and thought process); the scores for these criteria were summed to create overall scores. Three sources of validity evidence (response process, internal structure, and association with other variables) were examined. RESULTS Machine scores correlated more strongly with faculty ratings than faculty ratings did with each other (machine: .28-.53, faculty: .13-.33) and were less case-specific. Machine scores and faculty ratings were similarly correlated with medical knowledge, clinical cognition, and prior diagnostic justification. Machine scores were more strongly associated with clinical communication than were faculty ratings (.43 vs .31). CONCLUSIONS Machine learning technologies may be useful for assessing medical students' long-form written clinical reasoning. Semantically based machine scoring may capture the communicative aspects of clinical reasoning better than faculty ratings, offering the potential for automated assessment that generalizes to the workplace. These results underscore the potential of machine scoring to capture an aspect of clinical reasoning performance that is difficult to assess with traditional analytic scoring methods. Additional research should investigate machine scoring generalizability and examine its acceptability to trainees and educators.
Collapse
Affiliation(s)
- Anna T Cianciolo
- A.T. Cianciolo is associate professor of medical education, Southern Illinois University School of Medicine, Springfield, Illinois; ORCID: https://orcid.org/0000-0001-5948-9304
| | - Noelle LaVoie
- N. LaVoie is president, Parallel Consulting, Petaluma, California; ORCID: https://orcid.org/0000-0002-7013-3568
| | - James Parker
- J. Parker is senior research associate, Parallel Consulting, Petaluma, California
| |
Collapse
|
18
|
Davids J, Ashrafian H. AIM and mHealth, Smartphones and Apps. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_242-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Jani KH, Jones KA, Jones GW, Amiel J, Barron B, Elhadad N. Machine learning to extract communication and history-taking skills in OSCE transcripts. MEDICAL EDUCATION 2020; 54:1159-1170. [PMID: 32776345 DOI: 10.1111/medu.14347] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/24/2020] [Accepted: 07/31/2020] [Indexed: 06/11/2023]
Abstract
OBJECTIVES Observed Structured Clinical Exams (OSCEs) allow assessment of, and provide feedback to, medical students. Clinical examiners and standardised patients (SP) typically complete itemised checklists and global scoring scales, which have known shortcomings. In this study, we applied machine learning (ML) to label some communication skills and interview content information in OSCE transcripts and to compare several ML methodologies by performance and transferability. METHODS One-hundred and twenty-one transcripts of two OSCE scenarios were manually annotated per utterance across 19 communication skills and content areas. Utterances were converted to two types of numeric sentence vector representations and were paired with three types of ML algorithms. First, ML models (MLMs) were evaluated using a five K-fold cross-validation technique on all transcripts in one scenario to generate precision and recall, and their harmonic mean, F1 scores. Second, ML models were trained on all 101 transcripts from scenario 1 and tested for transferability on 20 scenario 2 transcripts. RESULTS Performance testing in the K-fold cross-validation demonstrated relatively high mean F1 scores: median 0.87 and range 0.53-0.98 across all 19 labels. Transferability testing demonstrated success: F1 median 0.76 and range 0.46-0.97. The combination of a bi-directional long short-term memory neural network (biLSTM) algorithm with GenSen numeric sentence vector representations was associated with greater F1 scores across both performance and transferability (P < .005). CONCLUSIONS We report the first application of ML in the context of student-SP OSCEs. We demonstrated that several MLMs automatically labelled OSCE transcripts for a range of interview content and some clinical communications skills. Some MLMs achieved greater performance and transferability. Optimised MLMs could provide automated and accurate assessment of OSCEs with potential to track student progress and identify areas for further practice.
Collapse
Affiliation(s)
- Karan H Jani
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Kai A Jones
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | | | - Jonathan Amiel
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Beth Barron
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
20
|
Tolsgaard MG, Boscardin CK, Park YS, Cuddy MM, Sebok-Syer SS. The role of data science and machine learning in Health Professions Education: practical applications, theoretical contributions, and epistemic beliefs. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2020; 25:1057-1086. [PMID: 33141345 DOI: 10.1007/s10459-020-10009-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 10/24/2020] [Indexed: 06/11/2023]
Abstract
Data science is an inter-disciplinary field that uses computer-based algorithms and methods to gain insights from large and often complex datasets. Data science, which includes Artificial Intelligence techniques such as Machine Learning (ML), has been credited with the promise to transform Health Professions Education (HPE) by offering approaches to handle big (and often messy) data. To examine this promise, we conducted a critical review to explore: (1) published applications of data science and ML in HPE literature and (2) the potential role of data science and ML in shifting theoretical and epistemological perspectives in HPE research and practice. Existing data science studies in HPE are often not informed by theory, but rather oriented towards developing applications for specific problems, uses, and contexts. The most common areas currently being studied are procedural (e.g., computer-based tutoring or adaptive systems and assessment of technical skills). We found that epistemic beliefs informing the use of data science and ML in HPE poses a challenge for existing views on what constitutes objective knowledge and the role of human subjectivity for instruction and assessment. As a result, criticisms have emerged that the integration of data science in the field of HPE is in danger of becoming technically driven and narrowly focused in its approach to teaching, learning and assessment. Our findings suggest that researchers tend to formalize around the epistemological stance driven largely by traditions of a research paradigm. Future data science studies in HPE need to involve both education scientists and data scientists to ensure mutual advancements in the development of educational theory and practical applications. This may be one of the most important tasks in the integration of data science and ML in HPE research in the years to come.
Collapse
Affiliation(s)
- Martin G Tolsgaard
- Copenhagen Academy for Medical Education and Simulation (CAMES), Rigshospitalet, Copenhagen, Denmark.
- Department of Obstetrics, Centre for Fetal Medicine, Copenhagen University Hospital Rigshospitalet, Copenhagen, Denmark.
| | - Christy K Boscardin
- Department of Medicine, Department of Anesthesia, University of California San Francisco, San Francisco, CA, USA
| | - Yoon Soo Park
- Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Monica M Cuddy
- National Board of Medical Examiners, Philadelphia, PA, USA
| | - Stefanie S Sebok-Syer
- Department of Emergency Medicine, Stanford University School of Medicine, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
21
|
Abstract
The tremendous and rapid technological advances that humans have achieved in the last decade have definitely impacted how surgical tasks are performed in the operating room (OR). As a high-tech work environment, the contemporary OR has incorporated novel computational systems into the clinical workflow, aiming to optimize processes and support the surgical team. Artificial intelligence (AI) is increasingly important for surgical decision making to help address diverse sources of information, such as patient risk factors, anatomy, disease natural history, patient values and cost, and assist surgeons and patients to make better predictions regarding the consequences of surgical decisions. In this review, we discuss the current initiatives that are using AI in cardiothoracic surgery and surgical care in general. We also address the future of AI and how high-tech ORs will leverage human-machine teaming to optimize performance and enhance patient safety.
Collapse
Affiliation(s)
- Roger D Dias
- STRATUS Center for Medical Simulation, Brigham Health, Boston, MA, USA -
- Department of Emergency Medicine, Harvard Medical School, Boston, MA, USA -
| | - Julie A Shah
- Laboratory of Computer Science and Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Marco A Zenati
- Laboratory of Medical Robotics and Computer Assisted Surgery (MRCAS), Division of Cardiothoracic Surgery, VA Boston Healthcare System, Boston, MA, USA
- Department of Surgery, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
22
|
Rencic J, Schuwirth LWT, Gruppen LD, Durning SJ. A situated cognition model for clinical reasoning performance assessment: a narrative review. Diagnosis (Berl) 2020; 7:227-240. [PMID: 32352400 DOI: 10.1515/dx-2019-0106] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 04/04/2020] [Indexed: 02/17/2024]
Abstract
Background Clinical reasoning performance assessment is challenging because it is a complex, multi-dimensional construct. In addition, clinical reasoning performance can be impacted by contextual factors, leading to significant variation in performance. This phenomenon called context specificity has been described by social cognitive theories. Situated cognition theory, one of the social cognitive theories, posits that cognition emerges from the complex interplay of human beings with each other and the environment. It has been used as a valuable conceptual framework to explore context specificity in clinical reasoning and its assessment. We developed a conceptual model of clinical reasoning performance assessment based on situated cognition theory. In this paper, we use situated cognition theory and the conceptual model to explore how this lens alters the interpretation of articles or provides additional insights into the interactions between the assessee, patient, rater, environment, assessment method, and task. Methods We culled 17 articles from a systematic literature search of clinical reasoning performance assessment that explicitly or implicitly demonstrated a situated cognition perspective to provide an "enriched" sample with which to explore how contextual factors impact clinical reasoning performance assessment. Results We found evidence for dyadic, triadic, and quadratic interactions between different contextual factors, some of which led to dramatic changes in the assessment of clinical reasoning performance, even when knowledge requirements were not significantly different. Conclusions The analysis of the selected articles highlighted the value of a situated cognition perspective in understanding variations in clinical reasoning performance assessment. Prospective studies that evaluate the impact of modifying various contextual factors, while holding others constant, can provide deeper insights into the mechanisms by which context impacts clinical reasoning performance assessment.
Collapse
Affiliation(s)
- Joseph Rencic
- Department of Medicine, Boston University School of Medicine, 72 East Concord Street, Boston, MA 02118, USA
| | - Lambert W T Schuwirth
- Prideaux Centre for Research in Health Professions Education, Flinders University, Flinders, Australia
| | - Larry D Gruppen
- Department of Medical Education, University of Michigan, Ann Arbor, MI, USA
| | - Steven J Durning
- Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| |
Collapse
|
23
|
Carrillo-Larco RM, Tudor Car L, Pearson-Stuttard J, Panch T, Miranda JJ, Atun R. Machine learning health-related applications in low-income and middle-income countries: a scoping review protocol. BMJ Open 2020; 10:e035983. [PMID: 32393612 PMCID: PMC7223147 DOI: 10.1136/bmjopen-2019-035983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 04/17/2020] [Accepted: 04/20/2020] [Indexed: 12/03/2022] Open
Abstract
INTRODUCTION Machine learning (ML) has been used in bio-medical research, and recently in clinical and public health research. However, much of the available evidence comes from high-income countries, where different health profiles challenge the application of this research to low/middle-income countries (LMICs). It is largely unknown what ML applications are available for LMICs that can support and advance clinical medicine and public health. We aim to address this gap by conducting a scoping review of health-related ML applications in LMICs. METHODS AND ANALYSIS This scoping review will follow the methodology proposed by Levac et al. The search strategy is informed by recent systematic reviews of ML health-related applications. We will search Embase, Medline and Global Health (through Ovid), Cochrane and Google Scholar; we will present the date of our searches in the final review. Titles and abstracts will be screened by two reviewers independently; selected reports will be studied by two reviewers independently. Reports will be included if they are primary research where data have been analysed, ML techniques have been used on data from LMICs and they aimed to improve health-related outcomes. We will synthesise the information following evidence mapping recommendations. ETHICS AND DISSEMINATION The review will provide a comprehensive list of health-related ML applications in LMICs. The results will be disseminated through scientific publications. We also plan to launch a website where ML models can be hosted so that researchers, policymakers and the general public can readily access them.
Collapse
Affiliation(s)
- Rodrigo M Carrillo-Larco
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Lorainne Tudor Car
- Family Medicine and Primary Care, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
- Department of Primary Care and Public Health, School of Public Health, Imperial College London, London, UK
| | - Jonathan Pearson-Stuttard
- Department of Epidemiology and Biostatistics and MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | | | - J Jaime Miranda
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
- Facultad de Medicina "Alberto Hurtado", Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Rifat Atun
- Harvard T.H Chan School of Public Health and Harvard Medical School, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
24
|
Sparrow R, Hatherley J. High Hopes for “Deep Medicine”? AI, Economics, and the Future of Care. Hastings Cent Rep 2020; 50:14-17. [DOI: 10.1002/hast.1079] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
25
|
Annapureddy AR, Angraal S, Caraballo C, Grimshaw A, Huang C, Mortazavi BJ, Krumholz HM. The National Institutes of Health funding for clinical research applying machine learning techniques in 2017. NPJ Digit Med 2020; 3:13. [PMID: 32025574 PMCID: PMC6994580 DOI: 10.1038/s41746-020-0223-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 12/03/2019] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) techniques have become ubiquitous and indispensable for solving intricate problems in most disciplines. To determine the extent of funding for clinical research projects applying ML techniques by the National Institutes of Health (NIH) in 2017, we searched the NIH Research Portfolio Online Reporting Tools Expenditures and Results (RePORTER) system using relevant keywords. We identified 535 projects, which together received a total of $264 million, accounting for 2% of the NIH extramural budget for clinical research.
Collapse
Affiliation(s)
- Amarnath R. Annapureddy
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT USA
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT USA
| | - Suveen Angraal
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT USA
- Department of Internal Medicine, University of Missouri Kansas City School of Medicine, Kansas City, MO USA
| | - Cesar Caraballo
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT USA
| | - Alyssa Grimshaw
- Harvey Cushing/John Hay Whitney Medical Library, Yale University, New Haven, CT USA
| | - Chenxi Huang
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT USA
| | - Bobak J. Mortazavi
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX USA
| | - Harlan M. Krumholz
- Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT USA
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT USA
- Department of Health Policy and Management, Yale School of Public Health, New Haven, CT USA
| |
Collapse
|
26
|
Affiliation(s)
- Katrina A Armstrong
- Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Eileen E Reynolds
- Harvard Medical School, Boston, Massachusetts
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| |
Collapse
|
27
|
Andolsek KM. One Small Step for Step 1. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2019; 94:309-313. [PMID: 30570496 DOI: 10.1097/acm.0000000000002560] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Step 1 of the United States Medical Licensing Examination (USMLE) is a multiple-choice exam primarily measuring knowledge about foundational sciences and organ systems. The test was psychometrically designed as pass/fail for licensing boards to decide whether physician candidates meet minimum standards they deem necessary to obtain the medical licensure necessary to practice. With an increasing number of applicants to review, Step 1 scores are commonly used by residency program directors to screen applicants, even though the exam was not intended for this purpose. Elsewhere in this issue, Chen and colleagues describe the "Step 1 climate" that has evolved in undergraduate medical education, affecting learning, diversity, and well-being.Addressing issues related to Step 1 is a challenge. Various stakeholders frequently spend more time demonizing one another rather than listening, addressing what lies under their respective control, and working collaboratively toward better long-term solutions. In this Invited Commentary, the author suggests how different constituencies can act now to improve this situation while aspirational future solutions are developed.One suggestion is to report Step 1 and Step 2 Clinical Knowledge scores as pass/fail and Step 2 Clinical Skills scores numerically. Any changes must be carefully implemented in a way that is mindful of the kind of unintended consequences that have befallen Step 1. The upcoming invitational conference on USMLE scoring (InCUS) will bring together representatives from all stakeholders. Until there is large-scale reform, all stakeholders should commit to taking (at least) one small step toward fixing Step 1 today.
Collapse
Affiliation(s)
- Kathryn M Andolsek
- K.M. Andolsek is professor, Department of Community and Family Medicine, and assistant dean for premedical education, Duke University School of Medicine, Durham, North Carolina; ORCID: https://orcid.org/0000-0001-7994-3869
| |
Collapse
|