1
|
Sciannameo V, Pagliari DJ, Urru S, Grimaldi P, Ocagli H, Ahsani-Nasab S, Comoretto RI, Gregori D, Berchialla P. Information extraction from medical case reports using OpenAI InstructGPT. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 255:108326. [PMID: 39029416 DOI: 10.1016/j.cmpb.2024.108326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 06/15/2023] [Accepted: 07/11/2024] [Indexed: 07/21/2024]
Abstract
BACKGROUND AND OBJECTIVE Researchers commonly use automated solutions such as Natural Language Processing (NLP) systems to extract clinical information from large volumes of unstructured data. However, clinical text's poor semantic structure and domain-specific vocabulary can make it challenging to develop a one-size-fits-all solution. Large Language Models (LLMs), such as OpenAI's Generative Pre-Trained Transformer 3 (GPT-3), offer a promising solution for capturing and standardizing unstructured clinical information. This study evaluated the performance of InstructGPT, a family of models derived from LLM GPT-3, to extract relevant patient information from medical case reports and discussed the advantages and disadvantages of LLMs versus dedicated NLP methods. METHODS In this paper, 208 articles related to case reports of foreign body injuries in children were identified by searching PubMed, Scopus, and Web of Science. A reviewer manually extracted information on sex, age, the object that caused the injury, and the injured body part for each patient to build a gold standard to compare the performance of InstructGPT. RESULTS InstructGPT achieved high accuracy in classifying the sex, age, object and body part involved in the injury, with 94%, 82%, 94% and 89%, respectively. When excluding articles for which InstructGPT could not retrieve any information, the accuracy for determining the child's sex and age improved to 97%, and the accuracy for identifying the injured body part improved to 93%. InstructGPT was also able to extract information from non-English language articles. CONCLUSIONS The study highlights that LLMs have the potential to eliminate the necessity for task-specific training (zero-shot extraction), allowing the retrieval of clinical information from unstructured natural language text, particularly from published scientific literature like case reports, by directly utilizing the PDF file of the article without any pre-processing and without requiring any technical expertise in NLP or Machine Learning. The diverse nature of the corpus, which includes articles written in languages other than English, some of which contain a wide range of clinical details while others lack information, adds to the strength of the study.
Collapse
Affiliation(s)
- Veronica Sciannameo
- Centre for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Regione Gonzole 10, Orbassano 10043, Italy
| | | | - Sara Urru
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Piercesare Grimaldi
- Department of Public Health and Pediatrics, University of Torino, Via Santena 5 bis, Torino 10126, Italy
| | - Honoria Ocagli
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Sara Ahsani-Nasab
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Rosanna Irene Comoretto
- Department of Public Health and Pediatrics, University of Torino, Via Santena 5 bis, Torino 10126, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Paola Berchialla
- Centre for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Regione Gonzole 10, Orbassano 10043, Italy.
| |
Collapse
|
2
|
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform 2017; 73:14-29. [PMID: 28729030 DOI: 10.1016/j.jbi.2017.07.012] [Citation(s) in RCA: 290] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 06/07/2017] [Accepted: 07/14/2017] [Indexed: 12/24/2022]
Abstract
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.
Collapse
Affiliation(s)
- Kory Kreimeyer
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States.
| | - Matthew Foster
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Abhishek Pandey
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Nina Arya
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Gwendolyn Halford
- FDA Library, US Food and Drug Administration, Silver Spring, MD, United States
| | - Sandra F Jones
- Cancer Surveillance Branch, Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Richard Forshee
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Mark Walderhaug
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Taxiarchis Botsis
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| |
Collapse
|
3
|
Storey J. Factors affecting the adoption of quality assurance technologies in healthcare. J Health Organ Manag 2013; 27:498-519. [PMID: 24003634 DOI: 10.1108/jhom-12-2011-0138] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PURPOSE In the light of public concern and of strong policy emphasis on quality and safety in the nursing care of patients in hospital settings, this paper aims to focus on the factors affecting the adoption of innovative quality assurance technologies. DESIGN/METHODOLOGY/APPROACH Two sets of complementary literature were mined for key themes. Next, new empirical insights were sought. Data gathering was conducted in three phases. The first involved contact with NHS Technology Hubs and other institutions which had insights into leading centres in quality assurance technologies. The second phase was a series of telephone interviews with lead nurses in those hospitals which were identified in the first phase as comprising the leading centres. The third phase comprised a series of face to face interviews with innovators and adopters of healthcare quality assurance technologies in five hospital trusts. FINDINGS There were three main sets of findings. First, despite the strong policy push and the templates established at national level, there were significant variations in the nature and robustness of the quality assurance toolkits that were developed, adapted and adopted. Second, in most of the adopting cases there were important obstacles to the full adoption of the toolkits that were designed. Third, the extent and nature of the ambition of the developers varied dramatically - some wished to see their work impacting widely across the health service; others had a number of different reasons for wanting to restrict the impact of their work. ORIGINALITY/VALUE The general concerns about front-line care and the various inquiries into care quality failures emphasise the need for improved and consistent care quality assurance methodologies and practice. The technology adoption literature gives only partial insight into the nature of the challenges; this paper offers specific insights into the factors inhibiting the full adoption of quality assurance technologies in ward-based care.
Collapse
Affiliation(s)
- John Storey
- The Open University Business School, The Open University, Milton Keynes, UK.
| |
Collapse
|
4
|
Carmona-Cejudo JM, Hortas ML, Baena-García M, Lana-Linati J, González C, Redondo M, Morales-Bueno R. DB4US: A Decision Support System for Laboratory Information Management. Interact J Med Res 2012; 1:e16. [PMID: 23608745 PMCID: PMC3626127 DOI: 10.2196/ijmr.2126] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 08/10/2012] [Accepted: 09/21/2012] [Indexed: 12/21/2022] Open
Abstract
Background Until recently, laboratory automation has focused primarily on improving hardware. Future advances are concentrated on intelligent software since laboratories performing clinical diagnostic testing require improved information systems to address their data processing needs. In this paper, we propose DB4US, an application that automates information related to laboratory quality indicators information. Currently, there is a lack of ready-to-use management quality measures. This application addresses this deficiency through the extraction, consolidation, statistical analysis, and visualization of data related to the use of demographics, reagents, and turn-around times. The design and implementation issues, as well as the technologies used for the implementation of this system, are discussed in this paper. Objective To develop a general methodology that integrates the computation of ready-to-use management quality measures and a dashboard to easily analyze the overall performance of a laboratory, as well as automatically detect anomalies or errors. The novelty of our approach lies in the application of integrated web-based dashboards as an information management system in hospital laboratories. Methods We propose a new methodology for laboratory information management based on the extraction, consolidation, statistical analysis, and visualization of data related to demographics, reagents, and turn-around times, offering a dashboard-like user web interface to the laboratory manager. The methodology comprises a unified data warehouse that stores and consolidates multidimensional data from different data sources. The methodology is illustrated through the implementation and validation of DB4US, a novel web application based on this methodology that constructs an interface to obtain ready-to-use indicators, and offers the possibility to drill down from high-level metrics to more detailed summaries. The offered indicators are calculated beforehand so that they are ready to use when the user needs them. The design is based on a set of different parallel processes to precalculate indicators. The application displays information related to tests, requests, samples, and turn-around times. The dashboard is designed to show the set of indicators on a single screen. Results DB4US was deployed for the first time in the Hospital Costa del Sol in 2008. In our evaluation we show the positive impact of this methodology for laboratory professionals, since the use of our application has reduced the time needed for the elaboration of the different statistical indicators and has also provided information that has been used to optimize the usage of laboratory resources by the discovery of anomalies in the indicators. DB4US users benefit from Internet-based communication of results, since this information is available from any computer without having to install any additional software. Conclusions The proposed methodology and the accompanying web application, DB4US, automates the processing of information related to laboratory quality indicators and offers a novel approach for managing laboratory-related information, benefiting from an Internet-based communication mechanism. The application of this methodology has been shown to improve the usage of time, as well as other laboratory resources.
Collapse
|
5
|
Routine assessment of patient-reported outcomes in behavioral health: room for improvement. Qual Manag Health Care 2010; 19:70-81. [PMID: 20042935 DOI: 10.1097/qmh.0b013e3181ccbc53] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Outcomes assessment has become an important tool in assessing the quality of health care. To date, most quality initiatives have focused on adverse events, clinical processes, and/or cost variables. Considerably less attention has been paid to indices of clinical improvement, especially from a patient's perspective and in behavioral health settings. The relative inattention given to clinical improvement is attributable to a number of reasons, including (but not limited to) a lack of consensus regarding measures of improvement, few simple methods for data collection and analysis, and an inability to provide timely feedback. In this article, the authors describe a Web-based system designed to routinely collect quality-of-life ratings from patients in outpatient behavioral health clinics, allowing for real-time feedback at the patient levels regarding clinical improvement. The system also allows for administrative evaluation of overall clinic performance. The costs and benefits of this system are discussed.
Collapse
|
6
|
D'Avolio LW, Litwin MS, Rogers SO, Bui AAT. Facilitating Clinical Outcomes Assessment through the automated identification of quality measures for prostate cancer surgery. J Am Med Inform Assoc 2008; 15:341-8. [PMID: 18308980 DOI: 10.1197/jamia.m2649] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
OBJECTIVES The College of American Pathologists (CAP) Category 1 quality measures, tumor stage, Gleason score, and surgical margin status, are used by physicians and cancer registrars to categorize patients into groups for clinical trials and treatment planning. This study was conducted to evaluate the effectiveness of an application designed to automatically extract these quality measures from the postoperative pathology reports of patients having undergone prostatectomies for treatment of prostate cancer. DESIGN An application was developed with the Clinical Outcomes Assessment Toolkit that uses an information pipeline of regular expressions and support vector machines to extract CAP Category 1 quality measures. System performance was evaluated against a gold standard of 676 pathology reports from the University of California at Los Angeles Medical Center and Brigham and Women's Hospital. To evaluate the feasibility of clinical implementation, all pathology reports were gathered using administrative codes with no manual preprocessing of the data performed. MEASUREMENTS The sensitivity, specificity, and overall accuracy of system performance were measured for all three quality measures. Performance at both hospitals was compared, and a detailed failure analysis was conducted to identify errors caused by poor data quality versus system shortcomings. RESULTS Accuracies for Gleason score were 99.7%, tumor stage 99.1%, and margin status 97.2%, for an overall accuracy of 98.67%. System performance on data from both hospitals was comparable. Poor clinical data quality led to a decrease in overall accuracy of only 0.3% but accounted for 25.9% of the total errors. CONCLUSION Despite differences in document format and pathologists' reporting styles, strong system performance indicates the potential of using a combination of regular expressions and support vector machines to automatically extract CAP Category 1 quality measures from postoperative prostate cancer pathology reports.
Collapse
Affiliation(s)
- Leonard W D'Avolio
- Massachusetts Veterans Epidemiology Research and Information Center, Veterans Administration Hospital, Boston, MA, USA.
| | | | | | | |
Collapse
|