1
|
Ying L, Liu Z, Fang H, Kusko R, Wu L, Harris S, Tong W. Text summarization with ChatGPT for drug labeling documents. Drug Discov Today 2024:104018. [PMID: 38723763 DOI: 10.1016/j.drudis.2024.104018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/22/2024] [Accepted: 05/02/2024] [Indexed: 05/13/2024]
Abstract
Text summarization is crucial in scientific research, drug discovery and development, regulatory review, and more. This task demands domain expertise, language proficiency, semantic prowess, and conceptual skill. The recent advent of large language models (LLMs), such as ChatGPT, offers unprecedented opportunities to automate this process. We compared ChatGPT-generated summaries with those produced by human experts using FDA drug labeling documents. The labeling contains summaries of key labeling sections, making them an ideal human benchmark to evaluate ChatGPT's summarization capabilities. Analyzing >14000 summaries, we observed that ChatGPT-generated summaries closely resembled those generated by human experts. Importantly, ChatGPT exhibited even greater similarity when summarizing drug safety information. These findings highlight ChatGPT's potential to accelerate work in critical areas, including drug safety.
Collapse
Affiliation(s)
- Lan Ying
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Zhichao Liu
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA; Nonclinical Drug Safety, Boehringer Ingelheim Pharmaceuticals, Inc, Ridgefield, CT 06877, USA
| | - Hong Fang
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | | | - Leihong Wu
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Stephen Harris
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Weida Tong
- FDA National Center for Toxicological Research, Jefferson, AR 72079, USA.
| |
Collapse
|
2
|
Sasaki K, Nishikawa J, Morita J. Evaluation of co-speech gestures grounded in word-distributed representation. Front Robot AI 2024; 11:1362463. [PMID: 38726067 PMCID: PMC11079185 DOI: 10.3389/frobt.2024.1362463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 03/25/2024] [Indexed: 05/12/2024] Open
Abstract
The condition for artificial agents to possess perceivable intentions can be considered that they have resolved a form of the symbol grounding problem. Here, the symbol grounding is considered an achievement of the state where the language used by the agent is endowed with some quantitative meaning extracted from the physical world. To achieve this type of symbol grounding, we adopt a method for characterizing robot gestures with quantitative meaning calculated from word-distributed representations constructed from a large corpus of text. In this method, a "size image" of a word is generated by defining an axis (index) that discriminates the "size" of the word in the word-distributed vector space. The generated size images are converted into gestures generated by a physical artificial agent (robot). The robot's gesture can be set to reflect either the size of the word in terms of the amount of movement or in terms of its posture. To examine the perception of communicative intention in the robot that performs the gestures generated as described above, the authors examine human ratings on "the naturalness" obtained through an online survey, yielding results that partially validate our proposed method. Based on the results, the authors argue for the possibility of developing advanced artifacts that achieve human-like symbolic grounding.
Collapse
Affiliation(s)
- Kosuke Sasaki
- Department of Informatics, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Jumpei Nishikawa
- Department of Information Science and Technology, Graduate School of Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Junya Morita
- Department of Informatics, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
- Department of Information Science and Technology, Graduate School of Science and Technology, Shizuoka University, Shizuoka, Japan
- Department of Behavior Informatics, Faculty of Informatics, Shizuoka University, Hamamatsu, Japan
| |
Collapse
|
3
|
Choi I, Kim J, Kim WC. An Explainable Prediction for Dietary-Related Diseases via Language Models. Nutrients 2024; 16:686. [PMID: 38474813 DOI: 10.3390/nu16050686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024] Open
Abstract
Our study harnesses the power of natural language processing (NLP) to explore the relationship between dietary patterns and metabolic health outcomes among Korean adults using data from the Seventh Korea National Health and Nutrition Examination Survey (KNHANES VII). Using Latent Dirichlet Allocation (LDA) analysis, we identified three distinct dietary patterns: "Traditional and Staple", "Communal and Festive", and "Westernized and Convenience-Oriented". These patterns reflect the diversity of dietary preferences in Korea and reveal the cultural and social dimensions influencing eating habits and their potential implications for public health, particularly concerning obesity and metabolic disorders. Integrating NLP-based indices, including sentiment scores and the identified dietary patterns, into our predictive models significantly enhanced the accuracy of obesity and dyslipidemia predictions. This improvement was consistent across various machine learning techniques-XGBoost, LightGBM, and CatBoost-demonstrating the efficacy of NLP methodologies in refining disease prediction models. Our findings underscore the critical role of dietary patterns as indicators of metabolic diseases. The successful application of NLP techniques offers a novel approach to public health and nutritional epidemiology, providing a deeper understanding of the diet-disease nexus. This study contributes to the evolving field of personalized nutrition and emphasizes the potential of leveraging advanced computational tools to inform targeted nutritional interventions and public health strategies aimed at mitigating the prevalence of metabolic disorders in the Korean population.
Collapse
Affiliation(s)
- Insu Choi
- Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Jihye Kim
- Department of Genetics and Biotechnology, College of Life Sciences, Kyung Hee University, Yongin 17104, Republic of Korea
| | - Woo Chang Kim
- Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
4
|
Eberhardt ST, Schaffrath J, Moggia D, Schwartz B, Jaehde M, Rubel JA, Baur T, André E, Lutz W. Decoding emotions: Exploring the validity of sentiment analysis in psychotherapy. Psychother Res 2024:1-16. [PMID: 38415369 DOI: 10.1080/10503307.2024.2322522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVE Given the importance of emotions in psychotherapy, valid measures are essential for research and practice. As emotions are expressed at different levels, multimodal measurements are needed for a nuanced assessment. Natural Language Processing (NLP) could augment the measurement of emotions. The study explores the validity of sentiment analysis in psychotherapy transcripts. METHOD We used a transformer-based NLP algorithm to analyze sentiments in 85 transcripts from 35 patients. Construct and criterion validity were evaluated using self- and therapist reports and process and outcome measures via correlational, multitrait-multimethod, and multilevel analyses. RESULTS The results provide indications in support of the sentiments' validity. For example, sentiments were significantly related to self- and therapist reports of emotions in the same session. Sentiments correlated significantly with in-session processes (e.g., coping experiences), and an increase in positive sentiments throughout therapy predicted better outcomes after treatment termination. DISCUSSION Sentiment analysis could serve as a valid approach to assessing the emotional tone of psychotherapy sessions and may contribute to the multimodal measurement of emotions. Future research could combine sentiment analysis with automatic emotion recognition in facial expressions and vocal cues via the Nonverbal Behavior Analyzer (NOVA). Limitations (e.g., exploratory study with numerous tests) and opportunities are discussed.
Collapse
|
5
|
Ahmad A, Azzeh M, Alnagi E, Abu Al-Haija Q, Halabi D, Aref A, AbuHour Y. Hate speech detection in the Arabic language: corpus design, construction, and evaluation. Front Artif Intell 2024; 7:1345445. [PMID: 38444962 PMCID: PMC10912174 DOI: 10.3389/frai.2024.1345445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 01/25/2024] [Indexed: 03/07/2024] Open
Abstract
Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.
Collapse
Affiliation(s)
- Ashraf Ahmad
- Department of Computer Science, Princess Sumaya University for Technology (PSUT), Amman, Jordan
| | - Mohammad Azzeh
- Department of Data Science, Princess Sumaya University for Technology (PSUT), Amman, Jordan
| | - Eman Alnagi
- Department of Computer Science, Princess Sumaya University for Technology (PSUT), Amman, Jordan
| | - Qasem Abu Al-Haija
- Department of Cybersecurity, Faculty of Computer and Information Technology, Jordan University of Science and Technology, Irbid, Jordan
| | - Dana Halabi
- SAE Institute, Luminus Technical University College (LTUC), Amman, Jordan
| | - Abdullah Aref
- Department of Computer Science, Princess Sumaya University for Technology (PSUT), Amman, Jordan
| | - Yousef AbuHour
- Department of Basic Sciences, Princess Sumaya University for Technology (PSUT), Amman, Jordan
| |
Collapse
|
6
|
Mandava S, Oyer SL, Park SS. A quantitative analysis of Twitter ("X") trends in the discussion of rhinoplasty. Laryngoscope Investig Otolaryngol 2024; 9:e1227. [PMID: 38384363 PMCID: PMC10880128 DOI: 10.1002/lio2.1227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/04/2024] [Indexed: 02/23/2024] Open
Abstract
Introduction Rhinoplasty is one of the most common cosmetic surgical procedures performed globally. Twitter, also known as "X," is used by both patients and physicians and has been studied as a useful tool for analyzing trends in healthcare. The public social media discourse of rhinoplasty has not been previously reported in the field of otolaryngology. The goal of this study was to characterize the most common user type, sentiment, and temporal trends in the discussion of rhinoplasty on Twitter to guide facial plastic surgeons in their clinical and social media practices. Methods A total of 1,427,015 tweets published from 2015 to 2020 containing the keywords "rhinoplasty" or "nose job" were extracted using Twitter Academic API. Tweets were standardized and filtered for spam and duplication. Natural language processing (NLP) algorithms and data visualization techniques were applied to characterize tweets. Results Significantly more "nose job" tweets (80.8%) were published compared with "rhinoplasty" (19.2%). Annual tweet frequency increased over the 5 years, with "rhinoplasty" tweets peaking in January and "nose job" tweets peaking in the summer and winter months. Most "rhinoplasty" tweets were linked to a surgeon or medical practice source, while most "nose job" tweets were from isolated laypersons. While discussion was positive in sentiment overall (M = +0.08), "nose job" tweets had lower average sentiment scores (P < .001) and over twice the proportion of negative tweets. The top 20 most prolific accounts contributed to 14,758 (10.6%) of total "rhinoplasty" tweets. Exactly 90% (18/20) of those accounts linked to non-academic surgeons compared with 10% (2/20) linked to academic surgeons. Conclusions Rhinoplasty-related posts on Twitter were cumulatively positive in sentiment and tweet volume is steadily increasing over time, especially during popular holiday months. The search term "nose job" yields significantly more results than "rhinoplasty," and is the preferred term of non-healthcare users. We found a large digital contribution from surgeons and medical practices, particularly in the non-academic and private practice sector, utilizing Twitter for promotional purposes.
Collapse
Affiliation(s)
- Shreya Mandava
- School of MedicineUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Samuel L. Oyer
- Department of Otolaryngology‐Head and Neck SurgeryUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Stephen S. Park
- Department of Otolaryngology‐Head and Neck SurgeryUniversity of VirginiaCharlottesvilleVirginiaUSA
| |
Collapse
|
7
|
Nishioka S, Asano M, Yada S, Aramaki E, Yajima H, Kizaki H, Hori S. Detection of Adverse Event Signals with Severity Grade Classification from Cancer Patient Narrative. Stud Health Technol Inform 2024; 310:554-558. [PMID: 38269870 DOI: 10.3233/shti231026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Adverse event (AE) management is crucial to improve anti-cancer treatment outcomes, but it is reported that some AE signals can be missed in clinical visits. Thus, monitoring AE signals seamlessly, including events outside hospitals, would be helpful for early intervention. Here we investigated how to detect AE signals from texts written by cancer patients themselves by developing deep-learning (DL) models to classify posts mentioning AEs according to severity grade, in order to focus on those that might need immediate treatment interventions. Using patient blogs written in Japanese by cancer patients as a data source, we built DL models based on three approaches, BERT, ELECTRA, and T5. Among these models, T5 showed the best F1 scores for both Grade ≥ 1 and ≥ 2 article classification tasks (0.85 and 0.53, respectively). This model might benefit patients by enabling earlier AE signal detection, thereby improving quality of life.
Collapse
Affiliation(s)
- Satoshi Nishioka
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Masaki Asano
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | | | | | | | - Hayato Kizaki
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| |
Collapse
|
8
|
Li Z, Zhao H, Zhu G, Du J, Wu Z, Jiang Z, Li Y. Classification method of traditional Chinese medicine compound decoction duration based on multi-dimensional feature weighted fusion. Comput Methods Biomech Biomed Engin 2024:1-15. [PMID: 38193238 DOI: 10.1080/10255842.2024.2302225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/10/2023] [Indexed: 01/10/2024]
Abstract
This paper extends a text classification method utilizing natural language processing (NLP) into the field of traditional Chinese medicine (TCM) compound decoction to effectively and scientifically extend the TCM compound decoction duration. Specifically, a TCM compound decoction duration classification named TCM-TextCNN is proposed to fuse multi-dimensional herb features and improve TextCNN. Indeed, first, we utilize word vector technology to construct feature vectors of herb names and medicinal parts, aiming to describe the herb characteristics comprehensively. Second, considering the impact of different herb features on the decoction duration, we use an improved Term Frequency-Inverse Word Frequency (TF-IWF) algorithm to weigh the feature vectors of herb names and medicinal parts. These weighted feature vectors are then concatenated to obtain a multi-dimensional herb feature vector, allowing for a more comprehensive representation. Finally, the feature vector is input into the improved TextCNN, which uses k-max pooling to reduce information loss rather than max pooling. Three fully connected layers are added to generate higher-level feature representations, followed by softmax to obtain the final results. Experimental results on a dataset of TCM compound decoction duration demonstrate that TCM-TextCNN improves accuracy, recall, and F1 score by 5.31%, 5.63%, and 5.22%, respectively, compared to methods solely rely on herb name features, thereby confirming our method's effectiveness in classifying TCM compound decoction duration.
Collapse
Affiliation(s)
- Zhibiao Li
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
- Ganjiang New Area Zhiyao Shanhe Technology Co., Ltd, Nanchang, Jiangxi, China
- Key Laboratory of Artificial Intelligence in Chinese Medicine, Nanchang, Jiangxi, China
| | - Huayong Zhao
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Genhua Zhu
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Jianqiang Du
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Zhenfeng Wu
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
- Ganjiang New Area Zhiyao Shanhe Technology Co., Ltd, Nanchang, Jiangxi, China
| | - Zhicheng Jiang
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| | - Yiwen Li
- Computer Science College, Jiangxi University of Chinese Medicine, Nanchang, Jiangxi, China
| |
Collapse
|
9
|
Al Zubaer A, Granitzer M, Mitrović J. Performance analysis of large language models in the domain of legal argument mining. Front Artif Intell 2023; 6:1278796. [PMID: 38045763 PMCID: PMC10691378 DOI: 10.3389/frai.2023.1278796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 10/25/2023] [Indexed: 12/05/2023] Open
Abstract
Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.
Collapse
Affiliation(s)
- Abdullah Al Zubaer
- Faculty of Computer Science and Mathematics, Chair of Data Science, University of Passau, Passau, Germany
| | - Michael Granitzer
- Faculty of Computer Science and Mathematics, Chair of Data Science, University of Passau, Passau, Germany
| | - Jelena Mitrović
- Faculty of Computer Science and Mathematics, Chair of Data Science, University of Passau, Passau, Germany
- Group for Human Computer Interaction, Institute for Artificial Intelligence Research and Development of Serbia, Novi Sad, Serbia
| |
Collapse
|
10
|
Dong T, Sunderland N, Nightingale A, Fudulu DP, Chan J, Zhai B, Freitas A, Caputo M, Dimagli A, Mires S, Wyatt M, Benedetto U, Angelini GD. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioengineering (Basel) 2023; 10:1307. [PMID: 38002431 PMCID: PMC10669818 DOI: 10.3390/bioengineering10111307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. OBJECTIVES To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. METHODS 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. RESULTS Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. CONCLUSIONS The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
Collapse
Affiliation(s)
- Tim Dong
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Nicholas Sunderland
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Angus Nightingale
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Daniel P. Fudulu
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Jeremy Chan
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Alberto Freitas
- Faculty of Medicine, University of Porto, 4100 Porto, Portugal;
| | - Massimo Caputo
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Arnaldo Dimagli
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Stuart Mires
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Mike Wyatt
- University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK;
| | - Umberto Benedetto
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Gianni D. Angelini
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| |
Collapse
|
11
|
Yu C, Huang Y, Yan W, Jiang X. A comprehensive overview of psoriatic research over the past 20 years: machine learning-based bibliometric analysis. Front Immunol 2023; 14:1272080. [PMID: 37954610 PMCID: PMC10637956 DOI: 10.3389/fimmu.2023.1272080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
Background The surge in the number of publications on psoriasis has posed significant challenges for researchers in effectively managing the vast amount of information. However, due to the lack of tools to process metadata, no comprehensive bibliometric analysis has been conducted. Objectives This study is to evaluate the trends and current hotspots of psoriatic research from a macroscopic perspective through a bibliometric analysis assisted by machine learning based semantic analysis. Methods Publications indexed under the Medical Subject Headings (MeSH) term "Psoriasis" from 2003 to 2022 were extracted from PubMed. The generative statistical algorithm latent Dirichlet allocation (LDA) was applied to identify specific topics and trends based on abstracts. The unsupervised Louvain algorithm was used to establish a network identifying relationships between topics. Results A total of 28,178 publications were identified. The publications were derived from 176 countries, with United States, China, and Italy being the top three countries. For the term "psoriasis", 9,183 MeSH terms appeared 337,545 times. Among them, MeSH term "Severity of illness index", "Treatment outcome", "Dermatologic agents" occur most frequently. A total of 21,928 publications were included in LDA algorithm, which identified three main areas and 50 branched topics, with "Molecular pathogenesis", "Clinical trials", and "Skin inflammation" being the most increased topics. LDA networks identified "Skin inflammation" was tightly associated with "Molecular pathogenesis" and "Biological agents". "Nail psoriasis" and "Epidemiological study" have presented as new research hotspots, and attention on topics of comorbidities, including "Cardiovascular comorbidities", "Psoriatic arthritis", "Obesity" and "Psychological disorders" have increased gradually. Conclusions Research on psoriasis is flourishing, with molecular pathogenesis, skin inflammation, and clinical trials being the current hotspots. The strong association between skin inflammation and biologic agents indicated the effective translation between basic research and clinical application in psoriasis. Besides, nail psoriasis, epidemiological study and comorbidities of psoriasis also draw increased attention.
Collapse
Affiliation(s)
- Chenyang Yu
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Yingzhao Huang
- Department of Thoracic Surgery, Sichuan Cancer Hospital and Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Wei Yan
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, China
| | - Xian Jiang
- Department of Dermatology, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
12
|
Kamil MZ, Taleb-Berrouane M, Khan F, Amyotte P, Ahmed S. Textual data transformations using natural language processing for risk assessment. Risk Anal 2023; 43:2033-2052. [PMID: 36682740 DOI: 10.1111/risa.14100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/12/2022] [Accepted: 12/18/2022] [Indexed: 06/17/2023]
Abstract
Underlying information about failure, including observations made in free text, can be a good source for understanding, analyzing, and extracting meaningful information for determining causation. The unstructured nature of natural language expression demands advanced methodology to identify its underlying features. There is no available solution to utilize unstructured data for risk assessment purposes. Due to the scarcity of relevant data, textual data can be a vital learning source for developing a risk assessment methodology. This work addresses the knowledge gap in extracting relevant features from textual data to develop cause-effect scenarios with minimal manual interpretation. This study applies natural language processing and text-mining techniques to extract features from past accident reports. The extracted features are transformed into parametric form with the help of fuzzy set theory and utilized in Bayesian networks as prior probabilities for risk assessment. An application of the proposed methodology is shown in microbiologically influenced corrosion-related incident reports available from the Pipeline and Hazardous Material Safety Administration database. In addition, the trained named entity recognition (NER) model is verified on eight incidents, showing a promising preliminary result for identifying all relevant features from textual data and demonstrating the robustness and applicability of the NER method. The proposed methodology can be used in domain-specific risk assessment to analyze, predict, and prevent future mishaps, ameliorating overall process safety.
Collapse
Affiliation(s)
- Mohammad Zaid Kamil
- Centre for Risk, Integrity and Safety Engineering (C-RISE), Faculty of Engineering & Applied Science, Memorial University, St John's, Newfoundland, Canada
| | - Mohammed Taleb-Berrouane
- Centre for Risk, Integrity and Safety Engineering (C-RISE), Faculty of Engineering & Applied Science, Memorial University, St John's, Newfoundland, Canada
| | - Faisal Khan
- Centre for Risk, Integrity and Safety Engineering (C-RISE), Faculty of Engineering & Applied Science, Memorial University, St John's, Newfoundland, Canada
- Mary Kay O'Connor Process Safety Center, Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
| | - Paul Amyotte
- Department of Process Engineering and Applied Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Salim Ahmed
- Centre for Risk, Integrity and Safety Engineering (C-RISE), Faculty of Engineering & Applied Science, Memorial University, St John's, Newfoundland, Canada
| |
Collapse
|
13
|
Mardikoraem M, Wang Z, Pascual N, Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform 2023; 24:bbad358. [PMID: 37864295 PMCID: PMC10589401 DOI: 10.1093/bib/bbad358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/08/2023] [Accepted: 09/12/2023] [Indexed: 10/22/2023] Open
Abstract
The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Michigan State University (MSU)‘s Department of Chemical Engineering and Materials Science
| | - Zirui Wang
- Regeneron Pharmaceuticals, Inc. Having received his B.S. in Chemical Engineering from MSU, he is currently pursuing a M.S. in Computer Science from Syracuse University
| | | | - Daniel Woldring
- MSU’s Department of Chemical Engineering and Materials Science and a member of MSU’s Institute for Quantitative Health Sciences and Engineering
| |
Collapse
|
14
|
Baumgartner C, Baumgartner D. A regulatory challenge for natural language processing (NLP)-based tools such as ChatGPT to be legally used for healthcare decisions. Where are we now? Clin Transl Med 2023; 13:e1362. [PMID: 37548259 PMCID: PMC10405238 DOI: 10.1002/ctm2.1362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 07/29/2023] [Indexed: 08/08/2023] Open
Affiliation(s)
- Christian Baumgartner
- Institute of Health Care Engineering with European Testing Center of Medical DevicesGraz University of TechnologyGrazAustria
| | - Daniela Baumgartner
- Clinical Division of Pediatric Cardiology, Department of Pediatrics and Adolescent MedicineMedical University of GrazGrazAustria
| |
Collapse
|
15
|
Zhu J, Yalamanchi N, Jin R, Kenne DR, Phan N. Investigating COVID-19's Impact on Mental Health: Trend and Thematic Analysis of Reddit Users' Discourse. J Med Internet Res 2023; 25:e46867. [PMID: 37436793 PMCID: PMC10365637 DOI: 10.2196/46867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/03/2023] [Accepted: 05/09/2023] [Indexed: 07/13/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic has resulted in heightened levels of depression, anxiety, and other mental health issues due to sudden changes in daily life, such as economic stress, social isolation, and educational irregularity. Accurately assessing emotional and behavioral changes in response to the pandemic can be challenging, but it is essential to understand the evolving emotions, themes, and discussions surrounding the impact of COVID-19 on mental health. OBJECTIVE This study aims to understand the evolving emotions and themes associated with the impact of COVID-19 on mental health support groups (eg, r/Depression and r/Anxiety) on Reddit (Reddit Inc) during the initial phase and after the peak of the pandemic using natural language processing techniques and statistical methods. METHODS This study used data from the r/Depression and r/Anxiety Reddit communities, which consisted of posts contributed by 351,409 distinct users over a period spanning from 2019 to 2022. Topic modeling and Word2Vec embedding models were used to identify key terms associated with the targeted themes within the data set. A range of trend and thematic analysis techniques, including time-to-event analysis, heat map analysis, factor analysis, regression analysis, and k-means clustering analysis, were used to analyze the data. RESULTS The time-to-event analysis revealed that the first 28 days following a major event could be considered a critical window for mental health concerns to become more prominent. The theme trend analysis revealed key themes such as economic stress, social stress, suicide, and substance use, with varying trends and impacts in each community. The factor analysis highlighted pandemic-related stress, economic concerns, and social factors as primary themes during the analyzed period. Regression analysis showed that economic stress consistently demonstrated the strongest association with the suicide theme, whereas the substance theme had a notable association in both data sets. Finally, the k-means clustering analysis showed that in r/Depression, the number of posts related to the "depression, anxiety, and medication" cluster decreased after 2020, whereas the "social relationships and friendship" cluster showed a steady decrease. In r/Anxiety, the "general anxiety and feelings of unease" cluster peaked in April 2020 and remained high, whereas the "physical symptoms of anxiety" cluster showed a slight increase. CONCLUSIONS This study sheds light on the impact of COVID-19 on mental health and the related themes discussed in 2 web-based communities during the pandemic. The results offer valuable insights for developing targeted interventions and policies to support individuals and communities in similar crises.
Collapse
Affiliation(s)
- Jianfeng Zhu
- Department of Computer Science, Kent State University, Kent, OH, United States
| | - Neha Yalamanchi
- Department of Computer Science, Kent State University, Kent, OH, United States
| | - Ruoming Jin
- Department of Computer Science, Kent State University, Kent, OH, United States
| | - Deric R Kenne
- Center for Public Policy and Health, Kent State University, Kent, OH, United States
- College of Public Health, Kent State University, Kent, OH, United States
| | - NhatHai Phan
- Data Science Department, New Jersey Institute of Technology, Newark, NJ, United States
| |
Collapse
|
16
|
Lau C, Zhu X, Chan WY. Automatic depression severity assessment with deep learning using parameter-efficient tuning. Front Psychiatry 2023; 14:1160291. [PMID: 37398577 PMCID: PMC10308283 DOI: 10.3389/fpsyt.2023.1160291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/01/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction To assist mental health care providers with the assessment of depression, research to develop a standardized, accessible, and non-invasive technique has garnered considerable attention. Our study focuses on the application of deep learning models for automatic assessment of depression severity based on clinical interview transcriptions. Despite the recent success of deep learning, the lack of large-scale high-quality datasets is a major performance bottleneck for many mental health applications. Methods A novel approach is proposed to address the data scarcity problem for depression assessment. It leverages both pretrained large language models and parameter-efficient tuning techniques. The approach is built upon adapting a small set of tunable parameters, known as prefix vectors, to guide a pretrained model towards predicting the Patient Health Questionnaire (PHQ)-8 score of a person. Experiments were conducted on the Distress Analysis Interview Corpus - Wizard of Oz (DAIC-WOZ) benchmark dataset with 189 subjects, partitioned into training, development, and test sets. Model learning was done on the training set. Prediction performance mean and standard deviation of each model, with five randomly-initialized runs, were reported on the development set. Finally, optimized models were evaluated on the test set. Results The proposed model with prefix vectors outperformed all previously published methods, including models which utilized multiple types of data modalities, and achieved the best reported performance on the test set of DAIC-WOZ with a root mean square error of 4.67 and a mean absolute error of 3.80 on the PHQ-8 scale. Compared to conventionally fine-tuned baseline models, prefix-enhanced models were less prone to overfitting by using far fewer training parameters (<6% relatively). Discussion While transfer learning through pretrained large language models can provide a good starting point for downstream learning, prefix vectors can further adapt the pretrained models effectively to the depression assessment task by only adjusting a small number of parameters. The improvement is in part due to the fine-grain flexibility of prefix vector size in adjusting the model's learning capacity. Our results provide evidence that prefix-tuning can be a useful approach in developing tools for automatic depression assessment.
Collapse
|
17
|
Rho EH, Harrington M, Zhong Y, Pryzant R, Camp NP, Jurafsky D, Eberhardt JL. Escalated police stops of Black men are linguistically and psychologically distinct in their earliest moments. Proc Natl Acad Sci U S A 2023; 120:e2216162120. [PMID: 37253013 DOI: 10.1073/pnas.2216162120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023] Open
Abstract
Across the United States, police chiefs, city officials, and community leaders alike have highlighted the need to de-escalate police encounters with the public. This concern about escalation extends from encounters involving use of force to routine car stops, where Black drivers are disproportionately pulled over. Yet, despite the calls for action, we know little about the trajectory of police stops or how escalation unfolds. In study 1, we use methods from computational linguistics to analyze police body-worn camera footage from 577 stops of Black drivers. We find that stops with escalated outcomes (those ending in arrest, handcuffing, or a search) diverge from stops without these outcomes in their earliest moments-even in the first 45 words spoken by the officer. In stops that result in escalation, officers are more likely to issue commands as their opening words to the driver and less likely to tell drivers the reason why they are being stopped. In study 2, we expose Black males to audio clips of the same stops and find differences in how escalated stops are perceived: Participants report more negative emotion, appraise officers more negatively, worry about force being used, and predict worse outcomes after hearing only the officer's initial words in escalated versus non-escalated stops. Our findings show that car stops that end in escalated outcomes sometimes begin in an escalated fashion, with adverse effects for Black male drivers and, in turn, police-community relations.
Collapse
Affiliation(s)
- Eugenia H Rho
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061
| | | | - Yuyang Zhong
- Department of Organizational Studies, University of Michigan, Ann Arbor, MI 48109
| | - Reid Pryzant
- Department of Computer Science, Stanford University, Stanford, CA 94305
| | - Nicholas P Camp
- Department of Organizational Studies, University of Michigan, Ann Arbor, MI 48109
| | - Dan Jurafsky
- Department of Computer Science, Stanford University, Stanford, CA 94305
- Department of Linguistics, Stanford University, Stanford, CA 94305
| | - Jennifer L Eberhardt
- Department of Psychology, Stanford University, Stanford, CA 94305
- Graduate School of Business, Stanford University, Stanford, CA 94305
| |
Collapse
|
18
|
Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel) 2023; 13:diagnostics13111950. [PMID: 37296802 DOI: 10.3390/diagnostics13111950] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/28/2023] [Accepted: 06/01/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND AND AIMS Patients frequently have concerns about their disease and find it challenging to obtain accurate Information. OpenAI's ChatGPT chatbot (ChatGPT) is a new large language model developed to provide answers to a wide range of questions in various fields. Our aim is to evaluate the performance of ChatGPT in answering patients' questions regarding gastrointestinal health. METHODS To evaluate the performance of ChatGPT in answering patients' questions, we used a representative sample of 110 real-life questions. The answers provided by ChatGPT were rated in consensus by three experienced gastroenterologists. The accuracy, clarity, and efficacy of the answers provided by ChatGPT were assessed. RESULTS ChatGPT was able to provide accurate and clear answers to patients' questions in some cases, but not in others. For questions about treatments, the average accuracy, clarity, and efficacy scores (1 to 5) were 3.9 ± 0.8, 3.9 ± 0.9, and 3.3 ± 0.9, respectively. For symptoms questions, the average accuracy, clarity, and efficacy scores were 3.4 ± 0.8, 3.7 ± 0.7, and 3.2 ± 0.7, respectively. For diagnostic test questions, the average accuracy, clarity, and efficacy scores were 3.7 ± 1.7, 3.7 ± 1.8, and 3.5 ± 1.7, respectively. CONCLUSIONS While ChatGPT has potential as a source of information, further development is needed. The quality of information is contingent upon the quality of the online information provided. These findings may be useful for healthcare providers and patients alike in understanding the capabilities and limitations of ChatGPT.
Collapse
Affiliation(s)
- Adi Lahat
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Eyal Shachar
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Benjamin Avidan
- Chaim Sheba Medical Center, Department of Gastroenterology, Affiliated to Tel Aviv University, Tel Aviv 69978, Israel
| | - Benjamin Glicksberg
- Mount Sinai Clinical Intelligence Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eyal Klang
- The Sami Sagol AI Hub, ARC Innovation Center, Chaim Sheba Medical Center, Affiliated to Tel-Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
19
|
Han L, Luther SL, Finch DK, Dobscha SK, Skanderson M, Bathulapalli H, Fodeh SJ, Hahm B, Bouayad L, Lee A, Goulet JL, Brandt CA, Kerns RD. Complementary and Integrative Health Approaches and Pain Care Quality in the Veterans Health Administration Primary Care Setting: A Quasi-Experimental Analysis. J Integr Complement Med 2023; 29:420-429. [PMID: 36971840 PMCID: PMC10280173 DOI: 10.1089/jicm.2022.0686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Background: Complementary and integrative health (CIH) approaches have been recommended in national and international clinical guidelines for chronic pain management. We set out to determine whether exposure to CIH approaches is associated with pain care quality (PCQ) in the Veterans Health Administration (VHA) primary care setting. Methods: We followed a cohort of 62,721 Veterans with newly diagnosed musculoskeletal disorders between October 2016 and September 2017 over 1-year. PCQ scores were derived from primary care progress notes using natural language processing. CIH exposure was defined as documentation of acupuncture, chiropractic or massage therapies by providers. Propensity scores (PSs) were used to match one control for each Veteran with CIH exposure. Generalized estimating equations were used to examine associations between CIH exposure and PCQ scores, accounting for potential selection and confounding bias. Results: CIH was documented for 14,114 (22.5%) Veterans over 16,015 primary care clinic visits during the follow-up period. The CIH exposure group and the 1:1 PS-matched control group achieved superior balance on all measured baseline covariates, with standardized differences ranging from 0.000 to 0.045. CIH exposure was associated with an adjusted rate ratio (aRR) of 1.147 (95% confidence interval [CI]: 1.142, 1.151) on PCQ total score (mean: 8.36). Sensitivity analyses using an alternative PCQ scoring algorithm (aRR: 1.155; 95% CI: 1.150-1.160) and redefining CIH exposure by chiropractic alone (aRR: 1.118; 95% CI: 1.110-1.126) derived consistent results. Discussion: Our data suggest that incorporating CIH approaches may reflect higher overall quality of care for patients with musculoskeletal pain seen in primary care settings, supporting VHA initiatives and the Declaration of Astana to build comprehensive, sustainable primary care capacity for pain management. Future investigation is warranted to better understand whether and to what degree the observed association may reflect the therapeutic benefits patients actually received or other factors such as empowering provider-patient education and communication about these approaches.
Collapse
Affiliation(s)
- Ling Han
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
| | - Stephen L. Luther
- James A. Haley Veterans Hospital, Tampa, FL, USA
- University of South Florida, College of Public Health, Tampa, FL, USA
| | | | - Steven K. Dobscha
- Oregon Health and Science University, Portland, OR, USA
- VA Portland Health Care System, Portland, OR, USA
| | - Melissa Skanderson
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
| | - Harini Bathulapalli
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
| | - Samah J. Fodeh
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Bridget Hahm
- James A. Haley Veterans Hospital, Tampa, FL, USA
| | - Lina Bouayad
- James A. Haley Veterans Hospital, Tampa, FL, USA
- Florida International University, Miami, FL, USA
| | - Allison Lee
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
| | - Joseph L. Goulet
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Cynthia A. Brandt
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Robert D. Kerns
- VA Connecticut Healthcare System, Pain Research, Informatics, Multimorbdities and Education (PRIME) Center, West Haven, CT, USA
- Departments of Psychiatry, Neurology and Psychology, Yale University, New Haven, CT, USA
| |
Collapse
|
20
|
Bilal M, Khan A, Jan S, Musa S, Ali S. Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications. Sensors (Basel) 2023; 23:3909. [PMID: 37112249 PMCID: PMC10143294 DOI: 10.3390/s23083909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 01/17/2023] [Accepted: 01/19/2023] [Indexed: 06/19/2023]
Abstract
Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech may result in hate crimes, cyber violence, and substantial harm to cyberspace, physical security, and social safety. As a result, hate speech detection is a critical issue for both cyberspace and physical society, necessitating the development of a robust application capable of detecting and combating it in real-time. Hate speech detection is a context-dependent problem that requires context-aware mechanisms for resolution. In this study, we employed a transformer-based model for Roman Urdu hate speech classification due to its ability to capture the text context. In addition, we developed the first Roman Urdu pre-trained BERT model, which we named BERT-RU. For this purpose, we exploited the capabilities of BERT by training it from scratch on the largest Roman Urdu dataset consisting of 173,714 text messages. Traditional and deep learning models were used as baseline models, including LSTM, BiLSTM, BiLSTM + Attention Layer, and CNN. We also investigated the concept of transfer learning by using pre-trained BERT embeddings in conjunction with deep learning models. The performance of each model was evaluated in terms of accuracy, precision, recall, and F-measure. The generalization of each model was evaluated on a cross-domain dataset. The experimental results revealed that the transformer-based model, when directly applied to the classification task of the Roman Urdu hate speech, outperformed traditional machine learning, deep learning models, and pre-trained transformer-based models in terms of accuracy, precision, recall, and F-measure, with scores of 96.70%, 97.25%, 96.74%, and 97.89%, respectively. In addition, the transformer-based model exhibited superior generalization on a cross-domain dataset.
Collapse
Affiliation(s)
- Muhammad Bilal
- Department of Computer Science, Islamia College Peshawar, Peshawar 25130, Pakistan
| | - Atif Khan
- Department of Computer Science, Islamia College Peshawar, Peshawar 25130, Pakistan
| | - Salman Jan
- Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Kuala Lumpur 50250, Malaysia
- Department of Computer Science, Bacha Khan University Charsadda, Charsadda 24420, Pakistan
| | - Shahrulniza Musa
- Malaysian Institute of Information Technology, Universiti Kuala Lumpur, Kuala Lumpur 50250, Malaysia
| | - Shaukat Ali
- Department of Computer Science, Islamia College Peshawar, Peshawar 25130, Pakistan
| |
Collapse
|
21
|
Hu G, Ahmed M, L'Abbé MR. Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods. Am J Clin Nutr 2023; 117:553-563. [PMID: 36872019 DOI: 10.1016/j.ajcnut.2022.11.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/16/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Food categorization and nutrient profiling are labor intensive, time consuming, and costly tasks, given the number of products and labels in large food composition databases and the dynamic food supply. OBJECTIVES This study used a pretrained language model and supervised machine learning to automate food category classification and nutrition quality score prediction based on manually coded and validated data, and compared prediction results with models using bag-of-words and structured nutrition facts as inputs for predictions. METHODS Food product information from University of Toronto Food Label Information and Price Database 2017 (n = 17,448) and University of Toronto Food Label Information and Price Database 2020 (n = 74,445) databases were used. Health Canada's Table of Reference Amounts (TRA) (24 categories and 172 subcategories) was used for food categorization and the Food Standards of Australia and New Zealand (FSANZ) nutrient profiling system was used for nutrition quality score evaluation. TRA categories and FSANZ scores were manually coded and validated by trained nutrition researchers. A modified pretrained sentence-Bidirectional Encoder Representations from Transformers model was used to encode unstructured text from food labels into lower-dimensional vector representations, followed by supervised machine learning algorithms (i.e., elastic net, k-Nearest Neighbors, and XGBoost) for multiclass classification and regression tasks. RESULTS Pretrained language model representations utilized by the XGBoost multiclass classification algorithm reached overall accuracy scores of 0.98 and 0.96 in predicting food TRA major and subcategories, outperforming bag-of-words methods. For FSANZ score prediction, our proposed method reached a similar prediction accuracy (R2: 0.87 and MSE: 14.4) compared with bag-of-words methods (R2: 0.72-0.84; MSE: 30.3-17.6), whereas structured nutrition facts machine learning model performed the best (R2: 0.98; MSE: 2.5). The pretrained language model had a higher generalizable ability on the external test datasets than bag-of-words methods. CONCLUSIONS Our automation achieved high accuracy in classifying food categories and predicting nutrition quality scores using text information found on food labels. This approach is effective and generalizable in a dynamic food environment, where large amounts of food label data can be obtained from websites.
Collapse
Affiliation(s)
- Guanlan Hu
- Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Mavra Ahmed
- Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Joannah & Brian Lawson Centre for Child Nutrition, University of Toronto, ON, Canada
| | - Mary R L'Abbé
- Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
22
|
Yang S, Duan X, Xiao Z, Li Z, Liu Y, Jie Z, Tang D, Du H. Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN. Int J Environ Res Public Health 2022; 19:13520. [PMID: 36294096 PMCID: PMC9602456 DOI: 10.3390/ijerph192013520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 10/11/2022] [Accepted: 10/14/2022] [Indexed: 06/16/2023]
Abstract
Nowadays, tourists increasingly prefer to check the reviews of attractions before traveling to decide whether to visit them or not. To respond to the change in the way tourists choose attractions, it is important to classify the reviews of attractions with high precision. In addition, more and more tourists like to use emojis to express their satisfaction or dissatisfaction with the attractions. In this paper, we built a dataset for Chinese attraction evaluation incorporating emojis (CAEIE) and proposed an explicitly n-gram masking method to enhance the integration of coarse-grained information into a pre-training (ERNIE-Gram) and Text Graph Convolutional Network (textGCN) (E2G) model to classify the dataset with a high accuracy. The E2G preprocesses the text and feeds it to ERNIE-Gram and TextGCN. ERNIE-Gram was trained using its unique mask mechanism to obtain the final probabilities. TextGCN used the dataset to construct heterogeneous graphs with comment text and words, which were trained to obtain a representation of the document output category probabilities. The two probabilities were calculated to obtain the final results. To demonstrate the validity of the E2G model, this paper was compared with advanced models. After experiments, it was shown that E2G had a good classification effect on the CAEIE dataset, and the accuracy of classification was up to 97.37%. Furthermore, the accuracy of E2G was 1.37% and 1.35% ahead of ERNIE-Gram and TextGCN, respectively. In addition, two sets of comparison experiments were conducted to verify the performance of TextGCN and TextGAT on the CAEIE dataset. The final results showed that ERNIE and ERNIE-Gram combined TextGCN and TextGAT, respectively, and TextGCN performed 1.6% and 2.15% ahead. This paper compared the effects of eight activation functions on the second layer of the TextGCN and the activation-function-rectified linear unit 6 (RELU6) with the best results based on experiments.
Collapse
Affiliation(s)
- Senqi Yang
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Xuliang Duan
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Zeyan Xiao
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Zhiyao Li
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Yuhai Liu
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Zhihao Jie
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Dezhao Tang
- College of Information Engineering, Sichuan Agricultural University, Ya’an 625000, China
- The Lab of Agricultural Information Engineering, Sichuan Key Laboratory, Ya’an 625000, China
| | - Hui Du
- Housing and Urban-Rural Development Bureau of Lincheng County, Xingtai 054000, China
| |
Collapse
|
23
|
Li Z, Wang X, Xu M, Li Y, Wang Y, Chen Y, Li S, Li Z, Yang J, Tang C, Xiong F, Jian W, He P, Zhan Y, Zheng J, Ye F. Development and clinical application of an electronic health record quality control system for pulmonary aspergillosis based on guidelines and natural language processing technology. J Thorac Dis 2022; 14:3398-3407. [PMID: 36245604 PMCID: PMC9562533 DOI: 10.21037/jtd-22-532] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 08/19/2022] [Indexed: 11/26/2022]
Abstract
Background There are considerable differences in the diagnosis and treatment of pulmonary aspergillosis (PA) between specialized hospitals and primary hospitals or developed areas and underdeveloped areas in China. There is a lack of electronic systems that assist respiratory physicians in standardizing the diagnosis and treatment of PA. Methods We extracted 26 quality control points from the latest guidelines related to PA, and developed a PA quality control system of electronic health record (EHR) based on natural language processing (NLP) techniques. We obtained PA patient records in the Department of Respiratory Medicine of the First Affiliated Hospital of Guangzhou Medical University to verify the effectiveness of the system comparing with manually evaluation of respiratory experts. Results We successfully developed quality control system of PA; 699 PA medical records from EHR of the First Affiliated Hospital of Guangzhou Medical University between January 2015 and March 2020 were obtained and assessed by the system; 162 defects were found, which included 19 medical records with diagnostic defects, 76 medical records with examination defects, and 80 medical records with treatment defects; 200 medical records were sampled for validation, and found that the sensitivity and accuracy of quality control system for pulmonary aspergillosis (QCSA) were 0.99 and 0.96, F1 value was 0.85, and the recall rate was 0.77 compared with experts' evaluation. Conclusions Our system successfully uses medical guidelines and NLP technology to detect defects in the diagnosis and treatment of PA, which helps to improve the management quality of PA patients.
Collapse
Affiliation(s)
- Zhengtu Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xidong Wang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Mengke Xu
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yongming Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yinguang Wang
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yijun Chen
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shaoqiang Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zhun Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jinglu Yang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Chun Tang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Fangshu Xiong
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Wenhua Jian
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Peimei He
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yangqing Zhan
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jinping Zheng
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Feng Ye
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
24
|
Rathee S, MacMahon M, Liu A, Katritsis NM, Youssef G, Hwang W, Wollman L, Han N. DILI C : An AI-Based Classifier to Search for Drug-Induced Liver Injury Literature. Front Genet 2022; 13:867946. [PMID: 35846129 PMCID: PMC9277181 DOI: 10.3389/fgene.2022.867946] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/11/2022] [Indexed: 01/15/2023] Open
Abstract
Drug-induced liver injury (DILI) is a class of adverse drug reactions (ADR) that causes problems in both clinical and research settings. It is the most frequent cause of acute liver failure in the majority of Western countries and is a major cause of attrition of novel drug candidates. Manual trawling of the literature is the main route of deriving information on DILI from research studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related articles from the huge ocean of literature could be invaluable for the drug discovery community. In this study, we built an artificial intelligence (AI) model combining the power of natural language processing (NLP) and machine learning (ML) to address this problem. This model uses NLP to filter out meaningless text (e.g., stop words) and uses customized functions to extract relevant keywords such as singleton, pair, and triplet. These keywords are processed by an apriori pattern mining algorithm to extract relevant patterns which are used to estimate initial weightings for a ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods builds a DILI classifier (DILI C ), with 94.91% cross-validation and 94.14% external validation accuracy. To make DILI C as accessible as possible, including to researchers without coding experience, an R Shiny app capable of classifying single or multiple entries for DILI is developed to enhance ease of user experience and made available at https://researchmind.co.uk/diliclassifier/. Additionally, a GitHub link (https://github.com/sanjaysinghrathi/DILI-Classifier) for app source code and ISMB extended video talk (https://www.youtube.com/watch?v=j305yIVi_f8) are available as supplementary materials.
Collapse
Affiliation(s)
- Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Meabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,LifeArc, Stevenage, United Kingdom
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Nicholas M Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
25
|
Park J, Choi W, Jung SU. Exploring Trends in Environmental, Social, and Governance Themes and Their Sentimental Value Over Time. Front Psychol 2022; 13:890435. [PMID: 35837641 PMCID: PMC9275432 DOI: 10.3389/fpsyg.2022.890435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
Environmental, social, and governance (ESG) is an indicator that measures a company's non-financial performance. Many firms have recently emphasized the importance of ESG. Ascertaining what topics are being discussed around ESG and how they change over time will contribute significantly to gaining insight into ESG. Using 73,397,870 text data scraped and refined from publicly available Twitter data, this study applied Latent Dirichlet Allocation (LDA) and the dynamic topic model (DTM) to ascertain the hidden structure of the ESG-related document collection and the topics being discussed. The study further conducts a sentiment analysis to examine the sentiment of the general public regarding ESG. Topic modeling shows that various topics regarding ESG are being discussed and evolve over time. Sentiment analysis shows that many people have neutral or positive sentiments toward ESG-related issues. This study contributes to exploring insights into ESG among the public and understanding public reactions toward ESG. We further conclude the study with a discussion of managerial implications and potential future research.
Collapse
Affiliation(s)
- Joonbeom Park
- Graduate School of Information, Yonsei University, Seoul, South Korea
| | - Woojoo Choi
- Graduate Business School, Hankuk University of Foreign Studies, Seoul, South Korea
| | - Sang-Uk Jung
- Graduate Business School, Hankuk University of Foreign Studies, Seoul, South Korea
| |
Collapse
|
26
|
Wang F, Wang H, Wang L, Lu H, Qiu S, Zang T, Zhang X, Hu Y. MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences. Brief Bioinform 2022; 23:6571528. [PMID: 35443027 DOI: 10.1093/bib/bbab595] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/14/2021] [Accepted: 12/23/2021] [Indexed: 11/14/2022] Open
Abstract
Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide-MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide-MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).
Collapse
Affiliation(s)
- Fuxu Wang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Haoyan Wang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Lizhuang Wang
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Haoyu Lu
- Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shizheng Qiu
- Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tianyi Zang
- Cisco Research, NLP team, California, United States
| | - Xinjun Zhang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yang Hu
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
27
|
Wang K, Herr I. Machine-Learning-Based Bibliometric Analysis of Pancreatic Cancer Research Over the Past 25 Years. Front Oncol 2022; 12:832385. [PMID: 35419289 PMCID: PMC8995465 DOI: 10.3389/fonc.2022.832385] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 03/03/2022] [Indexed: 12/21/2022] Open
Abstract
Machine learning and semantic analysis are computer-based methods to evaluate complex relationships and predict future perspectives. We used these technologies to define recent, current and future topics in pancreatic cancer research. Publications indexed under the Medical Subject Headings (MeSH) term 'Pancreatic Neoplasms' from January 1996 to October 2021 were downloaded from PubMed. Using the statistical computing language R and the interpreted, high-level, general-purpose programming language Python, we extracted publication dates, geographic information, and abstracts from each publication's metadata for bibliometric analyses. The generative statistical algorithm "latent Dirichlet allocation" (LDA) was applied to identify specific research topics and trends. The unsupervised "Louvain algorithm" was used to establish a network to identify relationships between single topics. A total of 60,296 publications were identified and analyzed. The publications were derived from 133 countries, mostly from the Northern Hemisphere. For the term "pancreatic cancer research", 12,058 MeSH terms appeared 1,395,060 times. Among them, we identified the four main topics "Clinical Manifestation and Diagnosis", "Review and Management", "Treatment Studies", and "Basic Research". The number of publications has increased rapidly during the past 25 years. Based on the number of publications, the algorithm predicted that "Immunotherapy", Prognostic research", "Protein expression", "Case reports", "Gemcitabine and mechanism", "Clinical study of gemcitabine", "Operation and postoperation", "Chemotherapy and resection", and "Review and management" as current research topics. To our knowledge, this is the first study on this subject of pancreatic cancer research, which has become possible due to the improvement of algorithms and hardware.
Collapse
Affiliation(s)
- Kangtao Wang
- Molecular OncoSurgery, Section Surgical Research, Department of General, Visceral and Transplantation Surgery, University of Heidelberg, Heidelberg, Germany
| | - Ingrid Herr
- Molecular OncoSurgery, Section Surgical Research, Department of General, Visceral and Transplantation Surgery, University of Heidelberg, Heidelberg, Germany
| |
Collapse
|
28
|
Lanyi K, Green R, Craig D, Marshall C. COVID-19 Vaccine Hesitancy: Analysing Twitter to Identify Barriers to Vaccination in a Low Uptake Region of the UK. Front Digit Health 2022; 3:804855. [PMID: 35141699 PMCID: PMC8818664 DOI: 10.3389/fdgth.2021.804855] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 12/30/2021] [Indexed: 11/13/2022] Open
Abstract
To facilitate effective targeted COVID-19 vaccination strategies, it is important to understand reasons for vaccine hesitancy where uptake is low. Artificial intelligence (AI) techniques offer an opportunity for real-time analysis of public attitudes, sentiments, and key discussion topics from sources of soft-intelligence, including social media data. In this work, we explore the value of soft-intelligence, leveraged using AI, as an evidence source to support public health research. As a case study, we deployed a natural language processing (NLP) platform to rapidly identify and analyse key barriers to vaccine uptake from a collection of geo-located tweets from London, UK. We developed a search strategy to capture COVID-19 vaccine related tweets, identifying 91,473 tweets between 30 November 2020 and 15 August 2021. The platform's algorithm clustered tweets according to their topic and sentiment, from which we extracted 913 tweets from the top 12 negative sentiment topic clusters. These tweets were extracted for further qualitative analysis. We identified safety concerns; mistrust of government and pharmaceutical companies; and accessibility issues as key barriers limiting vaccine uptake. Our analysis also revealed widespread sharing of vaccine misinformation amongst Twitter users. This study further demonstrates that there is promising utility for using off-the-shelf NLP tools to leverage insights from social media data to support public health research. Future work to examine where this type of work might be integrated as part of a mixed-methods research approach to support local and national decision making is suggested.
Collapse
Affiliation(s)
- Katherine Lanyi
- National Institute for Health Research Innovation Observatory (NIHR) Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle, United Kingdom
| | | | | | | |
Collapse
|
29
|
Karim HT, Vahia IV, Iaboni A, Lee EE. Editorial: Artificial Intelligence in Geriatric Mental Health Research and Clinical Care. Front Psychiatry 2022; 13:859175. [PMID: 35299825 PMCID: PMC8921095 DOI: 10.3389/fpsyt.2022.859175] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 02/07/2022] [Indexed: 11/24/2022] Open
Affiliation(s)
- Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, United States.,Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, United States
| | - Ipsit V Vahia
- Division of Geriatrics, McLean Hospital, Belmont, MA, United States.,Department of Psychiatry, Harvard Medical School, Boston, MA, United States
| | - Andrea Iaboni
- Knowledge, Innovation, Talent, Everywhere (KITE), Toronto Rehab Institute, University Health Network, Toronto, ON, Canada.,Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Ellen E Lee
- Department of Psychiatry, University of California, San Diego, San Diego, CA, United States.,Sam and Rose Stein Institute for Research on Aging, University of California, San Diego, San Diego, CA, United States.,Desert-Pacific Mental Illness Research Education and Clinical Center, Veterans Affairs San Diego Healthcare System, San Diego, CA, United States
| |
Collapse
|
30
|
Pair E, Vicas N, Weber AM, Meausoone V, Zou J, Njuguna A, Darmstadt GL. Quantification of Gender Bias and Sentiment Toward Political Leaders Over 20 Years of Kenyan News Using Natural Language Processing. Front Psychol 2021; 12:712646. [PMID: 34955949 PMCID: PMC8703202 DOI: 10.3389/fpsyg.2021.712646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/17/2021] [Indexed: 11/16/2022] Open
Abstract
Background: Despite a 2010 Kenyan constitutional amendment limiting members of elected public bodies to < two-thirds of the same gender, only 22 percent of the 12th Parliament members inaugurated in 2017 were women. Investigating gender bias in the media is a useful tool for understanding socio-cultural barriers to implementing legislation for gender equality. Natural language processing (NLP) methods, such as word embedding and sentiment analysis, can efficiently quantify media biases at a scope previously unavailable in the social sciences. Methods: We trained GloVe and word2vec word embeddings on text from 1998 to 2019 from Kenya’s Daily Nation newspaper. We measured gender bias in these embeddings and used sentiment analysis to predict quantitative sentiment scores for sentences surrounding female leader names compared to male leader names. Results: Bias in leadership words for men and women measured from Daily Nation word embeddings corresponded to temporal trends in men and women’s participation in political leadership (i.e., parliamentary seats) using GloVe (correlation 0.8936, p = 0.0067, r2 = 0.799) and word2vec (correlation 0.844, p = 0.0169, r2 = 0.712) algorithms. Women continue to be associated with domestic terms while men continue to be associated with influence terms, for both regular gender words and female and male political leaders’ names. Male words (e.g., he, him, man) were mentioned 1.84 million more times than female words from 1998 to 2019. Sentiment analysis showed an increase in relative negative sentiment associated with female leaders (p = 0.0152) and an increase in positive sentiment associated with male leaders over time (p = 0.0216). Conclusion: Natural language processing is a powerful method for gaining insights into and quantifying trends in gender biases and sentiment in news media. We found evidence of improvement in gender equality but also a backlash from increased female representation in high-level governmental leadership.
Collapse
Affiliation(s)
- Emma Pair
- Department of Pediatrics, Global Center for Gender Equality, School of Medicine, Stanford University, Stanford, CA, United States
| | - Nikitha Vicas
- Department of Neuroscience, University of Texas - Dallas, Dallas, TX, United States
| | - Ann M Weber
- School of Public Health, University of Nevada, Reno, NV, United States
| | - Valerie Meausoone
- Research Computing Center, Stanford University, Stanford, CA, United States
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Amos Njuguna
- School of Graduate Studies, Research and Extension, United States International University - Africa, Nairobi, Kenya
| | - Gary L Darmstadt
- Department of Pediatrics, Global Center for Gender Equality, School of Medicine, Stanford University, Stanford, CA, United States
| |
Collapse
|
31
|
Kim S, Lee CK, Choi Y, Baek ES, Choi JE, Lim JS, Kang J, Shin SJ. Deep-Learning-Based Natural Language Processing of Serial Free-Text Radiological Reports for Predicting Rectal Cancer Patient Survival. Front Oncol 2021; 11:747250. [PMID: 34868947 PMCID: PMC8635726 DOI: 10.3389/fonc.2021.747250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 10/28/2021] [Indexed: 01/04/2023] Open
Abstract
Most electronic medical records, such as free-text radiological reports, are unstructured; however, the methodological approaches to analyzing these accumulating unstructured records are limited. This article proposes a deep-transfer-learning-based natural language processing model that analyzes serial magnetic resonance imaging reports of rectal cancer patients and predicts their overall survival. To evaluate the model, a retrospective cohort study of 4,338 rectal cancer patients was conducted. The experimental results revealed that the proposed model utilizing pre-trained clinical linguistic knowledge could predict the overall survival of patients without any structured information and was superior to the carcinoembryonic antigen in predicting survival. The deep-transfer-learning model using free-text radiological reports can predict the survival of patients with rectal cancer, thereby increasing the utility of unstructured medical big data.
Collapse
Affiliation(s)
- Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Choong-Kun Lee
- Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, South Korea.,Songdang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, South Korea
| | - Yonghwa Choi
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Eun Sil Baek
- Songdang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, South Korea
| | - Jeong Eun Choi
- Songdang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, South Korea
| | - Joon Seok Lim
- Department of Radiology, Yonsei University College of Medicine, Seoul, South Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, South Korea
| | - Sang Joon Shin
- Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, South Korea.,Songdang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, South Korea
| |
Collapse
|
32
|
Hunter B, Reis S, Campbell D, Matharu S, Ratnakumar P, Mercuri L, Hindocha S, Kalsi H, Mayer E, Glampson B, Robinson EJ, Al-Lazikani B, Scerri L, Bloch S, Lee R. Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre. Front Med (Lausanne) 2021; 8:748168. [PMID: 34805217 PMCID: PMC8599820 DOI: 10.3389/fmed.2021.748168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 10/07/2021] [Indexed: 12/04/2022] Open
Abstract
Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate lung nodule identification in a tertiary cancer centre. Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients. Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy. Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
Collapse
Affiliation(s)
- Benjamin Hunter
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom.,Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Sara Reis
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom
| | - Des Campbell
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom
| | - Sheila Matharu
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom
| | | | - Luca Mercuri
- Imperial College Healthcare National Health Service (NHS) Trust, Imperial Clinical Analytics, Research and Evaluation, London, United Kingdom
| | - Sumeet Hindocha
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom.,Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Hardeep Kalsi
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom.,Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Erik Mayer
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom.,Imperial College Healthcare National Health Service (NHS) Trust, Imperial Clinical Analytics, Research and Evaluation, London, United Kingdom
| | - Ben Glampson
- Imperial College Healthcare National Health Service (NHS) Trust, Imperial Clinical Analytics, Research and Evaluation, London, United Kingdom
| | - Emily J Robinson
- The Royal Marsden National Health Service (NHS) Foundation Trust, Royal Marsden Clinical Trials Unit, London, United Kingdom
| | - Bisan Al-Lazikani
- The Institute for Cancer Research, Computational Biology and Chromogenetics, London, United Kingdom
| | - Lisa Scerri
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom
| | - Susannah Bloch
- Imperial College Healthcare Trust, Respiratory Medicine, London, United Kingdom
| | - Richard Lee
- The Royal Marsden National Health Service (NHS) Foundation Trust, Lung Unit, London, United Kingdom.,Imperial College London, National Heart and Lung Institute, London, United Kingdom.,The Institute for Cancer Research, Early Diagnosis and Detection, Genetics and Epidemiology, London, United Kingdom
| |
Collapse
|
33
|
Tsuji S, Wen A, Takahashi N, Zhang H, Ogasawara K, Jiang G. Developing a RadLex-Based Named Entity Recognition Tool for Mining Textual Radiology Reports: Development and Performance Evaluation Study. J Med Internet Res 2021; 23:e25378. [PMID: 34714247 PMCID: PMC8590187 DOI: 10.2196/25378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 07/06/2021] [Accepted: 07/27/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions such as the name and location of a disease for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities that can be extracted depends on the dictionary lookup. In particular, the recognition of compound terms is very complicated because of the variety of patterns. OBJECTIVE The aim of this study is to develop and evaluate an NER tool concerned with compound terms using RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general purpose dictionary). We manually annotated 400 radiology reports for compound terms in noun phrases and used them as the gold standard for performance evaluation (precision, recall, and F-measure). In addition, we created a compound terms-enhanced dictionary (CtED) by analyzing false negatives and false positives and applied it to another 100 radiology reports for validation. We also evaluated the stem terms of compound terms by defining two measures: occurrence ratio (OR) and matching ratio (MR). RESULTS The F-measure of cTAKES+RadLex+general purpose dictionary was 30.9% (precision 73.3% and recall 19.6%) and that of the combined CtED was 63.1% (precision 82.8% and recall 51%). The OR indicated that the stem terms of effusion, node, tube, and disease were used frequently, but it still lacks capturing compound terms. The MR showed that 71.85% (9411/13,098) of the stem terms matched with that of the ontologies, and RadLex improved approximately 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using the ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance with regard to expanding vocabularies.
Collapse
Affiliation(s)
- Shintaro Tsuji
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States.,Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan
| | - Andrew Wen
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States
| | - Naoki Takahashi
- Department of Radiology, Mayo Clinic, Rochester, MN, United States
| | - Hongjian Zhang
- Graduate School of Health Sciences, Hokkaido University, Sapporo, Japan
| | | | - Gouqian Jiang
- Department of Health Sciences Research, Department of Radiology, Rochester, MN, United States
| |
Collapse
|
34
|
Affiliation(s)
- Saturnino Luz
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, United Kingdom
| | - Fasih Haider
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, United Kingdom
| | | | - Davida Fromm
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Brian MacWhinney
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
35
|
Gupta P, Kumar S, Suman RR, Kumar V. Sentiment Analysis of Lockdown in India During COVID-19: A Case Study on Twitter. IEEE Trans Comput Soc Syst 2021; 8:992-1002. [PMID: 37982036 PMCID: PMC8545003 DOI: 10.1109/tcss.2020.3042446] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 08/28/2020] [Accepted: 12/01/2020] [Indexed: 11/21/2023]
Abstract
With the rapid increase in the use of the Internet, sentiment analysis has become one of the most popular fields of natural language processing (NLP). Using sentiment analysis, the implied emotion in the text can be mined effectively for different occasions. People are using social media to receive and communicate different types of information on a massive scale during COVID-19 outburst. Mining such content to evaluate people's sentiments can play a critical role in making decisions to keep the situation under control. The objective of this study is to mine the sentiments of Indian citizens regarding the nationwide lockdown enforced by the Indian government to reduce the rate of spreading of Coronavirus. In this work, the sentiment analysis of tweets posted by Indian citizens has been performed using NLP and machine learning classifiers. From April 5, 2020 to April 17, 2020, a total of 12 741 tweets having the keywords "Indialockdown" are extracted. Data have been extracted from Twitter using Tweepy API, annotated using TextBlob and VADER lexicons, and preprocessed using the natural language tool kit provided by the Python. Eight different classifiers have been used to classify the data. The experiment achieved the highest accuracy of 84.4% with LinearSVC classifier and unigrams. This study concludes that the majority of Indian citizens are supporting the decision of the lockdown implemented by the Indian government during corona outburst.
Collapse
Affiliation(s)
- Prasoon Gupta
- National Institute of Technology JamshedpurJamshedpur831014India
| | - Sanjay Kumar
- National Institute of Technology JamshedpurJamshedpur831014India
| | - R. R. Suman
- National Institute of Technology JamshedpurJamshedpur831014India
| | - Vinay Kumar
- National Institute of Technology JamshedpurJamshedpur831014India
| |
Collapse
|
36
|
Woo K, Song J, Adams V, Block LJ, Currie LM, Shang J, Topaz M. Exploring prevalence of wound infections and related patient characteristics in homecare using natural language processing. Int Wound J 2021; 19:211-221. [PMID: 34105873 PMCID: PMC8684883 DOI: 10.1111/iwj.13623] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/06/2021] [Accepted: 05/12/2021] [Indexed: 12/13/2022] Open
Abstract
We aimed to create and validate a natural language processing algorithm to extract wound infection-related information from nursing notes. We also estimated wound infection prevalence in homecare settings and described related patient characteristics. In this retrospective cohort study, a natural language processing algorithm was developed and validated against a gold standard testing set. Cases with wound infection were identified using the algorithm and linked to Outcome and Assessment Information Set data to identify related patient characteristics. The final version of the natural language processing vocabulary contained 3914 terms and expressions related to the presence of wound infection. The natural language processing algorithm achieved overall good performance (F-measure = 0.88). The presence of wound infection was documented for 1.03% (n = 602) of patients without wounds, for 5.95% (n = 3232) of patients with wounds, and 19.19% (n = 152) of patients with wound-related hospitalisation or emergency department visits. Diabetes, peripheral vascular disease, and skin ulcer were significantly associated with wound infection among homecare patients. Our findings suggest that nurses frequently document wound infection-related information. The use of natural language processing demonstrated that valuable information can be extracted from nursing notes which can be used to improve our understanding of the care needs of people receiving homecare. By linking findings from clinical nursing notes with additional structured data, we can analyse related patients' characteristics and use them to develop a tailored intervention that may potentially lead to reduced wound infection-related hospitalizations.
Collapse
Affiliation(s)
- Kyungmi Woo
- College of Nursing, Seoul National University, Seoul, South Korea
| | - Jiyoun Song
- School of Nursing, Columbia University, New York City, New York, USA
| | - Victoria Adams
- Visiting Nurse Service of New York, New York City, New York, USA
| | - Lorraine J Block
- School of Nursing, University of British Columbia, Vancouver, British Columbia, Canada
| | - Leanne M Currie
- School of Nursing, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jingjing Shang
- School of Nursing, Columbia University, New York City, New York, USA
| | - Maxim Topaz
- School of Nursing, Columbia University, New York City, New York, USA.,Visiting Nurse Service of New York, New York City, New York, USA.,Data Science Institute, Columbia University, New York City, New York, USA
| |
Collapse
|
37
|
Kjell O, Daukantaitė D, Sikström S. Computational Language Assessments of Harmony in Life - Not Satisfaction With Life or Rating Scales - Correlate With Cooperative Behaviors. Front Psychol 2021; 12:601679. [PMID: 34045988 PMCID: PMC8144476 DOI: 10.3389/fpsyg.2021.601679] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 03/31/2021] [Indexed: 01/07/2023] Open
Abstract
Different types of well-being are likely to be associated with different kinds of behaviors. The first objective of this study was, from a subjective well-being perspective, to examine whether harmony in life and satisfaction with life are related differently to cooperative behaviors depending on individuals' social value orientation. The second objective was, from a methodological perspective, to examine whether language-based assessments called computational language assessments (CLA), which enable respondents to answer with words that are analyzed using natural language processing, demonstrate stronger correlations with cooperation than traditional rating scales. Participants reported their harmony in life, satisfaction with life, and social value orientation before taking part in an online cooperative task. The results show that the CLA of overall harmony in life correlated with cooperation (all participants: r = 0.18, p < 0.05, n = 181) and that this was particularly true for prosocial participants (r = 0.35, p < 0.001, n = 96), whereas rating scales were not correlated (p > 0.05). No significant correlations (measured by the CLA or traditional rating scales) were found between satisfaction with life and cooperation. In conclusion, our study reveals an important behavioral difference between different types of subjective well-being. To our knowledge, this is the first study supporting the validity of self-reported CLA over traditional rating scales in relation to actual behaviors.
Collapse
Affiliation(s)
- Oscar Kjell
- Department of Psychology, Lund University, Lund, Sweden
| | | | | |
Collapse
|
38
|
Sundaram DSB, Arunachalam SP, Damani DN, Farahani NZ, Enayati M, Pasupathy KS, Arruda-Olson AM. NATURAL LANGUAGE PROCESSING BASED MACHINE LEARNING MODEL USING CARDIAC MRI REPORTS TO IDENTIFY HYPERTROPHIC CARDIOMYOPATHY PATIENTS. Proc Des Med Devices Conf 2021; 2021:V001T03A005. [PMID: 35463194 PMCID: PMC9032778 DOI: 10.1115/dmd2021-1076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Hypertrophic Cardiomyopathy (HCM) is the most common genetic heart disease in the US and is known to cause sudden death (SCD) in young adults. While significant advancements have been made in HCM diagnosis and management, there is a need to identify HCM cases from electronic health record (EHR) data to develop automated tools based on natural language processing guided machine learning (ML) models for accurate HCM case identification to improve management and reduce adverse outcomes of HCM patients. Cardiac Magnetic Resonance (CMR) Imaging, plays a significant role in HCM diagnosis and risk stratification. CMR reports, generated by clinician annotation, offer rich data in the form of cardiac measurements as well as narratives describing interpretation and phenotypic description. The purpose of this study is to develop an NLP-based interpretable model utilizing impressions extracted from CMR reports to automatically identify HCM patients. CMR reports of patients with suspected HCM diagnosis between the years 1995 to 2019 were used in this study. Patients were classified into three categories of yes HCM, no HCM and, possible HCM. A random forest (RF) model was developed to predict the performance of both CMR measurements and impression features to identify HCM patients. The RF model yielded an accuracy of 86% (608 features) and 85% (30 features). These results offer promise for accurate identification of HCM patients using CMR reports from EHR for efficient clinical management transforming health care delivery for these patients.
Collapse
|
39
|
Ferrone L, Zanzotto FM. Symbolic, Distributed, and Distributional Representations for Natural Language Processing in the Era of Deep Learning: A Survey. Front Robot AI 2021; 6:153. [PMID: 33501168 PMCID: PMC7805717 DOI: 10.3389/frobt.2019.00153] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 12/20/2019] [Indexed: 11/13/2022] Open
Abstract
Natural language is inherently a discrete symbolic representation of human knowledge. Recent advances in machine learning (ML) and in natural language processing (NLP) seem to contradict the above intuition: discrete symbols are fading away, erased by vectors or tensors called distributed and distributional representations. However, there is a strict link between distributed/distributional representations and discrete symbols, being the first an approximation of the second. A clearer understanding of the strict link between distributed/distributional representations and symbols may certainly lead to radically new deep learning networks. In this paper we make a survey that aims to renew the link between symbolic representations and distributed/distributional representations. This is the right time to revitalize the area of interpreting how discrete symbols are represented inside neural networks.
Collapse
Affiliation(s)
- Lorenzo Ferrone
- Department of Enterprise Engineering, University of Rome Tor Vergata, Rome, Italy
| | | |
Collapse
|
40
|
V. P. Singh H, Mahmoud QH. NLP-Based Approach for Predicting HMI State Sequences Towards Monitoring Operator Situational Awareness. Sensors (Basel) 2020; 20:s20113228. [PMID: 32517145 PMCID: PMC7309108 DOI: 10.3390/s20113228] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/28/2020] [Accepted: 05/29/2020] [Indexed: 12/03/2022]
Abstract
A novel approach presented herein transforms the Human Machine Interface (HMI) states, as a pattern of visual feedback states that encompass both operator actions and process states, from a multi-variate time-series to a natural language processing (NLP) modeling domain. The goal of this approach is to predict operator response patterns for n−ahead time-step window given k−lagged past HMI state patterns. The NLP approach offers the possibility of encoding (semantic) contextual relations within HMI state patterns. Towards which, a technique for framing raw HMI data for supervised training using sequence-to-sequence (seq2seq) deep-learning machine translation algorithms is presented. In addition, a custom Seq2Seq convolutional neural network (CNN) NLP model based on current state-of-the-art design elements such as attention, is compared against a standard recurrent neural network (RNN) based NLP model. Results demonstrate comparable effectiveness of both the designs of NLP models evaluated for modeling HMI states. RNN NLP models showed higher (≈26%) forecast accuracy, in general for both in-sample and out-of-sample test datasets. However, custom CNN NLP model showed higher (≈53%) validation accuracy indicative of less over-fitting with the same amount of available training data. The real-world application of the proposed NLP modeling of industrial HMIs, such as in power generating stations control rooms, aviation (cockpits), and so forth, is towards the realization of a non-intrusive operator situational awareness monitoring framework through prediction of HMI states.
Collapse
Affiliation(s)
- Harsh V. P. Singh
- Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON L1G 0C5, Canada;
- Computers, Controls and Design Department, Ontario Power Generation, Pickering, ON L1W 3J2, Canada
- Correspondence:
| | - Qusay H. Mahmoud
- Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON L1G 0C5, Canada;
| |
Collapse
|
41
|
Zhao Y, Zhang J, Wu M. Finding Users' Voice on Social Media: An Investigation of Online Support Groups for Autism-Affected Users on Facebook. Int J Environ Res Public Health 2019; 16:ijerph16234804. [PMID: 31795451 PMCID: PMC6926495 DOI: 10.3390/ijerph16234804] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 11/24/2019] [Accepted: 11/27/2019] [Indexed: 11/16/2022]
Abstract
The trend towards the use of the Internet for health information purposes is rising. Utilization of various forms of social media has been a key interest in consumer health informatics (CHI). To reveal the information needs of autism-affected users, this study centers on the research of users' interactions and information sharing within autism communities on social media. It aims to understand how autism-affected users utilize support groups on Facebook by applying natural language process (NLP) techniques to unstructured health data in social media. An interactive visualization method (pyLDAvis) was employed to evaluate produced models and visualize the inter-topic distance maps. The revealed topics (e.g., parenting, education, behavior traits) identify issues that individuals with autism were concerned about on a daily basis and how they addressed such concerns in the form of group communication. In addition to general social support, disease-specific information, collective coping strategies, and emotional support were provided as well by group members based on similar personal experiences. This study concluded that Latent Dirichlet Allocation (LDA) is feasible and appropriated to derive topics (focus) from messages posted to the autism support groups on Facebook. The revealed topics help healthcare professionals (content providers) understand autism from users' perspectives and provide better patient communications.
Collapse
Affiliation(s)
- Yuehua Zhao
- School of Information Management, Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing University, Nanjing 210023, China;
| | - Jin Zhang
- School of Information Studies, University of Wisconsin Milwaukee, Milwaukee, WI 53211, USA;
| | - Min Wu
- College of Health Sciences, University of Wisconsin Milwaukee, Milwaukee, WI 53211, USA
- Correspondence: ; Tel.: +1-414-229-4778
| |
Collapse
|
42
|
Gavrielov-Yusim N, Kürzinger ML, Nishikawa C, Pan C, Pouget J, Epstein LB, Golant Y, Tcherny-Lessenot S, Lin S, Hamelin B, Juhaeri J. Comparison of text processing methods in social media-based signal detection. Pharmacoepidemiol Drug Saf 2019; 28:1309-1317. [PMID: 31392844 DOI: 10.1002/pds.4857] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 06/12/2019] [Accepted: 06/14/2019] [Indexed: 11/08/2022]
Abstract
PURPOSE Adverse event (AE) identification in social media (SM) can be performed using various types of natural language processing (NLP) and machine learning (ML). These methods can be categorized by complexity and precision level. Co-occurrence-based ML methods are rather basic, as they identify simultaneous appearance of drugs and clinical events in a single post. In contrast, statistical learning methods involve more complex NLP and identify drugs, events, and associations between them. We aimed to compare the ability of co-occurrence and NLP to identify AEs and signals of disproportionate reporting (SDR) in patient-generated SM. We also examined the performance of lift in SM-based signal detection (SD). METHODS Our examination was performed in a corpus of SM posts crawled from open online patient forums and communities, using the spontaneously reported VigiBase data as reference data set. RESULTS We found that co-occurrence and NLP produce AEs, which are 57% and 93% consistent with VigiBase AEs, respectively. Among the SDRs identified both in SM and in VigiBase, up to 55.3% were identified earlier in co-occurrence, and up to 32.1% were identified earlier in NLP-processed SM. Using lift in SM SD provided performance similar to frequentist methods, both in co-occurrence and in NLP-processed AEs. CONCLUSION Our results indicate that using SM as a data source complementary to traditional pharmacovigilance sources should be considered further. Various levels of SM processing may be considered, depending on the preferred policies and tolerance for false-positive to false-negative balance in routine pharmacovigilance processes.
Collapse
Affiliation(s)
| | | | - Chihiro Nishikawa
- Epidemiology and Benefit Risk Evaluation, Sanofi, Chilly-Mazarin, France
| | - Chunshen Pan
- Epidemiology and Benefit Risk Evaluation, Sanofi, Bridgewater, NJ, USA
| | - Julie Pouget
- Information Technology and Solutions, R&D CMO - SC Real World Evidence, Sanofi, Lyon, France
| | | | | | | | - Stephen Lin
- Global Pharmacovigilance, Sanofi, Bridgewater, NJ, USA
| | | | - Juhaeri Juhaeri
- Epidemiology and Benefit Risk Evaluation, Sanofi, Bridgewater, NJ, USA
| |
Collapse
|
43
|
Assale M, Dui LG, Cina A, Seveso A, Cabitza F. The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records. Front Med (Lausanne) 2019; 6:66. [PMID: 31058150 PMCID: PMC6478793 DOI: 10.3389/fmed.2019.00066] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 03/18/2019] [Indexed: 01/01/2023] Open
Abstract
Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.
Collapse
Affiliation(s)
- Michela Assale
- K-tree SRL, Pont-Saint-Martin, Italy
- University of Milano-Bicocca, Milan, Italy
| | - Linda Greta Dui
- Politecnico di Milano, Milan, Italy
- Link-Up Datareg, Cinisello Balsamo, Italy
| | - Andrea Cina
- K-tree SRL, Pont-Saint-Martin, Italy
- University of Milano-Bicocca, Milan, Italy
| | - Andrea Seveso
- University of Milano-Bicocca, Milan, Italy
- Link-Up Datareg, Cinisello Balsamo, Italy
| | - Federico Cabitza
- University of Milano-Bicocca, Milan, Italy
- IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| |
Collapse
|
44
|
Müller MM, Salathé M. Crowdbreaks: Tracking Health Trends Using Public Social Media Data and Crowdsourcing. Front Public Health 2019; 7:81. [PMID: 31037238 PMCID: PMC6476276 DOI: 10.3389/fpubh.2019.00081] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 03/19/2019] [Indexed: 11/13/2022] Open
Abstract
In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labeling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labeling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work describes the technical aspects of the platform, thereby covering the functionalities at its current state and exploring its future use cases and extensions.
Collapse
Affiliation(s)
- Martin M Müller
- Digital Epidemiology Lab, EPFL, Geneva, Switzerland.,School of Life Sciences, EPFL, Lausanne, Switzerland.,School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
| | - Marcel Salathé
- Digital Epidemiology Lab, EPFL, Geneva, Switzerland.,School of Life Sciences, EPFL, Lausanne, Switzerland.,School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
| |
Collapse
|
45
|
Trivedi G, Pham P, Chapman WW, Hwa R, Wiebe J, Hochheiser H. NLPReViz: an interactive tool for natural language processing on clinical text. J Am Med Inform Assoc 2019; 25:81-87. [PMID: 29016825 DOI: 10.1093/jamia/ocx070] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 06/21/2017] [Indexed: 11/14/2022] Open
Abstract
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.
Collapse
Affiliation(s)
- Gaurav Trivedi
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Phuong Pham
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Rebecca Hwa
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
| | - Janyce Wiebe
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
| | - Harry Hochheiser
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
46
|
Garvin JH, Kim Y, Gobbel GT, Matheny ME, Redd A, Bray BE, Heidenreich P, Bolton D, Heavirland J, Kelly N, Reeves R, Kalsy M, Goldstein MK, Meystre SM. Automating Quality Measures for Heart Failure Using Natural Language Processing: A Descriptive Study in the Department of Veterans Affairs. JMIR Med Inform 2018; 6:e5. [PMID: 29335238 PMCID: PMC5789165 DOI: 10.2196/medinform.9150] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 12/08/2017] [Accepted: 12/10/2017] [Indexed: 12/11/2022] Open
Abstract
Background We developed an accurate, stakeholder-informed, automated, natural language processing (NLP) system to measure the quality of heart failure (HF) inpatient care, and explored the potential for adoption of this system within an integrated health care system. Objective To accurately automate a United States Department of Veterans Affairs (VA) quality measure for inpatients with HF. Methods We automated the HF quality measure Congestive Heart Failure Inpatient Measure 19 (CHI19) that identifies whether a given patient has left ventricular ejection fraction (LVEF) <40%, and if so, whether an angiotensin-converting enzyme inhibitor or angiotensin-receptor blocker was prescribed at discharge if there were no contraindications. We used documents from 1083 unique inpatients from eight VA medical centers to develop a reference standard (RS) to train (n=314) and test (n=769) the Congestive Heart Failure Information Extraction Framework (CHIEF). We also conducted semi-structured interviews (n=15) for stakeholder feedback on implementation of the CHIEF. Results The CHIEF classified each hospitalization in the test set with a sensitivity (SN) of 98.9% and positive predictive value of 98.7%, compared with an RS and SN of 98.5% for available External Peer Review Program assessments. Of the 1083 patients available for the NLP system, the CHIEF evaluated and classified 100% of cases. Stakeholders identified potential implementation facilitators and clinical uses of the CHIEF. Conclusions The CHIEF provided complete data for all patients in the cohort and could potentially improve the efficiency, timeliness, and utility of HF quality measurements.
Collapse
Affiliation(s)
- Jennifer Hornung Garvin
- Health Information Management and Systems Division, School of Health and Rehabilitation Sciences, The Ohio State University, Columbus, OH, United States.,IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Medicine, University of Utah, Salt Lake City, UT, United States.,Geriatric Research, Education and Clinical Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States
| | - Youngjun Kim
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Translational Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States
| | - Glenn Temple Gobbel
- Geriatric Research, Education and Clinical Center, Tennessee Valley Healthcare System, Department of Veterans Affairs, Nashville, TN, United States.,Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, United States
| | - Michael E Matheny
- Geriatric Research, Education and Clinical Center, Tennessee Valley Healthcare System, Department of Veterans Affairs, Nashville, TN, United States.,Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, United States
| | - Andrew Redd
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Bruce E Bray
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Paul Heidenreich
- Palo Alto Geriatric Research, Education and Clinical Center, Veterans Affairs Palo Alto Health Care System, Department of Veterans Affairs, Stanford University, Palo Alto, CA, United States
| | - Dan Bolton
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Division of Epidemiology, Department of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Julia Heavirland
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States
| | - Natalie Kelly
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States
| | - Ruth Reeves
- Geriatric Research, Education and Clinical Center, Tennessee Valley Healthcare System, Department of Veterans Affairs, Nashville, TN, United States.,Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, United States
| | - Megha Kalsy
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Department of Biomedical Informatics, School of Medicine, University of Utah, Salt Lake City, UT, United States
| | - Mary Kane Goldstein
- Medical Service, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, United States.,Department of Medicine, Stanford University School of Medicine, Stanford, CA, United States
| | - Stephane M Meystre
- IDEAS 2.0 Health Services Research and Development Research Center, Salt Lake City Veterans Affairs Healthcare System, Department of Veterans Affairs, Salt Lake City, UT, United States.,Translational Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States
| |
Collapse
|
47
|
Beyer SE, McKee BJ, Regis SM, McKee AB, Flacke S, El Saadawi G, Wald C. Automatic Lung-RADS™ classification with a natural language processing system. J Thorac Dis 2017; 9:3114-3122. [PMID: 29221286 DOI: 10.21037/jtd.2017.08.13] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Background Our aim was to train a natural language processing (NLP) algorithm to capture imaging characteristics of lung nodules reported in a structured CT report and suggest the applicable Lung-RADS™ (LR) category. Methods Our study included structured, clinical reports of consecutive CT lung screening (CTLS) exams performed from 08/2014 to 08/2015 at an ACR accredited Lung Cancer Screening Center. All patients screened were at high-risk for lung cancer according to the NCCN Guidelines®. All exams were interpreted by one of three radiologists credentialed to read CTLS exams using LR using a standard reporting template. Training and test sets consisted of consecutive exams. Lung screening exams were divided into two groups: three training sets (500, 120, and 383 reports each) and one final evaluation set (498 reports). NLP algorithm results were compared with the gold standard of LR category assigned by the radiologist. Results The sensitivity/specificity of the NLP algorithm to correctly assign LR categories for suspicious nodules (LR 4) and positive nodules (LR 3/4) were 74.1%/98.6% and 75.0%/98.8% respectively. The majority of mismatches occurred in cases where pulmonary findings were present not currently addressed by LR. Misclassifications also resulted from the failure to identify exams as follow-up and the failure to completely characterize part-solid nodules. In a sub-group analysis among structured reports with standardized language, the sensitivity and specificity to detect LR 4 nodules were 87.0% and 99.5%, respectively. Conclusions An NLP system can accurately suggest the appropriate LR category from CTLS exam findings when standardized reporting is used.
Collapse
Affiliation(s)
- Sebastian E Beyer
- Department of Radiology, Lahey Hospital and Medical Center, Burlington, MA, USA
| | - Brady J McKee
- Department of Radiology, Lahey Hospital and Medical Center, Burlington, MA, USA
| | - Shawn M Regis
- Department of Radiation Oncology, Lahey Hospital and Medical Center, Burlington, MA, USA
| | - Andrea B McKee
- Department of Radiation Oncology, Lahey Hospital and Medical Center, Burlington, MA, USA
| | - Sebastian Flacke
- Department of Radiology, Lahey Hospital and Medical Center, Burlington, MA, USA
| | | | - Christoph Wald
- Department of Radiology, Lahey Hospital and Medical Center, Burlington, MA, USA
| |
Collapse
|
48
|
Chen W, Kowatch R, Lin S, Splaingard M, Huang Y. Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2. Appl Clin Inform 2015; 6:345-63. [PMID: 26171080 DOI: 10.4338/aci-2014-11-ra-0106] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 02/23/2015] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. OBJECTIVE We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. METHODS We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. RESULTS 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. CONCLUSION Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
Collapse
Affiliation(s)
- W Chen
- Research Information Solutions and Innovations , Columbus, OH
| | - R Kowatch
- Center for Innovation in Pediatric Practice , Columbus, OH
| | - S Lin
- Research Information Solutions and Innovations , Columbus, OH
| | - M Splaingard
- Sleep Disorder Center, Nationwide Children's Hospital , Columbus, OH
| | - Y Huang
- Research Information Solutions and Innovations , Columbus, OH
| |
Collapse
|
49
|
Patel R, Lloyd T, Jackson R, Ball M, Shetty H, Broadbent M, Geddes JR, Stewart R, McGuire P, Taylor M. Mood instability is a common feature of mental health disorders and is associated with poor clinical outcomes. BMJ Open 2015; 5:e007504. [PMID: 25998036 PMCID: PMC4452754 DOI: 10.1136/bmjopen-2014-007504] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
OBJECTIVES Mood instability is a clinically important phenomenon but has received relatively little research attention. The objective of this study was to assess the impact of mood instability on clinical outcomes in a large sample of people receiving secondary mental healthcare. DESIGN Observational study using an anonymised electronic health record case register. SETTING South London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK. PARTICIPANTS 27,704 adults presenting to SLaM between April 2006 and March 2013 with a psychotic, affective or personality disorder. EXPOSURE The presence of mood instability within 1 month of presentation, identified using natural language processing (NLP). MAIN OUTCOME MEASURES The number of days spent in hospital, frequency of hospital admission, compulsory hospital admission and prescription of antipsychotics or non-antipsychotic mood stabilisers over a 5-year follow-up period. RESULTS Mood instability was documented in 12.1% of people presenting to mental healthcare services. It was most frequently documented in people with bipolar disorder (22.6%), but was common in people with personality disorder (17.8%) and schizophrenia (15.5%). It was associated with a greater number of days spent in hospital (β coefficient 18.5, 95% CI 12.1 to 24.8), greater frequency of hospitalisation (incidence rate ratio 1.95, 1.75 to 2.17), greater likelihood of compulsory admission (OR 2.73, 2.34 to 3.19) and an increased likelihood of prescription of antipsychotics (2.03, 1.75 to 2.35) or non-antipsychotic mood stabilisers (2.07, 1.77 to 2.41). CONCLUSIONS Mood instability occurs in a wide range of mental disorders and is not limited to affective disorders. It is generally associated with relatively poor clinical outcomes. These findings suggest that clinicians should screen for mood instability across all common mental health disorders. The data also suggest that targeted interventions for mood instability may be useful in patients who do not have a formal affective disorder.
Collapse
Affiliation(s)
- Rashmi Patel
- Department of Psychosis Studies, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Theodore Lloyd
- Department of Psychosis Studies, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Richard Jackson
- Department of Psychological Medicine, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Michael Ball
- Department of Psychological Medicine, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Hitesh Shetty
- South London and Maudsley NHS Foundation Trust, Biomedical Research Centre Nucleus, London, UK
| | - Matthew Broadbent
- South London and Maudsley NHS Foundation Trust, Biomedical Research Centre Nucleus, London, UK
| | - John R Geddes
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Robert Stewart
- Department of Psychological Medicine, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Philip McGuire
- Department of Psychosis Studies, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Matthew Taylor
- Department of Psychosis Studies, King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| |
Collapse
|
50
|
Garvin JH, Elkin PL, Shen S, Brown S, Trusko B, Wang E, Hoke L, Quiaoit Y, Lajoie J, Weiner MG, Graham P, Speroff T. Automated quality measurement in Department of the Veterans Affairs discharge instructions for patients with congestive heart failure. J Healthc Qual 2014; 35:16-24. [PMID: 23819743 DOI: 10.1111/j.1945-1474.2011.195.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Quality measurement is an important issue for the United States Department of Veterans Affairs (VA). In this study, we piloted the use of an informatics tool, the Multithreaded Clinical Vocabulary Server (MCVS), which extracted automatically whether the VA Office of Quality and Performance measures of quality of care were met for the completion of discharge instructions for inpatients with congestive heart failure. We used a single document, the discharge instructions, from one section of the medical records for 152 patients and developed a reference standard using two independent reviewers to assess performance. When evaluated against the reference standard, MCVS achieved a sensitivity of 0.87, a specificity of 0.86, and a positive predictive value of 0.90. The automated process using the discharge instruction document worked effectively. The use of the MCVS tool for concept-based indexing resulted in mostly accurate data capture regarding quality measurement, but improvements are needed to further increase the accuracy of data extraction.
Collapse
Affiliation(s)
- Jennifer H Garvin
- Department of Biomedical Informatics, University of Utah School of Medicine, UT, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|