1
|
Bishal MM, Chowdory MRH, Das A, Kabir MA. COVIDHealth: A novel labeled dataset and machine learning-based web application for classifying COVID-19 discourses on Twitter. Heliyon 2024; 10:e34103. [PMID: 39100452 PMCID: PMC11295851 DOI: 10.1016/j.heliyon.2024.e34103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 06/27/2024] [Accepted: 07/03/2024] [Indexed: 08/06/2024] Open
Abstract
The COVID-19 pandemic has sparked widespread health-related discussions on social media platforms like Twitter (now named 'X'). However, the lack of labeled Twitter data poses significant challenges for theme-based classification and tweet aggregation. To address this gap, we developed a machine learning-based web application that automatically classifies COVID-19 discourses into five categories: health risks, prevention, symptoms, transmission, and treatment. We collected and labeled 6,667 COVID-19-related tweets using the Twitter API, and applied various feature extraction methods to extract relevant features. We then compared the performance of seven classical machine learning algorithms (Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbor, Logistic Regression, and Linear SVC) and four deep learning techniques (LSTM, CNN, RNN, and BERT) for classification. Our results show that the CNN achieved the highest precision (90.41%), recall (90.4%), F1 score (90.4%), and accuracy (90.4%). The Linear SVC algorithm exhibited the highest precision (85.71%), recall (86.94%), and F1 score (86.13%) among classical machine learning approaches. Our study advances the field of health-related data analysis and classification, and offers a publicly accessible web-based tool for public health researchers and practitioners. This tool has the potential to support addressing public health challenges and enhancing awareness during pandemics. The dataset and application are accessible at https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
Collapse
Affiliation(s)
- Mahathir Mohammad Bishal
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram, 4349, Bangladesh
| | - Md. Rakibul Hassan Chowdory
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram, 4349, Bangladesh
| | - Anik Das
- Department of Computer Science, St. Francis Xavier University, Antigonish, B2G 2W5, NS, Canada
| | - Muhammad Ashad Kabir
- School of Computing, Mathematics, and Engineering, Charles Sturt University, Bathurst, 2795, NSW, Australia
| |
Collapse
|
2
|
Gu D, Wang Q, Chai Y, Yang X, Zhao W, Li M, Zolotarev O, Xu Z, Zhang G. Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis. J Med Internet Res 2024; 26:e48324. [PMID: 38386404 PMCID: PMC10921335 DOI: 10.2196/48324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/30/2023] [Accepted: 01/03/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. OBJECTIVE This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? METHODS This study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. RESULTS Our classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body's immune system and lead to the development of allergies. CONCLUSIONS Our approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.
Collapse
Affiliation(s)
- Dongxiao Gu
- School of Management, Hefei University of Technology, Hefei, China
| | - Qin Wang
- School of Management, Hefei University of Technology, Hefei, China
| | - Yidong Chai
- School of Management, Hefei University of Technology, Hefei, China
| | - Xuejie Yang
- School of Management, Hefei University of Technology, Hefei, China
| | - Wang Zhao
- School of Management, Hefei University of Technology, Hefei, China
| | - Min Li
- School of Management, Hefei University of Technology, Hefei, China
| | | | - Zhengfei Xu
- School of Management, Hefei University of Technology, Hefei, China
| | - Gongrang Zhang
- School of Management, Hefei University of Technology, Hefei, China
| |
Collapse
|
3
|
Yang M, Chen X, Tan L, Lan X, Luo Y. Listen carefully to experts when you classify data: A generic data classification ontology encoded from regulations. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
4
|
Gu D, Li M, Yang X, Gu Y, Zhao Y, Liang C, Liu H. An analysis of cognitive change in online mental health communities: A textual data analysis based on post replies of support seekers. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
5
|
A Predictive Model Based on User Awareness and Multi-Type Rumors Forwarding Dynamics. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
Pu Y, Li J, Tang J, Guo F. DeepFusionDTA: Drug-Target Binding Affinity Prediction With Information Fusion and Hybrid Deep-Learning Ensemble Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2760-2769. [PMID: 34379594 DOI: 10.1109/tcbb.2021.3103966] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identification of drug-target interaction (DTI) is the most important issue in the broad field of drug discovery. Using purely biological experiments to verify drug-target binding profiles takes lots of time and effort, so computational technologies for this task obviously have great benefits in reducing the drug search space. Most of computational methods to predict DTI are proposed to solve a binary classification problem, which ignore the influence of binding strength. Therefore, drug-target binding affinity prediction is still a challenging issue. Currently, lots of studies only extract sequence information that lacks feature-rich representation, but we consider more spatial features in order to merge various data in drug and target spaces. In this study, we propose a two-stage deep neural network ensemble model for detecting drug-target binding affinity, called DeepFusionDTA, via various information analysis modules. First stage is to utilize sequence and structure information to generate fusion feature map of candidate protein and drug pair through various analysis modules based deep learning. Second stage is to apply bagging-based ensemble learning strategy for regression prediction, and we obtain outstanding results by combining the advantages of various algorithms in efficient feature abstraction and regression calculation. Importantly, we evaluate our novel method, DeepFusionDTA, which delivers 1.5 percent CI increase on KIBA dataset and 1.0 percent increase on Davis dataset, by comparing with existing prediction tools, DeepDTA. Furthermore, the ideas we have offered can be applied to in-silico screening of the interaction space, to provide novel DTIs which can be experimentally pursued. The codes and data are available from https://github.com/guofei-tju/DeepFusionDTA.
Collapse
|
7
|
Amin S, Alharbi A, Uddin MI, Alyami H. Adapting recurrent neural networks for classifying public discourse on COVID-19 symptoms in Twitter content. Soft comput 2022; 26:11077-11089. [PMID: 35966348 PMCID: PMC9364288 DOI: 10.1007/s00500-022-07405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2022] [Indexed: 12/15/2022]
Abstract
The COVID-19 infection, which began in December 2019, has claimed many lives and impacted all aspects of human life. With time, COVID-19 was identified as a pandemic outbreak by the World Health Organization (WHO), putting massive pressure on global health. During this ongoing pandemic, the exponential growth of social media platforms has provided valuable resources for distributing information, as well as a source for self-reported disease symptoms in public discourse. Therefore, there is an urgent need for effective approaches to detect self-reported symptoms or cases in social media content. In this study, we scrapped public discourse on COVID-19 symptoms in Twitter content. For this, we developed a huge dataset of COVID-19 self-reported symptoms and gold-annotated the tweets into four categories: confirmed, death, suspected, and recovered. Then, we use a machine and deep machine learning models, each with its own set of features, such as feature representation. Furthermore, the experimentations were achieved with recurrent neural networks (RNNs) variants and compared their performance with traditional machine learning algorithms. Experimental results report that optimizing the area under the curve (AUC) enhances model performance, and the long short-term memory (LSTM) has the highest accuracy in detecting COVID-19 symptoms in real-time public messaging. Thus, the LSTM classifier in the proposed pipeline achieves a classification accuracy of 90.7%, outperforming existing state-of-the-art algorithms for multi-class classification.
Collapse
Affiliation(s)
- Samina Amin
- Institute of Computing, Kohat University of Science and Technology, Kohat, 2600 Pakistan
| | - Abdullah Alharbi
- Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944 Saudi Arabia
| | - M. Irfan Uddin
- Institute of Computing, Kohat University of Science and Technology, Kohat, 2600 Pakistan
| | - Hashem Alyami
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944 Saudi Arabia
| |
Collapse
|
8
|
Pinto JP, Viana P, Teixeira I, Andrade M. Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus. PeerJ Comput Sci 2022; 8:e964. [PMID: 35875629 PMCID: PMC9301597 DOI: 10.7717/peerj-cs.964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 04/07/2022] [Indexed: 06/15/2023]
Abstract
The subjectiveness of multimedia content description has a strong negative impact on tag-based information retrieval. In our work, we propose enhancing available descriptions by adding semantically related tags. To cope with this objective, we use a word embedding technique based on the Word2Vec neural network parameterized and trained using a new dataset built from online newspapers. A large number of news stories was scraped and pre-processed to build a new dataset. Our target language is Portuguese, one of the most spoken languages worldwide. The results achieved significantly outperform similar existing solutions developed in the scope of different languages, including Portuguese. Contributions include also an online application and API available for external use. Although the presented work has been designed to enhance multimedia content annotation, it can be used in several other application areas.
Collapse
Affiliation(s)
| | - Paula Viana
- INESC TEC, Porto, Portugal
- School of Engineering, Polytechnic of Porto, Porto, Portugal
| | | | - Maria Andrade
- INESC TEC, Porto, Portugal
- Faculty of Engineering, University of Porto, Porto, Portugal
| |
Collapse
|
9
|
Research on Long Text Classification Model Based on Multi-Feature Weighted Fusion. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Text classification in the long-text domain has become a development challenge due to the significant increase in text data, complexity enhancement, and feature extraction of long texts in various domains of the Internet. A long text classification model based on multi-feature weighted fusion is proposed for the problems of contextual semantic relations, long-distance global relations, and multi-sense words in long text classification tasks. The BERT model is used to obtain feature representations containing global semantic and contextual feature information of text, convolutional neural networks to obtain features at different levels and combine attention mechanisms to obtain weighted local features, fuse global contextual features with weighted local features, and obtain classification results by equal-length convolutional pooling. The experimental results show that the proposed model outperforms other models in terms of accuracy, precision, recall, F1 value, etc., under the same data set conditions compared with traditional deep learning classification models, and it can be seen that the model has more obvious advantages in long text classification.
Collapse
|
10
|
Jiang Q, Xue Y, Hu Y, Li Y. Public Social Media Discussions on Agricultural Product Safety Incidents: Chinese African Swine Fever Debate on Weibo. Front Psychol 2022; 13:903760. [PMID: 35668976 PMCID: PMC9165425 DOI: 10.3389/fpsyg.2022.903760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 05/03/2022] [Indexed: 11/13/2022] Open
Abstract
Public concern over major agricultural product safety incidents, such as swine flu and avian flu, can intensify financial losses in the livestock and poultry industries. Crawler technology were applied to reviewed the Weibo social media discussions on the African Swine Fever (ASF) incident in China that was reported on 3 August 2018, and used content analysis and network analysis to specifically examine the online public opinion network dissemination characteristics of verified individual users, institutional users and ordinary users. It was found that: (1) attention paid to topics related to "epidemic," "treatment," "effect" and "prevent" decrease in turn, with the interest in "prevent" increasing significantly when human infections were possible; (2) verified individual users were most concerned about epidemic prevention and control and play a supervisory role, the greatest concern of institutional users and ordinary users were issues related to agricultural industry and agricultural products price fluctuations respectively; (3) among institutional users, media was the main opinion leader, and among non-institutional users, elites from all walks of life, especially the food safety personnel acted as opinion leaders. Based on these findings, some policy suggestions are given: determine the nature of the risk to human health of the safety incident, stabilizing prices of relevant agricultural products, and giving play to the role of information dissemination of relevant institutions.
Collapse
Affiliation(s)
- Qian Jiang
- School of Geography and Resource Science, Neijiang Normal University, Neijiang, China
| | - Ya Xue
- Neijiang Center for Disease Control and Prevention, Neijiang, China
| | - Yan Hu
- School of Economics and Management, Neijiang Normal University, Neijiang, China.,Tuojiang River Basin High-Quality Development Research Center, Neijiang, China
| | - Yibin Li
- School of Economics and Management, Neijiang Normal University, Neijiang, China
| |
Collapse
|
11
|
Blanco G, Lourenço A. Optimism and pessimism analysis using deep learning on COVID-19 related twitter conversations. Inf Process Manag 2022; 59:102918. [PMID: 36569234 PMCID: PMC9758015 DOI: 10.1016/j.ipm.2022.102918] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 02/16/2022] [Accepted: 02/23/2022] [Indexed: 12/27/2022]
Abstract
This paper proposes a new deep learning approach to better understand how optimistic and pessimistic feelings are conveyed in Twitter conversations about COVID-19. A pre-trained transformer embedding is used to extract the semantic features and several network architectures are compared. Model performance is evaluated on two new, publicly available Twitter corpora of crisis-related posts. The best performing pessimism and optimism detection models are based on bidirectional long- and short-term memory networks. Experimental results on four periods of the COVID-19 pandemic show how the proposed approach can model optimism and pessimism in the context of a health crisis. There is a total of 150,503 tweets and 51,319 unique users. Conversations are characterised in terms of emotional signals and shifts to unravel empathy and support mechanisms. Conversations with stronger pessimistic signals denoted little emotional shift (i.e. 62.21% of these conversations experienced almost no change in emotion). In turn, only 10.42% of the conversations laying more on the optimistic side maintained the mood. User emotional volatility is further linked with social influence.
Collapse
Affiliation(s)
- Guillermo Blanco
- Universidade de Vigo, Departamento de Informática, Edificio Politécnico, Campus Universitario As Lagoas S/N 32004, Ourense, Spain,CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain,SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Anália Lourenço
- Universidade de Vigo, Departamento de Informática, Edificio Politécnico, Campus Universitario As Lagoas S/N 32004, Ourense, Spain,CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain,SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain,CEB - Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal,Corresponding autohr
| |
Collapse
|
12
|
Social Media User Behavior and Emotions during Crisis Events. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19095197. [PMID: 35564591 PMCID: PMC9100990 DOI: 10.3390/ijerph19095197] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/06/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023]
Abstract
The wide availability of smart mobile devices and Web 2.0 services has allowed people to easily access news, spread information, and express their opinions and emotions using various social media platforms. However, because of the ease of joining these sites, people also use them to spread rumors and vent their emotions, with the social platforms often playing a facilitation role. This paper collected more than 190,000 messages published on the Chinese Sina-Weibo platform to examine social media user behaviors and emotions during an emergency, with a particular research focus on the “Dr. Li Wenliang” reports associated with the COVID-19 epidemic in China. The verified accounts were found to have the strongest interactions with users, and the sentiment analysis revealed that the news from government agencies had a positive user effect and the national media and trusted experts were more favored by users in an emergency. This research provides a new perspective on trust and the use of social media platforms in crises, and therefore offers some guidance to government agencies.
Collapse
|
13
|
Zhao S, Pan Q, Zou Q, Ju Y, Shi L, Su X. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews' correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qingfeng Pan
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
14
|
Serrano-Guerrero J, Bani-Doumi M, Romero FP, Olivas JA. Understanding what patients think about hospitals: A deep learning approach for detecting emotions in patient opinions. Artif Intell Med 2022; 128:102298. [DOI: 10.1016/j.artmed.2022.102298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 03/02/2022] [Accepted: 04/04/2022] [Indexed: 11/02/2022]
|
15
|
Text Similarity Measurement Method and Application of Online Medical Community Based on Density Peak Clustering. J ORGAN END USER COM 2022. [DOI: 10.4018/joeuc.302893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Text similarity measurement is a link between basic research such as text modeling and upper-level application research of text potential information. In order to improve the accuracy of text similarity measurement, this paper proposes a semantic similarity calculation method integrating word2vec model and TF-IDF, and applies it to the density peak clustering of Chinese text data consulted by patients in online medical community. Experimental results show that the proposed similarity measurement method is superior to the traditional method. Furthermore, the study is among the first to apply the density peak clustering algorithm to online medical community, which offers a reference for how to find out user demands from medical text data in the big data environment.
Collapse
|
16
|
Kumar P, Sarin G. WELMSD – word embedding and language model based sarcasm detection. ONLINE INFORMATION REVIEW 2022. [DOI: 10.1108/oir-03-2021-0184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeSarcasm is a sentiment in which human beings convey messages with the opposite meanings to hurt someone emotionally or condemn something in a witty manner. The difference between the text's literal and its intended meaning makes it tough to identify. Mostly, researchers and practitioners only consider explicit information for text classification; however, considering implicit with explicit information will enhance the classifier's accuracy. Several sarcasm detection studies focus on syntactic, lexical or pragmatic features that are uttered using words, emoticons and exclamation marks. Discrete models, which are utilized by many existing works, require manual features that are costly to uncover.Design/methodology/approachIn this research, word embeddings used for feature extraction are combined with context-aware language models to provide automatic feature engineering capabilities as well superior classification performance as compared to baseline models. Performance of the proposed models has been shown on three benchmark datasets over different evaluation metrics namely misclassification rate, receiver operating characteristic (ROC) curve and area under curve (AUC).FindingsExperimental results suggest that FastText word embedding technique with BERT language model gives higher accuracy and helps to identify the sarcastic textual element correctly.Originality/valueSarcasm detection is a sub-task of sentiment analysis. To help in appropriate data-driven decision-making, the sentiment of the text that gets reversed due to sarcasm needs to be detected properly. In online social environments, it is critical for businesses and individuals to detect the correct sentiment polarity. This will aid in the right selling and buying of products and/or services, leading to higher sales and better market share for businesses, and meeting the quality requirements of customers.
Collapse
|
17
|
Qi P, Sun Y, Luo H, Guizani M. Scratch-Rec: a novel Scratch recommendation approach adapting user preference and programming skill for enhancing learning to program. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Caliskan C. How does "A Bit of Everything American" state feel about COVID-19? A quantitative Twitter analysis of the pandemic in Ohio. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE 2022; 5:19-45. [PMID: 33842722 PMCID: PMC8021216 DOI: 10.1007/s42001-021-00111-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/28/2021] [Indexed: 05/21/2023]
Abstract
COVID-19 has proven itself to be one of the most important events of the last two centuries. This defining moment in our lives has created wide-ranging discussions in many segments of our societies, both politically and socially. Over time, the pandemic has been associated with many social and political topics, as well as sentiments and emotions. Twitter offers a platform to understand these effects. The primary objective of this study is to capture the awareness and sentiment about COVID-19-related issues and to find how they relate to the number of cases and deaths in a representative region of the United States. The study uses a unique dataset consisting of over 46 million tweets from over 91,000 users in 88 counties of the state of Ohio, a state-of-the-art deep learning model to measure and detect awareness and emotions. The data collected is analyzed using OLS regression and System-GMM dynamic panel. Findings indicate that the pandemic has drastically changed the perception of the Republican party in the society. Individual motivations are strongly influenced by ideological choices and this ultimately affects individual pandemic-related outcomes. The paper contributes to the literature by expanding the knowledge on COVID-19 (i), offering a representative result for the United States by focusing on an "average" state like Ohio (ii), and incorporating the sentiment and emotions into the calculation of awareness (iii).
Collapse
Affiliation(s)
- Cantay Caliskan
- Department of Data Analytics, Denison University, 100 W. College Street, Granville, OH 43023 USA
| |
Collapse
|
19
|
Ravanmehr V, Blau H, Cappelletti L, Fontana T, Carmody L, Coleman B, George J, Reese J, Joachimiak M, Bocci G, Hansen P, Bult C, Rueter J, Casiraghi E, Valentini G, Mungall C, Oprea TI, Robinson PN. Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer. NAR Genom Bioinform 2021; 3:lqab113. [PMID: 34888523 PMCID: PMC8652379 DOI: 10.1093/nargab/lqab113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 10/14/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022] Open
Abstract
Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.
Collapse
Affiliation(s)
- Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Leigh Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- University of Connecticut Health Center, Department of Genetics and Genome Sciences, Farmington, CT 06030, USA
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Marcin Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Giovanni Bocci
- Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of, Medicine, Albuquerque, NM 87102, USA
| | - Peter Hansen
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Carol Bult
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME 04609, USA
| | - Jens Rueter
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME 04609, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Christopher Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Tudor I Oprea
- Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of, Medicine, Albuquerque, NM 87102, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
20
|
Using data mining to track the information spreading on social media about the COVID-19 outbreak. ELECTRONIC LIBRARY 2021. [DOI: 10.1108/el-04-2021-0086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
COVID-19, a causative agent of the potentially fatal disease, has raised great global public health concern. Information spreading on the COVID-19 outbreak can strongly influence people behaviour in social media. This paper aims to question of information spreading on COVID-19 outbreak are addressed with a massive data analysis on Twitter from a multidimensional perspective.
Design/methodology/approach
The evolutionary trend of user interaction and the network structure is analysed by social network analysis. A differential assessment on the topics evolving is provided by the method of text clustering. Visualization is further used to show different characteristics of user interaction networks and public opinion in different periods.
Findings
Information spreading in social media emerges from different characteristics during various periods. User interaction demonstrates multidimensional cross relations. The results interpret how people express their thoughts and detect topics people are most discussing in social media.
Research limitations/implications
This study is mainly limited by the size of the data sets and the unicity of the social media. It is challenging to expand the data sets and choose multiple social media to cross-validate the findings of this study.
Originality/value
This paper aims to find the evolutionary trend of information spreading on the COVID-19 outbreak in social media, including user interaction and topical issues. The findings are of great importance to help government and related regulatory units to manage the dissemination of information on emergencies, in terms of early detection and prevention.
Collapse
|
21
|
Daouadi KE, Zghal Rebaï R, Amous I. Optimizing Semantic Deep Forest for tweet topic classification. INFORM SYST 2021. [DOI: 10.1016/j.is.2021.101801] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
An Improved Model for Analyzing Textual Sentiment Based on a Deep Neural Network Using Multi-Head Attention Mechanism. APPLIED SYSTEM INNOVATION 2021. [DOI: 10.3390/asi4040085] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Due to the increasing growth of social media content on websites such as Twitter and Facebook, analyzing textual sentiment has become a challenging task. Therefore, many studies have focused on textual sentiment analysis. Recently, deep learning models, such as convolutional neural networks and long short-term memory, have achieved promising performance in sentiment analysis. These models have proven their ability to cope with the arbitrary length of sequences. However, when they are used in the feature extraction layer, the feature distance is highly dimensional, the text data are sparse, and they assign equal importance to various features. To address these issues, we propose a hybrid model that combines a deep neural network with a multi-head attention mechanism (DNN–MHAT). In the DNN–MHAT model, we first design an improved deep neural network to capture the text’s actual context and extract the local features of position invariants by combining recurrent bidirectional long short-term memory units (Bi-LSTM) with a convolutional neural network (CNN). Second, we present a multi-head attention mechanism to capture the words in the text that are significantly related to long space and encoding dependencies, which adds a different focus to the information outputted from the hidden layers of BiLSTM. Finally, a global average pooling is applied for transforming the vector into a high-level sentiment representation to avoid model overfitting, and a sigmoid classifier is applied to carry out the sentiment polarity classification of texts. The DNN–MHAT model is tested on four reviews and two Twitter datasets. The results of the experiments illustrate the effectiveness of the DNN–MHAT model, which achieved excellent performance compared to the state-of-the-art baseline methods based on short tweets and long reviews.
Collapse
|
23
|
Basiri ME, Nemati S, Abdar M, Asadi S, Acharrya UR. A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets. Knowl Based Syst 2021; 228:107242. [PMID: 36570870 PMCID: PMC9759659 DOI: 10.1016/j.knosys.2021.107242] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Revised: 04/30/2021] [Accepted: 06/15/2021] [Indexed: 12/27/2022]
Abstract
Undoubtedly, coronavirus (COVID-19) has caused one of the biggest challenges of all times. The ongoing COVID-19 pandemic has caused more than 150 million infected cases and one million deaths globally as of May 5, 2021. Understanding the sentiment of people expressed in their social media comments can help in monitoring, controlling, and ultimately eradicating the disease. This is a sensitive matter as the threat of infectious disease significantly affects the way people think and behave in various ways. In this study, we proposed a novel method based on the fusion of four deep learning and one classical supervised machine learning model for sentiment analysis of coronavirus-related tweets from eight countries. Also, we analyzed coronavirus-related searches using Google Trends to better understand the change in the sentiment pattern at different times and places. Our findings reveal that the coronavirus attracted the attention of people from different countries at different times in varying intensities. Also, the sentiment in their tweets is correlated to the news and events that occurred in their countries including the number of newly infected cases, number of recoveries and deaths. Moreover, common sentiment patterns can be observed in various countries during the spread of the virus. We believe that different social media platforms have great impact on raising people's awareness about the importance of this disease as well as promoting preventive measures among people in the community.
Collapse
Affiliation(s)
| | - Shahla Nemati
- Department of Computer Engineering, Shahrekord University, Shahrekord, Iran
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Somayeh Asadi
- Department of Architectural Engineering, Pennsylvania State University, 104 Engineering Unit A, University Park, PA, 16802, USA
| | - U Rajendra Acharrya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Clementi, Singapore
- Department Bioinformatics and Medical Engineering, Asia University, Taiwan
- International Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan
| |
Collapse
|
24
|
|
25
|
Social Media Behavior and Emotional Evolution during Emergency Events. Healthcare (Basel) 2021; 9:healthcare9091109. [PMID: 34574883 PMCID: PMC8469477 DOI: 10.3390/healthcare9091109] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 08/22/2021] [Accepted: 08/23/2021] [Indexed: 11/17/2022] Open
Abstract
Online social networks have recently become a vital source for emergency event news and the consequent venting of emotions. However, knowledge on what drives user emotion and behavioral responses to emergency event developments are still limited. Therefore, unlike previous studies that have only explored trending themes and public sentiment in social media, this study sought to develop a holistic framework to assess the impact of emergency developments on emotions and behavior by exploring the evolution of trending themes and public sentiments in social media posts as a focal event developed. By examining the event timelines and the associated hashtags on the popular Chinese social media site Sina-Weibo, the 2019 Wuxi viaduct collapse accident was taken as the research object and the event timeline and the Sina-Weibo tagging function focused on to analyze the behaviors and emotional changes in the social media users and elucidate the correlations. It can conclude that: (i) There were some social media rules being adhered to and that new focused news from the same event impacted user behavior and the popularity of previous thematic discussions. (ii) While the most critical function for users appeared to express their emotions, the user foci changed when recent focus news emerged. (iii) As the news of the collapse deepened, the change in user sentiment was found to be positively correlated with the information released by personal-authentication accounts. This research provides a new perspective on the extraction of information from social media platforms in emergencies and social-emotional transmission rules.
Collapse
|
26
|
Satu MS, Khan MI, Mahmud M, Uddin S, Summers MA, Quinn JMW, Moni MA. TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets. Knowl Based Syst 2021; 226:107126. [PMID: 33972817 PMCID: PMC8099549 DOI: 10.1016/j.knosys.2021.107126] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 05/01/2021] [Accepted: 05/03/2021] [Indexed: 01/31/2023]
Abstract
COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations.
Collapse
Affiliation(s)
- Md Shahriare Satu
- Department of Management Information Systems, Noakhali Science & Technology University, Noakhali, 3814, Bangladesh
| | - Md Imran Khan
- Department of Computer Scienc & Engineering, Gono Bishwabidyalay, Savar, Dhaka, 1344, Bangladesh
| | - Mufti Mahmud
- Department of Computer Science, and Medical Technology Innovation Facility, Nottingham Trent University, Clifton Campus, Clifton, Nottingham - NG11 8NS, UK
| | - Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia
| | - Matthew A Summers
- The Garvan Institute of Medical Research, Healthy Ageing Theme, Darlinghurst, NSW 2010, Australia
| | - Julian M W Quinn
- The Garvan Institute of Medical Research, Healthy Ageing Theme, Darlinghurst, NSW 2010, Australia
| | - Mohammad Ali Moni
- The Garvan Institute of Medical Research, Healthy Ageing Theme, Darlinghurst, NSW 2010, Australia.,WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
27
|
|
28
|
Abstract
AbstractChinese word embeddings have recently garnered considerable attention. Chinese characters and their sub-character components, which contain rich semantic information, are incorporated to learn Chinese word embeddings. Chinese characters can represent a combination of meaning, structure, and pronunciation. However, existing embedding learning methods focus on the structure and meaning of Chinese characters. In this study, we aim to develop an embedding learning method that can make complete use of the information represented by Chinese characters, including phonology, morphology, and semantics. Specifically, we propose a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings. Evaluation of word similarity, word analogy reasoning, text classification, and sentiment analysis validate the effectiveness of our proposed method.
Collapse
|
29
|
Distante D, Faralli S, Rittinghaus S, Rosso P, Samsami N. DomainSenticNet: An Ontology and a Methodology Enabling Domain-Aware Sentic Computing. Cognit Comput 2021; 14:62-77. [PMID: 33558822 PMCID: PMC7859726 DOI: 10.1007/s12559-021-09825-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/12/2021] [Indexed: 11/28/2022]
Abstract
In recent years, SenticNet and OntoSenticNet have represented important developments in the novel interdisciplinary field of research known as sentic computing, enabling the development of a variety of Sentic applications. In this paper, we propose an extension of the OntoSenticNet ontology, named DomainSenticNet, and contribute an unsupervised methodology to support the development of domain-aware Sentic applications. We developed an unsupervised methodology that, for each concept in OntoSenticNet, mines semantically related concepts from WordNet and Probase knowledge bases and computes domain distributional information from the entire collection of Kickstarter domain-specific crowdfunding campaigns. Subsequently, we applied DomainSenticNet to a prototype tool for Kickstarter campaign authoring and success prediction, demonstrating an improvement in the interpretability of sentiment intensities. DomainSenticNet is an extension of the OntoSenticNet ontology that integrates each of the 100,000 concepts included in OntoSenticNet with a set of semantically related concepts and domain distributional information. The defined unsupervised methodology is highly replicable and can be easily adapted to build similar domain-aware resources from different domain corpora and external knowledge bases. Used in combination with OntoSenticNet, DomainSenticNet may favor the development of novel hybrid aspect-based sentiment analysis systems and support further research on sentic computing in domain-aware applications.
Collapse
Affiliation(s)
| | | | - Steve Rittinghaus
- Independent researcher, Freelancer Digital Transformation, Baden-Wurttemberg, Germany
| | - Paolo Rosso
- Universitat Politcnica de Valncia, Valencia, Spain
| | - Nima Samsami
- Independent researcher, Software Architect, Baden-Wurttemberg, Germany
| |
Collapse
|
30
|
Abstract
AbstractDetection of mental disorders from textual input is an emerging field for applied machine and deep learning methods. Here, we explore the limits of automated detection of autism spectrum disorder (ASD) and schizophrenia (SCZ). We compared the performance of: (1) dedicated diagnostic tools that involve collecting textual data, (2) automated methods applied to the data gathered by these tools, and (3) psychiatrists. Our article tests the effectiveness of several baseline approaches, such as bag of words and dictionary-based vectors, followed by a machine learning model. We employed two more refined Sentic text representations using affective features and concept-level analysis on texts. Further, we applied selected state-of-the-art deep learning methods for text representation and inference, as well as experimented with transfer and zero-shot learning. Finally, we also explored few-shot methods dedicated to low data size scenarios, which is a typical problem for the clinical setting. The best breed of automated methods outperformed human raters (psychiatrists). Cross-dataset approaches turned out to be useful (only from SCZ to ASD) despite different data types. The few-shot learning methods revealed promising results on the SCZ dataset. However, more effort is needed to explore the approaches to efficiently training models, given the very limited amounts of labeled clinical data. Psychiatry is one of the few medical fields in which the diagnosis of most disorders is based on the subjective assessment of a psychiatrist. Therefore, the introduction of objective tools supporting diagnostics seems to be pivotal. This paper is a step in this direction.
Collapse
|
31
|
Does Twitter Affect Stock Market Decisions? Financial Sentiment Analysis During Pandemics: A Comparative Study of the H1N1 and the COVID-19 Periods. Cognit Comput 2021; 14:372-387. [PMID: 33520006 PMCID: PMC7825382 DOI: 10.1007/s12559-021-09819-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 01/05/2021] [Indexed: 11/15/2022]
Abstract
Investors are constantly aware of the behaviour of stock markets. This affects their emotions and motivates them to buy or sell shares. Financial sentiment analysis allows us to understand the effect of social media reactions and emotions on the stock market and vice versa. In this research, we analyse Twitter data and important worldwide financial indices to answer the following question: How does the polarity generated by Twitter posts influence the behaviour of financial indices during pandemics? This study is based on the financial sentiment analysis of influential Twitter accounts and its relationship with the behaviour of important financial indices. To carry out this analysis, we used fundamental and technical financial analysis combined with a lexicon-based approach on financial Twitter accounts. We calculated the correlations between the polarities of financial market indicators and posts on Twitter by applying a date shift on tweets. In addition, correlations were identified days before and after the existing posts on financial Twitter accounts. Our findings show that the markets reacted 0 to 10 days after the information was shared and disseminated on Twitter during the COVID-19 pandemic and 0 to 15 days after the information was shared and disseminated on Twitter during the H1N1 pandemic. We identified an inverse relationship: Twitter accounts presented reactions to financial market behaviour within a period of 0 to 11 days during the H1N1 pandemic and 0 to 6 days during the COVID-19 pandemic. We also found that our method is better at detecting highly shifted correlations by using SenticNet compared with other lexicons. With SenticNet, it is possible to detect correlations even on the same day as the Twitter posts. The most influential Twitter accounts during the period of the pandemic were The New York Times, Bloomberg, CNN News and Investing.com, presenting a very high correlation between sentiments on Twitter and stock market behaviour. The combination of a lexicon-based approach is enhanced by a shifted correlation analysis, as latent or hidden correlations can be found in data.
Collapse
|
32
|
El Akrouchi M, Benbrahim H, Kassou I. End-to-end LDA-based automatic weak signal detection in web news. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
33
|
Mirlohi Falavarjani SA, Jovanovic J, Fani H, Ghorbani AA, Noorian Z, Bagheri E. On the causal relation between real world activities and emotional expressions of social media users. J Assoc Inf Sci Technol 2020. [DOI: 10.1002/asi.24440] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
34
|
Rodriguez A, Okamura K. Enhancing data quality in real-time threat intelligence systems using machine learning. SOCIAL NETWORK ANALYSIS AND MINING 2020. [DOI: 10.1007/s13278-020-00707-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
35
|
Pang PCI, McKay D, Chang S, Chen Q, Zhang X, Cui L. Privacy concerns of the Australian My Health Record: Implications for other large-scale opt-out personal health records. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102364] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Ba Z, Zhao Y, Zhou L, Song S. Exploring the donation allocation of online charitable crowdfunding based on topical and spatial analysis: Evidence from the Tencent GongYi. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102322] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
37
|
Grzeça M, Becker K, Galante R. Drink2Vec: Improving the classification of alcohol-related tweets using distributional semantics and external contextual enrichment. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
He X, Meng X, Wu Y, Chan CS, Pang T. Semantic Matching Efficiency of Supply and Demand Texts on Online Technology Trading Platforms: Taking the Electronic Information of Three Platforms as an Example. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102258] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
Al-Rakhami MS, Al-Amri AM. Lies Kill, Facts Save: Detecting COVID-19 Misinformation in Twitter. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:155961-155970. [PMID: 34192115 PMCID: PMC8043503 DOI: 10.1109/access.2020.3019600] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 08/21/2020] [Indexed: 05/05/2023]
Abstract
Online social networks (ONSs) such as Twitter have grown to be very useful tools for the dissemination of information. However, they have also become a fertile ground for the spread of false information, particularly regarding the ongoing coronavirus disease 2019 (COVID-19) pandemic. Best described as an infodemic, there is a great need, now more than ever, for scientific fact-checking and misinformation detection regarding the dangers posed by these tools with regards to COVID-19. In this article, we analyze the credibility of information shared on Twitter pertaining the COVID-19 pandemic. For our analysis, we propose an ensemble-learning-based framework for verifying the credibility of a vast number of tweets. In particular, we carry out analyses of a large dataset of tweets conveying information regarding COVID-19. In our approach, we classify the information into two categories: credible or non-credible. Our classifications of tweet credibility are based on various features, including tweet- and user-level features. We conduct multiple experiments on the collected and labeled dataset. The results obtained with the proposed framework reveal high accuracy in detecting credible and non-credible tweets containing COVID-19 information.
Collapse
Affiliation(s)
- Mabrook S. Al-Rakhami
- Research Chair of Pervasive and Mobile ComputingKing Saud UniversityRiyadh11543Saudi Arabia
- Information Systems DepartmentCollege of Computer and Information SciencesKing Saud UniversityRiyadh11543Saudi Arabia
| | - Atif M. Al-Amri
- Research Chair of Pervasive and Mobile ComputingKing Saud UniversityRiyadh11543Saudi Arabia
- Software Engineering DepartmentCollege of Computer and Information SciencesKing Saud UniversityRiyadh11543Saudi Arabia
| |
Collapse
|
40
|
Jia Q, Guo Y, Wang G, Barnes SJ. Big Data Analytics in the Fight against Major Public Health Incidents (Including COVID-19): A Conceptual Framework. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E6161. [PMID: 32854265 PMCID: PMC7503476 DOI: 10.3390/ijerph17176161] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/16/2022]
Abstract
Major public health incidents such as COVID-19 typically have characteristics of being sudden, uncertain, and hazardous. If a government can effectively accumulate big data from various sources and use appropriate analytical methods, it may quickly respond to achieve optimal public health decisions, thereby ameliorating negative impacts from a public health incident and more quickly restoring normality. Although there are many reports and studies examining how to use big data for epidemic prevention, there is still a lack of an effective review and framework of the application of big data in the fight against major public health incidents such as COVID-19, which would be a helpful reference for governments. This paper provides clear information on the characteristics of COVID-19, as well as key big data resources, big data for the visualization of pandemic prevention and control, close contact screening, online public opinion monitoring, virus host analysis, and pandemic forecast evaluation. A framework is provided as a multidimensional reference for the effective use of big data analytics technology to prevent and control epidemics (or pandemics). The challenges and suggestions with respect to applying big data for fighting COVID-19 are also discussed.
Collapse
Affiliation(s)
- Qiong Jia
- Department of Management, Hohai Business School, Hohai University, Nanjing 211100, China; (Q.J.); (G.W.)
| | - Yue Guo
- The Department of Information System and Management Engineering, Faculty of Business, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen 518055, China;
| | - Guanlin Wang
- Department of Management, Hohai Business School, Hohai University, Nanjing 211100, China; (Q.J.); (G.W.)
| | - Stuart J. Barnes
- CODA Research Centre, King’s Business School, King’s College London, Bush House, 30 Aldwych, London WC2B 4BG, UK
| |
Collapse
|
41
|
López-Santillán R, Montes-Y-Gómez M, González-Gurrola LC, Ramírez-Alonso G, Prieto-Ordaz O. Richer Document Embeddings for Author Profiling tasks based on a heuristic search. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102227] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
42
|
AL-Sharuee MT, Liu F, Pratama M. Sentiment analysis: dynamic and temporal clustering of product reviews. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01668-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
|
44
|
Zhang F. A hybrid structured deep neural network with Word2Vec for construction accident causes classification. INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 2019. [DOI: 10.1080/15623599.2019.1683692] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Fan Zhang
- Departments of Microdata Analysis and Energy Technology, Dalarna University, Falun, Sweden
| |
Collapse
|
45
|
Mirlohi Falavarjani SA, Zarrinkalam F, Jovanovic J, Bagheri E, Ghorbani AA. The reflection of offline activities on users’ online social behavior: An observational study. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2019.102070] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
46
|
Edo-Osagie O, Smith G, Lake I, Edeghere O, De La Iglesia B. Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance. PLoS One 2019; 14:e0210689. [PMID: 31318885 PMCID: PMC6638773 DOI: 10.1371/journal.pone.0210689] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 06/13/2019] [Indexed: 11/19/2022] Open
Abstract
We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome-asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet's tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions.
Collapse
Affiliation(s)
- Oduwa Edo-Osagie
- School of Computing Science, University of East Anglia, Norwich, Norfolk, United Kingdom
| | - Gillian Smith
- Real-time Syndromic Surveillance Team, National Infection Service, Public Health England, Birmingham, United Kingdom
| | - Iain Lake
- School of Environmental Sciences, University of East Anglia, Norwich, Norfolk, United Kingdom
| | - Obaghe Edeghere
- Epidemiology West Midlands, Field Service, National Infection Service, Public Health England, Birmingham, United Kingdom
| | - Beatriz De La Iglesia
- School of Computing Science, University of East Anglia, Norwich, Norfolk, United Kingdom
| |
Collapse
|