1
|
Liu L, Fang J, Yang L, Han L, Hossin MA, Wen C. The power of talk: Exploring the effects of streamers’ linguistic styles on sales performance in B2B livestreaming commerce. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
2
|
Contextually Enriched Meta-Learning Ensemble Model for Urdu Sentiment Analysis. Symmetry (Basel) 2023. [DOI: 10.3390/sym15030645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023] Open
Abstract
The task of analyzing sentiment has been extensively researched for a variety of languages. However, due to a dearth of readily available Natural Language Processing methods, Urdu sentiment analysis still necessitates additional study by academics. When it comes to text processing, Urdu has a lot to offer because of its rich morphological structure. The most difficult aspect is determining the optimal classifier. Several studies have incorporated ensemble learning into their methodology to boost performance by decreasing error rates and preventing overfitting. However, the baseline classifiers and the fusion procedure limit the performance of the ensemble approaches. This research made several contributions to incorporate the symmetries concept into the deep learning model and architecture: firstly, it presents a new meta-learning ensemble method for fusing basic machine learning and deep learning models utilizing two tiers of meta-classifiers for Urdu. The proposed ensemble technique combines the predictions of both the inter- and intra-committee classifiers on two separate levels. Secondly, a comparison is made between the performance of various committees of deep baseline classifiers and the performance of the suggested ensemble Model. Finally, the study’s findings are expanded upon by contrasting the proposed ensemble approach efficiency with that of other, more advanced ensemble techniques. Additionally, the proposed model reduces complexity, and overfitting in the training process. The results show that the classification accuracy of the baseline deep models is greatly enhanced by the proposed MLE approach.
Collapse
|
3
|
Ding J, Li B, Xu C, Qiao Y, Zhang L. Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04346-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
4
|
Attention-Based RU-BiLSTM Sentiment Analysis Model for Roman Urdu. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073641] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Deep neural networks have emerged as a leading approach towards handling many natural language processing (NLP) tasks. Deep networks initially conquered the problems of computer vision. However, dealing with sequential data such as text and sound was a nightmare for such networks as traditional deep networks are not reliable in preserving contextual information. This may not harm the results in the case of image processing where we do not care about the sequence, but when we consider the data collected from text for processing, such networks may trigger disastrous results. Moreover, establishing sentence semantics in a colloquial text such as Roman Urdu is a challenge. Additionally, the sparsity and high dimensionality of data in such informal text have encountered a significant challenge for building sentence semantics. To overcome this problem, we propose a deep recurrent architecture RU-BiLSTM based on bidirectional LSTM (BiLSTM) coupled with word embedding and an attention mechanism for sentiment analysis of Roman Urdu. Our proposed model uses the bidirectional LSTM to preserve the context in both directions and the attention mechanism to concentrate on more important features. Eventually, the last dense softmax output layer is used to acquire the binary and ternary classification results. We empirically evaluated our model on two available datasets of Roman Urdu, i.e., RUECD and RUSA-19. Our proposed model outperformed the baseline models on many grounds, and a significant improvement of 6% to 8% is achieved over baseline models.
Collapse
|
5
|
Multi-class sentiment analysis of urdu text using multilingual BERT. Sci Rep 2022; 12:5436. [PMID: 35361890 PMCID: PMC8971433 DOI: 10.1038/s41598-022-09381-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 03/22/2022] [Indexed: 12/02/2022] Open
Abstract
Sentiment analysis (SA) is an important task because of its vital role in analyzing people’s opinions. However, existing research is solely based on the English language with limited work on low-resource languages. This study introduced a new multi-class Urdu dataset based on user reviews for sentiment analysis. This dataset is gathered from various domains such as food and beverages, movies and plays, software and apps, politics, and sports. Our proposed dataset contains 9312 reviews manually annotated by human experts into three classes: positive, negative and neutral. The main goal of this research study is to create a manually annotated dataset for Urdu sentiment analysis and to set baseline results using rule-based, machine learning (SVM, NB, Adabbost, MLP, LR and RF) and deep learning (CNN-1D, LSTM, Bi-LSTM, GRU and Bi-GRU) techniques. Additionally, we fine-tuned Multilingual BERT(mBERT) for Urdu sentiment analysis. We used four text representations: word n-grams, char n-grams,pre-trained fastText and BERT word embeddings to train our classifiers. We trained these models on two different datasets for evaluation purposes. Finding shows that the proposed mBERT model with BERT pre-trained word embeddings outperformed deep learning, machine learning and rule-based classifiers and achieved an F1 score of 81.49%.
Collapse
|
6
|
Rana TA, Shahzadi K, Rana T, Arshad A, Tubishat M. An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3474119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
During the last two decades, sentiment analysis, also known as opinion mining, has become one of the most explored research areas in Natural Language Processing (NLP) and data mining. Sentiment analysis focuses on the sentiments or opinions of consumers expressed over social media or different web sites. Due to exposure on the Internet, sentiment analysis has attracted vast numbers of researchers over the globe. A large amount of research has been conducted in English, Chinese, and other languages used worldwide. However, Roman Urdu has been neglected despite being the third most used language for communication in the world, covering millions of users around the globe. Although some techniques have been proposed for sentiment analysis in Roman Urdu, these techniques are limited to a specific domain or developed incorrectly due to the unavailability of language resources available for Roman Urdu. Therefore, in this article, we are proposing an unsupervised approach for sentiment analysis in Roman Urdu. First, the proposed model normalizes the text to overcome spelling variations of different words. After normalizing text, we have used Roman Urdu and English opinion lexicons to correctly identify users’ opinions from the text. We have also incorporated negation terms and stemming to assign polarities to each extracted opinion. Furthermore, our model assigns a score to each sentence on the basis of the polarities of extracted opinions and classifies each sentence as positive, negative, or neutral. In order to verify our approach, we have conducted experiments on two publicly available datasets for Roman Urdu and compared our approach with the existing model. Results have demonstrated that our approach outperforms existing models for sentiment analysis tasks in Roman Urdu. Furthermore, our approach does not suffer from domain dependency.
Collapse
Affiliation(s)
- Toqir A. Rana
- Department of Computer Science & IT, The University of Lahore, Lahore, Pakistan and School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia
| | - Kiran Shahzadi
- Department of Software Engineering, The University of Lahore, Lahore, Pakistan
| | - Tauseef Rana
- Department of Computer Software Engineering, MCS, National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Ahsan Arshad
- Department of Computer Science & IT, The University of Lahore, Lahore, Pakistan
| | - Mohammad Tubishat
- School of Information Technology, Skyline University College, Sharjah, United Arab Emirates
| |
Collapse
|
7
|
Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Sentiment analysis (SA) has been an active research subject in the domain of natural language processing due to its important functions in interpreting people’s perspectives and drawing successful opinion-based judgments. On social media, Roman Urdu is one of the most extensively utilized dialects. Sentiment analysis of Roman Urdu is difficult due to its morphological complexities and varied dialects. The purpose of this paper is to evaluate the performance of various word embeddings for Roman Urdu and English dialects using the CNN-LSTM architecture with traditional machine learning classifiers. We introduce a novel deep learning architecture for Roman Urdu and English dialect SA based on two layers: LSTM for long-term dependency preservation and a one-layer CNN model for local feature extraction. To obtain the final classification, the feature maps learned by CNN and LSTM are fed to several machine learning classifiers. Various word embedding models support this concept. Extensive tests on four corpora show that the proposed model performs exceptionally well in Roman Urdu and English text sentiment classification, with an accuracy of 0.904, 0.841, 0.740, and 0.748 against MDPI, RUSA, RUSA-19, and UCL datasets, respectively. The results show that the SVM classifier and the Word2Vec CBOW (Continuous Bag of Words) model are more beneficial options for Roman Urdu sentiment analysis, but that BERT word embedding, two-layer LSTM, and SVM as a classifier function are more suitable options for English language sentiment analysis. The suggested model outperforms existing well-known advanced models on relevant corpora, improving the accuracy by up to 5%.
Collapse
|
8
|
SCNN-Attack: A Side-Channel Attack to Identify YouTube Videos in a VPN and Non-VPN Network Traffic. ELECTRONICS 2022. [DOI: 10.3390/electronics11030350] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Encryption Protocols e.g., HTTPS is utilized to secure the traffic between servers and clients for YouTube and other video streaming services, and to further secure the communication, VPNs are used. However, these protocols are not sufficient to hide the identity of the videos from someone who can sniff the network traffic. The present work explores the methodologies and features to identify the videos in a VPN and non-VPN network traffic. To identify such videos, a side-channel attack using a Sequential Convolution Neural Network is proposed. The results demonstrate that a sequence of bytes per second from even one-minute sniffing of network traffic is sufficient to predict the video with high accuracy. The accuracy is increased to 90% accuracy in the non-VPN, 66% accuracy in the VPN, and 77% in the mixed VPN and non-VPN traffic, for models with two-minute sniffing.
Collapse
|
9
|
Mohammed A, Kora R. An effective ensemble deep learning framework for text classification. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
|
11
|
Ligthart A, Catal C, Tekinerdogan B. Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-09973-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
AbstractWith advanced digitalisation, we can observe a massive increase of user-generated content on the web that provides opinions of people on different subjects. Sentiment analysis is the computational study of analysing people's feelings and opinions for an entity. The field of sentiment analysis has been the topic of extensive research in the past decades. In this paper, we present the results of a tertiary study, which aims to investigate the current state of the research in this field by synthesizing the results of published secondary studies (i.e., systematic literature review and systematic mapping study) on sentiment analysis. This tertiary study follows the guidelines of systematic literature reviews (SLR) and covers only secondary studies. The outcome of this tertiary study provides a comprehensive overview of the key topics and the different approaches for a variety of tasks in sentiment analysis. Different features, algorithms, and datasets used in sentiment analysis models are mapped. Challenges and open problems are identified that can help to identify points that require research efforts in sentiment analysis. In addition to the tertiary study, we also identified recent 112 deep learning-based sentiment analysis papers and categorized them based on the applied deep learning algorithms. According to this analysis, LSTM and CNN algorithms are the most used deep learning algorithms for sentiment analysis.
Collapse
|
12
|
Nawaz A, Bakhtyar M, Baber J, Ullah I, Noor W, Basit A. Extractive Text Summarization Models for Urdu Language. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102383] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
13
|
Safder I, Hassan SU, Visvizi A, Noraset T, Nawaz R, Tuarob S. Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2020.102269] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|