1
|
Mercha EM, Benbrahim H. Machine Learning and Deep Learning for sentiment analysis across languages: A survey. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
2
|
An ensemble transformer-based model for Arabic sentiment analysis. SOCIAL NETWORK ANALYSIS AND MINING 2022. [DOI: 10.1007/s13278-022-01009-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
3
|
Explicit and implicit oriented Aspect-Based Sentiment Analysis with optimal feature selection and deep learning for demonetization in India. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.102092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
Arabic sentiment analysis using dependency-based rules and deep neural networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM). APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios.
Collapse
|
6
|
Alqurashi T. Stance Analysis of Distance Education in the Kingdom of Saudi Arabia during the COVID-19 Pandemic Using Arabic Twitter Data. SENSORS 2022; 22:s22031006. [PMID: 35161752 PMCID: PMC8839923 DOI: 10.3390/s22031006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 01/21/2022] [Accepted: 01/24/2022] [Indexed: 02/05/2023]
Abstract
The coronavirus has caused significant disruption to people's everyday lives, altering how people live, work, and study. The Kingdom of Saudi Arabia (KSA) reacted very quickly to suppress the spread of the virus even before the first case of COVID-19 was confirmed in the country. In the education sector, all face-to-face activities at public and private schools and universities were suspended, as they switched from traditional to distance learning for the entire 2020 academic year. This study collected 1,846,285 tweets to analyze the public's dynamic opinions towards distance education in the KSA during the 2020 academic year. Several classical machine-learning models and deep-learning models, including ensemble random forest (RF), support vector machine (SVM), adaptive boosting (AdaBoost), multinomial naïve Bayes (MNB), convolutional neural network (CNN), and long short-term memory (LSTM), were tested on this data, and the best-performing models were selected to analyze the public stance towards distance education. Additionally, I correlated my analysis with the major events that were announced by the Ministry of Education (MOE). I observed that people in the KSA took some time to react and express their stances at the start of the academic year. Regarding the news, I observed that any exam-related topic attracted high engagement. In-favor stances increased when news headlines covered the topic of exams compared to other topics. The results show that the primary Saudi public stance favored distance education during the 2020 academic year.
Collapse
Affiliation(s)
- Tahani Alqurashi
- Common First Year Deanship, Computer Science Department, Umm Al-Qura University, Makkah 24382, Saudi Arabia
| |
Collapse
|
7
|
Moudjari L, Benamara F, Akli-Astouati K. Multi-level embeddings for processing Arabic social media contents. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2021.101240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
8
|
BinSaeedan W, Alramlawi S. CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107224] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
9
|
Filter gate network based on multi-head attention for aspect-level sentiment classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.041] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Alzyout M, AL Bashabsheh E, Najadat H, Alaiad A. Sentiment Analysis of Arabic Tweets about Violence Against Women using Machine Learning. 2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS) 2021. [DOI: 10.1109/icics52457.2021.9464600] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
11
|
Abstract
Text classification is a prominent research area, gaining more interest in academia, industry and social media. Arabic is one of the world’s most famous languages and it had a significant role in science, mathematics and philosophy in Europe in the middle ages. During the Arab Spring, social media, that is, Facebook, Twitter and Instagram, played an essential role in establishing, running, and spreading these movements. Arabic Sentiment Analysis (ASA) and Arabic Text Classification (ATC) for these social media tools are hot topics, aiming to obtain valuable Arabic text insights. Although some surveys are available on this topic, the studies and research on Arabic Tweets need to be classified on the basis of machine learning algorithms. Machine learning algorithms and lexicon-based classifications are considered essential tools for text processing. In this paper, a comparison of previous surveys is presented, elaborating the need for a comprehensive study on Arabic Tweets. Research studies are classified according to machine learning algorithms, supervised learning, unsupervised learning, hybrid, and lexicon-based classifications, and their advantages/disadvantages are discussed comprehensively. We pose different challenges and future research directions.
Collapse
|
12
|
|
13
|
Choudrie J, Patil S, Kotecha K, Matta N, Pappas I. Applying and Understanding an Advanced, Novel Deep Learning Approach: A Covid 19, Text Based, Emotions Analysis Study. INFORMATION SYSTEMS FRONTIERS : A JOURNAL OF RESEARCH AND INNOVATION 2021; 23:1431-1465. [PMID: 34188606 PMCID: PMC8225489 DOI: 10.1007/s10796-021-10152-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/24/2021] [Indexed: 05/04/2023]
Abstract
The pandemic COVID 19 has altered individuals' daily lives across the globe. It has led to preventive measures such as physical distancing to be imposed on individuals and led to terms such as 'lockdown,' 'emergency,' or curfew' to emerge in various countries. It has affected society, not only physically and financially, but in terms of emotional wellbeing as well. This distress in the human emotional quotient results from multiple factors such as financial implications, family member's behavior and support, country-specific lockdown protocols, media influence, or fear of the pandemic. For efficient pandemic management, there is a need to understand the emotional variations among individuals, as this will provide insights into public sentiment towards various government pandemic management policies. From our investigations, it was found that individuals have increasingly used different microblogging platforms such as Twitter to remain connected and express their feelings and concerns during the pandemic. However, research in the area of expressed emotional wellbeing during COVID 19 is still growing, which motivated this team to form the aim: To identify, explore and understand globally the emotions expressed during the earlier months of the pandemic COVID 19 by utilizing Deep Learning and Natural language Processing (NLP). For the data collection, over 2 million tweets during February-June 2020 were collected and analyzed using an advanced deep learning technique of Transfer Learning and Robustly Optimized BERT Pretraining Approach (RoBERTa). A Reddit-based standard Emotion Dataset by Crowdflower was utilized for transfer learning. Using RoBERTa and the collated Twitter dataset, a multi-class emotion classifier system was formed. With the implemented methodology, a tweet classification accuracy of 80.33% and an average MCC score of 0.78 was achieved, improving the existing AI-based emotion classification methods. This study explains the novel application of the Roberta model during the pandemic that provided insights into changing emotional wellbeing over time of various citizens worldwide. It also offers novelty for data mining and analytics during this challenging, pandemic era. These insights can be beneficial for formulating effective pandemic management strategies and devising a novel, predictive strategy for the emotional well-being of an entire country's citizens when facing future unexpected exogenous shocks.
Collapse
Affiliation(s)
- Jyoti Choudrie
- University of Hertfordshire, Hertfordshire Business School, Hatfield, Hertfordshire, AL10 9EU UK
| | - Shruti Patil
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH 412115 India
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH 412115 India
| | - Nikhil Matta
- Symbiosis International University, Symbiosis Institute of Technology, Pune, India
| | - Ilias Pappas
- University of Agder: Universitetet i Agder, Kristiansand, Norway
| |
Collapse
|
14
|
Elfaik H, Nfaoui EH. Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text. JOURNAL OF INTELLIGENT SYSTEMS 2020. [DOI: 10.1515/jisys-2020-0021] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Sentiment analysis aims to predict sentiment polarities (positive, negative or neutral) of a given piece of text. It lies at the intersection of many fields such as Natural Language Processing (NLP), Computational Linguistics, and Data Mining. Sentiments can be expressed explicitly or implicitly. Arabic Sentiment Analysis presents a challenge undertaking due to its complexity, ambiguity, various dialects, the scarcity of resources, the morphological richness of the language, the absence of contextual information, and the absence of explicit sentiment words in an implicit piece of text. Recently, deep learning has obviously shown a great success in the field of sentiment analysis and is considered as the state-of-the-art model in Arabic Sentiment Analysis. However, the state-of-the-art accuracy for Arabic sentiment analysis still needs improvements regarding contextual information and implicit sentiment expressed in different real cases. In this paper, an efficient Bidirectional LSTM Network (BiLSTM) is investigated to enhance Arabic Sentiment Analysis, by applying Forward-Backward encapsulate contextual information from Arabic feature sequences. The experimental results on six benchmark sentiment analysis datasets demonstrate that our model achieves significant improvements over the state-of-art deep learning models and the baseline traditional machine learning methods.
Collapse
Affiliation(s)
- Hanane Elfaik
- LISAC Laboratory, Faculty of Sciences Dhar EL Mehraz, Sidi Mohamed Ben Abdellah University , Fez , Morocco
| | - El Habib Nfaoui
- LISAC Laboratory, Faculty of Sciences Dhar EL Mehraz, Sidi Mohamed Ben Abdellah University , Fez , Morocco
| |
Collapse
|
15
|
Badaro G, Hajj H, Habash N. A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet. ACM T ASIAN LOW-RESO 2020. [DOI: 10.1145/3404854] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Success of Natural Language Processing (NLP) models, just like all advanced machine learning models, rely heavily on large -scale lexical resources. For English, English WordNet (EWN) is a leading example of a large-scale resource that has enabled advances in Natural Language Understanding (NLU) tasks such as word sense disambiguation, question answering, sentiment analysis, and emotion recognition. EWN includes sets of cognitive synonyms called synsets, which are interlinked by means of conceptual-semantic and lexical relations and where each synset expresses a distinct concept. However, other languages are still lagging behind in having large-scale and rich lexical resources similar to EWN. In this article, we focus on enabling the development of such resources for Arabic. While there have been efforts in developing an Arabic WordNet (AWN), the current version of AWN has its limitations in size and in lacking transliteration standards, which are important for compatibility with Arabic NLP tools. Previous efforts for extending AWN resulted in a lexicon, called ArSenL, that overcame the size and the transliteration standard limitation but was limited in accuracy due to the heuristic approach that only considered surface matching between the English definitions from the Standard Arabic Morphological Analyzer (SAMA) and EWN synset terms, and that resulted in inaccurate mapping of Arabic lemmas to EWN’s synsets. Furthermore, there has been limited exploration of other expansion methods due to expensive manual validation needed. To address these limitations of simultaneously having large-scale size with high accuracy and standard representations, the mapping problem is formulated as a link prediction problem between a large-scale Arabic lexicon and EWN, where a word in one lexicon is linked to a word in another lexicon if the two words are semantically related. We use a semi-supervised approach to create a training dataset by finding common terms in the large-scale Arabic resource and AWN. This set of data becomes implicitly linked to EWN and can be used for training and evaluating prediction models. We propose the use of a two-step Boosting method, where the first step aims at linking English translations of SAMA’s terms to EWN’s synsets. The second step uses surface similarity between SAMA’s glosses and EWN’s synsets. The method results in a new large-scale Arabic lexicon that we call ArSenL 2.0 as a sequel to the previously developed sentiment lexicon ArSenL. A comprehensive study covering both intrinsic and extrinsic evaluations shows the superiority of the method compared to several baseline and state-of-the-art link prediction methods. Compared to previously developed ArSenL, ArSenL 2.0 included a larger set of sentimentally charged adjectives and verbs. It also showed higher linking accuracy on the ground truth data compared to previous ArSenL. For extrinsic evaluation, ArSenL 2.0 was used for sentiment analysis and showed, here, too, higher accuracy compared to previous ArSenL.
Collapse
|
16
|
Basmmi ABMN, Halim SA, Saadon NA. Comparison of Web Services for Sentiment Analysis in Social Networking Sites. IOP CONFERENCE SERIES: MATERIALS SCIENCE AND ENGINEERING 2020; 884:012063. [DOI: 10.1088/1757-899x/884/1/012063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Abstract
With various type of web services available, it is hard to identify and compare which of the free access web services work best in analysing sentiment of extremist content in social networking sites. For that purpose, a generic approach by working with API of web service using PHP programming language is used to test each dataset that was extracted based on the keyword ‘extremism’. Data from both Twitter and Facebook has been used as these two are the most powerful platforms for expressing one’s feeling. The comparison for web service is done based on the analysis of its accuracy, precision, recall and f-measures in obtaining the lowest score of mean square error (MSE). Four sentiment analysis web services are used which are Sentiment Analyzer, Aylien, ParallelDots, and MonkeyLearn. From the comparison, MonkeyLearn obtained the best final results among all web services with the lowest MSE score of 14%. For the benefit of other researchers, the finding of this will reveal the suitable web service for analysing sentiment issues as critical as extremism.
Collapse
|
17
|
Keyvanpour M, Karimi Zandian Z, Heidarypanah M. OMLML: a helpful opinion mining method based on lexicon and machine learning in social networks. SOCIAL NETWORK ANALYSIS AND MINING 2020. [DOI: 10.1007/s13278-019-0622-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
18
|
|
19
|
A set of parameters for automatically annotating a Sentiment Arabic Corpus. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2019. [DOI: 10.1108/ijwis-03-2019-0008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.Design/methodology/approachThe approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).FindingsThe results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.Originality/valueThe principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.
Collapse
|
20
|
Sarkar K. Sentiment Polarity Detection in Bengali Tweets Using Deep Convolutional Neural Networks. JOURNAL OF INTELLIGENT SYSTEMS 2019. [DOI: 10.1515/jisys-2017-0418] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Sentiment polarity detection is one of the most popular sentiment analysis tasks. Sentiment polarity detection in tweets is a more difficult task than sentiment polarity detection in review documents, because tweets are relatively short and they contain limited contextual information. Although the amount of blog posts, tweets and comments in Indian languages is rapidly increasing on the web, research on sentiment analysis in Indian languages is at the early stage. In this paper, we present an approach that classifies the sentiment polarity of Bengali tweets using deep neural networks which consist of one convolutional layer, one hidden layer and one output layer, which is a soft-max layer. Our proposed approach has been tested on the Bengali tweet dataset released for Sentiment Analysis in Indian Languages contest 2015. We have compared the performance of our proposed convolutional neural networks (CNN)-based model with a sentiment polarity detection model that uses deep belief networks (DBN). Our experiments reveal that the performance of our proposed CNN-based system is better than our implemented DBN-based system and some existing Bengali sentiment polarity detection systems.
Collapse
|
21
|
Abstract
Opinion-mining or sentiment analysis continues to gain interest in industry and academics. While there has been significant progress in developing models for sentiment analysis, the field remains an active area of research for many languages across the world, and in particular for the Arabic language, which is the fifth most-spoken language and has become the fourth most-used language on the Internet. With the flurry of research activity in Arabic opinion mining, several researchers have provided surveys to capture advances in the field. While these surveys capture a wealth of important progress in the field, the fast pace of advances in machine learning and natural language processing (NLP) necessitates a continuous need for a more up-to-date literature survey. The aim of this article is to provide a comprehensive literature survey for state-of-the-art advances in Arabic opinion mining. The survey goes beyond surveying previous works that were primarily focused on classification models. Instead, this article provides a comprehensive system perspective by covering advances in different aspects of an opinion-mining system, including advances in NLP software tools, lexical sentiment and corpora resources, classification models, and applications of opinion mining. It also presents future directions for opinion mining in Arabic. The survey also covers latest advances in the field, including deep learning advances in Arabic Opinion Mining. The article provides state-of-the-art information to help new or established researchers in the field as well as industry developers who aim to deploy an operational complete opinion-mining system. Key insights are captured at the end of each section for particular aspects of the opinion-mining system giving the reader a choice of focusing on particular aspects of interest.
Collapse
|
22
|
Al-Ayyoub M, Khamaiseh AA, Jararweh Y, Al-Kabi MN. A comprehensive survey of arabic sentiment analysis. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.07.006] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
|
24
|
Abooraig R, Al-Zu'bi S, Kanan T, Hawashin B, Al Ayoub M, Hmeidi I. Automatic categorization of Arabic articles based on their political orientation. DIGIT INVEST 2018. [DOI: 10.1016/j.diin.2018.04.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
25
|
SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis. ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS 2018. [DOI: 10.1007/978-3-030-00563-4_54] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
26
|
|
27
|
El-Masri M, Altrabsheh N, Mansour H. Successes and challenges of Arabic sentiment analysis research: a literature review. SOCIAL NETWORK ANALYSIS AND MINING 2017. [DOI: 10.1007/s13278-017-0474-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
28
|
Al-Moslmi T, Albared M, Al-Shabi A, Omar N, Abdullah S. Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis. J Inf Sci 2017. [DOI: 10.1177/0165551516683908] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.
Collapse
Affiliation(s)
- Tareq Al-Moslmi
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
| | - Mohammed Albared
- Faculty of Computer and Information Technology, Sana’a University, Yemen
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
| | - Adel Al-Shabi
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
| | - Nazlia Omar
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
| | - Salwani Abdullah
- Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
| |
Collapse
|
29
|
|
30
|
Abdullah M, Hadzikadic M. Sentiment Analysis on Arabic Tweets: Challenges to Dissecting the Language. SOCIAL COMPUTING AND SOCIAL MEDIA. APPLICATIONS AND ANALYTICS 2017. [DOI: 10.1007/978-3-319-58562-8_15] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
31
|
Erratum to: Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques. Cognit Comput 2016. [DOI: 10.1007/s12559-016-9421-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
32
|
Smadi MA, Obaidat I, Al-Ayyoub M, Mohawesh R, Jararweh Y. Using Enhanced Lexicon-Based Approaches for the Determination of Aspect Categories and Their Polarities in Arabic Reviews. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING 2016. [DOI: 10.4018/ijitwe.2016070102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Sentiment Analysis (SA) is the process of determining the sentiment of a text written in a natural language to be positive, negative or neutral. It is one of the most interesting subfields of natural language processing (NLP) and Web mining due to its diverse applications and the challenges associated with applying it on the massive amounts of textual data available online (especially, on social networks). Most of the current work on SA focus on the English language and work on the sentence-level or the document-level. This work focuses on the less studied version of SA, which is aspect-based SA (ABSA) for the Arabic language. Specifically, this work considers two ABSA tasks: aspect category determination and aspect category polarity determination, and makes use of the publicly available human annotated Arabic dataset (HAAD) along with its baseline experiments conducted by HAAD providers. In this work, several lexicon-based approaches are presented for the two tasks at hand and show that some of the presented approaches significantly outperforms the best-known result on the given dataset. An enhancement of 9% and 46% were achieved in the tasks aspect category determination and aspect category polarity determination respectively.
Collapse
Affiliation(s)
| | - Islam Obaidat
- Jordan University of Science and Technology, Irbid, Jordan
| | - Mahmoud Al-Ayyoub
- Computer Science Department, Jordan University of Science and Technology, Irbid, Jordan
| | - Rami Mohawesh
- Jordan University of Science and Technology, Irbid, Jordan
| | - Yaser Jararweh
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| |
Collapse
|
33
|
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, Zhou Q. Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques. Cognit Comput 2016; 8:757-771. [PMID: 27563360 PMCID: PMC4981629 DOI: 10.1007/s12559-016-9415-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 05/10/2016] [Indexed: 11/21/2022]
Abstract
With the advent of Internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English-language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing approaches on common data. Precision observed in our experiments is typically lower than the one reported by the original authors, which we attribute to the lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.
Collapse
Affiliation(s)
- Kia Dashtipour
- Department of Computing Science and Mathematics, University of Stirling, Stirling, FK9 4LA Scotland, UK
| | - Soujanya Poria
- Temasek Laboratory, Nanyang Technological University, Singapore, Singapore
| | - Amir Hussain
- Department of Computing Science and Mathematics, University of Stirling, Stirling, FK9 4LA Scotland, UK
| | - Erik Cambria
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| | | | | | | |
Collapse
|
34
|
|