1
|
Wang J. Evaluation and analysis of visual perception using attention-enhanced computation in multimedia affective computing. Front Neurosci 2024; 18:1449527. [PMID: 39170679 PMCID: PMC11335721 DOI: 10.3389/fnins.2024.1449527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 07/11/2024] [Indexed: 08/23/2024] Open
Abstract
Facial expression recognition (FER) plays a crucial role in affective computing, enhancing human-computer interaction by enabling machines to understand and respond to human emotions. Despite advancements in deep learning, current FER systems often struggle with challenges such as occlusions, head pose variations, and motion blur in natural environments. These challenges highlight the need for more robust FER solutions. To address these issues, we propose the Attention-Enhanced Multi-Layer Transformer (AEMT) model, which integrates a dual-branch Convolutional Neural Network (CNN), an Attentional Selective Fusion (ASF) module, and a Multi-Layer Transformer Encoder (MTE) with transfer learning. The dual-branch CNN captures detailed texture and color information by processing RGB and Local Binary Pattern (LBP) features separately. The ASF module selectively enhances relevant features by applying global and local attention mechanisms to the extracted features. The MTE captures long-range dependencies and models the complex relationships between features, collectively improving feature representation and classification accuracy. Our model was evaluated on the RAF-DB and AffectNet datasets. Experimental results demonstrate that the AEMT model achieved an accuracy of 81.45% on RAF-DB and 71.23% on AffectNet, significantly outperforming existing state-of-the-art methods. These results indicate that our model effectively addresses the challenges of FER in natural environments, providing a more robust and accurate solution. The AEMT model significantly advances the field of FER by improving the robustness and accuracy of emotion recognition in complex real-world scenarios. This work not only enhances the capabilities of affective computing systems but also opens new avenues for future research in improving model efficiency and expanding multimodal data integration.
Collapse
Affiliation(s)
- Jingyi Wang
- School of Mass-communication and Advertising, Tongmyong University, Busan, Republic of Korea
| |
Collapse
|
2
|
Quayson E, Ganaa ED, Zhu Q, Shen XJ. Multi-view Representation Induced Kernel Ensemble Support Vector Machine. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11250-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
3
|
Jiang D, Liu H, Wei R, Tu G. CSAT-FTCN: A Fuzzy-Oriented Model with Contextual Self-attention Network for Multimodal Emotion Recognition. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10119-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
4
|
Zhou Y, Bu F. An Overview of Advancements in Lie Detection Technology in Speech. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH 2023. [DOI: 10.4018/ijitsa.316935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Lie detection technology in speech is a process of lying psychological state recognition according speech signal analysis. Normally, the emotion of tension if felt when people lie. As a result, this tension leads to some subtle changes of the sound channel structure; for example, the semantic characteristics, prosodic characteristics, resonance peak, and the psychoacoustics parameters all can be different from before. In this paper, the development situation of current lie detection technology is presented. Several public speech databases for lie detection are also introduced. Then, the research situation of feature expression, selection, and extraction for lie detection is described. In addition, the research progress of lie detection algorithm is highlighted. Finally, the future direction and the existing problems of lie detection technology in speech are summarized.
Collapse
Affiliation(s)
- Yan Zhou
- Suzhou Vocational University, China
| | - Feng Bu
- Suzhou Vocational University, China
| |
Collapse
|
5
|
Rahmani S, Hosseini S, Zall R, Kangavari MR, Kamran S, Hua W. Transfer-based adaptive tree for multimodal sentiment analysis based on user latent aspects. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
6
|
Cheung TH, Lam KM. Crossmodal bipolar attention for multimodal classification on social media. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Subject independent emotion recognition using EEG and physiological signals – a comparative study. APPLIED COMPUTING AND INFORMATICS 2022. [DOI: 10.1108/aci-03-2022-0080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe aim of this study is to investigate subject independent emotion recognition capabilities of EEG and peripheral physiological signals namely: electroocoulogram (EOG), electromyography (EMG), electrodermal activity (EDA), temperature, plethysmograph and respiration. The experiments are conducted on both modalities independently and in combination. This study arranges the physiological signals in order based on the prediction accuracy obtained on test data using time and frequency domain features.Design/methodology/approachDEAP dataset is used in this experiment. Time and frequency domain features of EEG and physiological signals are extracted, followed by correlation-based feature selection. Classifiers namely – Naïve Bayes, logistic regression, linear discriminant analysis, quadratic discriminant analysis, logit boost and stacking are trained on the selected features. Based on the performance of the classifiers on the test set, the best modality for each dimension of emotion is identified.Findings The experimental results with EEG as one modality and all physiological signals as another modality indicate that EEG signals are better at arousal prediction compared to physiological signals by 7.18%, while physiological signals are better at valence prediction compared to EEG signals by 3.51%. The valence prediction accuracy of EOG is superior to zygomaticus electromyography (zEMG) and EDA by 1.75% at the cost of higher number of electrodes. This paper concludes that valence can be measured from the eyes (EOG) while arousal can be measured from the changes in blood volume (plethysmograph). The sorted order of physiological signals based on arousal prediction accuracy is plethysmograph, EOG (hEOG + vEOG), vEOG, hEOG, zEMG, tEMG, temperature, EMG (tEMG + zEMG), respiration, EDA, while based on valence prediction accuracy the sorted order is EOG (hEOG + vEOG), EDA, zEMG, hEOG, respiration, tEMG, vEOG, EMG (tEMG + zEMG), temperature and plethysmograph.Originality/valueMany of the emotion recognition studies in literature are subject dependent and the limited subject independent emotion recognition studies in the literature report an average of leave one subject out (LOSO) validation result as accuracy. The work reported in this paper sets the baseline for subject independent emotion recognition using DEAP dataset by clearly specifying the subjects used in training and test set. In addition, this work specifies the cut-off score used to classify the scale as low or high in arousal and valence dimensions. Generally, statistical features are used for emotion recognition using physiological signals as a modality, whereas in this work, time and frequency domain features of physiological signals and EEG are used. This paper concludes that valence can be identified from EOG while arousal can be predicted from plethysmograph.
Collapse
|
8
|
An element-wise kernel learning framework. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Cai M, Luo H, Meng X, Cui Y, Wang W. Influence of information attributes on information dissemination in public health emergencies. HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS 2022; 9:257. [PMID: 35967483 PMCID: PMC9361962 DOI: 10.1057/s41599-022-01278-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
When public health emergencies occur, relevant information containing different topics, sentiments, and emotions spread rapidly on social media. From the cognitive and emotional dimensions, this paper explores the relationship between information attributes and information dissemination behavior. At the same time, the moderating role of the media factor (user influence) and the time factor (life cycle) in information attributes and information transmission is also discussed. The results confirm differences in the spread of posts under different topic types, sentiment types, and emotion types on social media. At the same time, the study also found that posts published by users with a high number of followers and users of a media type are more likely to spread on social media. In addition, the study also found that posts with different information attributes are easier to spread on social media during the outbreak and recurrence periods. The driving effect of life cycles is more obvious, especially for topics of prayer and fact, negative sentiment, emotions of fear, and anger. Relevant findings have specific contributions to the information governance of public opinion, the development of social media theory, and the maintenance of network order, which can further weaken the negative impact of information epidemic in the occurrence of public health emergencies, maintain normal social order, and thus create favorable conditions for the further promotion of global recovery.
Collapse
Affiliation(s)
- Meng Cai
- School of Humanities and Social Sciences, Xi’an Jiaotong University, Xi’an, China
| | - Han Luo
- School of Humanities and Social Sciences, Xi’an Jiaotong University, Xi’an, China
| | - Xiao Meng
- School of Journalism and New Media, Xi’an Jiaotong University, Xi’an, China
| | - Ying Cui
- School of Mechano-Electronic Engineering, Xidian University, Xi’an, China
| | - Wei Wang
- School of Public Health, Chongqing Medical University, Chongqing, China
| |
Collapse
|
10
|
Storey VC, O’Leary DE. Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication. Cognit Comput 2022:1-24. [PMID: 35915743 PMCID: PMC9330938 DOI: 10.1007/s12559-022-10025-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/08/2022] [Indexed: 11/09/2022]
Abstract
Scientists and regular citizens alike search for ways to manage the widespread effects of the COVID-19 pandemic. While scientists are busy in their labs, other citizens often turn to online sources to report their experiences and concerns and to seek and share knowledge of the virus. The text generated by those users in online social media platforms can provide valuable insights about evolving users' opinions and attitudes. The objective of this research is to analyze text of such user disclosures to study human communication during a pandemic in four primary ways. First, we analyze Twitter tweet information, generated throughout the pandemic, to understand users' communications concerning COVID-19 and how those communications have evolved during the pandemic. Second, we analyze linguistic sentiment concepts (analytic, authentic, clout, and tone concepts) in different Twitter settings (sentiment in tweets with pictures or no pictures and tweets versus retweets). Third, we investigate the relationship between Twitter tweets with additional forms of internet activity, namely, Google searches and Wikipedia page views. Finally, we create and use a dictionary of specific COVID-19-related concepts (e.g., symptom of lost taste) to assess how the use of those concepts in tweets are related to the spread of information and the resulting influence of Twitter users. The analysis showed a surprisingly lack of emotion in the initial phases of the pandemic as people were information seeking. As time progressed, there were more expressions of sentiment, including anger. Further, tweets with and without pictures and/or video had statistically significant differences in text sentiment characteristics. Similarly, there were differences between the sentiment in tweets and retweets and tweets. We also found that Google and Wikipedia searches were predictive of sentiment in the tweets. Finally, a variable representing a dictionary of COVID-related concepts was statistically significant when related to users' Twitter influence score and number of retweets, illustrating the general impact of COVID-19 on Twitter and human communication. Overall, the results provide insights into human communication as well as models of human internet and social media use. These findings could be useful for the management of global challenges beyond, or different from, a pandemic.
Collapse
Affiliation(s)
- Veda C. Storey
- Dept. of Computer Information Systems, J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 USA
| | - Daniel E. O’Leary
- Marshall School of Business, University of Southern California, Los Angeles, CA USA
| |
Collapse
|
11
|
Zall R, Kangavari MR. Comparative Analytical Survey on Cognitive Agents with Emotional Intelligence. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10007-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
Ghorbanali A, Sohrabi MK, Yaghmaee F. Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102929] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
13
|
Ali L, He Z, Cao W, Rauf HT, Imrana Y, Bin Heyat MB. MMDD-Ensemble: A Multimodal Data-Driven Ensemble Approach for Parkinson's Disease Detection. Front Neurosci 2021; 15:754058. [PMID: 34790091 PMCID: PMC8591047 DOI: 10.3389/fnins.2021.754058] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
Parkinson's disease (PD) is the second most common neurological disease having no specific medical test for its diagnosis. In this study, we consider PD detection based on multimodal voice data that was collected through two channels, i.e., Smart Phone (SP) and Acoustic Cardioid (AC). Four types of data modalities were collected through each channel, namely sustained phonation (P), speech (S), voiced (V), and unvoiced (U) modality. The contributions of this paper are twofold. First, it explores optimal data modality and features having better information about PD. Second, it proposes a MultiModal Data-Driven Ensemble (MMDD-Ensemble) approach for PD detection. The MMDD-Ensemble has two levels. At the first level, different base classifiers are developed that are driven by multimodal voice data. At the second level, the predictions of the base classifiers are fused using blending and voting methods. In order to validate the robustness of the propose method, six evaluation measures, namely accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC), are adopted. The proposed method outperformed the best results produced by optimal unimodal framework from both the key evaluation aspects, i.e., accuracy and AUC. Furthermore, the proposed method also outperformed other state-of-the-art ensemble models. Experimental results show that the proposed multimodal approach yields 96% accuracy, 100% sensitivity, 88.88% specificity, 0.914 of MCC, and 0.986 of AUC. These results are promising compared to the recently reported results for PD detection based on multimodal voice data.
Collapse
Affiliation(s)
- Liaqat Ali
- Department of Electrical Engineering, University of Science and Technology, Bannu, Pakistan
| | - Zhiquan He
- Guangdong Multimedia Information Service Engineering Technology Research Center, Shenzhen University, Shenzhen, China
| | - Wenming Cao
- Guangdong Multimedia Information Service Engineering Technology Research Center, Shenzhen University, Shenzhen, China
| | - Hafiz Tayyab Rauf
- Faculty of Engineering & Informatics, University of Bradford, Bradford, United Kingdom
| | - Yakubu Imrana
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Md Belal Bin Heyat
- School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
14
|
Leveraging label hierarchy using transfer and multi-task learning: A case study on patent classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
Wu P, Li X, Ling C, Ding S, Shen S. Sentiment classification using attention mechanism and bidirectional long short-term memory network. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107792] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
16
|
Jindal K, Aron R. A Novel Visual-Textual Sentiment Analysis Framework for Social Media Data. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09929-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
|
18
|
Zhou Z, Li Y, Zhang Y, Yin Z, Qi L, Ma R. Residual visualization-guided explainable copy-relationship learning for image copy detection in social networks. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
19
|
|
20
|
Cai M, Luo H, Meng X, Cui Y. Topic-Emotion Propagation Mechanism of Public Emergencies in Social Networks. SENSORS 2021; 21:s21134516. [PMID: 34282784 PMCID: PMC8271428 DOI: 10.3390/s21134516] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/16/2022]
Abstract
The information propagation of emergencies in social networks is often accompanied by the dissemination of the topic and emotion. As a virtual sensor of public emergencies, social networks have been widely used in data mining, knowledge discovery, and machine learning. From the perspective of network, this study aims to explore the topic and emotion propagation mechanism, as well as the interaction and communication relations of the public in social networks under four types of emergencies, including public health events, accidents and disasters, social security events, and natural disasters. Event topics were identified by Word2vec and K-means clustering. The biLSTM model was used to identify emotion in posts. The propagation maps of topic and emotion were presented visually on the network, and the synergistic relationship between topic and emotion propagation as well as the communication characteristics of multiple subjects were analyzed. The results show that there were similarities and differences in the propagation mechanism of topic and emotion in different types of emergencies. There was a positive correlation between topic and emotion of different types of users in social networks in emergencies. Users with a high level of topic influence were often accompanied by a high level of emotion appeal.
Collapse
Affiliation(s)
- Meng Cai
- School of Humanities and Social Sciences, Xi’an Jiaotong University, Xi’an 710049, China;
- Correspondence:
| | - Han Luo
- School of Humanities and Social Sciences, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Xiao Meng
- School of Journalism and New Media, Xi’an Jiaotong University, Xi’an 710049, China;
| | - Ying Cui
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China;
| |
Collapse
|
21
|
Li X, Zhang T, Zhao X, Sun X, Yi Z. Learning fused features with parallel training for person re-identification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106941] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
|
23
|
Li Y, Lu H. Multi-modal constraint propagation via compatible conditional distribution reconstruction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Abstract
In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this paper proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on two common multimodal sentiment datasets demonstrate the effectiveness of the proposed model.
Collapse
|
25
|
Lauriola I, Polato M, Aiolli F. Learning deep kernels in the space of monotone conjunctive polynomials. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
26
|
Joint Sentiment Part Topic Regression Model for Multimodal Analysis. INFORMATION 2020. [DOI: 10.3390/info11100486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The development of multimodal media compensates for the lack of information expression in a single modality and thus gradually becomes the main carrier of sentiment. In this situation, automatic assessment for sentiment information in multimodal contents is of increasing importance for many applications. To achieve this, we propose a joint sentiment part topic regression model (JSP) based on latent Dirichlet allocation (LDA), with a sentiment part, which effectively utilizes the complementary information between the modalities and strengthens the relationship between the sentiment layer and multimodal content. Specifically, a linear regression module is developed to share implicit variables between image–text pairs, so that one modality can predict the other. Moreover, a sentiment label layer is added to model the relationship between sentiment distribution parameters and multimodal contents. Experimental results on several datasets verify the feasibility of our proposed approach for multimodal sentiment analysis.
Collapse
|
27
|
Ren S, Liu F, Zhou W, Feng X, Siddique CN. Group-based local adaptive deep multiple kernel learning with lp norm. PLoS One 2020; 15:e0238535. [PMID: 32941468 PMCID: PMC7498035 DOI: 10.1371/journal.pone.0238535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/18/2020] [Indexed: 12/21/2022] Open
Abstract
The deep multiple kernel Learning (DMKL) method has attracted wide attention due to its better classification performance than shallow multiple kernel learning. However, the existing DMKL methods are hard to find suitable global model parameters to improve classification accuracy in numerous datasets and do not take into account inter-class correlation and intra-class diversity. In this paper, we present a group-based local adaptive deep multiple kernel learning (GLDMKL) method with lp norm. Our GLDMKL method can divide samples into multiple groups according to the multiple kernel k-means clustering algorithm. The learning process in each well-grouped local space is exactly adaptive deep multiple kernel learning. And our structure is adaptive, so there is no fixed number of layers. The learning model in each group is trained independently, so the number of layers of the learning model maybe different. In each local space, adapting the model by optimizing the SVM model parameter α and the local kernel weight β in turn and changing the proportion of the base kernel of the combined kernel in each layer by the local kernel weight, and the local kernel weight is constrained by the lp norm to avoid the sparsity of basic kernel. The hyperparameters of the kernel are optimized by the grid search method. Experiments on UCI and Caltech 256 datasets demonstrate that the proposed method is more accurate in classification accuracy than other deep multiple kernel learning methods, especially for datasets with relatively complex data.
Collapse
Affiliation(s)
- Shengbing Ren
- School of Computer Science and Engineering, Central South University, Changsha, China
- * E-mail:
| | - Fa Liu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Weijia Zhou
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xian Feng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | | |
Collapse
|
28
|
Best Practices of Convolutional Neural Networks for Question Classification. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10144710] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Question Classification (QC) is of primary importance in question answering systems, since it enables extraction of the correct answer type. State-of-the-art solutions for short text classification obtained remarkable results by Convolutional Neural Networks (CNNs). However, implementing such models requires choices, usually based on subjective experience, or on rare works comparing different settings for general text classification, while peculiar solutions should be individuated for QC task, depending on language and on dataset size. Therefore, this work aims at suggesting best practices for QC using CNNs. Different datasets were employed: (i) A multilingual set of labelled questions to evaluate the dependence of optimal settings on language; (ii) a large, widely used dataset for validation and comparison. Numerous experiments were executed, to perform a multivariate analysis, for evaluating statistical significance and influence on QC performance of all the factors (regarding text representation, architectural characteristics, and learning hyperparameters) and some of their interactions, and for finding the most appropriate strategies for QC. Results show the influence of CNN settings on performance. Optimal settings were found depending on language. Tests on different data validated the optimization performed, and confirmed the transferability of the best settings. Comparisons to configurations suggested by previous works highlight the best classification accuracy by those optimized here. These findings can suggest the best choices to configure a CNN for QC.
Collapse
|
29
|
Abstract
Opinion mining in outdoor images posted by users during different activities can provide valuable information to better understand urban areas. In this regard, we propose a framework to classify the sentiment of outdoor images shared by users on social networks. We compare the performance of state-of-the-art ConvNet architectures and one specifically designed for sentiment analysis. We also evaluate how the merging of deep features and semantic information derived from the scene attributes can improve classification and cross-dataset generalization performance. The evaluation explores a novel dataset—namely, OutdoorSent—and other publicly available datasets. We observe that the incorporation of knowledge about semantic attributes improves the accuracy of all ConvNet architectures studied. Besides, we found that exploring only images related to the context of the study—outdoor, in our case—is recommended, i.e., indoor images were not significantly helpful. Furthermore, we demonstrated the applicability of our results in the United States city of Chicago, Illinois, showing that they can help to improve the knowledge of subjective characteristics of different areas of the city. For instance, particular areas of the city tend to concentrate more images of a specific class of sentiment, which are also correlated with median income, opening up opportunities in different fields.
Collapse
Affiliation(s)
| | | | - Rodrigo Minetto
- Universidade Tecnológica Federal do Paraná - UTFPR, Curitiba, Brazil
| | | |
Collapse
|
30
|
Vázquez-Romero A, Gallardo-Antolín A. Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks. ENTROPY 2020; 22:e22060688. [PMID: 33286460 PMCID: PMC7517226 DOI: 10.3390/e22060688] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 06/17/2020] [Accepted: 06/19/2020] [Indexed: 12/29/2022]
Abstract
This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depression Classification Sub-Challenge (DCC) at the 2016 Audio–Visual Emotion Challenge (AVEC-2016). In the pre-processing phase, speech files are represented as a sequence of log-spectrograms and randomly sampled to balance positive and negative samples. For the classification task itself, first, a more suitable architecture for this task, based on One-Dimensional Convolutional Neural Networks, is built. Secondly, several of these CNN-based models are trained with different initializations and then the corresponding individual predictions are fused by using an Ensemble Averaging algorithm and combined per speaker to get an appropriate final decision. The proposed ensemble system achieves satisfactory results on the DCC at the AVEC-2016 in comparison with a reference system based on Support Vector Machines and hand-crafted features, with a CNN+LSTM-based system called DepAudionet, and with the case of a single CNN-based classifier.
Collapse
|
31
|
Bairavel S, Krishnamurthy M. Novel OGBEE-based feature selection and feature-level fusion with MLP neural network for social media multimodal sentiment analysis. Soft comput 2020. [DOI: 10.1007/s00500-020-05049-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
32
|
Ensemble Deep Learning for Multilabel Binary Classification of User-Generated Content. ALGORITHMS 2020. [DOI: 10.3390/a13040083] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Sentiment analysis usually refers to the analysis of human-generated content via a polarity filter. Affective computing deals with the exact emotions conveyed through information. Emotional information most frequently cannot be accurately described by a single emotion class. Multilabel classifiers can categorize human-generated content in multiple emotional classes. Ensemble learning can improve the statistical, computational and representation aspects of such classifiers. We present a baseline stacked ensemble and propose a weighted ensemble. Our proposed weighted ensemble can use multiple classifiers to improve classification results without hyperparameter tuning or data overfitting. We evaluate our ensemble models with two datasets. The first dataset is from Semeval2018-Task 1 and contains almost 7000 Tweets, labeled with 11 sentiment classes. The second dataset is the Toxic Comment Dataset with more than 150,000 comments, labeled with six different levels of abuse or harassment. Our results suggest that ensemble learning improves classification results by 1.5 % to 5.4 % .
Collapse
|
33
|
An efficient model-level fusion approach for continuous affect recognition from audiovisual signals. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
34
|
Kumar A, Srinivasan K, Cheng WH, Zomaya AY. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2019.102141] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
A memetic algorithm using emperor penguin and social engineering optimization for medical data classification. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105773] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
36
|
Chen MY, Liao CH, Hsieh RP. Modeling public mood and emotion: Stock market trend prediction with anticipatory computing approach. COMPUTERS IN HUMAN BEHAVIOR 2019. [DOI: 10.1016/j.chb.2019.03.021] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
37
|
Wang Z, Wang B, Cheng Y, Li D, Zhang J. Cost-sensitive Fuzzy Multiple Kernel Learning for imbalanced problem. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.06.065] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
38
|
Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y. Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.04.018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
39
|
Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:2537689. [PMID: 30936911 PMCID: PMC6413408 DOI: 10.1155/2019/2537689] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 01/18/2019] [Accepted: 01/30/2019] [Indexed: 11/22/2022]
Abstract
In recent years, convolutional neural network (CNN) has attracted considerable attention since its impressive performance in various applications, such as Arabic sentence classification. However, building a powerful CNN for Arabic sentiment classification can be highly complicated and time consuming. In this paper, we address this problem by combining differential evolution (DE) algorithm and CNN, where DE algorithm is used to automatically search the optimal configuration including CNN architecture and network parameters. In order to achieve the goal, five CNN parameters are searched by the DE algorithm which include convolution filter sizes that control the CNN architecture, number of filters per convolution filter size (NFCS), number of neurons in fully connected (FC) layer, initialization mode, and dropout rate. In addition, the effect of the mutation and crossover operators in DE algorithm were investigated. The performance of the proposed framework DE-CNN is evaluated on five Arabic sentiment datasets. Experiments' results show that DE-CNN has higher accuracy and is less time consuming than the state-of-the-art algorithms.
Collapse
|
40
|
Wang D, Mao K. Task-generic semantic convolutional neural network for web text-aided image classification. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.042] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
41
|
Wang Z, Yan X, Jiang W, Sun M. Two-Way Affective Modeling for Hidden Movie Highlights' Extraction. SENSORS (BASEL, SWITZERLAND) 2018; 18:s18124241. [PMID: 30513936 PMCID: PMC6308599 DOI: 10.3390/s18124241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 11/21/2018] [Accepted: 11/23/2018] [Indexed: 06/09/2023]
Abstract
Movie highlights are composed of video segments that induce a steady increase of the audience's excitement. Automatic movie highlights' extraction plays an important role in content analysis, ranking, indexing, and trailer production. To address this challenging problem, previous work suggested a direct mapping from low-level features to high-level perceptual categories. However, they only considered the highlight as intense scenes, like fighting, shooting, and explosions. Many hidden highlights are ignored because their low-level features' values are too low. Driven by cognitive psychology analysis, combined top-down and bottom-up processing is utilized to derive the proposed two-way excitement model. Under the criteria of global sensitivity and local abnormality, middle-level features are extracted in excitement modeling to bridge the gap between the feature space and the high-level perceptual space. To validate the proposed approach, a group of well-known movies covering several typical types is employed. Quantitative assessment using the determined excitement levels has indicated that the proposed method produces promising results in movie highlights' extraction, even if the response in the low-level audio-visual feature space is low.
Collapse
Affiliation(s)
- Zheng Wang
- Division of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Xinyu Yan
- Division of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Wei Jiang
- Division of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China.
| | - Meijun Sun
- Division of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
42
|
Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.07.041] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
43
|
|
44
|
Huang N, Slaney M, Elhilali M. Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals. Front Neurosci 2018; 12:532. [PMID: 30154688 PMCID: PMC6102345 DOI: 10.3389/fnins.2018.00532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Accepted: 07/16/2018] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Malcolm Slaney
- Machine Hearing, Google AI, Google (United States), Mountain View, CA, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
45
|
Abu-Salih B, Wongthongtham P, Chan KY, Zhu D. CredSaT: Credibility ranking of users in big social data incorporating semantic analysis and temporal factor. J Inf Sci 2018. [DOI: 10.1177/0165551518790424] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The widespread use of big social data has influenced the research community in several significant ways. In particular, the notion of social trust has attracted a great deal of attention from information processors and computer scientists as well as information consumers and formal organisations. This attention is embodied in the various shapes social trust has taken, such as its use in recommendation systems, viral marketing and expertise retrieval. Hence, it is essential to implement frameworks that are able to temporally measure a user’s credibility in all categories of big social data. To this end, this article suggests the CredSaT (Credibility incorporating Semantic analysis and Temporal factor), which is a fine-grained credibility analysis framework for use in big social data. A novel metric that includes both new and current features, as well as the temporal factor, is harnessed to establish the credibility ranking of users. Experiments on real-world datasets demonstrate the efficacy and applicability of our model in determining highly domain-based trustworthy users. Furthermore, CredSaT may also be used to identify spammers and other anomalous users.
Collapse
|
46
|
|
47
|
|
48
|
|