Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One 2018;13:e0196391. [PMID: 29768426 PMCID: PMC5955500 DOI: 10.1371/journal.pone.0196391] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 04/12/2018] [Indexed: 11/19/2022] Open

For:	Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One 2018;13:e0196391. [PMID: 29768426 PMCID: PMC5955500 DOI: 10.1371/journal.pone.0196391] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 04/12/2018] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Chong CS, Davis C, Kim J. A Cantonese Audio-Visual Emotional Speech (CAVES) dataset. Behav Res Methods 2024;56:5264-5278. [PMID: 38017201 PMCID: PMC11289252 DOI: 10.3758/s13428-023-02270-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2023] [Indexed: 11/30/2023]

von Eiff CI, Kauk J, Schweinberger SR. The Jena Audiovisual Stimuli of Morphed Emotional Pseudospeech (JAVMEPS): A database for emotional auditory-only, visual-only, and congruent and incongruent audiovisual voice and dynamic face stimuli with varying voice intensities. Behav Res Methods 2024;56:5103-5115. [PMID: 37821750 PMCID: PMC11289065 DOI: 10.3758/s13428-023-02249-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/18/2023] [Indexed: 10/13/2023]

Abstract

We describe JAVMEPS, an audiovisual (AV) database for emotional voice and dynamic face stimuli, with voices varying in emotional intensity. JAVMEPS includes 2256 stimulus files comprising (A) recordings of 12 speakers, speaking four bisyllabic pseudowords with six naturalistic induced basic emotions plus neutral, in auditory-only, visual-only, and congruent AV conditions. It furthermore comprises (B) caricatures (140%), original voices (100%), and anti-caricatures (60%) for happy, fearful, angry, sad, disgusted, and surprised voices for eight speakers and two pseudowords. Crucially, JAVMEPS contains (C) precisely time-synchronized congruent and incongruent AV (and corresponding auditory-only) stimuli with two emotions (anger, surprise), (C1) with original intensity (ten speakers, four pseudowords), (C2) and with graded AV congruence (implemented via five voice morph levels, from caricatures to anti-caricatures; eight speakers, two pseudowords). We collected classification data for Stimulus Set A from 22 normal-hearing listeners and four cochlear implant users, for two pseudowords, in auditory-only, visual-only, and AV conditions. Normal-hearing individuals showed good classification performance (McorrAV = .59 to .92), with classification rates in the auditory-only condition ≥ .38 correct (surprise: .67, anger: .51). Despite compromised vocal emotion perception, CI users performed above chance levels of .14 for auditory-only stimuli, with best rates for surprise (.31) and anger (.30). We anticipate JAVMEPS to become a useful open resource for researchers into auditory emotion perception, especially when adaptive testing or calibration of task difficulty is desirable. With its time-synchronized congruent and incongruent stimuli, JAVMEPS can also contribute to filling a gap in research regarding dynamic audiovisual integration of emotion perception via behavioral or neurophysiological recordings.

Collapse

Yue L, Hu P, Zhu J. Advanced differential evolution for gender-aware English speech emotion recognition. Sci Rep 2024;14:17696. [PMID: 39085418 PMCID: PMC11291894 DOI: 10.1038/s41598-024-68864-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 07/29/2024] [Indexed: 08/02/2024] Open

Alroobaea R. Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation. Comput Biol Med 2024;179:108841. [PMID: 39002317 DOI: 10.1016/j.compbiomed.2024.108841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 04/16/2024] [Accepted: 05/06/2024] [Indexed: 07/15/2024]

Abstract

Speech emotion recognition (SER) stands as a prominent and dynamic research field in data science due to its extensive application in various domains such as psychological assessment, mobile services, and computer games, mobile services. In previous research, numerous studies utilized manually engineered features for emotion classification, resulting in commendable accuracy. However, these features tend to underperform in complex scenarios, leading to reduced classification accuracy. These scenarios include: 1. Datasets that contain diverse speech patterns, dialects, accents, or variations in emotional expressions. 2. Data with background noise. 3. Scenarios where the distribution of emotions varies significantly across datasets can be challenging. 4. Combining datasets from different sources introduce complexities due to variations in recording conditions, data quality, and emotional expressions. Consequently, there is a need to improve the classification performance of SER techniques. To address this, a novel SER framework was introduced in this study. Prior to feature extraction, signal preprocessing and data augmentation methods were applied to augment the available data, resulting in the derivation of 18 informative features from each signal. The discriminative feature set was obtained using feature selection techniques which was then utilized as input for emotion recognition using the SAVEE, RAVDESS, and EMO-DB datasets. Furthermore, this research also implemented a cross-corpus model that incorporated all speech files related to common emotions from three datasets. The experimental outcomes demonstrated the superior performance of SER framework compared to existing frameworks in the field. Notably, the framework presented in this study achieved remarkable accuracy rates across various datasets. Specifically, the proposed model obtained an accuracy of 95%, 94%,97%, and 97% on SAVEE, RAVDESS, EMO-DB and cross-corpus datasets respectively. These results underscore the significant contribution of our proposed framework to the field of SER.

Collapse

Munsif M, Sajjad M, Ullah M, Tarekegn AN, Cheikh FA, Tsakanikas P, Muhammad K. Optimized efficient attention-based network for facial expressions analysis in neurological health care. Comput Biol Med 2024;179:108822. [PMID: 38986286 DOI: 10.1016/j.compbiomed.2024.108822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/25/2024] [Accepted: 06/25/2024] [Indexed: 07/12/2024]

Becker C, Conduit R, Chouinard PA, Laycock R. Can deepfakes be used to study emotion perception? A comparison of dynamic face stimuli. Behav Res Methods 2024:10.3758/s13428-024-02443-y. [PMID: 38834812 DOI: 10.3758/s13428-024-02443-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/11/2024] [Indexed: 06/06/2024]

Thomas AL, Assmann PF. Speech production and perception data collection in R: A tutorial for web-based methods using speechcollectr. Behav Res Methods 2024:10.3758/s13428-024-02399-z. [PMID: 38829553 DOI: 10.3758/s13428-024-02399-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 06/05/2024]

Cooper A, Eitel M, Fecher N, Johnson E, Cirelli LK. Who is singing? Voice recognition from spoken versus sung speech. JASA EXPRESS LETTERS 2024;4:065203. [PMID: 38888432 DOI: 10.1121/10.0026385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 06/03/2024] [Indexed: 06/20/2024]

Wurzberger F, Schwenker F. Learning in Deep Radial Basis Function Networks. ENTROPY (BASEL, SWITZERLAND) 2024;26:368. [PMID: 38785617 PMCID: PMC11120405 DOI: 10.3390/e26050368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/19/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024]

Wu D, Jia X, Rao W, Dou W, Li Y, Li B. Construction of a Chinese traditional instrumental music dataset: A validated set of naturalistic affective music excerpts. Behav Res Methods 2024;56:3757-3778. [PMID: 38702502 PMCID: PMC11133124 DOI: 10.3758/s13428-024-02411-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2024] [Indexed: 05/06/2024]

Kim HN, Taylor S. Differences of people with visual disabilities in the perceived intensity of emotion inferred from speech of sighted people in online communication settings. Disabil Rehabil Assist Technol 2024;19:633-640. [PMID: 35997772 DOI: 10.1080/17483107.2022.2114555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 06/17/2022] [Accepted: 08/12/2022] [Indexed: 10/15/2022]

Abstract

PURPOSE

As humans convey information about emotions by speech signals, emotion recognition via auditory information is often employed to assess one's affective states. There are numerous ways of applying the knowledge of emotional vocal expressions to system designs that accommodate users' needs adequately. Yet, little is known about how people with visual disabilities infer emotions from speech stimuli, especially via online platforms (e.g., Zoom). This study focussed on examining the degree to which they perceive emotions strongly or weakly, i.e., perceived intensity but also investigating the degree to which their sociodemographic backgrounds affect them perceiving different intensity levels of emotions when exposed to a set of emotional speech stimuli via Zoom.

MATERIALS AND METHODS

A convenience sample of 30 individuals with visual disabilities participated in zoom interviews. Participants were given a set of emotional speech stimuli and reported the intensity level of the perceived emotions on a rating scale from 1 (weak) to 8 (strong).

RESULTS

When the participants were exposed to the emotional speech stimuli, calm, happy, fearful, sad, and neutral, they reported that neutral was the dominant emotion they perceived with the greatest intensity. Individual differences were also observed in the perceived intensity of emotions, associated with sociodemographic backgrounds, such as health, vision, job, and age.

CONCLUSIONS

The results of this study are anticipated to contribute to the fundamental knowledge that will be helpful for many stakeholders such as voice technology engineers, user experience designers, health professionals, and social workers providing support to people with visual disabilities.IMPLICATIONS FOR REHABILITATIONTechnologies equipped with alternative user interfaces (e.g., Siri, Alexa, and Google Voice Assistant) meeting the needs of people with visual disabilities can promote independent living and quality of life.Such technologies can also be equipped with systems that can recognize emotions via users' voice, such that users can obtain services customized to fit their emotional needs or adequately address their emotional challenges (e.g., early detection of onset, provision of advice, and so on).The results of this study can be beneficial to health professionals (e.g., social workers) who work closely with clients who have visual disabilities (e.g., virtual telehealth sessions) as they could gain insights or learn how to recognize and understand the clients' emotional struggle by hearing their voice, which is contributing to enhancement of emotional intelligence. Thus, they can provide better services to their clients, leading to building a strong bond and trust between health professionals and clients with visual disabilities even they meet virtually (e.g., Zoom).

Collapse

Leung FYN, Stojanovik V, Jiang C, Liu F. Investigating implicit emotion processing in autism spectrum disorder across age groups: A cross-modal emotional priming study. Autism Res 2024;17:824-837. [PMID: 38488319 DOI: 10.1002/aur.3124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 03/01/2024] [Indexed: 04/13/2024]

Krumpholz C, Quigley C, Fusani L, Leder H. Vienna Talking Faces (ViTaFa): A multimodal person database with synchronized videos, images, and voices. Behav Res Methods 2024;56:2923-2940. [PMID: 37950115 PMCID: PMC11133183 DOI: 10.3758/s13428-023-02264-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2023] [Indexed: 11/12/2023]

Sadok S, Leglaive S, Girin L, Alameda-Pineda X, Séguier R. A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Netw 2024;172:106120. [PMID: 38266474 DOI: 10.1016/j.neunet.2024.106120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/25/2023] [Accepted: 01/09/2024] [Indexed: 01/26/2024]

Abstract

High-dimensional data such as natural images or speech signals exhibit some form of regularity, preventing their dimensions from varying independently. This suggests that there exists a lower dimensional latent representation from which the high-dimensional observed data were generated. Uncovering the hidden explanatory features of complex data is the goal of representation learning, and deep latent variable generative models have emerged as promising unsupervised approaches. In particular, the variational autoencoder (VAE) which is equipped with both a generative and an inference model allows for the analysis, transformation, and generation of various types of data. Over the past few years, the VAE has been extended to deal with data that are either multimodal or dynamical (i.e., sequential). In this paper, we present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audiovisual speech representation learning. The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality. A static latent variable is also introduced to encode the information that is constant over time within an audiovisual speech sequence. The model is trained in an unsupervised manner on an audiovisual emotional speech dataset, in two stages. In the first stage, a vector quantized VAE (VQ-VAE) is learned independently for each modality, without temporal modeling. The second stage consists in learning the MDVAE model on the intermediate representation of the VQ-VAEs before quantization. The disentanglement between static versus dynamical and modality-specific versus modality-common information occurs during this second training stage. Extensive experiments are conducted to investigate how audiovisual speech latent factors are encoded in the latent space of MDVAE. These experiments include manipulating audiovisual speech, audiovisual facial image denoising, and audiovisual speech emotion recognition. The results show that MDVAE effectively combines the audio and visual information in its latent space. They also show that the learned static representation of audiovisual speech can be used for emotion recognition with few labeled data, and with better accuracy compared with unimodal baselines and a state-of-the-art supervised model based on an audiovisual transformer architecture.

Collapse

Hsu JH, Wu CH, Lin ECL, Chen PS. MoodSensing: A smartphone app for digital phenotyping and assessment of bipolar disorder. Psychiatry Res 2024;334:115790. [PMID: 38401488 DOI: 10.1016/j.psychres.2024.115790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 01/29/2024] [Accepted: 02/11/2024] [Indexed: 02/26/2024]

Diemerling H, Stresemann L, Braun T, von Oertzen T. Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings. Front Psychol 2024;15:1300996. [PMID: 38572198 PMCID: PMC10987695 DOI: 10.3389/fpsyg.2024.1300996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 02/09/2024] [Indexed: 04/05/2024] Open

Lingelbach K, Vukelić M, Rieger JW. GAUDIE: Development, validation, and exploration of a naturalistic German AUDItory Emotional database. Behav Res Methods 2024;56:2049-2063. [PMID: 37221343 PMCID: PMC10991051 DOI: 10.3758/s13428-023-02135-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 05/25/2023]

Cooper H, Jennings BJ, Kumari V, Willard AK, Bennetts RJ. The association between childhood trauma and emotion recognition is reduced or eliminated when controlling for alexithymia and psychopathy traits. Sci Rep 2024;14:3413. [PMID: 38341493 PMCID: PMC10858958 DOI: 10.1038/s41598-024-53421-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 01/31/2024] [Indexed: 02/12/2024] Open

Islam B, McElwain NL, Li J, Davila MI, Hu Y, Hu K, Bodway JM, Dhekne A, Roy Choudhury R, Hasegawa-Johnson M. Preliminary Technical Validation of LittleBeats™: A Multimodal Sensing Platform to Capture Cardiac Physiology, Motion, and Vocalizations. SENSORS (BASEL, SWITZERLAND) 2024;24:901. [PMID: 38339617 PMCID: PMC10857055 DOI: 10.3390/s24030901] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 01/19/2024] [Accepted: 01/19/2024] [Indexed: 02/12/2024]

Ge Y, Tang C, Li H, Chen Z, Wang J, Li W, Cooper J, Chetty K, Faccio D, Imran M, Abbasi QH. A comprehensive multimodal dataset for contactless lip reading and acoustic analysis. Sci Data 2023;10:895. [PMID: 38092796 PMCID: PMC10719268 DOI: 10.1038/s41597-023-02793-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 11/27/2023] [Indexed: 12/17/2023] Open

Billah MM, Sarker ML, Akhand M. KBES: A dataset for realistic Bangla speech emotion recognition with intensity level. Data Brief 2023;51:109741. [PMID: 37965597 PMCID: PMC10641593 DOI: 10.1016/j.dib.2023.109741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 10/25/2023] [Indexed: 11/16/2023] Open

Abstract

Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets.

Collapse

Won NR, Son YD, Kim SM, Bae S, Kim JH, Kim JH, Han DH. Attention Circuits Mediate the Connection between Emotional Experience and Expression within the Emotional Circuit. CLINICAL PSYCHOPHARMACOLOGY AND NEUROSCIENCE : THE OFFICIAL SCIENTIFIC JOURNAL OF THE KOREAN COLLEGE OF NEUROPSYCHOPHARMACOLOGY 2023;21:715-723. [PMID: 37859444 PMCID: PMC10591168 DOI: 10.9758/cpn.22.1029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 10/21/2023]

Rezapour Mashhadi MM, Osei-Bonsu K. Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest. PLoS One 2023;18:e0291500. [PMID: 37988352 PMCID: PMC10662716 DOI: 10.1371/journal.pone.0291500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/31/2023] [Indexed: 11/23/2023] Open

Li N, Ross R. Invoking and identifying task-oriented interlocutor confusion in human-robot interaction. Front Robot AI 2023;10:1244381. [PMID: 38054199 PMCID: PMC10694506 DOI: 10.3389/frobt.2023.1244381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/31/2023] [Indexed: 12/07/2023] Open

Franca M, Bolognini N, Brysbaert M. Seeing emotions in the eyes: a validated test to study individual differences in the perception of basic emotions. Cogn Res Princ Implic 2023;8:67. [PMID: 37919608 PMCID: PMC10622392 DOI: 10.1186/s41235-023-00521-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/20/2023] [Indexed: 11/04/2023] Open

Caulley D, Alemu Y, Burson S, Cárdenas Bautista E, Abebe Tadesse G, Kottmyer C, Aeschbach L, Cheungvivatpant B, Sezgin E. Objectively Quantifying Pediatric Psychiatric Severity Using Artificial Intelligence, Voice Recognition Technology, and Universal Emotions: Pilot Study for Artificial Intelligence-Enabled Innovation to Address Youth Mental Health Crisis. JMIR Res Protoc 2023;12:e51912. [PMID: 37870890 PMCID: PMC10628686 DOI: 10.2196/51912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/14/2023] [Accepted: 09/18/2023] [Indexed: 10/24/2023] Open

Abstract

BACKGROUND

Providing Psychotherapy, particularly for youth, is a pressing challenge in the health care system. Traditional methods are resource-intensive, and there is a need for objective benchmarks to guide therapeutic interventions. Automated emotion detection from speech, using artificial intelligence, presents an emerging approach to address these challenges. Speech can carry vital information about emotional states, which can be used to improve mental health care services, especially when the person is suffering.

OBJECTIVE

This study aims to develop and evaluate automated methods for detecting the intensity of emotions (anger, fear, sadness, and happiness) in audio recordings of patients' speech. We also demonstrate the viability of deploying the models. Our model was validated in a previous publication by Alemu et al with limited voice samples. This follow-up study used significantly more voice samples to validate the previous model.

METHODS

We used audio recordings of patients, specifically children with high adverse childhood experience (ACE) scores; the average ACE score was 5 or higher, at the highest risk for chronic disease and social or emotional problems; only 1 in 6 have a score of 4 or above. The patients' structured voice sample was collected by reading a fixed script. In total, 4 highly trained therapists classified audio segments based on a scoring process of 4 emotions and their intensity levels for each of the 4 different emotions. We experimented with various preprocessing methods, including denoising, voice-activity detection, and diarization. Additionally, we explored various model architectures, including convolutional neural networks (CNNs) and transformers. We trained emotion-specific transformer-based models and a generalized CNN-based model to predict emotion intensities.

RESULTS

The emotion-specific transformer-based model achieved a test-set precision and recall of 86% and 79%, respectively, for binary emotional intensity classification (high or low). In contrast, the CNN-based model, generalized to predict the intensity of 4 different emotions, achieved test-set precision and recall of 83% for each.

CONCLUSIONS

Automated emotion detection from patients' speech using artificial intelligence models is found to be feasible, leading to a high level of accuracy. The transformer-based model exhibited better performance in emotion-specific detection, while the CNN-based model showed promise in generalized emotion detection. These models can serve as valuable decision-support tools for pediatricians and mental health providers to triage youth to appropriate levels of mental health care services.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

RR1-10.2196/51912.

Collapse

Zhou D, Cheng Y, Wen L, Luo H, Liu Y. Drivers' Comprehensive Emotion Recognition Based on HAM. SENSORS (BASEL, SWITZERLAND) 2023;23:8293. [PMID: 37837124 PMCID: PMC10574905 DOI: 10.3390/s23198293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/30/2023] [Accepted: 10/05/2023] [Indexed: 10/15/2023]

Balel Y, Mercuri LG. Does Emotional State Improve Following Temporomandibular Joint Total Joint Replacement? J Oral Maxillofac Surg 2023;81:1196-1203. [PMID: 37490998 DOI: 10.1016/j.joms.2023.06.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 06/23/2023] [Accepted: 06/26/2023] [Indexed: 07/27/2023]

Abstract

BACKGROUND

Temporomandibular joint total joint replacement (TMJTJR) offers patients the opportunity for improved function and reduced pain. TMJTJR also has the potential to affect a patient's emotions in a positive or negative manner.

PURPOSE

The purpose of this study was to evaluate changes in emotional state for subjects undergoing TMJTJR.

STUDY DESIGN, SETTING, SAMPLE

The authors implemented a retrospective cohort study. Subjects who received TMJTJR were identified from the TMJ Inter Network, which is a study group comprising more than 130 temporomandibular joint surgeons. Subjects between the ages of 18 and 65 years with complete medical records and pre/post TMJTJR video/audio recordings were enrolled in the study.

PREDICTOR VARIABLE

The predictor variable was time (preoperative and postoperative).

MAIN OUTCOME VARIABLES

The primary outcome variable is change in the emotional state. All subjects had preoperative (T0) recorded interview as well as a postoperative (T1) interview at 3 to 6 months. The eight-category emotional state was classified as neutral, happy, sad, angry, fearful, disgusted, surprised, and bored. The three-category emotional state was classified as neutral, positive, and negative. The emotional state was measured using artificial intelligence at T0 and T1. The secondary outcome variable was pain score and maximal interincisal opening.

COVARIATES

The covariates are gender, age, diagnosis, prosthetic side, TMJTJR design, and TMJTJR type.

ANALYSES

The relationship between emotional state change and covariates was examined using both the χ2 test and the Kruskal-Wallis H test. The significance of the change in categorical data after surgery was examined using the McNemar-Bowker test. P values < .05 were considered statistically significant.

RESULTS

Thirty-three subjects were included in the study. The mean age was 30.09 ± 8.69 with 15 males (45%) and 18 females (55%). The percentage of subjects with preoperative neutral, happy, sad, angry, and fearful emotional states was 24, 15, 24, 9, and 27%, respectively. The percentage of subjects with postoperative neutral, happy, sad, angry, and fearful emotional states was 21, 39, 21, 12, and 6%, respectively. The change in emotional state was statistically significant (P = .037). There was no statistically significant relationship between covariates and emotional state changes (P > .05).

CONCLUSION

According to the assessment of artificial intelligence, TMJTJR improves the emotional state of patients.

Collapse

K A, Prasad S, Chakrabarty M. Trait anxiety modulates the detection sensitivity of negative affect in speech: an online pilot study. Front Behav Neurosci 2023;17:1240043. [PMID: 37744950 PMCID: PMC10512416 DOI: 10.3389/fnbeh.2023.1240043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 08/21/2023] [Indexed: 09/26/2023] Open

Şentürk YD, Tavacioglu EE, Duymaz İ, Sayim B, Alp N. The Sabancı University Dynamic Face Database (SUDFace): Development and validation of an audiovisual stimulus set of recited and free speeches with neutral facial expressions. Behav Res Methods 2023;55:3078-3099. [PMID: 36018484 DOI: 10.3758/s13428-022-01951-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2022] [Indexed: 11/08/2022]

Alhinti L, Cunningham S, Christensen H. The Dysarthric Expressed Emotional Database (DEED): An audio-visual database in British English. PLoS One 2023;18:e0287971. [PMID: 37549162 PMCID: PMC10406321 DOI: 10.1371/journal.pone.0287971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 06/19/2023] [Indexed: 08/09/2023] Open

Johnson KT, Narain J, Quatieri T, Maes P, Picard RW. ReCANVo: A database of real-world communicative and affective nonverbal vocalizations. Sci Data 2023;10:523. [PMID: 37543663 PMCID: PMC10404278 DOI: 10.1038/s41597-023-02405-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 07/24/2023] [Indexed: 08/07/2023] Open

Pulatov I, Oteniyazov R, Makhmudov F, Cho YI. Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders. SENSORS (BASEL, SWITZERLAND) 2023;23:6640. [PMID: 37514933 PMCID: PMC10383041 DOI: 10.3390/s23146640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/21/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]

Abstract

Understanding and identifying emotional cues in human speech is a crucial aspect of human-computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

Collapse

Ullah R, Asif M, Shah WA, Anjam F, Ullah I, Khurshaid T, Wuttisittikulkij L, Shah S, Ali SM, Alibakhshikenari M. Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer. SENSORS (BASEL, SWITZERLAND) 2023;23:6212. [PMID: 37448062 DOI: 10.3390/s23136212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/26/2023] [Accepted: 06/04/2023] [Indexed: 07/15/2023]

John V, Kawanishi Y. Progressive Learning of a Multimodal Classifier Accounting for Different Modality Combinations. SENSORS (BASEL, SWITZERLAND) 2023;23:4666. [PMID: 37430579 DOI: 10.3390/s23104666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/09/2023] [Accepted: 05/09/2023] [Indexed: 07/12/2023]

Razzaq MA, Hussain J, Bang J, Hua CH, Satti FA, Rehman UU, Bilal HSM, Kim ST, Lee S. A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions. SENSORS (BASEL, SWITZERLAND) 2023;23:s23094373. [PMID: 37177574 PMCID: PMC10181635 DOI: 10.3390/s23094373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/03/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]

Abstract

Multimodal emotion recognition has gained much traction in the field of affective computing, human-computer interaction (HCI), artificial intelligence (AI), and user experience (UX). There is growing demand to automate analysis of user emotion towards HCI, AI, and UX evaluation applications for providing affective services. Emotions are increasingly being used, obtained through the videos, audio, text or physiological signals. This has led to process emotions from multiple modalities, usually combined through ensemble-based systems with static weights. Due to numerous limitations like missing modality data, inter-class variations, and intra-class similarities, an effective weighting scheme is thus required to improve the aforementioned discrimination between modalities. This article takes into account the importance of difference between multiple modalities and assigns dynamic weights to them by adapting a more efficient combination process with the application of generalized mixture (GM) functions. Therefore, we present a hybrid multimodal emotion recognition (H-MMER) framework using multi-view learning approach for unimodal emotion recognition and introducing multimodal feature fusion level, and decision level fusion using GM functions. In an experimental study, we evaluated the ability of our proposed framework to model a set of four different emotional states (Happiness, Neutral, Sadness, and Anger) and found that most of them can be modeled well with significantly high accuracy using GM functions. The experiment shows that the proposed framework can model emotional states with an average accuracy of 98.19% and indicates significant gain in terms of performance in contrast to traditional approaches. The overall evaluation results indicate that we can identify emotional states with high accuracy and increase the robustness of an emotion classification system required for UX measurement.

Collapse

Affiliation(s)

Muhammad Asif Razzaq Department of Computer Science, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
Jamil Hussain Department of Data Science, Sejong University, Seoul 30019, Republic of Korea
Jaehun Bang Hanwha Corporation/Momentum, Hanwha Building, 86 Cheonggyecheon-ro, Jung-gu, Seoul 04541, Republic of Korea
Cam-Hao Hua Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
Fahad Ahmed Satti Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
Ubaid Ur Rehman Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
Hafiz Syed Muhammad Bilal Department of Computing, School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
Seong Tae Kim Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea
Sungyoung Lee Ubiquitous Computing Lab, Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si 17104, Republic of Korea

Collapse

Heffer N, Dennie E, Ashwin C, Petrini K, Karl A. Multisensory processing of emotional cues predicts intrusive memories after virtual reality trauma. VIRTUAL REALITY 2023;27:2043-2057. [PMID: 37614716 PMCID: PMC10442266 DOI: 10.1007/s10055-023-00784-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 03/03/2023] [Indexed: 08/25/2023]

Tanko D, Demir FB, Dogan S, Sahin SE, Tuncer T. Automated speech emotion polarization for a distance education system based on orbital local binary pattern and an appropriate sub-band selection technique. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-18. [PMID: 37362680 PMCID: PMC10068203 DOI: 10.1007/s11042-023-14648-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 08/02/2022] [Accepted: 02/03/2023] [Indexed: 06/28/2023]

Gong B, Li N, Li Q, Yan X, Chen J, Li L, Wu X, Wu C. The Mandarin Chinese auditory emotions stimulus database: A validated set of Chinese pseudo-sentences. Behav Res Methods 2023;55:1441-1459. [PMID: 35641682 DOI: 10.3758/s13428-022-01868-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2022] [Indexed: 11/08/2022]

Reece A, Cooney G, Bull P, Chung C, Dawson B, Fitzpatrick C, Glazer T, Knox D, Liebscher A, Marin S. The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation. SCIENCE ADVANCES 2023;9:eadf3197. [PMID: 37000886 PMCID: PMC10065445 DOI: 10.1126/sciadv.adf3197] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 03/02/2023] [Indexed: 06/19/2023]

Cronin SL, Lipp OV, Marinovic W. Pupil Dilation During Encoding, But Not Type of Auditory Stimulation, Predicts Recognition Success in Face Memory. Biol Psychol 2023;178:108547. [PMID: 36972756 DOI: 10.1016/j.biopsycho.2023.108547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 03/19/2023] [Accepted: 03/24/2023] [Indexed: 03/29/2023]

Hajek P, Munk M. Speech emotion recognition and text sentiment analysis for financial distress prediction. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08470-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]

Olatinwo DD, Abu-Mahfouz A, Hancke G, Myburgh H. IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients. SENSORS (BASEL, SWITZERLAND) 2023;23:2948. [PMID: 36991659 PMCID: PMC10056097 DOI: 10.3390/s23062948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 02/27/2023] [Accepted: 03/03/2023] [Indexed: 06/19/2023]

Abstract

Internet of things (IoT)-enabled wireless body area network (WBAN) is an emerging technology that combines medical devices, wireless devices, and non-medical devices for healthcare management applications. Speech emotion recognition (SER) is an active research field in the healthcare domain and machine learning. It is a technique that can be used to automatically identify speakers' emotions from their speech. However, the SER system, especially in the healthcare domain, is confronted with a few challenges. For example, low prediction accuracy, high computational complexity, delay in real-time prediction, and how to identify appropriate features from speech. Motivated by these research gaps, we proposed an emotion-aware IoT-enabled WBAN system within the healthcare framework where data processing and long-range data transmissions are performed by an edge AI system for real-time prediction of patients' speech emotions as well as to capture the changes in emotions before and after treatment. Additionally, we investigated the effectiveness of different machine learning and deep learning algorithms in terms of performance classification, feature extraction methods, and normalization methods. We developed a hybrid deep learning model, i.e., convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), and a regularized CNN model. We combined the models with different optimization strategies and regularization techniques to improve the prediction accuracy, reduce generalization error, and reduce the computational complexity of the neural networks in terms of their computational time, power, and space. Different experiments were performed to check the efficiency and effectiveness of the proposed machine learning and deep learning algorithms. The proposed models are compared with a related existing model for evaluation and validation using standard performance metrics such as prediction accuracy, precision, recall, F1 score, confusion matrix, and the differences between the actual and predicted values. The experimental results proved that one of the proposed models outperformed the existing model with an accuracy of about 98%.

Collapse

Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and BiLSTM. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10127-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]

van Rijn P, Larrouy-Maestri P. Modelling individual and cross-cultural variation in the mapping of emotions to speech prosody. Nat Hum Behav 2023;7:386-396. [PMID: 36646838 PMCID: PMC10038802 DOI: 10.1038/s41562-022-01505-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 11/28/2022] [Indexed: 01/18/2023]

Xia W, Zhang Y, Yang Y, Xue JH, Zhou B, Yang MH. GAN Inversion: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023;45:3121-3138. [PMID: 37022469 DOI: 10.1109/tpami.2022.3181070] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Mustaqeem, El Saddik A, Alotaibi FS, Pham NT. AAD-Net: Advanced end-to-end speech signal system for human emotion detection & recognition using attention-based deep echo state network. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]

Ahmad M, Sanawar S, Alfandi O, Qadri SF, Saeed IA, Khan S, Hayat B, Ahmad A. Facial expression recognition using lightweight deep learning modeling. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:8208-8225. [PMID: 37161193 DOI: 10.3934/mbe.2023357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Pucci F, Fedele P, Dimitri GM. Speech emotion recognition with artificial intelligence for contact tracing in the COVID‐19 pandemic. COGNITIVE COMPUTATION AND SYSTEMS 2023. [DOI: 10.1049/ccs2.12076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open

Leung FYN, Stojanovik V, Micai M, Jiang C, Liu F. Emotion recognition in autism spectrum disorder across age groups: A cross-sectional investigation of various visual and auditory communicative domains. Autism Res 2023;16:783-801. [PMID: 36727629 DOI: 10.1002/aur.2896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/19/2023] [Indexed: 02/03/2023]