Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim J, Toutios A, Lee S, Narayanan SS. Vocal tract shaping of emotional speech. COMPUT SPEECH LANG 2020;64:101100. [PMID: 32523241 DOI: 10.1016/j.csl.2020.101100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

For:	Kim J, Toutios A, Lee S, Narayanan SS. Vocal tract shaping of emotional speech. COMPUT SPEECH LANG 2020;64:101100. [PMID: 32523241 DOI: 10.1016/j.csl.2020.101100] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

Anikin A, Barreda S, Reby D. A practical guide to calculating vocal tract length and scale-invariant formant patterns. Behav Res Methods 2024;56:5588-5604. [PMID: 38158551 DOI: 10.3758/s13428-023-02288-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2023] [Indexed: 01/03/2024]

Badin P, Sawallis TR, Tabain M, Lamalle L. Bilinguals from Larynx to Lips: Exploring Bilingual Articulatory Strategies with Anatomic MRI Data. LANGUAGE AND SPEECH 2024:238309231224790. [PMID: 38680040 DOI: 10.1177/00238309231224790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]

Chatterjee M, Gajre S, Kulkarni AM, Barrett KC, Limb CJ. Predictors of Emotional Prosody Identification by School-Age Children With Cochlear Implants and Their Peers With Normal Hearing. Ear Hear 2024;45:411-424. [PMID: 37811966 PMCID: PMC10922148 DOI: 10.1097/aud.0000000000001436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]

Abstract

OBJECTIVES

Children with cochlear implants (CIs) vary widely in their ability to identify emotions in speech. The causes of this variability are unknown, but this knowledge will be crucial if we are to design improvements in technological or rehabilitative interventions that are effective for individual patients. The objective of this study was to investigate how well factors such as age at implantation, duration of device experience (hearing age), nonverbal cognition, vocabulary, and socioeconomic status predict prosody-based emotion identification in children with CIs, and how the key predictors in this population compare to children with normal hearing who are listening to either normal emotional speech or to degraded speech.

DESIGN

We measured vocal emotion identification in 47 school-age CI recipients aged 7 to 19 years in a single-interval, 5-alternative forced-choice task. None of the participants had usable residual hearing based on parent/caregiver report. Stimuli consisted of a set of semantically emotion-neutral sentences that were recorded by 4 talkers in child-directed and adult-directed prosody corresponding to five emotions: neutral, angry, happy, sad, and scared. Twenty-one children with normal hearing were also tested in the same tasks; they listened to both original speech and to versions that had been noise-vocoded to simulate CI information processing.

RESULTS

Group comparison confirmed the expected deficit in CI participants' emotion identification relative to participants with normal hearing. Within the CI group, increasing hearing age (correlated with developmental age) and nonverbal cognition outcomes predicted emotion recognition scores. Stimulus-related factors such as talker and emotional category also influenced performance and were involved in interactions with hearing age and cognition. Age at implantation was not predictive of emotion identification. Unlike the CI participants, neither cognitive status nor vocabulary predicted outcomes in participants with normal hearing, whether listening to original speech or CI-simulated speech. Age-related improvements in outcomes were similar in the two groups. Participants with normal hearing listening to original speech showed the greatest differences in their scores for different talkers and emotions. Participants with normal hearing listening to CI-simulated speech showed significant deficits compared with their performance with original speech materials, and their scores also showed the least effect of talker- and emotion-based variability. CI participants showed more variation in their scores with different talkers and emotions than participants with normal hearing listening to CI-simulated speech, but less so than participants with normal hearing listening to original speech.

CONCLUSIONS

Taken together, these results confirm previous findings that pediatric CI recipients have deficits in emotion identification based on prosodic cues, but they improve with age and experience at a rate that is similar to peers with normal hearing. Unlike participants with normal hearing, nonverbal cognition played a significant role in CI listeners' emotion identification. Specifically, nonverbal cognition predicted the extent to which individual CI users could benefit from some talkers being more expressive of emotions than others, and this effect was greater in CI users who had less experience with their device (or were younger) than CI users who had more experience with their device (or were older). Thus, in young prelingually deaf children with CIs performing an emotional prosody identification task, cognitive resources may be harnessed to a greater degree than in older prelingually deaf children with CIs or than children with normal hearing.

Collapse

Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023;10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open

Ruthven M, Miquel ME, King AP. A segmentation-informed deep learning framework to register dynamic two-dimensional magnetic resonance images of the vocal tract during speech. Biomed Signal Process Control 2023;80:104290. [PMID: 36743699 PMCID: PMC9746295 DOI: 10.1016/j.bspc.2022.104290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/29/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022]

Abstract

Objective

Dynamic magnetic resonance (MR) imaging enables visualisation of articulators during speech. There is growing interest in quantifying articulator motion in two-dimensional MR images of the vocal tract, to better understand speech production and potentially inform patient management decisions. Image registration is an established way to achieve this quantification. Recently, segmentation-informed deformable registration frameworks have been developed and have achieved state-of-the-art accuracy. This work aims to adapt such a framework and optimise it for estimating displacement fields between dynamic two-dimensional MR images of the vocal tract during speech.

Methods

A deep-learning-based registration framework was developed and compared with current state-of-the-art registration methods and frameworks (two traditional methods and three deep-learning-based frameworks, two of which are segmentation informed). The accuracy of the methods and frameworks was evaluated using the Dice coefficient (DSC), average surface distance (ASD) and a metric based on velopharyngeal closure. The metric evaluated if the fields captured a clinically relevant and quantifiable aspect of articulator motion.

Results

The segmentation-informed frameworks achieved higher DSCs and lower ASDs and captured more velopharyngeal closures than the traditional methods and the framework that was not segmentation informed. All segmentation-informed frameworks achieved similar DSCs and ASDs. However, the proposed framework captured the most velopharyngeal closures.

Conclusions

A framework was successfully developed and found to more accurately estimate articulator motion than five current state-of-the-art methods and frameworks.

Significance

The first deep-learning-based framework specifically for registering dynamic two-dimensional MR images of the vocal tract during speech has been developed and evaluated.

Collapse

Anikin A, Pisanski K, Reby D. Static and dynamic formant scaling conveys body size and aggression. ROYAL SOCIETY OPEN SCIENCE 2022;9:211496. [PMID: 35242348 PMCID: PMC8753157 DOI: 10.1098/rsos.211496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/09/2021] [Indexed: 05/03/2023]

Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody. DATA 2021. [DOI: 10.3390/data6120130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021;8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open

Affiliation(s)

Yongwan Lim Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Asterios Toutios Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Yannick Bliesener Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Ye Tian Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Sajan Goud Lingala Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Colin Vaz Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Tanner Sorensen Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Miran Oh Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Sarah Harper Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Weiyi Chen Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Yoonjeong Lee Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Johannes Töger Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Mairym Lloréns Monteserin Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Caitlin Smith Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Bianca Godinez Department of Linguistics, California State University Long Beach, Long Beach, California, USA
Louis Goldstein Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Dani Byrd Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
Krishna S Nayak Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
Shrikanth S Narayanan Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA. Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.

Collapse

Ruthven M, Miquel ME, King AP. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;198:105814. [PMID: 33197740 PMCID: PMC7732702 DOI: 10.1016/j.cmpb.2020.105814] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/19/2020] [Indexed: 06/01/2023]

Abstract

BACKGROUND AND OBJECTIVE

Magnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods.

METHODS

Five speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure.

RESULTS

The segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets.

CONCLUSIONS

An automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition, a novel clinically relevant metric for assessing the accuracy of vocal tract and articulator segmentation methods was developed.

Collapse