1
|
Mallikarjun A, Shroads E, Newman RS. Perception of vocoded speech in domestic dogs. Anim Cogn 2024; 27:34. [PMID: 38625429 PMCID: PMC11021312 DOI: 10.1007/s10071-024-01869-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Humans have an impressive ability to comprehend signal-degraded speech; however, the extent to which comprehension of degraded speech relies on human-specific features of speech perception vs. more general cognitive processes is unknown. Since dogs live alongside humans and regularly hear speech, they can be used as a model to differentiate between these possibilities. One often-studied type of degraded speech is noise-vocoded speech (sometimes thought of as cochlear-implant-simulation speech). Noise-vocoded speech is made by dividing the speech signal into frequency bands (channels), identifying the amplitude envelope of each individual band, and then using these envelopes to modulate bands of noise centered over the same frequency regions - the result is a signal with preserved temporal cues, but vastly reduced frequency information. Here, we tested dogs' recognition of familiar words produced in 16-channel vocoded speech. In the first study, dogs heard their names and unfamiliar dogs' names (foils) in vocoded speech as well as natural speech. In the second study, dogs heard 16-channel vocoded speech only. Dogs listened longer to their vocoded name than vocoded foils in both experiments, showing that they can comprehend a 16-channel vocoded version of their name without prior exposure to vocoded speech, and without immediate exposure to the natural-speech version of their name. Dogs' name recognition in the second study was mediated by the number of phonemes in the dogs' name, suggesting that phonological context plays a role in degraded speech comprehension.
Collapse
Affiliation(s)
- Amritha Mallikarjun
- Penn Vet Working Dog Center, University of Pennsylvania School of Veterinary Medicine, Philadelphia, USA.
| | - Emily Shroads
- Department of Hearing and Speech Sciences, University of Maryland, College Park, USA
| | - Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park, USA
| |
Collapse
|
2
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
3
|
Bsharat-Maalouf D, Karawani H. Learning and bilingualism in challenging listening conditions: How challenging can it be? Cognition 2022; 222:105018. [PMID: 35032867 DOI: 10.1016/j.cognition.2022.105018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/14/2021] [Accepted: 01/05/2022] [Indexed: 11/19/2022]
Abstract
When speech is presented in their second language (L2), bilinguals have more difficulties with speech perception in noise than monolinguals do. However, how noise affects speech perception of bilinguals in their first language (L1) is still unclear. In addition, it is not clear whether bilinguals' speech perception in challenging listening conditions is specific to the type of degradation, or whether there is a shared mechanism for bilingual speech processing under complex listening conditions. Therefore, the current study examined the speech perception of 60 Arabic-Hebrew bilinguals and a control group of native Hebrew speakers during degraded (speech in noise, vocoded speech) and quiet listening conditions. Between participant comparisons (comparing native Hebrew speakers and bilinguals' perceptual performance in L1) and within participant comparisons (perceptual performance of bilinguals in L1 and L2) were conducted. The findings showed that bilinguals in L1 had more difficulty in noisy conditions than their control counterparts did, even when performed like controls under favorable listening conditions. However, bilingualism did not hinder language learning mechanisms. Bilinguals in L1 outperformed native Hebrew speakers in the perception of vocoded speech, demonstrating more extended learning processes. Bilinguals' perceptual performance in L1 versus L2 varied by task complexity. Correlation analyses revealed that bilinguals who coped better with noise degradation were more successful in perceiving the vocoding distortion. Together, these results provide insights into the mechanisms that contribute to speech perceptual performance in challenging listening conditions and suggest that bilinguals' language proficiency and age of language acquisition are not the only factors that affect performance. Rather, duration of exposure to languages, co-activation, and the ability to benefit from exposure to novel stimuli appear to affect the perceptual performance of bilinguals, even when operating in their dominant language. Our findings suggest that bilinguals use a shared mechanism for speech processing under challenging listening conditions.
Collapse
Affiliation(s)
- Dana Bsharat-Maalouf
- Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel
| | - Hanin Karawani
- Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel.
| |
Collapse
|
4
|
Defenderfer J, Forbes S, Wijeakumar S, Hedrick M, Plyler P, Buss AT. Frontotemporal activation differs between perception of simulated cochlear implant speech and speech in background noise: An image-based fNIRS study. Neuroimage 2021; 240:118385. [PMID: 34256138 PMCID: PMC8503862 DOI: 10.1016/j.neuroimage.2021.118385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 06/10/2021] [Accepted: 07/09/2021] [Indexed: 10/27/2022] Open
Abstract
In this study we used functional near-infrared spectroscopy (fNIRS) to investigate neural responses in normal-hearing adults as a function of speech recognition accuracy, intelligibility of the speech stimulus, and the manner in which speech is distorted. Participants listened to sentences and reported aloud what they heard. Speech quality was distorted artificially by vocoding (simulated cochlear implant speech) or naturally by adding background noise. Each type of distortion included high and low-intelligibility conditions. Sentences in quiet were used as baseline comparison. fNIRS data were analyzed using a newly developed image reconstruction approach. First, elevated cortical responses in the middle temporal gyrus (MTG) and middle frontal gyrus (MFG) were associated with speech recognition during the low-intelligibility conditions. Second, activation in the MTG was associated with recognition of vocoded speech with low intelligibility, whereas MFG activity was largely driven by recognition of speech in background noise, suggesting that the cortical response varies as a function of distortion type. Lastly, an accuracy effect in the MFG demonstrated significantly higher activation during correct perception relative to incorrect perception of speech. These results suggest that normal-hearing adults (i.e., untrained listeners of vocoded stimuli) do not exploit the same attentional mechanisms of the frontal cortex used to resolve naturally degraded speech and may instead rely on segmental and phonetic analyses in the temporal lobe to discriminate vocoded speech.
Collapse
Affiliation(s)
- Jessica Defenderfer
- Speech and Hearing Science, University of Tennessee Health Science Center, Knoxville, TN, United States.
| | - Samuel Forbes
- Psychology, University of East Anglia, Norwich, England.
| | | | - Mark Hedrick
- Speech and Hearing Science, University of Tennessee Health Science Center, Knoxville, TN, United States.
| | - Patrick Plyler
- Speech and Hearing Science, University of Tennessee Health Science Center, Knoxville, TN, United States.
| | - Aaron T Buss
- Psychology, University of Tennessee, Knoxville, TN, United States.
| |
Collapse
|
5
|
Lin IF, Itahashi T, Kashino M, Kato N, Hashimoto RI. Brain activations while processing degraded speech in adults with autism spectrum disorder. Neuropsychologia 2021; 152:107750. [PMID: 33417913 DOI: 10.1016/j.neuropsychologia.2021.107750] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 12/14/2020] [Accepted: 12/31/2020] [Indexed: 11/17/2022]
Abstract
Individuals with autism spectrum disorder (ASD) are found to have difficulties in understanding speech in adverse conditions. In this study, we used noise-vocoded speech (VS) to investigate neural processing of degraded speech in individuals with ASD. We ran fMRI experiments in the ASD group and a typically developed control (TDC) group while they listened to clear speech (CS), VS, and spectrally rotated VS (SRVS), and they were requested to pay attention to the heard sentence and answer whether it was intelligible or not. The VS used in this experiment was spectrally degraded but still intelligible, but the SRVS was unintelligible. We recruited 21 right-handed adult males with ASD and 24 age-matched and right-handed male TDC participants for this experiment. Compared with the TDC group, we observed reduced functional connectivity (FC) between the left dorsal premotor cortex and left temporoparietal junction in the ASD group for the effect of task difficulty in speech processing, computed as VS-(CS + SRVS)/2. Furthermore, the observed reduced FC was negatively correlated with their Autism-Spectrum Quotient scores. This observation supports our hypothesis that the disrupted dorsal stream for attentive process of degraded speech in individuals with ASD might be related to their difficulty in understanding speech in adverse conditions.
Collapse
Affiliation(s)
- I-Fan Lin
- Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa, 243-0124, Japan; Department of Medicine, Taipei Medical University, Taipei, Taiwan, 11031; Department of Occupational Medicine, Shuang Ho Hospital, New Taipei City, Taiwan, 23561.
| | - Takashi Itahashi
- Medical Institute of Developmental Disabilities Research, Showa University Karasuyama Hospital, Tokyo, 157-8577, Japan
| | - Makio Kashino
- Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa, 243-0124, Japan; School of Engineering, Tokyo Institute of Technology, Yokohama, 226-8503, Japan; Graduate School of Education, University of Tokyo, Tokyo, 113-0033, Japan
| | - Nobumasa Kato
- Medical Institute of Developmental Disabilities Research, Showa University Karasuyama Hospital, Tokyo, 157-8577, Japan
| | - Ryu-Ichiro Hashimoto
- Medical Institute of Developmental Disabilities Research, Showa University Karasuyama Hospital, Tokyo, 157-8577, Japan; Department of Language Sciences, Tokyo Metropolitan University, Tokyo, 192-0364, Japan.
| |
Collapse
|
6
|
Jain S, Vipin Ghosh PG. Acoustic simulation of cochlear implant hearing: Effect of manipulating various acoustic parameters on intelligibility of speech. Cochlear Implants Int 2017; 19:46-53. [PMID: 29032744 DOI: 10.1080/14670100.2017.1386384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
OBJECTIVE Cochlear implants process the acoustic speech signal and convert it into electrical impulses. During this processing, many parameters contribute to speech perception. The available literature reviewed the effect of manipulating one or two such parameters on speech intelligibility, but multiple parameters are seldom manipulated. METHOD Acoustic parameters, including pulse rate, number of channels, 'n of m', number of electrodes, and channel spacing, were manipulated in acoustic simulations of cochlear implant hearing and 90 different combinations were created. Speech intelligibility at sentence level was measured using subjective and objective tests. RESULTS Principal component analysis was employed to select only those components with maximum factor loading, thus reducing the number of components to a reasonable limit. Perceptual speech intelligibility was maximum for signal processing manipulation with respect to 'n of m' and rate of pulses/sec. Regression analysis revealed that lower rate (=500 pps/ch) and lesser stimulating electrodes per cycle (2-4) contributed maximally for speech intelligibility. Perceptual estimate of speech quality (PESQ) and composite measures of spectral weights and likelihood ratio correlated with subjective speech intelligibility scores. DISCUSSION The findings are consistent with the literature review, indicating that lesser stimulated channel per cycle reduces electrode interaction and hence improve spectral resolution of speech. Reduced rate of pulses/second enhances temporal resolution of speech. Thus, these two components contribute significantly to speech intelligibility. CONCLUSION Pulse rate/channel and 'n of m' contribute maximally to speech intelligibility, at least in simulations of electric hearing.
Collapse
Affiliation(s)
- Saransh Jain
- a Department of Audiology and Speech Language Pathology , Jagadguru Sri Shivarathreeswara (JSS) Institute of Speech and Hearing, University of Mysore , Mysuru , Karnataka , India
| | - P G Vipin Ghosh
- a Department of Audiology and Speech Language Pathology , Jagadguru Sri Shivarathreeswara (JSS) Institute of Speech and Hearing, University of Mysore , Mysuru , Karnataka , India
| |
Collapse
|
7
|
McMurray B, Farris-Trimble A, Rigler H. Waiting for lexical access: Cochlear implants or severely degraded input lead listeners to process speech less incrementally. Cognition 2017; 169:147-164. [PMID: 28917133 DOI: 10.1016/j.cognition.2017.08.013] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2016] [Revised: 08/27/2017] [Accepted: 08/31/2017] [Indexed: 11/30/2022]
Abstract
Spoken language unfolds over time. Consequently, there are brief periods of ambiguity, when incomplete input can match many possible words. Typical listeners solve this problem by immediately activating multiple candidates which compete for recognition. In two experiments using the visual world paradigm, we examined real-time lexical competition in prelingually deaf cochlear implant (CI) users, and normal hearing (NH) adults listening to severely degraded speech. In Experiment 1, adolescent CI users and NH controls matched spoken words to arrays of pictures including pictures of the target word and phonological competitors. Eye-movements to each referent were monitored asa measure of how strongly that candidate was considered over time. Relative to NH controls, CI users showed a large delay in fixating any object, less competition from onset competitors (e.g., sandwich after hearing sandal), and increased competition from rhyme competitors (e.g., candle after hearing sandal). Experiment 2 observed the same pattern with NH listeners hearing highly degraded speech. These studies suggests that in contrast to all prior studies of word recognition in typical listeners, listeners recognizing words in severely degraded conditions can exhibit a substantively different pattern of dynamics, waiting to begin lexical access until substantial information has accumulated.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; Dept. of Communication Sciences and Disorders, University of Iowa, United States; Dept. of Otolaryngology, University of Iowa, United States; DeLTA Center, University of Iowa, United States.
| | | | - Hannah Rigler
- Dept. of Psychological and Brain Sciences, University of Iowa, United States
| |
Collapse
|