101
|
Herrmann B, Johnsrude IS. Absorption and Enjoyment During Listening to Acoustically Masked Stories. Trends Hear 2020; 24:2331216520967850. [PMID: 33143565 DOI: 10.1177/2331216520967850] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Comprehension of speech masked by background sound requires increased cognitive processing, which makes listening effortful. Research in hearing has focused on such challenging listening experiences, in part because they are thought to contribute to social withdrawal in people with hearing impairment. Research has focused less on positive listening experiences, such as enjoyment, despite their potential importance in motivating effortful listening. Moreover, the artificial speech materials-such as disconnected, brief sentences-commonly used to investigate speech intelligibility and listening effort may be ill-suited to capture positive experiences when listening is challenging. Here, we investigate how listening to naturalistic spoken stories under acoustic challenges influences the quality of listening experiences. We assess absorption (the feeling of being immersed/engaged in a story), enjoyment, and listening effort and show that (a) story absorption and enjoyment are only minimally affected by moderate speech masking although listening effort increases, (b) thematic knowledge increases absorption and enjoyment and reduces listening effort when listening to a story presented in multitalker babble, and (c) absorption and enjoyment increase and effort decreases over time as individuals listen to several stories successively in multitalker babble. Our research indicates that naturalistic, spoken stories can reveal several concurrent listening experiences and that expertise in a topic can increase engagement and reduce effort. Our work also demonstrates that, although listening effort may increase with speech masking, listeners may still find the experience both absorbing and enjoyable.
Collapse
Affiliation(s)
- Björn Herrmann
- Rotman Research Institute, Baycrest, Toronto, Ontario, Canada.,Department of Psychology, University of Toronto, Toronto, Ontario, Canada.,Department of Psychology, University of Western Ontario, London, Canada
| | - Ingrid S Johnsrude
- Department of Psychology, University of Western Ontario, London, Canada.,School of Communication Sciences & Disorders, University of Western Ontario, London, Canada
| |
Collapse
|
102
|
Sohoglu E, Davis MH. Rapid computations of spectrotemporal prediction error support perception of degraded speech. eLife 2020; 9:e58077. [PMID: 33147138 PMCID: PMC7641582 DOI: 10.7554/elife.58077] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022] Open
Abstract
Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of SussexBrightonUnited Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences UnitCambridgeUnited Kingdom
| |
Collapse
|
103
|
Banellis L, Sokoliuk R, Wild CJ, Bowman H, Cruse D. Event-related potentials reflect prediction errors and pop-out during comprehension of degraded speech. Neurosci Conscious 2020; 2020:niaa022. [PMID: 33133640 PMCID: PMC7585676 DOI: 10.1093/nc/niaa022] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 07/08/2020] [Accepted: 08/06/2020] [Indexed: 11/20/2022] Open
Abstract
Comprehension of degraded speech requires higher-order expectations informed by prior knowledge. Accurate top-down expectations of incoming degraded speech cause a subjective semantic 'pop-out' or conscious breakthrough experience. Indeed, the same stimulus can be perceived as meaningless when no expectations are made in advance. We investigated the event-related potential (ERP) correlates of these top-down expectations, their error signals and the subjective pop-out experience in healthy participants. We manipulated expectations in a word-pair priming degraded (noise-vocoded) speech task and investigated the role of top-down expectation with a between-groups attention manipulation. Consistent with the role of expectations in comprehension, repetition priming significantly enhanced perceptual intelligibility of the noise-vocoded degraded targets for attentive participants. An early ERP was larger for mismatched (i.e. unexpected) targets than matched targets, indicative of an initial error signal not reliant on top-down expectations. Subsequently, a P3a-like ERP was larger to matched targets than mismatched targets only for attending participants-i.e. a pop-out effect-while a later ERP was larger for mismatched targets and did not significantly interact with attention. Rather than relying on complex post hoc interactions between prediction error and precision to explain this apredictive pattern, we consider our data to be consistent with prediction error minimization accounts for early stages of processing followed by Global Neuronal Workspace-like breakthrough and processing in service of task goals.
Collapse
Affiliation(s)
- Leah Banellis
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| | - Rodika Sokoliuk
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| | - Conor J Wild
- Brain and Mind Institute, University of Western Ontario, London, ON N6A 3K7, Canada
| | - Howard Bowman
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
- School of Computing, University of Kent, Canterbury, Kent CT2 7NF, UK
| | - Damian Cruse
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| |
Collapse
|
104
|
Tinnemore AR, Gordon-Salant S, Goupell MJ. Audiovisual Speech Recognition With a Cochlear Implant and Increased Perceptual and Cognitive Demands. Trends Hear 2020; 24:2331216520960601. [PMID: 33054620 PMCID: PMC7575283 DOI: 10.1177/2331216520960601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Speech recognition in complex environments involves focusing on the most relevant speech signal while ignoring distractions. Difficulties can arise due to the incoming signal's characteristics (e.g., accented pronunciation, background noise, distortion) or the listener's characteristics (e.g., hearing loss, advancing age, cognitive abilities). Listeners who use cochlear implants (CIs) must overcome these difficulties while listening to an impoverished version of the signals available to listeners with normal hearing (NH). In the real world, listeners often attempt tasks concurrent with, but unrelated to, speech recognition. This study sought to reveal the effects of visual distraction and performing a simultaneous visual task on audiovisual speech recognition. Two groups, those with CIs and those with NH listening to vocoded speech, were presented videos of unaccented and accented talkers with and without visual distractions, and with a secondary task. It was hypothesized that, compared with those with NH, listeners with CIs would be less influenced by visual distraction or a secondary visual task because their prolonged reliance on visual cues to aid auditory perception improves the ability to suppress irrelevant information. Results showed that visual distractions alone did not significantly decrease speech recognition performance for either group, but adding a secondary task did. Speech recognition was significantly poorer for accented compared with unaccented speech, and this difference was greater for CI listeners. These results suggest that speech recognition performance is likely more dependent on incoming signal characteristics than a difference in adaptive strategies for managing distractions between those who listen with and without a CI.
Collapse
Affiliation(s)
- Anna R Tinnemore
- Department of Hearing and Speech Sciences, Neuroscience and Cognitive Science Program, University of Maryland at College Park, College Park, United States
| | - Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, Neuroscience and Cognitive Science Program, University of Maryland at College Park, College Park, United States
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, Neuroscience and Cognitive Science Program, University of Maryland at College Park, College Park, United States
| |
Collapse
|
105
|
Roberts B, Summers RJ. Informational masking of speech depends on masker spectro-temporal variation but not on its coherence. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2416. [PMID: 33138537 DOI: 10.1121/10.0002359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100-200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
106
|
Kim DO, Carney L, Kuwada S. Amplitude modulation transfer functions reveal opposing populations within both the inferior colliculus and medial geniculate body. J Neurophysiol 2020; 124:1198-1215. [PMID: 32902353 PMCID: PMC7717166 DOI: 10.1152/jn.00279.2020] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/21/2020] [Accepted: 08/22/2020] [Indexed: 11/22/2022] Open
Abstract
Based on single-unit recordings of modulation transfer functions (MTFs) in the inferior colliculus (IC) and the medial geniculate body (MGB) of the unanesthetized rabbit, we identified two opposing populations: band-enhanced (BE) and band-suppressed (BS) neurons. In response to amplitude-modulated (AM) sounds, firing rates of BE and BS neurons were enhanced and suppressed, respectively, relative to their responses to an unmodulated noise with a one-octave bandwidth. We also identified a third population, designated hybrid neurons, whose firing rates were enhanced by some modulation frequencies and suppressed by others. Our finding suggests that perception of AM may be based on the co-occurrence of enhancement and suppression of responses of the opposing populations of neurons. Because AM carries an important part of the content of speech, progress in understanding auditory processing of AM sounds should lead to progress in understanding speech perception. Each of the BE, BS, and hybrid types of MTFs comprised approximately one-third of the total sample. Modulation envelopes having short duty cycles of 20-50% and raised-sine envelopes accentuated the degree of enhancement and suppression and sharpened tuning of the MTFs. With sinusoidal envelopes, peak modulation frequencies were centered around 32-64 Hz among IC BE neurons, whereas the MGB peak frequencies skewed toward lower frequencies, with a median of 16 Hz. We also tested an auditory-brainstem model and found that a simple circuit containing fast excitatory synapses and slow inhibitory synapses was able to reproduce salient features of the BE- and BS-type MTFs of IC neurons.NEW & NOTEWORTHY Opposing populations of neurons have been identified in the mammalian auditory midbrain and thalamus. In response to amplitude-modulated sounds, responses of one population (band-enhanced) increased whereas responses of another (band-suppressed) decreased relative to their responses to an unmodulated sound. These opposing auditory populations are analogous to the ON and OFF populations of the visual system and may improve transfer of information carried by the temporal envelopes of complex sounds such as speech.
Collapse
Affiliation(s)
- Duck O Kim
- Department of Neuroscience, University of Connecticut Health Center, Farmington, Connecticut
| | - Laurel Carney
- Department of Biomedical Engineering, Neurobiology and Anatomy, University of Rochester, Rochester, New York
| | - Shigeyuki Kuwada
- Department of Neuroscience, University of Connecticut Health Center, Farmington, Connecticut
| |
Collapse
|
107
|
Rotman T, Lavie L, Banai K. Rapid Perceptual Learning: A Potential Source of Individual Differences in Speech Perception Under Adverse Conditions? Trends Hear 2020; 24:2331216520930541. [PMID: 32552477 PMCID: PMC7303778 DOI: 10.1177/2331216520930541] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Challenging listening situations (e.g., when speech is rapid or noisy) result in substantial individual differences in speech perception. We propose that rapid auditory perceptual learning is one of the factors contributing to those individual differences. To explore this proposal, we assessed rapid perceptual learning of time-compressed speech in young adults with normal hearing and in older adults with age-related hearing loss. We also assessed the contribution of this learning as well as that of hearing and cognition (vocabulary, working memory, and selective attention) to the recognition of natural-fast speech (NFS; both groups) and speech in noise (younger adults). In young adults, rapid learning and vocabulary were significant predictors of NFS and speech in noise recognition. In older adults, hearing thresholds, vocabulary, and rapid learning were significant predictors of NFS recognition. In both groups, models that included learning fitted the speech data better than models that did not include learning. Therefore, under adverse conditions, rapid learning may be one of the skills listeners could employ to support speech recognition.
Collapse
Affiliation(s)
- Tali Rotman
- Department of Communication Sciences and Disorders, University of Haifa
| | - Limor Lavie
- Department of Communication Sciences and Disorders, University of Haifa
| | - Karen Banai
- Department of Communication Sciences and Disorders, University of Haifa
| |
Collapse
|
108
|
Abstract
Listeners exposed to accented speech must adjust how they map between acoustic features and lexical representations such as phonetic categories. A robust form of this adaptive perceptual learning is learning to perceive synthetic speech where the connections between acoustic features and phonetic categories must be updated. Both implicit learning through mere exposure and explicit learning through directed feedback have previously been shown to produce this type of adaptive learning. The present study crosses implicit exposure and explicit feedback with the presence or absence of a written identification task. We show that simple exposure produces some learning, but explicit feedback produces substantially stronger learning, whereas requiring written identification did not measurably affect learning. These results suggest that explicit feedback guides learning of new mappings between acoustic patterns and known phonetic categories. We discuss mechanisms that may support learning via implicit exposure.
Collapse
|
109
|
Lehet M, Holt LL. Nevertheless, it persists: Dimension-based statistical learning and normalization of speech impact different levels of perceptual processing. Cognition 2020; 202:104328. [PMID: 32502867 DOI: 10.1016/j.cognition.2020.104328] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022]
Abstract
Speech is notoriously variable, with no simple mapping from acoustics to linguistically-meaningful units like words and phonemes. Empirical research on this theoretically central issue establishes at least two classes of perceptual phenomena that accommodate acoustic variability: normalization and perceptual learning. Intriguingly, perceptual learning is supported by learning across acoustic variability, but normalization is thought to counteract acoustic variability leaving open questions about how these two phenomena might interact. Here, we examine the joint impact of normalization and perceptual learning on how acoustic dimensions map to vowel categories. As listeners categorized nonwords as setch or satch, they experienced a shift in short-term distributional regularities across the vowels' acoustic dimensions. Introduction of this 'artificial accent' resulted in a shift in the contribution of vowel duration in categorization. Although this dimension-based statistical learning impacted the influence of vowel duration on vowel categorization, the duration of these very same vowels nonetheless maintained a consistent influence on categorization of a subsequent consonant via duration contrast, a form of normalization. Thus, vowel duration had a duplex role consistent with normalization and perceptual learning operating on distinct levels in the processing hierarchy. We posit that whereas normalization operates across auditory dimensions, dimension-based statistical learning impacts the connection weights among auditory dimensions and phonetic categories.
Collapse
Affiliation(s)
- Matthew Lehet
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15232, USA.
| |
Collapse
|
110
|
Paulus M, Hazan V, Adank P. The relationship between talker acoustics, intelligibility, and effort in degraded listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3348. [PMID: 32486777 DOI: 10.1121/10.0001212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 04/20/2020] [Indexed: 06/11/2023]
Abstract
Listening to degraded speech is associated with decreased intelligibility and increased effort. However, listeners are generally able to adapt to certain types of degradations. While intelligibility of degraded speech is modulated by talker acoustics, it is unclear whether talker acoustics also affect effort and adaptation. Moreover, it has been demonstrated that talker differences are preserved across spectral degradations, but it is not known whether this effect extends to temporal degradations and which acoustic-phonetic characteristics are responsible. In a listening experiment combined with pupillometry, participants were presented with speech in quiet as well as in masking noise, time-compressed, and noise-vocoded speech by 16 Southern British English speakers. Results showed that intelligibility, but not adaptation, was modulated by talker acoustics. Talkers who were more intelligible under noise-vocoding were also more intelligible under masking and time-compression. This effect was linked to acoustic-phonetic profiles with greater vowel space dispersion (VSD) and energy in mid-range frequencies, as well as slower speaking rate. While pupil dilation indicated increasing effort with decreasing intelligibility, this study also linked reduced effort in quiet to talkers with greater VSD. The results emphasize the relevance of talker acoustics for intelligibility and effort in degraded listening conditions.
Collapse
Affiliation(s)
- Maximillian Paulus
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| | - Valerie Hazan
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| |
Collapse
|
111
|
Fairchild S, Mathis A, Papafragou A. Pragmatics and social meaning: Understanding under-informativeness in native and non-native speakers. Cognition 2020; 200:104171. [PMID: 32244064 DOI: 10.1016/j.cognition.2019.104171] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 12/15/2019] [Accepted: 12/24/2019] [Indexed: 11/18/2022]
Abstract
Foreign-accented non-native speakers sometimes face negative biases compared to native speakers. Here we report an advantage in how comprehenders process the speech of non-native compared to native speakers. In a series of four experiments, we find that under-informative sentences are interpreted differently when attributed to non-native compared to native speakers. Specifically, under-informativeness is more likely to be attributed to inability (rather than unwillingness) to say more in non-native as compared to native speakers. This asymmetry has implications for learning: under-informative teachers are more likely to be given a second chance in case they are non-native speakers of the language (presumably because their prior under-informativeness is less likely to be intentional). Our results suggest strong effects of non-native speech on social-pragmatic inferences. Because these effects emerge for written stimuli, they support theories that stress the role of expectations on non-native comprehension, even in the absence of experience with foreign accents. Finally, our data bear on pragmatic theories of how speaker identity affects language comprehension and show how such theories offer an integrated framework for explaining how non-native language can lead to (sometimes unexpected) social meanings.
Collapse
Affiliation(s)
- Sarah Fairchild
- Department of Psychological & Brain Sciences, University of Delaware, USA
| | - Ariel Mathis
- Department of Linguistics, University of Pennsylvania, USA
| | | |
Collapse
|
112
|
Newman RS, Morini G, Shroads E, Chatterjee M. Toddlers' fast-mapping from noise-vocoded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2432. [PMID: 32359241 PMCID: PMC7176458 DOI: 10.1121/10.0001129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 04/02/2020] [Accepted: 04/03/2020] [Indexed: 06/11/2023]
Abstract
The ability to recognize speech that is degraded spectrally is a critical skill for successfully using a cochlear implant (CI). Previous research has shown that toddlers with normal hearing can successfully recognize noise-vocoded words as long as the signal contains at least eight spectral channels [Newman and Chatterjee. (2013). J. Acoust. Soc. Am. 133(1), 483-494; Newman, Chatterjee, Morini, and Remez. (2015). J. Acoust. Soc. Am. 138(3), EL311-EL317], although they have difficulty with signals that only contain four channels of information. Young children with CIs not only need to match a degraded speech signal to a stored representation (word recognition), but they also need to create new representations (word learning), a task that is likely to be more cognitively demanding. Normal-hearing toddlers aged 34 months were tested on their ability to initially learn (fast-map) new words in noise-vocoded stimuli. While children were successful at fast-mapping new words from 16-channel noise-vocoded stimuli, they failed to do so from 8-channel noise-vocoded speech. The level of degradation imposed by 8-channel vocoding appears sufficient to disrupt fast-mapping in young children. Recent results indicate that only CI patients with high spectral resolution can benefit from more than eight active electrodes. This suggests that for many children with CIs, reduced spectral resolution may limit their acquisition of novel words.
Collapse
Affiliation(s)
- Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, 0100 Lefrak Hall, College Park, Maryland 20742, USA
| | - Giovanna Morini
- Department of Communication Sciences and Disorders, University of Delaware, 100 Discovery Boulevard, Newark, Delaware 19713, USA
| | - Emily Shroads
- Department of Hearing and Speech Sciences, University of Maryland, 0100 Lefrak Hall, College Park, Maryland 20742, USA
| | - Monita Chatterjee
- Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131, USA
| |
Collapse
|
113
|
Kennedy-Higgins D, Devlin JT, Adank P. Cognitive mechanisms underpinning successful perception of different speech distortions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2728. [PMID: 32359293 DOI: 10.1121/10.0001160] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 04/08/2020] [Indexed: 06/11/2023]
Abstract
Few studies thus far have investigated whether perception of distorted speech is consistent across different types of distortion. This study investigated whether participants show a consistent perceptual profile across three speech distortions: time-compressed, noise-vocoded, and speech in noise. Additionally, this study investigated whether/how individual differences in performance on a battery of audiological and cognitive tasks links to perception. Eighty-eight participants completed a speeded sentence-verification task with increases in accuracy and reductions in response times used to indicate performance. Audiological and cognitive task measures include pure tone audiometry, speech recognition threshold, working memory, vocabulary knowledge, attention switching, and pattern analysis. Despite previous studies suggesting that temporal and spectral/environmental perception require different lexical or phonological mechanisms, this study shows significant positive correlations in accuracy and response time performance across all distortions. Results of a principal component analysis and multiple linear regressions suggest that a component based on vocabulary knowledge and working memory predicted performance in the speech in quiet, time-compressed and speech in noise conditions. These results suggest that listeners employ a similar cognitive strategy to perceive different temporal and spectral/environmental speech distortions and that this mechanism is supported by vocabulary knowledge and working memory.
Collapse
Affiliation(s)
- Dan Kennedy-Higgins
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Joseph T Devlin
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H 0AP, United Kingdom
| | - Patti Adank
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
114
|
Chan L, Johnson K, Babel M. Lexically-guided perceptual learning in early Cantonese-English bilinguals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL277. [PMID: 32237864 DOI: 10.1121/10.0000942] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 03/04/2020] [Indexed: 06/11/2023]
Abstract
Bilinguals are capable of retuning phonetic categories in both languages through lexically-guided perceptual learning, but recent work suggests that some bilingual speakers may lose the ability to adapt in the native language. In the study reported here, early Cantonese-English bilinguals, who are on average English-dominant, successfully retuned Cantonese /f/. Scores of Cantonese-English dominance were not shown to correlate with phonetic retuning. The results are discussed in light of what may support the maintenance of perceptual flexibility in a lesser-used language.
Collapse
Affiliation(s)
- Leighanne Chan
- Communication Sciences and Disorders, University of Western Ontario, London, Ontario, Canada
| | - Khia Johnson
- Linguistics, University of British Columbia, Vancouver, , ,
| | - Molly Babel
- Linguistics, University of British Columbia, Vancouver, , ,
| |
Collapse
|
115
|
Effects of stimulus repetition and training schedule on the perceptual learning of time-compressed speech and its transfer. Atten Percept Psychophys 2020; 81:2944-2955. [PMID: 31161493 DOI: 10.3758/s13414-019-01714-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perceptual learning can facilitate the recognition of hard-to-perceive (e.g., time-compressed or spectrally-degraded) speech. Although the learning induced by training with time-compressed speech is robust, previous findings suggest that intensive training yields learning that is partially specific to the items encountered during practice. Here, we asked whether three parameters of the training procedure - the overall number of training trials (training intensity), how these trials are distributed across sessions, and the number of semantically different items encountered during training (set size) - influence learning and transfer. Different groups of participants (69 normal-hearing young adults; nine to 11 participants/group) completed different training protocols (or served as an untrained control group) and tested on the recognition of time-compressed sentences taken from the training set (learning), new time-compressed sentences presented by the trained talker (semantic transfer), and time-compressed sentences taken from the training set but presented by a different talker (acoustic transfer). Compared to untrained listeners, all training protocols yielded both learning and transfer. More intense training resulted in greater item-specific learning and greater acoustic transfer than less intense training with the same number of training sessions. Training on a smaller set size (i.e., greater token repetition during training) also resulted in greater acoustic transfer, whereas distributing practice over a number of sessions improved semantic transfer. Together, these data suggest that whereas practice on a small set that results in stimulus repetition during training is not harmful for learning, distributed training can support transfer to new stimuli, perhaps because it provides multiple opportunities to consolidate learning.
Collapse
|
116
|
Summers RJ, Roberts B. Informational masking of speech by acoustically similar intelligible and unintelligible interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1113. [PMID: 32113320 DOI: 10.1121/10.0000688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/19/2020] [Indexed: 06/10/2023]
Abstract
Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.
Collapse
Affiliation(s)
- Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
117
|
Casaponsa A, Sohoglu E, Moore DR, Füllgrabe C, Molloy K, Amitay S. Does training with amplitude modulated tones affect tone-vocoded speech perception? PLoS One 2019; 14:e0226288. [PMID: 31881550 PMCID: PMC6934405 DOI: 10.1371/journal.pone.0226288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/22/2019] [Indexed: 11/17/2022] Open
Abstract
Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.
Collapse
Affiliation(s)
- Aina Casaponsa
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
- Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Christian Füllgrabe
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| |
Collapse
|
118
|
Moradi S, Lidestam B, Ning Ng EH, Danielsson H, Rönnberg J. Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise. Ear Hear 2019; 40:312-327. [PMID: 29870521 PMCID: PMC6400397 DOI: 10.1097/aud.0000000000000616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 04/15/2018] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We have previously shown that the gain provided by prior audiovisual (AV) speech exposure for subsequent auditory (A) sentence identification in noise is relatively larger than that provided by prior A speech exposure. We have called this effect "perceptual doping." Specifically, prior AV speech processing dopes (recalibrates) the phonological and lexical maps in the mental lexicon, which facilitates subsequent phonological and lexical access in the A modality, separately from other learning and priming effects. In this article, we use data from the n200 study and aim to replicate and extend the perceptual doping effect using two different A and two different AV speech tasks and a larger sample than in our previous studies. DESIGN The participants were 200 hearing aid users with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. There were four speech tasks in the n200 study that were presented in both A and AV modalities (gated consonants, gated vowels, vowel duration discrimination, and sentence identification in noise tasks). The modality order of speech presentation was counterbalanced across participants: half of the participants completed the A modality first and the AV modality second (A1-AV2), and the other half completed the AV modality and then the A modality (AV1-A2). Based on the perceptual doping hypothesis, which assumes that the gain of prior AV exposure will be relatively larger relative to that of prior A exposure for subsequent processing of speech stimuli, we predicted that the mean A scores in the AV1-A2 modality order would be better than the mean A scores in the A1-AV2 modality order. We therefore expected a significant difference in terms of the identification of A speech stimuli between the two modality orders (A1 versus A2). As prior A exposure provides a smaller gain than AV exposure, we also predicted that the difference in AV speech scores between the two modality orders (AV1 versus AV2) may not be statistically significantly different. RESULTS In the gated consonant and vowel tasks and the vowel duration discrimination task, there were significant differences in A performance of speech stimuli between the two modality orders. The participants' mean A performance was better in the AV1-A2 than in the A1-AV2 modality order (i.e., after AV processing). In terms of mean AV performance, no significant difference was observed between the two orders. In the sentence identification in noise task, a significant difference in the A identification of speech stimuli between the two orders was observed (A1 versus A2). In addition, a significant difference in the AV identification of speech stimuli between the two orders was also observed (AV1 versus AV2). This finding was most likely because of a procedural learning effect due to the greater complexity of the sentence materials or a combination of procedural learning and perceptual learning due to the presentation of sentential materials in noisy conditions. CONCLUSIONS The findings of the present study support the perceptual doping hypothesis, as prior AV relative to A speech exposure resulted in a larger gain for the subsequent processing of speech stimuli. For complex speech stimuli that were presented in degraded listening conditions, a procedural learning effect (or a combination of procedural learning and perceptual learning effects) also facilitated the identification of speech stimuli, irrespective of whether the prior modality was A or AV.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Elaine Hoi Ning Ng
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
- Oticon A/S, Smørum, Denmark
| | - Henrik Danielsson
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
119
|
Roettger TB, Franke M. Evidential Strength of Intonational Cues and Rational Adaptation to (Un‐)Reliable Intonation. Cogn Sci 2019; 43:e12745. [DOI: 10.1111/cogs.12745] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 04/11/2019] [Accepted: 04/15/2019] [Indexed: 11/28/2022]
Affiliation(s)
- Timo B. Roettger
- Department of Linguistics Northwestern University & University of Cologne
| | | |
Collapse
|
120
|
Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity. Atten Percept Psychophys 2019; 81:1088-1107. [PMID: 31218598 DOI: 10.3758/s13414-019-01778-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Collapse
|
121
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 173] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
Affiliation(s)
- Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
122
|
Goehring T, Archer-Boyd A, Deeks JM, Arenberg JG, Carlyon RP. A Site-Selection Strategy Based on Polarity Sensitivity for Cochlear Implants: Effects on Spectro-Temporal Resolution and Speech Perception. J Assoc Res Otolaryngol 2019; 20:431-448. [PMID: 31161338 PMCID: PMC6646483 DOI: 10.1007/s10162-019-00724-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 05/08/2019] [Indexed: 01/04/2023] Open
Abstract
Thresholds of asymmetric pulses presented to cochlear implant (CI) listeners depend on polarity in a way that differs across subjects and electrodes. It has been suggested that lower thresholds for cathodic-dominant compared to anodic-dominant pulses reflect good local neural health. We evaluated the hypothesis that this polarity effect (PE) can be used in a site-selection strategy to improve speech perception and spectro-temporal resolution. Detection thresholds were measured in eight users of Advanced Bionics CIs for 80-pps, triphasic, monopolar pulse trains where the central high-amplitude phase was either anodic or cathodic. Two experimental MAPs were then generated for each subject by deactivating the five electrodes with either the highest or the lowest PE magnitudes (cathodic minus anodic threshold). Performance with the two experimental MAPs was evaluated using two spectro-temporal tests (Spectro-Temporal Ripple for Investigating Processor EffectivenesS (STRIPES; Archer-Boyd et al. in J Acoust Soc Am 144:2983–2997, 2018) and Spectral-Temporally Modulated Ripple Test (SMRT; Aronoff and Landsberger in J Acoust Soc Am 134:EL217–EL222, 2013)) and with speech recognition in quiet and in noise. Performance was also measured with an experimental MAP that used all electrodes, similar to the subjects’ clinical MAP. The PE varied strongly across subjects and electrodes, with substantial magnitudes relative to the electrical dynamic range. There were no significant differences in performance between the three MAPs at group level, but there were significant effects at subject level—not all of which were in the hypothesized direction—consistent with previous reports of a large variability in CI users’ performance and in the potential benefit of site-selection strategies. The STRIPES but not the SMRT test successfully predicted which strategy produced the best speech-in-noise performance on a subject-by-subject basis. The average PE across electrodes correlated significantly with subject age, duration of deafness, and speech perception scores, consistent with a relationship between PE and neural health. These findings motivate further investigations into site-specific measures of neural health and their application to CI processing strategies.
Collapse
Affiliation(s)
- Tobias Goehring
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK.
| | - Alan Archer-Boyd
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | - John M Deeks
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | - Julie G Arenberg
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd St., Seattle, WA, 98105, USA
| | - Robert P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
123
|
Ritter C, Vongpaisal T. Multimodal and Spectral Degradation Effects on Speech and Emotion Recognition in Adult Listeners. Trends Hear 2019; 22:2331216518804966. [PMID: 30378469 PMCID: PMC6236866 DOI: 10.1177/2331216518804966] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
For cochlear implant (CI) users, degraded spectral input hampers the
understanding of prosodic vocal emotion, especially in difficult listening
conditions. Using a vocoder simulation of CI hearing, we examined the extent to
which informative multimodal cues in a talker’s spoken expressions improve
normal hearing (NH) adults’ speech and emotion perception under different levels
of spectral degradation (two, three, four, and eight spectral bands).
Participants repeated the words verbatim and identified emotions (among four
alternative options: happy, sad, angry, and neutral) in meaningful sentences
that are semantically congruent with the expression of the intended emotion.
Sentences were presented in their natural speech form and in speech sampled
through a noise-band vocoder in sound (auditory-only) and video
(auditory–visual) recordings of a female talker. Visual information had a more
pronounced benefit in enhancing speech recognition in the lower spectral band
conditions. Spectral degradation, however, did not interfere with emotion
recognition performance when dynamic visual cues in a talker’s expression are
provided as participants scored at ceiling levels across all spectral band
conditions. Our use of familiar sentences that contained congruent semantic and
prosodic information have high ecological validity, which likely optimized
listener performance under simulated CI hearing and may better predict CI users’
outcomes in everyday listening contexts.
Collapse
Affiliation(s)
- Chantel Ritter
- 1 Department of Psychology, MacEwan University, Alberta, Canada
| | - Tara Vongpaisal
- 1 Department of Psychology, MacEwan University, Alberta, Canada
| |
Collapse
|
124
|
Carlyon RP, Guérit F, Billig AJ, Tam YC, Harris F, Deeks JM. Effect of Chronic Stimulation and Stimulus Level on Temporal Processing by Cochlear Implant Listeners. J Assoc Res Otolaryngol 2019; 20:169-185. [PMID: 30543016 PMCID: PMC6453997 DOI: 10.1007/s10162-018-00706-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 10/29/2018] [Indexed: 01/26/2023] Open
Abstract
A series of experiments investigated potential changes in temporal processing during the months following activation of a cochlear implant (CI) and as a function of stimulus level. Experiment 1 tested patients on the day of implant activation and 2 and 6 months later. All stimuli were presented using direct stimulation of a single apical electrode. The dependent variables were rate discrimination ratios (RDRs) for pulse trains with rates centred on 120 pulses per second (pps), obtained using an adaptive procedure, and a measure of the upper limit of temporal pitch, obtained using a pitch-ranking procedure. All stimuli were presented at their most comfortable level (MCL). RDRs decreased from 1.23 to 1.16 and the upper limit increased from 357 to 485 pps from 0 to 2 months post-activation, with no overall change from 2 to 6 months. Because MCLs and hence the testing level increased across sessions, two further experiments investigated whether the performance changes observed across sessions could be due to level differences. Experiment 2 re-tested a subset of subjects at 9 months post-activation, using current levels similar to those used at 0 months. Although the stimuli sounded softer, some subjects showed lower RDRs and/or higher upper limits at this re-test. Experiment 3 measured RDRs and the upper limit for a separate group of subjects at levels equal to 60 %, 80 % and 100 % of the dynamic range. RDRs decreased with increasing level. The upper limit increased with increasing level for most subjects, with two notable exceptions. Implications of the results for temporal plasticity are discussed, along with possible influences of the effects of level and of across-session learning.
Collapse
Affiliation(s)
- Robert P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK.
| | - François Guérit
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | - Alexander J Billig
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | | | | | - John M Deeks
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
125
|
Levy H, Konieczny L, Hanulíková A. Processing of unfamiliar accents in monolingual and bilingual children: effects of type and amount of accent experience. JOURNAL OF CHILD LANGUAGE 2019; 46:368-392. [PMID: 30616700 DOI: 10.1017/s030500091800051x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Substantial individual differences exist in regard to type and amount of experience with variable speech resulting from foreign or regional accents. Whereas prior experience helps with processing familiar accents, research on how experience with accented speech affects processing of unfamiliar accents is inconclusive, ranging from perceptual benefits to processing disadvantages. We examined how experience with accented speech modulates mono- and bilingual children's (mean age: 9;10) ease of speech comprehension for two unfamiliar accents in German, one foreign and one regional. More experience with regional accents helped children repeat sentences correctly in the regional condition and in the standard condition. More experience with foreign accents did not help in either accent condition. The results suggest that type and amount of accent experience co-determine processing ease of accented speech.
Collapse
Affiliation(s)
- Helena Levy
- GRK 'Frequency effects in language', University of Freiburg, Germany
| | | | - Adriana Hanulíková
- University of Freiburg, Germany
- Freiburg Institute for Advanced Studies (FRIAS), Freiburg, Germany
| |
Collapse
|
126
|
Roberts B, Summers RJ. Dichotic integration of acoustic-phonetic information: Competition from extraneous formants increases the effect of second-formant attenuation on intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1230. [PMID: 31067923 DOI: 10.1121/1.5091443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 02/01/2019] [Indexed: 06/09/2023]
Abstract
Differences in ear of presentation and level do not prevent effective integration of concurrent speech cues such as formant frequencies. For example, presenting the higher formants of a consonant-vowel syllable in the opposite ear to the first formant protects them from upward spread of masking, allowing them to remain effective speech cues even after substantial attenuation. This study used three-formant (F1+F2+F3) analogues of natural sentences and extended the approach to include competitive conditions. Target formants were presented dichotically (F1+F3; F2), either alone or accompanied by an extraneous competitor for F2 (i.e., F1±F2C+F3; F2) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour and using the F2 amplitude contour without attenuation. In experiment 1, F2C was always absent and intelligibility was unaffected until F2 attenuation exceeded 30 dB; F2 still provided useful information at 48-dB attenuation. In experiment 2, attenuating F2 by 24 dB caused considerable loss of intelligibility when F2C was present, but had no effect in its absence. Factors likely to contribute to this interaction include informational masking from F2C acting to swamp the acoustic-phonetic information carried by F2, and interaural inhibition from F2C acting to reduce the effective level of F2.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
127
|
Walenski M, Europa E, Caplan D, Thompson CK. Neural networks for sentence comprehension and production: An ALE-based meta-analysis of neuroimaging studies. Hum Brain Mapp 2019; 40:2275-2304. [PMID: 30689268 DOI: 10.1002/hbm.24523] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 12/14/2018] [Accepted: 12/26/2018] [Indexed: 12/24/2022] Open
Abstract
Comprehending and producing sentences is a complex endeavor requiring the coordinated activity of multiple brain regions. We examined three issues related to the brain networks underlying sentence comprehension and production in healthy individuals: First, which regions are recruited for sentence comprehension and sentence production? Second, are there differences for auditory sentence comprehension vs. visual sentence comprehension? Third, which regions are specifically recruited for the comprehension of syntactically complex sentences? Results from activation likelihood estimation (ALE) analyses (from 45 studies) implicated a sentence comprehension network occupying bilateral frontal and temporal lobe regions. Regions implicated in production (from 15 studies) overlapped with the set of regions associated with sentence comprehension in the left hemisphere, but did not include inferior frontal cortex, and did not extend to the right hemisphere. Modality differences between auditory and visual sentence comprehension were found principally in the temporal lobes. Results from the analysis of complex syntax (from 37 studies) showed engagement of left inferior frontal and posterior temporal regions, as well as the right insula. The involvement of the right hemisphere in the comprehension of these structures has potentially important implications for language treatment and recovery in individuals with agrammatic aphasia following left hemisphere brain damage.
Collapse
Affiliation(s)
- Matthew Walenski
- Center for the Neurobiology of Language Recovery, Northwestern University, Evanston, Illinois.,Department of Communication Sciences and Disorders, School of Communication, Northwestern University, Evanston, Illinois
| | - Eduardo Europa
- Department of Neurology, University of California, San Francisco
| | - David Caplan
- Department of Neurology, Harvard Medical School, Massachusetts General Hospital, Boston, Massachusetts
| | - Cynthia K Thompson
- Center for the Neurobiology of Language Recovery, Northwestern University, Evanston, Illinois.,Department of Communication Sciences and Disorders, School of Communication, Northwestern University, Evanston, Illinois.,Department of Neurology, Feinberg School of Medicine, Northwestern University, Evanston, Illinois
| |
Collapse
|
128
|
Wu M. Effect of F0 contour on perception of Mandarin Chinese speech against masking. PLoS One 2019; 14:e0209976. [PMID: 30605452 PMCID: PMC6317796 DOI: 10.1371/journal.pone.0209976] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 12/14/2018] [Indexed: 11/19/2022] Open
Abstract
Intonation has many perceptually significant functions in language that contribute to speech recognition. This study aims to investigate whether intonation cues affect the unmasking of Mandarin Chinese speech in the presence of interfering sounds. Specifically, intelligibility of multi-tone Mandarin Chinese sentences with maskers consisting of either two-talker speech or steady-state noise was measured in three (flattened, typical, and exaggerated) intonation conditions. Different from most of the previous studies, the present study only manipulate and modify the intonation information but preserve tone information. The results showed that recognition of the final keywords in multi-tone Mandarin Chinese sentences was much better under the original F0 contour condition than the decreased F0 contour or exaggerated F0 contour conditions whenever there was a noise or speech masker, and an exaggerated F0 contour reduced the intelligibility of Mandarin Chinese more under the speech masker condition than that under the noise masker condition. These results suggested that speech in a tone language (Mandarin Chinese) is harder to understand when the intonation is unnatural, even if the tone information is preserved, and an unnatural intonation contour decreases releasing Mandarin Chinese speech from masking, especially in a multi-person talking environment.
Collapse
Affiliation(s)
- Meihong Wu
- School of Information Science and Engineering, Xiamen University, Fujian, China
- * E-mail:
| |
Collapse
|
129
|
Borrie SA, Barrett TS, Yoho SE. Autoscore: An open-source automated tool for scoring listener perception of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:392. [PMID: 30710955 PMCID: PMC6347573 DOI: 10.1121/1.5087276] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 11/26/2018] [Accepted: 12/10/2018] [Indexed: 05/19/2023]
Abstract
Speech perception studies typically rely on trained research assistants to score orthographic listener transcripts for words correctly identified. While the accuracy of the human scoring protocol has been validated with strong intra- and inter-rater reliability, the process of hand-scoring the transcripts is time-consuming and resource intensive. Here, an open-source computer-based tool for automated scoring of listener transcripts is built (Autoscore) and validated on three different human-scored data sets. Results show that not only is Autoscore highly accurate, achieving approximately 99% accuracy, but extremely efficient. Thus, Autoscore affords a practical research tool, with clinical application, for scoring listener intelligibility of speech.
Collapse
Affiliation(s)
- Stephanie A Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan, Utah 84322, USA
| | - Tyson S Barrett
- Department of Psychology, Utah State University, Logan, Utah 84322, USA
| | - Sarah E Yoho
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan, Utah 84322, USA
| |
Collapse
|
130
|
|
131
|
Nourski KV, Steinschneider M, Rhone AE, Kovach CK, Kawasaki H, Howard MA. Differential responses to spectrally degraded speech within human auditory cortex: An intracranial electrophysiology study. Hear Res 2018; 371:53-65. [PMID: 30500619 DOI: 10.1016/j.heares.2018.11.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 11/15/2018] [Accepted: 11/19/2018] [Indexed: 12/28/2022]
Abstract
Understanding cortical processing of spectrally degraded speech in normal-hearing subjects may provide insights into how sound information is processed by cochlear implant (CI) users. This study investigated electrocorticographic (ECoG) responses to noise-vocoded speech and related these responses to behavioral performance in a phonemic identification task. Subjects were neurosurgical patients undergoing chronic invasive monitoring for medically refractory epilepsy. Stimuli were utterances /aba/ and /ada/, spectrally degraded using a noise vocoder (1-4 bands). ECoG responses were obtained from Heschl's gyrus (HG) and superior temporal gyrus (STG), and were examined within the high gamma frequency range (70-150 Hz). All subjects performed at chance accuracy with speech degraded to 1 and 2 spectral bands, and at or near ceiling for clear speech. Inter-subject variability was observed in the 3- and 4-band conditions. High gamma responses in posteromedial HG (auditory core cortex) were similar for all vocoded conditions and clear speech. A progressive preference for clear speech emerged in anterolateral segments of HG, regardless of behavioral performance. On the lateral STG, responses to all vocoded stimuli were larger in subjects with better task performance. In contrast, both behavioral and neural responses to clear speech were comparable across subjects regardless of their ability to identify degraded stimuli. Findings highlight differences in representation of spectrally degraded speech across cortical areas and their relationship to perception. The results are in agreement with prior non-invasive results. The data provide insight into the neural mechanisms associated with variability in perception of degraded speech and potentially into sources of such variability in CI users.
Collapse
Affiliation(s)
- Kirill V Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, USA.
| | - Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ariane E Rhone
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA
| | | | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA
| | - Matthew A Howard
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, USA; Pappajohn Biomedical Institute, The University of Iowa, Iowa City, IA, USA
| |
Collapse
|
132
|
Archer-Boyd AW, Southwell RV, Deeks JM, Turner RE, Carlyon RP. Development and validation of a spectro-temporal processing test for cochlear-implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2983. [PMID: 30522311 PMCID: PMC6805218 DOI: 10.1121/1.5079636] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 11/01/2018] [Indexed: 06/06/2023]
Abstract
Psychophysical tests of spectro-temporal resolution may aid the evaluation of methods for improving hearing by cochlear implant (CI) listeners. Here the STRIPES (Spectro-Temporal Ripple for Investigating Processor EffectivenesS) test is described and validated. Like speech, the test requires both spectral and temporal processing to perform well. Listeners discriminate between complexes of sine sweeps which increase or decrease in frequency; difficulty is controlled by changing the stimulus spectro-temporal density. Care was taken to minimize extraneous cues, forcing listeners to perform the task only on the direction of the sweeps. Vocoder simulations with normal hearing listeners showed that the STRIPES test was sensitive to the number of channels and temporal information fidelity. An evaluation with CI listeners compared a standard processing strategy with one having very wide filters, thereby spectrally blurring the stimulus. Psychometric functions were monotonic for both strategies and five of six participants performed better with the standard strategy. An adaptive procedure revealed significant differences, all in favour of the standard strategy, at the individual listener level for six of eight CI listeners. Subsequent measures validated a faster version of the test, and showed that STRIPES could be performed by recently implanted listeners having no experience of psychophysical testing.
Collapse
Affiliation(s)
- Alan W. Archer-Boyd
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Rosy V. Southwell
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - John M. Deeks
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Richard E. Turner
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Robert P. Carlyon
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
133
|
Shen J, Souza PE. On Dynamic Pitch Benefit for Speech Recognition in Speech Masker. Front Psychol 2018; 9:1967. [PMID: 30405476 PMCID: PMC6204388 DOI: 10.3389/fpsyg.2018.01967] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 09/25/2018] [Indexed: 12/03/2022] Open
Abstract
Previous work demonstrated that dynamic pitch (i.e., pitch variation in speech) aids speech recognition in various types of noises. While this finding suggests dynamic pitch enhancement in target speech can benefit speech recognition in noise, it is of importance to know what noise characteristics affect dynamic pitch benefit, and who will benefit from enhanced dynamic pitch cues. Following our recent finding that temporal modulation in noise influences dynamic pitch benefit, we examined the effect of speech masker characteristics on dynamic pitch benefit. Specifically, the first goal of the study was to test the hypothesis that dynamic pitch benefit varies depending on the availability of pitch cues in the masker and the intelligibility of masker. The second goal of this study was to investigate whether older listeners as a group can benefit from dynamic pitch for speech recognition in speech maskers. In addition, individual factors of hearing loss and working memory capacity were examined for their impact on older listeners' dynamic pitch benefit. Twenty-three younger listeners with normal hearing and 37 older listeners with varying levels of hearing sensitivity participated the study, in which speech reception thresholds were measured with sentences in speech maskers. While we did not find an effect of masker characteristics on dynamic pitch benefit, the results showed older listeners can benefit from dynamic pitch for recognizing speech in speech maskers. The data also suggest that among those older listeners with hearing loss, dynamic pitch benefit is stronger for individuals with higher working memory capacity. This can be attributed to their ability to exploit facilitated lexical access in processing of degraded speech signal.
Collapse
Affiliation(s)
- Jing Shen
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, United States
| | - Pamela E. Souza
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, United States
- Knowles Hearing Center, Northwestern University, Evanston, IL, United States
| |
Collapse
|
134
|
Ishida M, Arai T, Kashino M. Perceptual Restoration of Temporally Distorted Speech in L1 vs. L2: Local Time Reversal and Modulation Filtering. Front Psychol 2018; 9:1749. [PMID: 30283390 PMCID: PMC6156149 DOI: 10.3389/fpsyg.2018.01749] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 08/29/2018] [Indexed: 11/13/2022] Open
Abstract
Speech is intelligible even when the temporal envelope of speech is distorted. The current study investigates how native and non-native speakers perceptually restore temporally distorted speech. Participants were native English speakers (NS), and native Japanese speakers who spoke English as a second language (NNS). In Experiment 1, participants listened to “locally time-reversed speech” where every x-ms of speech signal was reversed on the temporal axis. Here, the local time reversal shifted the constituents of the speech signal forward or backward from the original position, and the amplitude envelope of speech was altered as a function of reversed segment length. In Experiment 2, participants listened to “modulation-filtered speech” where the modulation frequency components of speech were low-pass filtered at a particular cut-off frequency. Here, the temporal envelope of speech was altered as a function of cut-off frequency. The results suggest that speech becomes gradually unintelligible as the length of reversed segments increases (Experiment 1), and as a lower cut-off frequency is imposed (Experiment 2). Both experiments exhibit the equivalent level of speech intelligibility across six levels of degradation for native and non-native speakers respectively, which poses a question whether the regular occurrence of local time reversal can be discussed in the modulation frequency domain, by simply converting the length of reversed segments (ms) into frequency (Hz).
Collapse
Affiliation(s)
- Mako Ishida
- NTT Communication Science Laboratories, Atsugi, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan.,Department of Information and Communication Sciences, Sophia University, Tokyo, Japan
| | - Takayuki Arai
- Department of Information and Communication Sciences, Sophia University, Tokyo, Japan
| | - Makio Kashino
- NTT Communication Science Laboratories, Atsugi, Japan
| |
Collapse
|
135
|
Frequency specificity of amplitude envelope patterns in noise-vocoded speech. Hear Res 2018; 367:169-181. [DOI: 10.1016/j.heares.2018.06.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 06/03/2018] [Accepted: 06/08/2018] [Indexed: 11/22/2022]
|
136
|
Fairchild S, Papafragou A. Sins of omission are more likely to be forgiven in non-native speakers. Cognition 2018; 181:80-92. [PMID: 30149264 DOI: 10.1016/j.cognition.2018.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 08/12/2018] [Accepted: 08/17/2018] [Indexed: 11/24/2022]
Abstract
Utterances produced by foreign-accented speakers are often judged as less credible, more vague, and more difficult to understand compared to those produced by native speakers. Some theoretical accounts argue that listeners have different expectations about the speech of non-native speakers. Other accounts argue that non-native speech is processed differently to the extent that a foreign accent taxes intelligibility and introduces additional processing load. Here we test the role of expectations for the processing of native vs. non-native speech in written texts where accents cannot be directly perceived (and thus affect processing load). In Experiment 1, native comprehenders gave higher ratings to the meaning of under-informative sentences ("Some people have noses with two nostrils") when they believed that the sentences were produced by non-native compared to native speakers. This difference was larger the more likely individual participants were to interpret under-informative sentences pragmatically (as opposed to logically). In Experiment 2, the tendency to forgive sins of information omission was shown to depend on the presumed L2 proficiency of non-native speakers. Experiment 3 replicated and extended the major finding. Since intelligibility of the sentences was identical across types of speakers, these findings provide support for the role of expectations for non-native speech comprehension, as well as for broader models of language processing that argue for a role of speaker identity.
Collapse
Affiliation(s)
- Sarah Fairchild
- Department of Psychological & Brain Sciences, University of Delaware, USA(1).
| | - Anna Papafragou
- Department of Psychological & Brain Sciences, University of Delaware, USA(1)
| |
Collapse
|
137
|
Maintaining information about speech input during accent adaptation. PLoS One 2018; 13:e0199358. [PMID: 30086140 PMCID: PMC6080756 DOI: 10.1371/journal.pone.0199358] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/06/2018] [Indexed: 11/19/2022] Open
Abstract
Speech understanding can be thought of as inferring progressively more abstract representations from a rapidly unfolding signal. One common view of this process holds that lower-level information is discarded as soon as higher-level units have been inferred. However, there is evidence that subcategorical information about speech percepts is not immediately discarded, but is maintained past word boundaries and integrated with subsequent input. Previous evidence for such subcategorical information maintenance has come from paradigms that lack many of the demands typical to everyday language use. We ask whether information maintenance is also possible under more typical constraints, and in particular whether it can facilitate accent adaptation. In a web-based paradigm, participants listened to isolated foreign-accented words in one of three conditions: subtitles were displayed concurrently with the speech, after speech offset, or not displayed at all. The delays between speech offset and subtitle presentation were manipulated. In a subsequent test phase, participants then transcribed novel words in the same accent without the aid of subtitles. We find that subtitles facilitate accent adaptation, even when displayed with a 6 second delay. Listeners thus maintained subcategorical information for sufficiently long to allow it to benefit adaptation. We close by discussing what type of information listeners maintain-subcategorical phonetic information, or just uncertainty about speech categories.
Collapse
|
138
|
Whitmal NA. Effects of vowel context and discriminability on band independence in nonsense syllable recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:678. [PMID: 30180683 DOI: 10.1121/1.5049375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 07/20/2018] [Indexed: 06/08/2023]
Abstract
The Speech Intelligibility Index algorithm [(1997). ANSI S3.5-1997] models cues in disjoint frequency bands for consonants and vowels as additive, independent contributions to intelligibility. Data from other studies examining only consonants in single-vowel nonsense stimuli exhibit synergetic and redundant band contributions that challenge the band independence assumption. The present study tested the hypotheses that (a) band independence is present for multi-vowel stimuli, and (b) dependent band contributions are artifacts of confounding stimulus administration and testing methods. Data were measured in two experiments in which subjects identified filtered nonsense consonant-vowel-consonant syllables using a variety of randomly selected vowels. The measured data were used in simulations that further characterized the range of subject responses. Results of testing and simulation suggest that, where present, band independence is fostered by low broadband error, high vowel diversity, and high vowel discriminability. Synergistic band contributions were observed for confusable vowels that were most susceptible to filtering; redundant contributions were observed for the least susceptible vowels. Implications for intelligibility prediction and enhancement are discussed.
Collapse
Affiliation(s)
- Nathaniel A Whitmal
- Department of Communication Disorders, University of Massachusetts, Amherst, Massachusetts 01003, USA
| |
Collapse
|
139
|
Abstract
OBJECTIVES It is well known from previous research that when listeners are told what they are about to hear before a degraded or partially masked auditory signal is presented, the speech signal "pops out" of the background and becomes considerably more intelligible. The goal of this research was to explore whether this priming effect is as strong in older adults as in younger adults. DESIGN Fifty-six adults-28 older and 28 younger-listened to "nonsense" sentences spoken by a female talker in the presence of a 2-talker speech masker (also female) or a fluctuating speech-like noise masker at 5 signal-to-noise ratios. Just before, or just after, the auditory signal was presented, a typed caption was displayed on a computer screen. The caption sentence was either identical to the auditory sentence or differed by one key word. The subjects' task was to decide whether the caption and auditory messages were the same or different. Discrimination performance was reported in d'. The strength of the pop-out perception was inferred from the improvement in performance that was expected from the caption-before order of presentation. A subset of 12 subjects from each group made confidence judgments as they gave their responses, and also completed several cognitive tests. RESULTS Data showed a clear order effect for both subject groups and both maskers, with better same-different discrimination performance for the caption-before condition than the caption-after condition. However, for the two-talker masker, the younger adults obtained a larger and more consistent benefit from the caption-before order than the older adults across signal-to-noise ratios. Especially at the poorer signal-to-noise ratios, older subjects showed little evidence that they experienced the pop-out effect that is presumed to make the discrimination task easier. On average, older subjects also appeared to approach the task differently, being more reluctant than younger subjects to report that the captions and auditory sentences were the same. Correlation analyses indicated a significant negative association between age and priming benefit in the two-talker masker and nonsignificant associations between priming benefit in this masker and either high-frequency hearing loss or performance on the cognitive tasks. CONCLUSIONS Previous studies have shown that older adults are at least as good, if not better, at exploiting context in speech recognition, as compared with younger adults. The current results are not in disagreement with those findings but suggest that, under some conditions, the automatic priming process that may contribute to benefits from context is not as strong in older as in younger adults.
Collapse
|
140
|
Panouillères MTN, Boyles R, Chesters J, Watkins KE, Möttönen R. Facilitation of motor excitability during listening to spoken sentences is not modulated by noise or semantic coherence. Cortex 2018; 103:44-54. [PMID: 29554541 PMCID: PMC6002609 DOI: 10.1016/j.cortex.2018.02.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 11/27/2017] [Accepted: 02/08/2018] [Indexed: 11/15/2022]
Abstract
Comprehending speech can be particularly challenging in a noisy environment and in the absence of semantic context. It has been proposed that the articulatory motor system would be recruited especially in difficult listening conditions. However, it remains unknown how signal-to-noise ratio (SNR) and semantic context affect the recruitment of the articulatory motor system when listening to continuous speech. The aim of the present study was to address the hypothesis that involvement of the articulatory motor cortex increases when the intelligibility and clarity of the spoken sentences decreases, because of noise and the lack of semantic context. We applied Transcranial Magnetic Stimulation (TMS) to the lip and hand representations in the primary motor cortex and measured motor evoked potentials from the lip and hand muscles, respectively, to evaluate motor excitability when young adults listened to sentences. In Experiment 1, we found that the excitability of the lip motor cortex was facilitated during listening to both semantically anomalous and coherent sentences in noise relative to non-speech baselines, but neither SNR nor semantic context modulated the facilitation. In Experiment 2, we replicated these findings and found no difference in the excitability of the lip motor cortex between sentences in noise and clear sentences without noise. Thus, our results show that the articulatory motor cortex is involved in speech processing even in optimal and ecologically valid listening conditions and that its involvement is not modulated by the intelligibility and clarity of speech.
Collapse
Affiliation(s)
| | - Rowan Boyles
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.
| | - Jennifer Chesters
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.
| | - Kate E Watkins
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom.
| | - Riikka Möttönen
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom; School of Psychology, University of Nottingham, Nottingham, United Kingdom.
| |
Collapse
|
141
|
Hawthorne K. Prosody-driven syntax learning is robust to impoverished pitch and spectral cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2756. [PMID: 29857717 DOI: 10.1121/1.5031130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Across languages, prosodic boundaries tend to align with syntactic boundaries, and both infant and adult language learners capitalize on these correlations to jump-start syntax acquisition. However, it is unclear which prosodic cues-pauses, final-syllable lengthening, and/or pitch resets across boundaries-are necessary for prosodic bootstrapping to occur. It is also unknown how syntax acquisition is impacted when listeners do not have access to the full range of prosodic or spectral information. These questions were addressed using 14-channel noise-vocoded (spectrally degraded) speech. While pre-boundary lengthening and pauses are well-transmitted through noise-vocoded speech, pitch is not; overall intelligibility is also decreased. In two artificial grammar experiments, adult native English speakers showed a similar ability to use English-like prosody to bootstrap unfamiliar syntactic structures from degraded speech and natural, unmanipulated speech. Contrary to previous findings that listeners may require pitch resets and final lengthening to co-occur if no pause cue is present, participants in the degraded speech conditions were able to detect prosodic boundaries from lengthening alone. Results suggest that pitch is not necessary for adult English speakers to perceive prosodic boundaries associated with syntactic structures, and that prosodic bootstrapping is robust to degraded spectral information.
Collapse
Affiliation(s)
- Kara Hawthorne
- Department of Communication Sciences and Disorders, University of Mississippi, 304 George Hall, University, Mississippi 38677, USA
| |
Collapse
|
142
|
Huyck JJ. Comprehension of Degraded Speech Matures During Adolescence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1012-1022. [PMID: 29625427 DOI: 10.1044/2018_jslhr-h-17-0252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 01/12/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE The aim of the study was to compare comprehension of spectrally degraded (noise-vocoded [NV]) speech and perceptual learning of NV speech between adolescents and young adults and examine the role of phonological processing and executive functions in this perception. METHOD Sixteen younger adolescents (11-13 years), 16 older adolescents (14-16 years), and 16 young adults (18-22 years) listened to 40 NV sentences and repeated back what they heard. They also completed tests assessing phonological processing and a variety of executive functions. RESULTS Word-report scores were generally poorer for younger adolescents than for the older age groups. Phonological processing also predicted initial word-report scores. Learning (i.e., improvement across training times) did not differ with age. Starting performance and processing speed predicted learning, with greater learning for those who started with the lowest scores and those with faster processing speed. CONCLUSIONS Degraded (NV) speech comprehension is not mature even by early adolescence; however, like adults, adolescents are able to improve their comprehension of degraded speech with training. Thus, although adolescents may have initial difficulty in understanding degraded speech or speech as presented through hearing aids or cochlear implants, they are able to improve their perception with experience. Processing speed and phonological processing may play a role in degraded speech comprehension in these age groups.
Collapse
|
143
|
Xie X, Weatherholtz K, Bainton L, Rowe E, Burchill Z, Liu L, Jaeger TF. Rapid adaptation to foreign-accented speech and its transfer to an unfamiliar talker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2013. [PMID: 29716296 PMCID: PMC5895469 DOI: 10.1121/1.5027410] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 02/28/2018] [Accepted: 03/01/2018] [Indexed: 05/31/2023]
Abstract
How fast can listeners adapt to unfamiliar foreign accents? Clarke and Garrett [J. Acoust. Soc. Am. 116, 3647-3658 (2004)] (CG04) reported that native-English listeners adapted to foreign-accented English within a minute, demonstrating improved processing of spoken words. In two web-based experiments that closely follow the design of CG04, the effects of rapid accent adaptation are examined and its generalization is explored across talkers. Experiment 1 replicated the core finding of CG04 that initial perceptual difficulty with foreign-accented speech can be attenuated rapidly by a brief period of exposure to an accented talker. Importantly, listeners showed both faster (replicating CG04) and more accurate (extending CG04) comprehension of this talker. Experiment 2 revealed evidence that such adaptation transferred to a different talker of a same accent. These results highlight the rapidity of short-term accent adaptation and raise new questions about the underlying mechanism. It is suggested that the web-based paradigm provides a useful tool for investigations in speech adaptation.
Collapse
Affiliation(s)
- Xin Xie
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Kodi Weatherholtz
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Larisa Bainton
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Emily Rowe
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Zachary Burchill
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Linda Liu
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
144
|
Abstract
Phonemes play a central role in traditional theories as units of speech perception and access codes to lexical representations. Phonemes have two essential properties: they are 'segment-sized' (the size of a consonant or vowel) and abstract (a single phoneme may be have different acoustic realisations). Nevertheless, there is a long history of challenging the phoneme hypothesis, with some theorists arguing for differently sized phonological units (e.g. features or syllables) and others rejecting abstract codes in favour of representations that encode detailed acoustic properties of the stimulus. The phoneme hypothesis is the minority view today. We defend the phoneme hypothesis in two complementary ways. First, we show that rejection of phonemes is based on a flawed interpretation of empirical findings. For example, it is commonly argued that the failure to find acoustic invariances for phonemes rules out phonemes. However, the lack of invariance is only a problem on the assumption that speech perception is a bottom-up process. If learned sublexical codes are modified by top-down constraints (which they are), then this argument loses all force. Second, we provide strong positive evidence for phonemes on the basis of linguistic data. Almost all findings that are taken (incorrectly) as evidence against phonemes are based on psycholinguistic studies of single words. However, phonemes were first introduced in linguistics, and the best evidence for phonemes comes from linguistic analyses of complex word forms and sentences. In short, the rejection of phonemes is based on a false analysis and a too-narrow consideration of the relevant data.
Collapse
Affiliation(s)
- Nina Kazanina
- School of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK.
| | - Jeffrey S Bowers
- School of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK
| | - William Idsardi
- Department of Linguistics, University of Maryland, 1401 Marie Mount Hall, College Park, MD, 20742, USA
| |
Collapse
|
145
|
Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ). Ear Hear 2018; 38:344-356. [PMID: 28045787 DOI: 10.1097/aud.0000000000000393] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by ) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention (AA) and response set, talker discrimination, and verbal and nonverbal short-term working memory. DESIGN Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (Peabody Picture Vocabulary test-4th Edition and Expressive Vocabulary test-2nd Edition) and measures of AA (NEPSY AA and response set and a talker discrimination task) and short-term memory (visual digit and symbol spans). RESULTS Consistent with the findings reported in the original ) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the Peabody Picture Vocabulary test-4th Edition using language quotients to control for age effects. However, children who scored higher on the Expressive Vocabulary test-2nd Edition recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of AA and short-term memory capacity were significantly correlated with a child's ability to perceive noise-vocoded isolated words and sentences. CONCLUSIONS First, we successfully replicated the major findings from the ) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children's ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally degraded speech reflects early peripheral auditory processes, as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that AA and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, because they are routinely required to encode, process, and understand spectrally degraded acoustic signals.
Collapse
|
146
|
Alain C, Du Y, Bernstein LJ, Barten T, Banai K. Listening under difficult conditions: An activation likelihood estimation meta-analysis. Hum Brain Mapp 2018. [PMID: 29536592 DOI: 10.1002/hbm.24031] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The brain networks supporting speech identification and comprehension under difficult listening conditions are not well specified. The networks hypothesized to underlie effortful listening include regions responsible for executive control. We conducted meta-analyses of auditory neuroimaging studies to determine whether a common activation pattern of the frontal lobe supports effortful listening under different speech manipulations. Fifty-three functional neuroimaging studies investigating speech perception were divided into three independent Activation Likelihood Estimate analyses based on the type of speech manipulation paradigm used: Speech-in-noise (SIN, 16 studies, involving 224 participants); spectrally degraded speech using filtering techniques (15 studies involving 270 participants); and linguistic complexity (i.e., levels of syntactic, lexical and semantic intricacy/density, 22 studies, involving 348 participants). Meta-analysis of the SIN studies revealed higher effort was associated with activation in left inferior frontal gyrus (IFG), left inferior parietal lobule, and right insula. Studies using spectrally degraded speech demonstrated increased activation of the insula bilaterally and the left superior temporal gyrus (STG). Studies manipulating linguistic complexity showed activation in the left IFG, right middle frontal gyrus, left middle temporal gyrus and bilateral STG. Planned contrasts revealed left IFG activation in linguistic complexity studies, which differed from activation patterns observed in SIN or spectral degradation studies. Although there were no significant overlap in prefrontal activation across these three speech manipulation paradigms, SIN and spectral degradation showed overlapping regions in left and right insula. These findings provide evidence that there is regional specialization within the left IFG and differential executive networks underlie effortful listening.
Collapse
Affiliation(s)
- Claude Alain
- Rotman Research Institute, Baycrest Health Centre, Toronto, Ontario, Canada.,Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| | - Lori J Bernstein
- Department of Supportive Care, Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.,Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Thijs Barten
- Rotman Research Institute, Baycrest Health Centre, Toronto, Ontario, Canada
| | - Karen Banai
- Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel
| |
Collapse
|
147
|
Senan TU, Jelfs S, Kohlrausch A. Cognitive disruption by noise-vocoded speech stimuli: Effects of spectral variation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1407. [PMID: 29604682 DOI: 10.1121/1.5026619] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The effect of irrelevant sounds on short-term memory was investigated in two experiments using noise-vocoded speech stimuli (NVSS). Speech samples were systematically modified by a noise-vocoder and a set of stimuli varying from amplitude-modulated white noise to intelligible speech was created. Eight NVSS conditions, composed of 1-, 2-, 4-, 6-, 9-, 12-, 15-, and 18-bands, were used as the distracting stimuli in a digit-recall task next to the speech and silence conditions. The results showed that performance decreased with the number of frequency bands up to the 6-bands condition, but there was no influence of number of bands on performance beyond six bands. The results were analyzed using four acoustic metrics proposed in the literature: the frequency domain correlation coefficient (FDCC), the fluctuation strength, the speech transmission index (STI), and the normalized covariance measure (NCM). None of the metrics successfully predicted the results. However, the parameter values of the FDCC, the STI, and the NCM indicated that a prediction model for irrelevant sound effect should account for both temporal and spectral features of the irrelevant sounds.
Collapse
Affiliation(s)
- Toros Ufuk Senan
- Royal Philips, High Tech Campus 36, Eindhoven, 5656 AE, The Netherlands
| | - Sam Jelfs
- Royal Philips, High Tech Campus 36, Eindhoven, 5656 AE, The Netherlands
| | - Armin Kohlrausch
- Human-Technology Interaction, Eindhoven University of Technology, Eindhoven, 5600 MB, The Netherlands
| |
Collapse
|
148
|
Abstract
OBJECTIVES The purpose of the present study was to quantify age-related differences in executive control as it relates to dual-task performance, which is thought to represent listening effort, during degraded speech recognition. DESIGN Twenty-five younger adults (YA; 18-24 years) and 21 older adults (OA; 56-82 years) completed a dual-task paradigm that consisted of a primary speech recognition task and a secondary visual monitoring task. Sentence material in the primary task was either unprocessed or spectrally degraded into 8, 6, or 4 spectral channels using noise-band vocoding. Performance on the visual monitoring task was assessed by the accuracy and reaction time of participants' responses. Performance on the primary and secondary task was quantified in isolation (i.e., single task) and during the dual-task paradigm. Participants also completed a standardized psychometric measure of executive control, including attention and inhibition. Statistical analyses were implemented to evaluate changes in listeners' performance on the primary and secondary tasks (1) per condition (unprocessed vs. vocoded conditions); (2) per task (single task vs. dual task); and (3) per group (YA vs. OA). RESULTS Speech recognition declined with increasing spectral degradation for both YA and OA when they performed the task in isolation or concurrently with the visual monitoring task. OA were slower and less accurate than YA on the visual monitoring task when performed in isolation, which paralleled age-related differences in standardized scores of executive control. When compared with single-task performance, OA experienced greater declines in secondary-task accuracy, but not reaction time, than YA. Furthermore, results revealed that age-related differences in executive control significantly contributed to age-related differences on the visual monitoring task during the dual-task paradigm. CONCLUSIONS OA experienced significantly greater declines in secondary-task accuracy during degraded speech recognition than YA. These findings are interpreted as suggesting that OA expended greater listening effort than YA, which may be partially attributed to age-related differences in executive control.
Collapse
|
149
|
Roberts B, Summers RJ. Informational masking of speech by time-varying competitors: Effects of frequency region and number of interfering formants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:891. [PMID: 29495741 DOI: 10.1121/1.5023476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study explored the extent to which informational masking of speech depends on the frequency region and number of extraneous formants in an interferer. Target formants-monotonized three-formant (F1+F2+F3) analogues of natural sentences-were presented monaurally, with target ear assigned randomly on each trial. Interferers were presented contralaterally. In experiment 1, single-formant interferers were created using the time-reversed F2 frequency contour and constant amplitude, root-mean-square (RMS)-matched to F2. Interferer center frequency was matched to that of F1, F2, or F3, while maintaining the extent of formant-frequency variation (depth) on a log scale. Adding an interferer lowered intelligibility; the effect of frequency region was small and broadly tuned around F2. In experiment 2, interferers comprised either one formant (F1, the most intense) or all three, created using the time-reversed frequency contours of the corresponding targets and RMS-matched constant amplitudes. Interferer formant-frequency variation was scaled to 0%, 50%, or 100% of the original depth. Increasing the depth of formant-frequency variation and number of formants in the interferer had independent and additive effects. These findings suggest that the impact on intelligibility depends primarily on the overall extent of frequency variation in each interfering formant (up to ∼100% depth) and the number of extraneous formants.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
150
|
Di Liberto GM, Lalor EC, Millman RE. Causal cortical dynamics of a predictive enhancement of speech intelligibility. Neuroimage 2018; 166:247-258. [DOI: 10.1016/j.neuroimage.2017.10.066] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 10/04/2017] [Accepted: 10/30/2017] [Indexed: 11/28/2022] Open
|