251
|
Creel SC, Aslin RN, Tanenhaus MK. Word learning under adverse listening conditions: Context-specific recognition. ACTA ACUST UNITED AC 2012. [DOI: 10.1080/01690965.2011.610597] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
252
|
Borrie SA, McAuliffe MJ, Liss JM, Kirk C, O'Beirne GA, Anderson T. Familiarisation conditions and the mechanisms that underlie improved recognition of dysarthric speech. ACTA ACUST UNITED AC 2012; 27:1039-1055. [PMID: 24009401 DOI: 10.1080/01690965.2011.610596] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
This investigation evaluated the familiarisation conditions required to promote subsequent and more long-term improvements in perceptual processing of dysarthric speech and examined the cognitive-perceptual processes that may underlie the experience-evoked learning response. Sixty listeners were randomly allocated to one of three experimental groups and were familiarised under the following conditions: (1) neurologically intact speech (control), (2) dysarthric speech (passive familiarisation), and (3) dysarthric speech coupled with written information (explicit familiarisation). All listeners completed an identical phrase transcription task immediately following familiarisation, and listeners familiarised with dysarthric speech also completed a follow-up phrase transcription task 7 days later. Listener transcripts were analysed for a measure of intelligibility (percent words correct), as well as error patterns at a segmental (percent syllable resemblance) and suprasegmental (lexical boundary errors) level of perceptual processing. The study found that intelligibility scores for listeners familiarised with dysarthric speech were significantly greater than those of the control group, with the greatest and most robust gains afforded by the explicit familiarisation condition. Relative perceptual gains in detecting phonetic and prosodic aspects of the signal varied dependent upon the familiarisation conditions, suggesting that passive familiarisation may recruit a different learning mechanism to that of a more explicit familiarisation experience involving supplementary written information. It appears that decisions regarding resource allocation during subsequent processing of dysarthric speech may be informed by the information afforded by the conditions of familiarisation.
Collapse
Affiliation(s)
- Stephanie A Borrie
- Department of Communication Disorders, University of Canterbury, Christchurch, New Zealand ; New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | | | | | | | | | | |
Collapse
|
253
|
Clos M, Langner R, Meyer M, Oechslin MS, Zilles K, Eickhoff SB. Effects of prior information on decoding degraded speech: an fMRI study. Hum Brain Mapp 2012; 35:61-74. [PMID: 22936472 PMCID: PMC6868994 DOI: 10.1002/hbm.22151] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2012] [Revised: 06/02/2012] [Accepted: 06/05/2012] [Indexed: 12/13/2022] Open
Abstract
Expectations and prior knowledge are thought to support the perceptual analysis of incoming sensory stimuli, as proposed by the predictive‐coding framework. The current fMRI study investigated the effect of prior information on brain activity during the decoding of degraded speech stimuli. When prior information enabled the comprehension of the degraded sentences, the left middle temporal gyrus and the left angular gyrus were activated, highlighting a role of these areas in meaning extraction. In contrast, the activation of the left inferior frontal gyrus (area 44/45) appeared to reflect the search for meaningful information in degraded speech material that could not be decoded because of mismatches with the prior information. Our results show that degraded sentences evoke instantaneously different percepts and activation patterns depending on the type of prior information, in line with prediction‐based accounts of perception. Hum Brain Mapp 35:61–74, 2014. © 2012 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mareike Clos
- Institute of Neuroscience and Medicine (INM-1, INM-2), Research Center Jülich, Germany
| | | | | | | | | | | |
Collapse
|
254
|
Shafiro V, Sheft S, Gygi B, Ho KTN. The influence of environmental sound training on the perception of spectrally degraded speech and environmental sounds. Trends Amplif 2012; 16:83-101. [PMID: 22891070 DOI: 10.1177/1084713812454225] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Perceptual training with spectrally degraded environmental sounds results in improved environmental sound identification, with benefits shown to extend to untrained speech perception as well. The present study extended those findings to examine longer-term training effects as well as effects of mere repeated exposure to sounds over time. Participants received two pretests (1 week apart) prior to a week-long environmental sound training regimen, which was followed by two posttest sessions, separated by another week without training. Spectrally degraded stimuli, processed with a four-channel vocoder, consisted of a 160-item environmental sound test, word and sentence tests, and a battery of basic auditory abilities and cognitive tests. Results indicated significant improvements in all speech and environmental sound scores between the initial pretest and the last posttest with performance increments following both exposure and training. For environmental sounds (the stimulus class that was trained), the magnitude of positive change that accompanied training was much greater than that due to exposure alone, with improvement for untrained sounds roughly comparable to the speech benefit from exposure. Additional tests of auditory and cognitive abilities showed that speech and environmental sound performance were differentially correlated with tests of spectral and temporal-fine-structure processing, whereas working memory and executive function were correlated with speech, but not environmental sound perception. These findings indicate generalizability of environmental sound training and provide a basis for implementing environmental sound training programs for cochlear implant (CI) patients.
Collapse
Affiliation(s)
- Valeriy Shafiro
- Department of Communication Disorders and Sciences, Rush University Medical Center, 600 S. Paulina Str., 1015 AAC, Chicago, IL 60612, USA.
| | | | | | | |
Collapse
|
255
|
Dubno JR, Ahlstrom JB, Wang X, Horwitz AR. Level-dependent changes in perception of speech envelope cues. J Assoc Res Otolaryngol 2012; 13:835-52. [PMID: 22872414 DOI: 10.1007/s10162-012-0343-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 07/16/2012] [Indexed: 11/28/2022] Open
Abstract
Level-dependent changes in temporal envelope fluctuations in speech and related changes in speech recognition may reveal effects of basilar-membrane nonlinearities. As a result of compression in the basilar-membrane response, the "effective" magnitude of envelope fluctuations may be reduced as speech level increases from lower level (more linear) to mid-level (more compressive) regions. With further increases to a more linear region, speech envelope fluctuations may become more pronounced. To assess these effects, recognition of consonants and key words in sentences was measured as a function of speech level for younger adults with normal hearing. Consonant-vowel syllables and sentences were spectrally degraded using "noise vocoder" processing to maximize perceptual effects of changes to the speech envelope. Broadband noise at a fixed signal-to-noise ratio maintained constant audibility as speech level increased. Results revealed significant increases in scores and envelope-dependent feature transmission from 45 to 60 dB SPL and decreasing scores and feature transmission from 60 to 85 dB SPL. This quadratic pattern, with speech recognition maximized at mid levels and poorer at lower and higher levels, is consistent with a role of cochlear nonlinearities in perception of speech envelope cues.
Collapse
Affiliation(s)
- Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC 29425-5500, USA.
| | | | | | | |
Collapse
|
256
|
Travis KE, Leonard MK, Chan AM, Torres C, Sizemore ML, Qu Z, Eskandar E, Dale AM, Elman JL, Cash SS, Halgren E. Independence of early speech processing from word meaning. ACTA ACUST UNITED AC 2012; 23:2370-9. [PMID: 22875868 DOI: 10.1093/cercor/bhs228] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We combined magnetoencephalography (MEG) with magnetic resonance imaging and electrocorticography to separate in anatomy and latency 2 fundamental stages underlying speech comprehension. The first acoustic-phonetic stage is selective for words relative to control stimuli individually matched on acoustic properties. It begins ∼60 ms after stimulus onset and is localized to middle superior temporal cortex. It was replicated in another experiment, but is strongly dissociated from the response to tones in the same subjects. Within the same task, semantic priming of the same words by a related picture modulates cortical processing in a broader network, but this does not begin until ∼217 ms. The earlier onset of acoustic-phonetic processing compared with lexico-semantic modulation was significant in each individual subject. The MEG source estimates were confirmed with intracranial local field potential and high gamma power responses acquired in 2 additional subjects performing the same task. These recordings further identified sites within superior temporal cortex that responded only to the acoustic-phonetic contrast at short latencies, or the lexico-semantic at long. The independence of the early acoustic-phonetic response from semantic context suggests a limited role for lexical feedback in early speech perception.
Collapse
|
257
|
Goslin J, Duffy H, Floccia C. An ERP investigation of regional and foreign accent processing. BRAIN AND LANGUAGE 2012; 122:92-102. [PMID: 22694999 DOI: 10.1016/j.bandl.2012.04.017] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Revised: 04/19/2012] [Accepted: 04/30/2012] [Indexed: 05/13/2023]
Abstract
This study used event-related potentials (ERPs) to examine whether we employ the same normalisation mechanisms when processing words spoken with a regional accent or foreign accent. Our results showed that the Phonological Mapping Negativity (PMN) following the onset of the final word of sentences spoken with an unfamiliar regional accent was greater than for those produced in the listener's own accent, whilst PMN for foreign accented speech was reduced. Foreign accents also resulted in a reduction in N400 amplitude when compared to both unfamiliar regional accents and the listener's own accent, with no significant difference found between the N400 of the regional and home accents. These results suggest that regional accent related variations are normalised at the earliest stages of spoken word recognition, requiring less top-down lexical intervention than foreign accents.
Collapse
|
258
|
Borrie SA, McAuliffe MJ, Liss JM, O'Beirne GA, Anderson TJ. A follow-up investigation into the mechanisms that underlie improved recognition of dysarthric speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:EL102-8. [PMID: 22894306 PMCID: PMC7888335 DOI: 10.1121/1.4736952] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Differences in perceptual strategies for lexical segmentation of moderate hypokinetic dysarthric speech, apparently related to the conditions of the familiarization procedure, have been previously reported [Borrie et al., Language and Cognitive Processes (2012)]. The current follow-up investigation examined whether this difference was also observed when familiarization stimuli highlighted syllabic strength contrast cues. Forty listeners completed an identical transcription task following familiarization with dysarthric phrases presented under either passive or explicit learning conditions. Lexical boundary error patterns revealed that syllabic strength cues were exploited in both familiarization conditions. Comparisons with data previously reported afford further insight into perceptual learning of dysarthric speech.
Collapse
Affiliation(s)
- Stephanie A Borrie
- Department of Communication Disorders and New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand.
| | | | | | | | | |
Collapse
|
259
|
Auditory skills and brain morphology predict individual differences in adaptation to degraded speech. Neuropsychologia 2012; 50:2154-64. [DOI: 10.1016/j.neuropsychologia.2012.05.013] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Revised: 05/09/2012] [Accepted: 05/10/2012] [Indexed: 11/21/2022]
|
260
|
Abstract
We investigated comprehension of and adaptation to speech in an unfamiliar accent in older adults. Participants performed a speeded sentence verification task for accented sentences: one group upon auditory-only presentation, and the other group upon audiovisual presentation. Our questions were whether audiovisual presentation would facilitate adaptation to the novel accent, and which cognitive and linguistic measures would predict adaptation. Participants were therefore tested on a range of background tests: hearing acuity, auditory verbal short-term memory, working memory, attention-switching control, selective attention, and vocabulary knowledge. Both auditory-only and audiovisual groups showed improved accuracy and decreasing response times over the course of the experiment, effectively showing accent adaptation. Even though the total amount of improvement was similar for the auditory-only and audiovisual groups, initial rate of adaptation was faster in the audiovisual group. Hearing sensitivity and short-term and working memory measures were associated with efficient processing of the novel accent. Analysis of the relationship between accent comprehension and the background tests revealed furthermore that selective attention and vocabulary size predicted the amount of adaptation over the course of the experiment. These results suggest that vocabulary knowledge and attentional abilities facilitate the attention-shifting strategies proposed to be required for perceptual learning.
Collapse
Affiliation(s)
- Esther Janse
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | | |
Collapse
|
261
|
Wild CJ, Davis MH, Johnsrude IS. Human auditory cortex is sensitive to the perceived clarity of speech. Neuroimage 2012; 60:1490-502. [PMID: 22248574 DOI: 10.1016/j.neuroimage.2012.01.035] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 11/23/2011] [Accepted: 01/02/2012] [Indexed: 11/29/2022] Open
Affiliation(s)
- Conor J Wild
- Centre for Neuroscience Studies, Queen's University, Kingston ON, Canada.
| | | | | |
Collapse
|
262
|
Summers RJ, Bailey PJ, Roberts B. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception. J Assoc Res Otolaryngol 2012; 13:269-280. [PMID: 22160754 PMCID: PMC3298615 DOI: 10.1007/s10162-011-0307-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Accepted: 11/18/2011] [Indexed: 11/29/2022] Open
Abstract
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1 + F2 + F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3C; F2 + F3), where F2C + F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0 = constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Collapse
Affiliation(s)
- Robert J. Summers
- />Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET UK
| | - Peter J. Bailey
- />Department of Psychology, University of York, Heslington, York, YO10 5DD UK
| | - Brian Roberts
- />Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET UK
| |
Collapse
|
263
|
Huyck JJ, Johnsrude IS. Rapid perceptual learning of noise-vocoded speech requires attention. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:EL236-EL242. [PMID: 22423814 DOI: 10.1121/1.3685511] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Humans are able to adapt to unfamiliar forms of speech (such as accented, time-compressed, or noise-vocoded speech) quite rapidly. Can such perceptual learning occur when attention is directed away from the speech signal? Here, participants were simultaneously exposed to noise-vocoded sentences, auditory distractors, and visual distractors. One group attended to the speech, listening to each sentence and reporting what they heard. Two other groups attended to either the auditory or visual distractors, performing a target-detection task. Only the attend-speech group benefited from the exposure when subsequently reporting noise-vocoded sentences. Thus, attention to noise-vocoded speech appears necessary for learning.
Collapse
Affiliation(s)
- Julia Jones Huyck
- Department of Psychology and Centre for Neuroscience Studies, Queen's University, 62 Arch Street, Kingston, Ontario K7L 3N6, Canada.
| | | |
Collapse
|
264
|
Abstracts of the British Society of Audiology annual conference (incorporating the Experimental and Clinical Short papers meetings). Int J Audiol 2012. [DOI: 10.3109/14992027.2012.653103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
265
|
Borrie SA, McAuliffe MJ, Liss JM. Perceptual learning of dysarthric speech: a review of experimental studies. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:290-305. [PMID: 22199185 PMCID: PMC3738172 DOI: 10.1044/1092-4388(2011/10-0349)] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
PURPOSE This review article provides a theoretical overview of the characteristics of perceptual learning, reviews perceptual learning studies that pertain to dysarthric populations, and identifies directions for future research that consider the application of perceptual learning to the management of dysarthria. METHOD A critical review of the literature was conducted that summarized and synthesized previously published research in the area of perceptual learning with atypical speech. Literature related to perceptual learning of neurologically degraded speech was emphasized with the aim of identifying key directions for future research with this population. CONCLUSIONS Familiarization with unfamiliar or ambiguous speech signals can facilitate perceptual learning of that same speech signal. There is a small but growing body of evidence that perceptual learning also occurs for listeners familiarized with dysarthric speech. Perceptual learning of the dysarthric signal is both theoretically and clinically significant. In order to establish the efficacy of exploiting perceptual learning paradigms for rehabilitative gain in dysarthria management, research is required to build on existing empirical evidence and develop a theoretical framework for learning to better recognize neurologically degraded speech.
Collapse
|
266
|
Brain 'talks over' boring quotes: top-down activation of voice-selective areas while listening to monotonous direct speech quotations. Neuroimage 2012; 60:1832-42. [PMID: 22306805 DOI: 10.1016/j.neuroimage.2012.01.111] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Revised: 12/31/2011] [Accepted: 01/22/2012] [Indexed: 11/24/2022] Open
Abstract
In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech.
Collapse
|
267
|
Banai K, Amitay S. Stimulus uncertainty in auditory perceptual learning. Vision Res 2012; 61:83-8. [PMID: 22289646 DOI: 10.1016/j.visres.2012.01.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 11/19/2011] [Accepted: 01/11/2012] [Indexed: 10/14/2022]
Abstract
Stimulus uncertainty produced by variations in a target stimulus to be detected or discriminated, impedes perceptual learning under some, but not all experimental conditions. To account for those discrepancies, it has been proposed that uncertainty is detrimental to learning when the interleaved stimuli or tasks are similar to each other but not when they are sufficiently distinct, or when it obstructs the downstream search required to gain access to fine-grained sensory information, as suggested by the Reverse Hierarchy Theory (RHT). The focus of the current review is on the effects of uncertainty on the perceptual learning of speech and non-speech auditory signals. Taken together, the findings from the auditory modality suggest that in addition to the accounts already described, uncertainty may contribute to learning when categorization of stimuli to phonological or acoustic categories is involved. Therefore, it appears that the differences reported between the learning of non-speech and speech-related parameters are not an outcome of inherent differences between those two domains, but rather due to the nature of the tasks often associated with those different stimuli.
Collapse
|
268
|
Abstract
OBJECTIVE While auditory training in quiet has been shown to improve cochlear implant (CI) users' speech understanding in quiet, it is unclear whether training in noise will benefit speech understanding in noise. The present study investigated whether auditory training could improve CI users' speech recognition in noise and whether training with familiar stimuli in an easy listening task (closed-set digit recognition) would improve recognition of unfamiliar stimuli in a more difficult task (open-set sentence recognition). DESIGN CI users' speech understanding in noise was assessed before, during, and after auditory training with a closed-set recognition task (digits identification) in speech babble. Before training was begun, recognition of digits, Hearing in Noise Test (HINT) sentences, and IEEE sentences presented in steady speech-shaped noise or multitalker speech babble was repeatedly measured to establish a stable estimate of baseline performance. After completing baseline measures, participants trained at home on their personal computers using custom software for approximately 30 mins/day, 5 days/wk, for 4 wks, for a total of 10 hrs of training. Participants were trained only to identify random sequences of three digits presented in speech babble, using a closed-set task. During training, the signal-to-noise ratio was adjusted according to subject performance; auditory and visual feedback was provided. Recognition of digits, HINT sentences, and IEEE sentences in steady noise and speech babble was remeasured after the second and fourth week of training. Training was stopped after the fourth week, and subjects returned to the laboratory 1 mo later for follow-up testing to see whether any training benefits had been retained. RESULTS Mean results showed that the digit training in babble significantly improved digit recognition in babble (which was trained) and in steady noise (which was not trained). The training benefit generalized to improved HINT and IEEE sentence recognition in both types of noise. Training benefits were largely retained in follow-up measures made 1 mo after training was stopped. CONCLUSIONS The results demonstrated that auditory training in noise significantly improved CI users' speech performance in noise, and that training with simple stimuli using an easy closed-set listening task improved performance with difficult stimuli and a difficult open-set listening task.
Collapse
|
269
|
|
270
|
Davis MH, Ford MA, Kherif F, Johnsrude IS. Does Semantic Context Benefit Speech Understanding through “Top–Down” Processes? Evidence from Time-resolved Sparse fMRI. J Cogn Neurosci 2011; 23:3914-32. [PMID: 21745006 DOI: 10.1162/jocn_a_00084] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
When speech is degraded, word report is higher for semantically coherent sentences (e.g., her new skirt was made of denim) than for anomalous sentences (e.g., her good slope was done in carrot). Such increased intelligibility is often described as resulting from “top–down” processes, reflecting an assumption that higher-level (semantic) neural processes support lower-level (perceptual) mechanisms. We used time-resolved sparse fMRI to test for top–down neural mechanisms, measuring activity while participants heard coherent and anomalous sentences presented in speech envelope/spectrum noise at varying signal-to-noise ratios (SNR). The timing of BOLD responses to more intelligible speech provides evidence of hierarchical organization, with earlier responses in peri-auditory regions of the posterior superior temporal gyrus than in more distant temporal and frontal regions. Despite Sentence content × SNR interactions in the superior temporal gyrus, prefrontal regions respond after auditory/perceptual regions. Although we cannot rule out top–down effects, this pattern is more compatible with a purely feedforward or bottom–up account, in which the results of lower-level perceptual processing are passed to inferior frontal regions. Behavioral and neural evidence that sentence content influences perception of degraded speech does not necessarily imply “top–down” neural processes.
Collapse
Affiliation(s)
- Matthew H. Davis
- 1Medical Research Council Cognition and Brain Sciences Unit, Cambridge, UK
| | - Michael A. Ford
- 1Medical Research Council Cognition and Brain Sciences Unit, Cambridge, UK
- 2University of East Anglia
| | | | | |
Collapse
|
271
|
Pilling M, Thomas S. Audiovisual cues and perceptual learning of spectrally distorted speech. LANGUAGE AND SPEECH 2011; 54:487-497. [PMID: 22338788 DOI: 10.1177/0023830911404958] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
Collapse
|
272
|
Idemaru K, Holt LL. Word recognition reflects dimension-based statistical learning. J Exp Psychol Hum Percept Perform 2011; 37:1939-56. [PMID: 22004192 DOI: 10.1037/a0025641] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing requires sensitivity to long-term regularities of the native language yet demands listeners to flexibly adapt to perturbations that arise from talker idiosyncrasies such as nonnative accent. The present experiments investigate whether listeners exhibit dimension-based statistical learning of correlations between acoustic dimensions defining perceptual space for a given speech segment. While engaged in a word recognition task guided by a perceptually unambiguous voice-onset time (VOT) acoustics to signal beer, pier, deer, or tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) to VOT. Results across four experiments are indicative of rapid, dimension-based statistical learning; reliance on the F0 dimension in word recognition was rapidly down-weighted in response to the perturbation of the correlation between F0 and VOT dimensions. However, listeners did not simply mirror the short-term input statistics. Instead, response patterns were consistent with a lingering influence of sensitivity to the long-term regularities of English. This suggests that the very acoustic dimensions defining perceptual space are not fixed and, rather, are dynamically and rapidly adjusted to the idiosyncrasies of local experience, such as might arise from nonnative-accent, dialect, or dysarthria. The current findings extend demonstrations of "object-based" statistical learning across speech segments to include incidental, online statistical learning of regularities residing within a speech segment.
Collapse
Affiliation(s)
- Kaori Idemaru
- Department of East Asian Languages and Literatures, University of Oregon, Eugene, OR 97403, USA.
| | | |
Collapse
|
273
|
Hazan V, Baker R. Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:2139-52. [PMID: 21973368 DOI: 10.1121/1.3623753] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced (a) with communicative intent under more ideal conditions and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. Forty talkers were recorded in pairs while engaged in "spot the difference" picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range, and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutors' needs, even when not directly experiencing the challenging listening condition.
Collapse
Affiliation(s)
- Valerie Hazan
- Speech, Hearing, and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1E 1PF, United Kingdom.
| | | |
Collapse
|
274
|
Bent T, Loebach JL, Phillips L, Pisoni DB. Perceptual adaptation to sinewave-vocoded speech across languages. J Exp Psychol Hum Percept Perform 2011; 37:1607-16. [PMID: 21688936 PMCID: PMC3179795 DOI: 10.1037/a0024281] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Listeners rapidly adapt to many forms of degraded speech. What level of information drives this adaptation, however, remains unresolved. The current study exposed listeners to sinewave-vocoded speech in one of three languages, which manipulated the type of information shared between the training languages (German, Mandarin, or English) and the testing language (English) in an audio-visual (AV) or an audio plus still frames modality (A + Stills). Three control groups were included to assess procedural learning effects. After training, listeners' perception of novel sinewave-vocoded English sentences was tested. Listeners exposed to German-AV materials performed equivalently to listeners exposed to English AV or A + Stills materials and significantly better than two control groups. The Mandarin groups and German-A + Stills group showed an intermediate level of performance. These results suggest that full lexical access is not absolutely necessary for adaptation to degraded speech, but providing AV-training in a language that is similar phonetically to the testing language can facilitate adaptation.
Collapse
Affiliation(s)
- Tessa Bent
- Department of Speech and Hearing Sciences, Indiana University, 200 S. Jordan Ave., Bloomington, IN 47405, USA.
| | | | | | | |
Collapse
|
275
|
Lansford KL, Liss JM, Caviness JN, Utianski RL. A cognitive-perceptual approach to conceptualizing speech intelligibility deficits and remediation practice in hypokinetic dysarthria. PARKINSONS DISEASE 2011; 2011:150962. [PMID: 21918728 PMCID: PMC3171761 DOI: 10.4061/2011/150962] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/09/2011] [Revised: 06/14/2011] [Accepted: 07/13/2011] [Indexed: 11/20/2022]
Abstract
Hypokinetic dysarthria is a common manifestation of Parkinson's disease, which negatively influences quality of life. Behavioral techniques that aim to improve speech intelligibility constitute the bulk of intervention strategies for this population, as the dysarthria does not often respond vigorously to medical interventions. Although several case and group studies generally support the efficacy of behavioral treatment, much work remains to establish a rigorous evidence base. This absence of definitive research leaves both the speech-language pathologist and referring physician with the task of determining the feasibility and nature of therapy for intelligibility remediation in PD. The purpose of this paper is to introduce a novel framework for medical practitioners in which to conceptualize and justify potential targets for speech remediation. The most commonly targeted deficits (e.g., speaking rate and vocal loudness) can be supported by this approach, as well as underutilized and novel treatment targets that aim at the listener's perceptual skills.
Collapse
Affiliation(s)
- Kaitlin L Lansford
- Motor Speech Disorders Laboratory, Department of Speech and Hearing Science, Arizona State University, P.O. Box 870102, Tempe, AZ 85287-0102, USA
| | | | | | | |
Collapse
|
276
|
Understanding of spoken language under challenging listening conditions in younger and older listeners: A combined behavioral and electrophysiological study. Brain Res 2011; 1415:8-22. [DOI: 10.1016/j.brainres.2011.08.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 07/29/2011] [Accepted: 08/01/2011] [Indexed: 11/19/2022]
|
277
|
Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. J Neurosci 2011; 31:1704-14. [PMID: 21289179 DOI: 10.1523/jneurosci.4853-10.2011] [Citation(s) in RCA: 133] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Humans are remarkably adept at understanding speech, even when it is contaminated by noise. Multisensory integration may explain some of this ability: combining independent information from the auditory modality (vocalizations) and the visual modality (mouth movements) reduces noise and increases accuracy. Converging evidence suggests that the superior temporal sulcus (STS) is a critical brain area for multisensory integration, but little is known about its role in the perception of noisy speech. Behavioral studies have shown that perceptual judgments are weighted by the reliability of the sensory modality: more reliable modalities are weighted more strongly, even if the reliability changes rapidly. We hypothesized that changes in the functional connectivity of STS with auditory and visual cortex could provide a neural mechanism for perceptual reliability weighting. To test this idea, we performed five blood oxygenation level-dependent functional magnetic resonance imaging and behavioral experiments in 34 healthy subjects. We found increased functional connectivity between the STS and auditory cortex when the auditory modality was more reliable (less noisy) and increased functional connectivity between the STS and visual cortex when the visual modality was more reliable, even when the reliability changed rapidly during presentation of successive words. This finding matched the results of a behavioral experiment in which the perception of incongruent audiovisual syllables was biased toward the more reliable modality, even with rapidly changing reliability. Changes in STS functional connectivity may be an important neural mechanism underlying the perception of noisy speech.
Collapse
|
278
|
Obleser J, Kotz SA. Multiple brain signatures of integration in the comprehension of degraded speech. Neuroimage 2011; 55:713-23. [PMID: 21172443 DOI: 10.1016/j.neuroimage.2010.12.020] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 11/26/2010] [Accepted: 12/06/2010] [Indexed: 11/20/2022] Open
Affiliation(s)
- Jonas Obleser
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | | |
Collapse
|
279
|
Aydelott J, Leech R, Crinion J. Normal adult aging and the contextual influences affecting speech and meaningful sound perception. Trends Amplif 2011; 14:218-32. [PMID: 21307006 DOI: 10.1177/1084713810393751] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is widely accepted that hearing loss increases markedly with age, beginning in the fourth decade ISO 7029 (2000). Age-related hearing loss is typified by high-frequency threshold elevation and associated reductions in speech perception because speech sounds, especially consonants, become inaudible. Nevertheless, older adults often report additional and progressive difficulties in the perception and comprehension of speech, often highlighted in adverse listening conditions that exceed those reported by younger adults with a similar degree of high-frequency hearing loss (Dubno, Dirks, & Morgan) leading to communication difficulties and social isolation (Weinstein & Ventry). Some of the age-related decline in speech perception can be accounted for by peripheral sensory problems but cognitive aging can also be a contributing factor. In this article, we review findings from the psycholinguistic literature predominantly over the last four years and present a pilot study illustrating how normal age-related changes in cognition and the linguistic context can influence speech-processing difficulties in older adults. For significant progress in understanding and improving the auditory performance of aging listeners to be made, we discuss how future research will have to be much more specific not only about which interactions between auditory and cognitive abilities are critical but also how they are modulated in the brain.
Collapse
|
280
|
Affiliation(s)
- Arthur G. Samuel
- Department of Psychology, Stony Brook University, Stony Brook, New York 11794-2500
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastian 20009 Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao 48011 Spain;
| |
Collapse
|
281
|
Summers RJ, Bailey PJ, Roberts B. Effects of differences in fundamental frequency on across-formant grouping in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3667-3677. [PMID: 21218899 DOI: 10.1121/1.3505119] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (ΔF0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (ΔF0 = ± 8, ± 2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ΔF0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping "primitive" based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Collapse
Affiliation(s)
- Robert J Summers
- School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | | | | |
Collapse
|
282
|
Roberts B, Summers RJ, Bailey PJ. The intelligibility of noise-vocoded speech: spectral information available from across-channel comparison of amplitude envelopes. Proc Biol Sci 2010; 278:1595-600. [PMID: 21068039 DOI: 10.1098/rspb.2010.1554] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output-a key source of phonetic detail-from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (≤30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (≈N1 + N2), F2 (≈N3 + N4) and the higher formants (F3' ≈ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
| | | | | |
Collapse
|
283
|
Mitterer H, Chen Y, Zhou X. Phonological Abstraction in Processing Lexical-Tone Variation: Evidence From a Learning Paradigm. Cogn Sci 2010; 35:184-97. [DOI: 10.1111/j.1551-6709.2010.01140.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
284
|
Hopkins K, Moore BCJ, Stone MA. The effects of the addition of low-level, low-noise noise on the intelligibility of sentences processed to remove temporal envelope information. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:2150-2161. [PMID: 20968385 DOI: 10.1121/1.3478773] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.
Collapse
Affiliation(s)
- Kathryn Hopkins
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom.
| | | | | |
Collapse
|
285
|
Interactions between unsupervised learning and the degree of spectral mismatch on short-term perceptual adaptation to spectrally shifted speech. Ear Hear 2010; 30:238-49. [PMID: 19194293 DOI: 10.1097/aud.0b013e31819769ac] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Cochlear implant listeners are able to at least partially adapt to the spectral mismatch associated with the implant device and speech processor via daily exposure and/or explicit training. The overall goal of this study was to investigate interactions between short-term unsupervised learning (i.e., passive adaptation) and the degree of spectral mismatch in normal-hearing listeners' adaptation to spectrally shifted vowels. DESIGN Normal-hearing subjects were tested while listening to acoustic cochlear implant simulations. Unsupervised learning was measured by testing vowel recognition repeatedly over a 5 day period; no feedback or explicit training was provided. In experiment 1, subjects listened to 8-channel, sine-wave vocoded speech. The spectral envelope was compressed to simulate a 16 mm cochlear implant electrode array. The analysis bands were fixed and the compressed spectral envelope was linearly shifted toward the base by 3.6, 6, or 8.3 mm to simulate different insertion depths of the electrode array, resulting in a slight, moderate, or severe spectral shift. In experiment 2, half the subjects were exclusively exposed to a severe shift with 8 or 16 channels (exclusive groups), and half the subjects were exposed to 8-channel severely shifted speech, 16-channel severely shifted speech, and 8-channel moderately shifted speech, alternately presented within each test session (mixed group). The region of stimulation in the cochlea was fixed (16 mm in extent and 15 mm from the apex) and the analysis bands were manipulated to create the spectral shift conditions. To determine whether increased spectral resolution would improve adaptation, subjects were exposed to 8- or 16-channel severely shifted speech. RESULTS In experiment 1, at the end of the adaptation period, there was no significant difference between 8-channel speech that was spectrally matched and that shifted by 3.6 mm. There was a significant, but less-complete, adaptation to the 6 mm shift and no adaptation to the 8.3 mm shift. In experiment 2, for the mixed exposure group, there was significant adaptation to severely shifted speech with 8 channels and even greater adaptation with 16 channels. For the exclusive exposure group, there was no significant adaptation to severely shifted speech with either 8 or 16 channels. CONCLUSIONS These findings suggest that listeners are able to passively adapt to spectral shifts up to 6 mm. For spectral shifts beyond 6 mm, some passive adaptation was observed with mixed exposure to a smaller spectral shift, even at the expense of some low frequency information. Mixed exposure to the smaller shift may have enhanced listeners' access to spectral envelope details that were not accessible when listening exclusively to severely shifted speech. The results suggest that the range of spectral mismatch that can support passive adaptation may be larger than previously reported. Some amount of passive adaptation may be possible with severely shifted speech by exposing listeners to a relatively small mismatch in conjunction with the severe mismatch.
Collapse
|
286
|
Roberts B, Summers RJ, Bailey PJ. The perceptual organization of sine-wave speech under competitive conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:804-817. [PMID: 20707450 DOI: 10.1121/1.3445786] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping "primitives" and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez, R. E., et al. (1994). Psychol. Rev. 101, 129-156]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear=F1+F2C+F3; opposite ear=F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom.
| | | | | |
Collapse
|
287
|
Abstract
Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Radiology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | |
Collapse
|
288
|
Loebach JL, Pisoni DB, Svirsky MA. Effects of semantic context and feedback on perceptual learning of speech processed through an acoustic simulation of a cochlear implant. J Exp Psychol Hum Percept Perform 2010; 36:224-34. [PMID: 20121306 DOI: 10.1037/a0017609] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The effect of feedback and materials on perceptual learning was examined in listeners with normal hearing who were exposed to cochlear implant simulations. Generalization was most robust when feedback paired the spectrally degraded sentences with their written transcriptions, promoting mapping between the degraded signal and its acoustic-phonetic representation. Transfer-appropriate processing theory suggests that such feedback was most successful because the original learning conditions were reinstated at testing: Performance was facilitated when both training and testing contained degraded stimuli. In addition, the effect of semantic context on generalization was assessed by training listeners on meaningful or anomalous sentences. Training with anomalous sentences was as effective as that with meaningful sentences, suggesting that listeners were encouraged to use acoustic-phonetic information to identify speech than to make predictions from semantic context.
Collapse
Affiliation(s)
- Jeremy L Loebach
- Department of Psychological and Brain Sciences, Indiana University, USA.
| | | | | |
Collapse
|
289
|
Stacey PC, Raine CH, O'Donoghue GM, Tapper L, Twomey T, Summerfield AQ. Effectiveness of computer-based auditory training for adult users of cochlear implants. Int J Audiol 2010; 49:347-56. [DOI: 10.3109/14992020903397838] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
290
|
Siciliano CM, Faulkner A, Rosen S, Mair K. Resistance to learning binaurally mismatched frequency-to-place maps: implications for bilateral stimulation with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1645-60. [PMID: 20329863 DOI: 10.1121/1.3293002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Simulations of monaural cochlear implants in normal hearing listeners have shown that the deleterious effects of upward spectral shifting on speech perception can be overcome with training. This study simulates bilateral stimulation with a unilateral spectral shift to investigate whether listeners can adapt to upward-shifted speech information presented together with contralateral unshifted information. A six-channel, dichotic, interleaved sine-carrier vocoder simulated a binaurally mismatched frequency-to-place map. Odd channels were presented to one ear with an upward frequency shift equivalent to 6 mm on the basilar membrane, while even channels were presented to the contralateral ear unshifted. In Experiment 1, listeners were trained for 5.3 h with either the binaurally mismatched processor or with just the shifted monaural bands. In Experiment 2, the duration of training was 10 h, and the trained condition alternated between those of Experiment 1. While listeners showed learning in both experiments, intelligibility with the binaurally mismatched processor never exceeded, intelligibility with just the three unshifted bands, suggesting that listeners did not benefit from combining the mismatched maps, even though there was clear scope to do so. Frequency-place map alignment may thus be of importance when optimizing bilateral devices of the type studied here.
Collapse
Affiliation(s)
- Catherine M Siciliano
- Speech, Hearing and Phonetic Sciences, Division of Psychology and Language Sciences, UCL, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom.
| | | | | | | |
Collapse
|
291
|
|
292
|
Transfer of auditory perceptual learning with spectrally reduced speech to speech and nonspeech tasks: implications for cochlear implants. Ear Hear 2010; 30:662-74. [PMID: 19773659 DOI: 10.1097/aud.0b013e3181b9c92d] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE The objective of this study was to assess whether training on speech processed with an eight-channel noise vocoder to simulate the output of a cochlear implant would produce transfer of auditory perceptual learning to the recognition of nonspeech environmental sounds, the identification of speaker gender, and the discrimination of talkers by voice. DESIGN Twenty-four normal-hearing subjects were trained to transcribe meaningful English sentences processed with a noise vocoder simulation of a cochlear implant. An additional 24 subjects served as an untrained control group and transcribed the same sentences in their unprocessed form. All subjects completed pre- and post-test sessions in which they transcribed vocoded sentences to provide an assessment of training efficacy. Transfer of perceptual learning was assessed using a series of closed set, nonlinguistic tasks: subjects identified talker gender, discriminated the identity of pairs of talkers, and identified ecologically significant environmental sounds from a closed set of alternatives. RESULTS Although both groups of subjects showed significant pre- to post-test improvements, subjects who transcribed vocoded sentences during training performed significantly better at post-test than those in the control group. Both groups performed equally well on gender identification and talker discrimination. Subjects who received explicit training on the vocoded sentences, however, performed significantly better on environmental sound identification than the untrained subjects. Moreover, across both groups, pre-test speech performance and, to a higher degree, post-test speech performance, were significantly correlated with environmental sound identification. For both groups, environmental sounds that were characterized as having more salient temporal information were identified more often than environmental sounds that were characterized as having more salient spectral information. CONCLUSIONS Listeners trained to identify noise-vocoded sentences showed evidence of transfer of perceptual learning to the identification of environmental sounds. In addition, the correlation between environmental sound identification and sentence transcription indicates that subjects who were better able to use the degraded acoustic information to identify the environmental sounds were also better able to transcribe the linguistic content of novel sentences. Both trained and untrained groups performed equally well ( approximately 75% correct) on the gender-identification task, indicating that training did not have an effect on the ability to identify the gender of talkers. Although better than chance, performance on the talker discrimination task was poor overall ( approximately 55%), suggesting that either explicit training is required to discriminate talkers' voices reliably or that additional information (perhaps spectral in nature) not present in the vocoded speech is required to excel in such tasks. Taken together, the results suggest that although transfer of auditory perceptual learning with spectrally degraded speech does occur, explicit task-specific training may be necessary for tasks that cannot rely on temporal information alone.
Collapse
|
293
|
Arlinger S, Lunner T, Lyxell B, Pichora-Fuller MK. The emergence of cognitive hearing science. Scand J Psychol 2010; 50:371-84. [PMID: 19778385 DOI: 10.1111/j.1467-9450.2009.00753.x] [Citation(s) in RCA: 133] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Cognitive Hearing Science or Auditory Cognitive Science is an emerging field of interdisciplinary research concerning the interactions between hearing and cognition. It follows a trend over the last half century for interdisciplinary fields to develop, beginning with Neuroscience, then Cognitive Science, then Cognitive Neuroscience, and then Cognitive Vision Science. A common theme is that an interdisciplinary approach is necessary to understand complex human behaviors, to develop technologies incorporating knowledge of these behaviors, and to find solutions for individuals with impairments that undermine typical behaviors. Accordingly, researchers in traditional academic disciplines, such as Psychology, Physiology, Linguistics, Philosophy, Anthropology, and Sociology benefit from collaborations with each other, and with researchers in Computer Science and Engineering working on the design of technologies, and with health professionals working with individuals who have impairments. The factors that triggered the emergence of Cognitive Hearing Science include the maturation of the component disciplines of Hearing Science and Cognitive Science, new opportunities to use complex digital signal-processing to design technologies suited to performance in challenging everyday environments, and increasing social imperatives to help people whose communication problems span hearing and cognition. Cognitive Hearing Science is illustrated in research on three general topics: (1) language processing in challenging listening conditions; (2) use of auditory communication technologies or the visual modality to boost performance; (3) changes in performance with development, aging, and rehabilitative training. Future directions for modeling and the translation of research into practice are suggested.
Collapse
Affiliation(s)
- Stig Arlinger
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Linköping University, Sweden
| | | | | | | |
Collapse
|
294
|
Van Engen KJ, Baese-Berk M, Baker RE, Choi A, Kim M, Bradlow AR. The Wildcat Corpus of native- and foreign-accented English: communicative efficiency across conversational dyads with varying language alignment profiles. LANGUAGE AND SPEECH 2010; 53:510-40. [PMID: 21313992 PMCID: PMC3537227 DOI: 10.1177/0023830910372495] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
This paper describes the development of the Wildcat Corpus of native- and foreign-accented English,a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English.The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different LIs), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency.The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared LI. Non-native pairs with different LIs were least efficient.These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Linguistics, Northwestern University, Evanston, IL 60208-4090, USA.
| | | | | | | | | | | |
Collapse
|
295
|
Sebastián-Gallés N, Vera-Constán F, Larsson JP, Costa A, Deco G. Lexical Plasticity in Early Bilinguals Does Not Alter Phoneme Categories: II. Experimental Evidence. J Cogn Neurosci 2009; 21:2343-57. [DOI: 10.1162/jocn.2008.21152] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
When listening to modified speech, either naturally or artificially altered, the human perceptual system rapidly adapts to it. There is some debate about the nature of the mechanisms underlying this adaptation. Although some authors propose that listeners modify their prelexical representations, others assume changes at the lexical level. Recently, Larsson, Vera, Sebastian-Galles, and Deco [Lexical plasticity in early bilinguals does not alter phoneme categories: I. Neurodynamical modelling. Journal of Cognitive Neuroscience, 20, 76–94, 2008] proposed a biologically plausible computational model to account for some existing data, one which successfully modeled how long-term exposure to a dialect triggers the creation of new lexical entries. One specific prediction of the model was that prelexical (phoneme) representations should not be affected by dialectal exposure (as long as the listener is exposed to both standard and dialectal pronunciations). Here we present a series of experiments testing the predictions of the model. Native listeners of Catalan, with extended exposure to Spanish-accented Catalan, were tested on different auditory lexical decision tasks and phoneme discrimination tasks. Behavioral and electrophysiological recordings were obtained. The results supported the predictions of our model. On the one hand, both error rates and N400 measurements indicated the existence of alternative lexical entries for dialectal varieties. On the other hand, no evidence of alterations at the phoneme level, either in the behavioral discrimination task or in the electrophysiological measurement (MMN), could be detected. The results of the present study are compared with those obtained in short-term laboratory exposures in an attempt to provide an integrative account.
Collapse
Affiliation(s)
- Núria Sebastián-Gallés
- 1Universitat de Barcelona, Barcelona, Spain
- 2Universitat Pompeu i Fabra, Barcelona, Spain
- 3Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | | | | | - Albert Costa
- 2Universitat Pompeu i Fabra, Barcelona, Spain
- 3Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Gustavo Deco
- 2Universitat Pompeu i Fabra, Barcelona, Spain
- 3Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
296
|
Foreign subtitles help but native-language subtitles harm foreign speech perception. PLoS One 2009; 4:e7785. [PMID: 19918371 PMCID: PMC2775720 DOI: 10.1371/journal.pone.0007785] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2009] [Accepted: 10/07/2009] [Indexed: 11/19/2022] Open
Abstract
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.
Collapse
|
297
|
Bent T, Buchwald A, Pisoni DB. Perceptual adaptation and intelligibility of multiple talkers for two types of degraded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2660-9. [PMID: 19894843 PMCID: PMC2787077 DOI: 10.1121/1.3212930] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Revised: 07/29/2009] [Accepted: 07/31/2009] [Indexed: 05/21/2023]
Abstract
Talker intelligibility and perceptual adaptation under cochlear implant (CI)-simulation and speech in multi-talker babble were compared. The stimuli consisted of 100 sentences produced by 20 native English talkers. The sentences were processed to simulate listening with an eight-channel CI or were mixed with multi-talker babble. Stimuli were presented to 400 listeners in a sentence transcription task (200 listeners in each condition). Perceptual adaptation was measured for each talker by comparing intelligibility in the first 20 sentences of the experiment to intelligibility in the last 20 sentences. Perceptual adaptation patterns were also compared across the two degradation conditions by comparing performance in blocks of ten sentences. The most intelligible talkers under CI-simulation also tended to be the most intelligible talkers in multi-talker babble. Furthermore, listeners demonstrated a greater degree of perceptual adaptation in the CI-simulation condition compared to the multi-talker babble condition although the extent of adaptation varied widely across talkers. Listeners reached asymptote later in the experiment in the CI-simulation condition compared with the multi-talker babble condition. Overall, these two forms of degradation did not differ in their effect on talker intelligibility, although they did result in differences in the amount and time-course of perceptual adaptation.
Collapse
Affiliation(s)
- Tessa Bent
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, IN 47405, USA.
| | | | | |
Collapse
|
298
|
Abstract
Adult language users have an enormous amount of experience with speech in their native language. As a result, they have very well-developed processes for categorizing the sounds of speech that they hear. Despite this very high level of experience, recent research has shown that listeners are capable of redeveloping their speech categorization to bring it into alignment with new variation in their speech input. This reorganization of phonetic space is a type of perceptual learning, or recalibration, of speech processes. In this article, we review several recent lines of research on perceptual learning for speech.
Collapse
|
299
|
RUDNER MARY, FOO CATHARINA, RÖNNBERG JERKER, LUNNER THOMAS. Cognition and aided speech recognition in noise: Specific role for cognitive factors following nine-week experience with adjusted compression settings in hearing aids. Scand J Psychol 2009; 50:405-18. [DOI: 10.1111/j.1467-9450.2009.00745.x] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
300
|
Connine CM, Darnieder LM. Perceptual learning of co-articulation in speech. JOURNAL OF MEMORY AND LANGUAGE 2009; 61:412-422. [PMID: 20160986 PMCID: PMC2754852 DOI: 10.1016/j.jml.2009.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Four experiments investigated the novel issue of learning to accommodate the coarticulated nature of speech. Experiment 1 established a co-articulatory mismatch effect for a set of vowel-consonant (VC) syllables (reaction times were faster for co-articulation matching than for mismatching stimuli). A rhyme judgment training task on words (Experiment 2) or VC stimuli (Experiment 3) with mismatching information was followed by a phoneme monitoring task on a set of VC stimuli; training and test stimuli contained physically identical (same condition) or new (different condition) mismatching coarticulatory information (along with a set containing matching coarticulatory information). A third group received no training. A coarticulatory mismatch effect was found without training but not when the same mismatching tokens were used at training and test. Both word (Experiment 2) and syllable (Experiment 3) training stimuli eliminated the mismatch effect; overall reaction times were somewhat slower when the training stimuli were words. Perceptual learning generalized to new tokens only when the acoustic manifestation of the critical co-articulatory information in the training stimuli was sufficiently large (Experiments 3 and 4). The results are discussed in terms of speech processing and perceptual learning in speech perception.
Collapse
|