1
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 223] [Impact Index Per Article: 37.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
223 |
2
|
Besson M, Chobert J, Marie C. Transfer of Training between Music and Speech: Common Processing, Attention, and Memory. Front Psychol 2011; 2:94. [PMID: 21738519 PMCID: PMC3125524 DOI: 10.3389/fpsyg.2011.00094] [Citation(s) in RCA: 173] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 04/29/2011] [Indexed: 01/17/2023] Open
Abstract
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing.
Collapse
|
Journal Article |
14 |
173 |
3
|
Drennan WR, Rubinstein JT. Music perception in cochlear implant users and its relationship with psychophysical capabilities. JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT 2008; 45:779-89. [PMID: 18816426 PMCID: PMC2628814 DOI: 10.1682/jrrd.2007.08.0118] [Citation(s) in RCA: 172] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
This article describes issues concerning music perception with cochlear implants, discusses why music perception is usually poor in cochlear implant users, reviews relevant data, and describes approaches for improving music perception with cochlear implants. Pitch discrimination ability ranges from the ability to hear a one-semitone difference to a two-octave difference. The ability to hear rhythm and tone duration is near normal in implantees. Timbre perception is usually poor, but about two-thirds of listeners can identify instruments in a closed set better than chance. Cochlear implant recipients typically have poor melody perception but are aided with rhythm and lyrics. Without rhythm or lyrics, only about one-third of implantees can identify common melodies in a closed set better than chance. Correlations have been found between music perception ability and speech understanding in noisy environments. Thus, improving music perception might also provide broader clinical benefit. A number of approaches have been proposed to improve music perception with implant users, including encoding fundamental frequency with modulation, "current-steering," MP3-like processing, and nerve "conditioning." If successful, these approaches could improve the quality of life for implantees by improving communication and musical and environmental awareness.
Collapse
|
Research Support, N.I.H., Extramural |
17 |
172 |
4
|
Abstract
Several dual route models of human speech processing have been proposed suggesting a large-scale anatomical division between cortical regions that support motor-phonological aspects vs. lexical-semantic aspects of speech processing. However, to date, there is no complete agreement on what areas subserve each route or the nature of interactions across these routes that enables human speech processing. Relying on an extensive behavioral and neuroimaging assessment of a large sample of stroke survivors, we used a data-driven approach using principal components analysis of lesion-symptom mapping to identify brain regions crucial for performance on clusters of behavioral tasks without a priori separation into task types. Distinct anatomical boundaries were revealed between a dorsal frontoparietal stream and a ventral temporal-frontal stream associated with separate components. Collapsing over the tasks primarily supported by these streams, we characterize the dorsal stream as a form-to-articulation pathway and the ventral stream as a form-to-meaning pathway. This characterization of the division in the data reflects both the overlap between tasks supported by the two streams as well as the observation that there is a bias for phonological production tasks supported by the dorsal stream and lexical-semantic comprehension tasks supported by the ventral stream. As such, our findings show a division between two processing routes that underlie human speech processing and provide an empirical foundation for studying potential computational differences that distinguish between the two routes.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
120 |
5
|
Jung YH, Hong SK, Wang HS, Han JH, Pham TX, Park H, Kim J, Kang S, Yoo CD, Lee KJ. Flexible Piezoelectric Acoustic Sensors and Machine Learning for Speech Processing. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e1904020. [PMID: 31617274 DOI: 10.1002/adma.201904020] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/28/2019] [Indexed: 05/22/2023]
Abstract
Flexible piezoelectric acoustic sensors have been developed to generate multiple sound signals with high sensitivity, shifting the paradigm of future voice technologies. Speech recognition based on advanced acoustic sensors and optimized machine learning software will play an innovative interface for artificial intelligence (AI) services. Collaboration and novel approaches between both smart sensors and speech algorithms should be attempted to realize a hyperconnected society, which can offer personalized services such as biometric authentication, AI secretaries, and home appliances. Here, representative developments in speech recognition are reviewed in terms of flexible piezoelectric materials, self-powered sensors, machine learning algorithms, and speaker recognition.
Collapse
|
Review |
5 |
103 |
6
|
Neural Speech Tracking in the Theta and in the Delta Frequency Band Differentially Encode Clarity and Comprehension of Speech in Noise. J Neurosci 2019; 39:5750-5759. [PMID: 31109963 PMCID: PMC6636082 DOI: 10.1523/jneurosci.1828-18.2019] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 05/01/2019] [Accepted: 05/11/2019] [Indexed: 11/21/2022] Open
Abstract
Humans excel at understanding speech even in adverse conditions such as background noise. Speech processing may be aided by cortical activity in the delta and theta frequency bands, which have been found to track the speech envelope. However, the rhythm of non-speech sounds is tracked by cortical activity as well. It therefore remains unclear which aspects of neural speech tracking represent the processing of acoustic features, related to the clarity of speech, and which aspects reflect higher-level linguistic processing related to speech comprehension. Here we disambiguate the roles of cortical tracking for speech clarity and comprehension through recording EEG responses to native and foreign language in different levels of background noise, for which clarity and comprehension vary independently. We then use a both a decoding and an encoding approach to relate clarity and comprehension to the neural responses. We find that cortical tracking in the theta frequency band is mainly correlated to clarity, whereas the delta band contributes most to speech comprehension. Moreover, we uncover an early neural component in the delta band that informs on comprehension and that may reflect a predictive mechanism for language processing. Our results disentangle the functional contributions of cortical speech tracking in the delta and theta bands to speech processing. They also show that both speech clarity and comprehension can be accurately decoded from relatively short segments of EEG recordings, which may have applications in future mind-controlled auditory prosthesis. SIGNIFICANCE STATEMENT Speech is a highly complex signal whose processing requires analysis from lower-level acoustic features to higher-level linguistic information. Recent work has shown that neural activity in the delta and theta frequency bands track the rhythm of speech, but the role of this tracking for speech processing remains unclear. Here we disentangle the roles of cortical entrainment in different frequency bands and at different temporal lags for speech clarity, reflecting the acoustics of the signal, and speech comprehension, related to linguistic processing. We show that cortical speech tracking in the theta frequency band encodes mostly speech clarity, and thus acoustic aspects of the signal, whereas speech tracking in the delta band encodes the higher-level speech comprehension.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
6 |
102 |
7
|
VanRullen R, Zoefel B, Ilhan B. On the cyclic nature of perception in vision versus audition. Philos Trans R Soc Lond B Biol Sci 2014; 369:20130214. [PMID: 24639585 PMCID: PMC3965168 DOI: 10.1098/rstb.2013.0214] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Does our perceptual awareness consist of a continuous stream, or a discrete sequence of perceptual cycles, possibly associated with the rhythmic structure of brain activity? This has been a long-standing question in neuroscience. We review recent psychophysical and electrophysiological studies indicating that part of our visual awareness proceeds in approximately 7-13 Hz cycles rather than continuously. On the other hand, experimental attempts at applying similar tools to demonstrate the discreteness of auditory awareness have been largely unsuccessful. We argue and demonstrate experimentally that visual and auditory perception are not equally affected by temporal subsampling of their respective input streams: video sequences remain intelligible at sampling rates of two to three frames per second, whereas audio inputs lose their fine temporal structure, and thus all significance, below 20-30 samples per second. This does not mean, however, that our auditory perception must proceed continuously. Instead, we propose that audition could still involve perceptual cycles, but the periodic sampling should happen only after the stage of auditory feature extraction. In addition, although visual perceptual cycles can follow one another at a spontaneous pace largely independent of the visual input, auditory cycles may need to sample the input stream more flexibly, by adapting to the temporal structure of the auditory inputs.
Collapse
|
Review |
11 |
96 |
8
|
de la Fuente Garcia S, Ritchie CW, Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer's Disease: A Systematic Review. J Alzheimers Dis 2020; 78:1547-1574. [PMID: 33185605 PMCID: PMC7836050 DOI: 10.3233/jad-200888] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
BACKGROUND Language is a valuable source of clinical information in Alzheimer's disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. OBJECTIVE Firstly, to summarize the existing findings on the use of artificial intelligence, speech, and language processing to predict cognitive decline in the context of Alzheimer's disease. Secondly, to detail current research procedures, highlight their limitations, and suggest strategies to address them. METHODS Systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase), and Web of Science. Bibliographies of relevant papers were screened until December 2019. RESULTS From 3,654 search results, 51 articles were selected against the eligibility criteria. Four tables summarize their findings: study details (aim, population, interventions, comparisons, methods, and outcomes), data details (size, type, modalities, annotation, balance, availability, and language of study), methodology (pre-processing, feature generation, machine learning, evaluation, and results), and clinical applicability (research implications, clinical potential, risk of bias, and strengths/limitations). CONCLUSION Promising results are reported across nearly all 51 studies, but very few have been implemented in clinical research or practice. The main limitations of the field are poor standardization, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Active attempts to close these gaps will support translation of future research into clinical practice.
Collapse
|
Systematic Review |
5 |
94 |
9
|
Donhauser PW, Baillet S. Two Distinct Neural Timescales for Predictive Speech Processing. Neuron 2020; 105:385-393.e9. [PMID: 31806493 PMCID: PMC6981026 DOI: 10.1016/j.neuron.2019.10.019] [Citation(s) in RCA: 89] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 08/21/2019] [Accepted: 10/10/2019] [Indexed: 11/29/2022]
Abstract
During speech listening, the brain could use contextual predictions to optimize sensory sampling and processing. We asked if such predictive processing is organized dynamically into separate oscillatory timescales. We trained a neural network that uses context to predict speech at the phoneme level. Using this model, we estimated contextual uncertainty and surprise of natural speech as factors to explain neurophysiological activity in human listeners. We show, first, that speech-related activity is hierarchically organized into two timescales: fast responses (theta: 4-10 Hz), restricted to early auditory regions, and slow responses (delta: 0.5-4 Hz), dominating in downstream auditory regions. Neural activity in these bands is selectively modulated by predictions: the gain of early theta responses varies according to the contextual uncertainty of speech, while later delta responses are selective to surprising speech inputs. We conclude that theta sensory sampling is tuned to maximize expected information gain, while delta encodes only non-redundant information. VIDEO ABSTRACT.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
89 |
10
|
Mild Cognitive Impairment Is Characterized by Deficient Brainstem and Cortical Representations of Speech. J Neurosci 2017; 37:3610-3620. [PMID: 28270574 DOI: 10.1523/jneurosci.3700-16.2017] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 02/21/2017] [Accepted: 02/24/2017] [Indexed: 11/21/2022] Open
Abstract
Mild cognitive impairment (MCI) is recognized as a transitional phase in the progression toward more severe forms of dementia and is an early precursor to Alzheimer's disease. Previous neuroimaging studies reveal that MCI is associated with aberrant sensory-perceptual processing in cortical brain regions subserving auditory and language function. However, whether the pathophysiology of MCI extends to speech processing before conscious awareness (brainstem) is unknown. Using a novel electrophysiological approach, we recorded both brainstem and cortical speech-evoked brain event-related potentials (ERPs) in older, hearing-matched human listeners who did and did not present with subtle cognitive impairment revealed through behavioral neuropsychological testing. We found that MCI was associated with changes in neural speech processing characterized as hypersensitivity (larger) brainstem and cortical speech encoding in MCI compared with controls in the absence of any perceptual speech deficits. Group differences also interacted with age differentially across the auditory pathway; brainstem responses became larger and cortical ERPs smaller with advancing age. Multivariate classification revealed that dual brainstem-cortical speech activity correctly identified MCI listeners with 80% accuracy, suggesting its application as a biomarker of early cognitive decline. Brainstem responses were also a more robust predictor of individuals' MCI severity than cortical activity. Our findings suggest that MCI is associated with poorer encoding and transfer of speech signals between functional levels of the auditory system and advance the pathophysiological understanding of cognitive aging by identifying subcortical deficits in auditory sensory processing mere milliseconds (<10 ms) after sound onset and before the emergence of perceptual speech deficits.SIGNIFICANCE STATEMENT Mild cognitive impairment (MCI) is a precursor to dementia marked by declines in communication skills. Whether MCI pathophysiology extends below cerebral cortex to affect speech processing before conscious awareness (brainstem) is unknown. By recording neuroelectric brain activity to speech from brainstem and cortex, we show that MCI hypersensitizes the normal encoding of speech information across the hearing brain. Deficient neural responses to speech (particularly those generated from the brainstem) predicted the presence of MCI with high accuracy and before behavioral deficits. Our findings advance the neurological understanding of MCI by identifying a subcortical biomarker in auditory-sensory processing before conscious awareness, which may be a precursor to declines in speech understanding.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
67 |
11
|
Abstract
The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep.SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states.
Collapse
|
Research Support, Non-U.S. Gov't |
8 |
63 |
12
|
Lotte F, Brumberg JS, Brunner P, Gunduz A, Ritaccio AL, Guan C, Schalk G. Electrocorticographic representations of segmental features in continuous speech. Front Hum Neurosci 2015; 9:97. [PMID: 25759647 PMCID: PMC4338752 DOI: 10.3389/fnhum.2015.00097] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 02/06/2015] [Indexed: 11/25/2022] Open
Abstract
Acoustic speech output results from coordinated articulation of dozens of muscles, bones and cartilages of the vocal mechanism. While we commonly take the fluency and speed of our speech productions for granted, the neural mechanisms facilitating the requisite muscular control are not completely understood. Previous neuroimaging and electrophysiology studies of speech sensorimotor control has typically concentrated on speech sounds (i.e., phonemes, syllables and words) in isolation; sentence-length investigations have largely been used to inform coincident linguistic processing. In this study, we examined the neural representations of segmental features (place and manner of articulation, and voicing status) in the context of fluent, continuous speech production. We used recordings from the cortical surface [electrocorticography (ECoG)] to simultaneously evaluate the spatial topography and temporal dynamics of the neural correlates of speech articulation that may mediate the generation of hypothesized gestural or articulatory scores. We found that the representation of place of articulation involved broad networks of brain regions during all phases of speech production: preparation, execution and monitoring. In contrast, manner of articulation and voicing status were dominated by auditory cortical responses after speech had been initiated. These results provide a new insight into the articulatory and auditory processes underlying speech production in terms of their motor requirements and acoustic correlates.
Collapse
|
Journal Article |
10 |
53 |
13
|
Lopez-Poveda EA. Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech. Front Neurosci 2014; 8:348. [PMID: 25400543 PMCID: PMC4214224 DOI: 10.3389/fnins.2014.00348] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 10/12/2014] [Indexed: 11/13/2022] Open
Abstract
Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.
Collapse
|
Review |
11 |
52 |
14
|
Irino T, Patterson RD. A Dynamic Compressive Gammachirp Auditory Filterbank. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2006; 14:2222-2232. [PMID: 19330044 PMCID: PMC2661063 DOI: 10.1109/tasl.2006.874669] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It is now common to use knowledge about human auditory processing in the development of audio signal processors. Until recently, however, such systems were limited by their linearity. The auditory filter system is known to be level-dependent as evidenced by psychophysical data on masking, compression, and two-tone suppression. However, there were no analysis/synthesis schemes with nonlinear filterbanks. This paper describe18300060s such a scheme based on the compressive gammachirp (cGC) auditory filter. It was developed to extend the gammatone filter concept to accommodate the changes in psychophysical filter shape that are observed to occur with changes in stimulus level in simultaneous, tone-in-noise masking. In models of simultaneous noise masking, the temporal dynamics of the filtering can be ignored. Analysis/synthesis systems, however, are intended for use with speech sounds where the glottal cycle can be long with respect to auditory time constants, and so they require specification of the temporal dynamics of auditory filter. In this paper, we describe a fast-acting level control circuit for the cGC filter and show how psychophysical data involving two-tone suppression and compression can be used to estimate the parameter values for this dynamic version of the cGC filter (referred to as the "dcGC" filter). One important advantage of analysis/synthesis systems with a dcGC filterbank is that they can inherit previously refined signal processing algorithms developed with conventional short-time Fourier transforms (STFTs) and linear filterbanks.
Collapse
|
research-article |
19 |
50 |
15
|
Schomers MR, Kirilina E, Weigand A, Bajbouj M, Pulvermüller F. Causal Influence of Articulatory Motor Cortex on Comprehending Single Spoken Words: TMS Evidence. Cereb Cortex 2014; 25:3894-902. [PMID: 25452575 PMCID: PMC4585521 DOI: 10.1093/cercor/bhu274] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Classic wisdom had been that motor and premotor cortex contribute to motor execution but not to higher cognition and language comprehension. In contrast, mounting evidence from neuroimaging, patient research, and transcranial magnetic stimulation (TMS) suggest sensorimotor interaction and, specifically, that the articulatory motor cortex is important for classifying meaningless speech sounds into phonemic categories. However, whether these findings speak to the comprehension issue is unclear, because language comprehension does not require explicit phonemic classification and previous results may therefore relate to factors alien to semantic understanding. We here used the standard psycholinguistic test of spoken word comprehension, the word-to-picture-matching task, and concordant TMS to articulatory motor cortex. TMS pulses were applied to primary motor cortex controlling either the lips or the tongue as subjects heard critical word stimuli starting with bilabial lip-related or alveolar tongue-related stop consonants (e.g., “pool” or “tool”). A significant cross-over interaction showed that articulatory motor cortex stimulation delayed comprehension responses for phonologically incongruent words relative to congruous ones (i.e., lip area TMS delayed “tool” relative to “pool” responses). As local TMS to articulatory motor areas differentially delays the comprehension of phonologically incongruous spoken words, we conclude that motor systems can take a causal role in semantic comprehension and, hence, higher cognition.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
46 |
16
|
Ivansic D, Guntinas-Lichius O, Müller B, Volk GF, Schneider G, Dobel C. Impairments of Speech Comprehension in Patients with Tinnitus-A Review. Front Aging Neurosci 2017; 9:224. [PMID: 28744214 PMCID: PMC5504093 DOI: 10.3389/fnagi.2017.00224] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 06/28/2017] [Indexed: 12/30/2022] Open
Abstract
Tinnitus describes the subjective perception of a sound despite the absence of external stimulation. Being a sensory symptom the majority of studies focusses on the auditory pathway. In the recent years, a series of studies suggested a crucial involvement of the limbic system in the manifestation and development of chronic tinnitus. Regarding cognitive symptoms, several reviews addressed the presence of cognitive impairments in tinnitus as well and concluded that attention and memory processes are affected. Despite the importance for social communication and the reliance on a highly functional auditory system, speech comprehension remains a largely neglected field in tinnitus research. This is why we review here the existing literature on speech and language functions in tinnitus patients. Reviewed studies suggest that speech comprehension is impaired in patients with tinnitus, especially in the presence of competing noise. This is even the case in tinnitus patients with normal hearing thresholds. Additionally, speech comprehension measures seem independent of other measures such as tinnitus severity and perceived tinnitus loudness. According to the majority of authors, the speech comprehension difficulties arise as a result of central processes or dysfunctional neuroplasticity.
Collapse
|
Review |
8 |
42 |
17
|
Mishra S, Lunner T, Stenfelt S, Rönnberg J, Rudner M. Seeing the talker's face supports executive processing of speech in steady state noise. Front Syst Neurosci 2013; 7:96. [PMID: 24324411 PMCID: PMC3840300 DOI: 10.3389/fnsys.2013.00096] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 11/09/2013] [Indexed: 11/21/2022] Open
Abstract
Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Collapse
|
Journal Article |
12 |
39 |
18
|
Jochaut D, Lehongre K, Saitovitch A, Devauchelle AD, Olasagasti I, Chabane N, Zilbovicius M, Giraud AL. Atypical coordination of cortical oscillations in response to speech in autism. Front Hum Neurosci 2015; 9:171. [PMID: 25870556 PMCID: PMC4376066 DOI: 10.3389/fnhum.2015.00171] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 03/11/2015] [Indexed: 01/26/2023] Open
Abstract
Subjects with autism often show language difficulties, but it is unclear how they relate to neurophysiological anomalies of cortical speech processing. We used combined EEG and fMRI in 13 subjects with autism and 13 control participants and show that in autism, gamma and theta cortical activity do not engage synergistically in response to speech. Theta activity in left auditory cortex fails to track speech modulations, and to down-regulate gamma oscillations in the group with autism. This deficit predicts the severity of both verbal impairment and autism symptoms in the affected sample. Finally, we found that oscillation-based connectivity between auditory and other language cortices is altered in autism. These results suggest that the verbal disorder in autism could be associated with an altered balance of slow and fast auditory oscillations, and that this anomaly could compromise the mapping between sensory input and higher-level cognitive representations.
Collapse
|
Journal Article |
10 |
39 |
19
|
Tune S, Wöstmann M, Obleser J. Probing the limits of alpha power lateralisation as a neural marker of selective attention in middle-aged and older listeners. Eur J Neurosci 2018; 48:2537-2550. [PMID: 29430736 DOI: 10.1111/ejn.13862] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 12/22/2017] [Accepted: 02/01/2018] [Indexed: 02/05/2023]
Abstract
In recent years, hemispheric lateralisation of alpha power has emerged as a neural mechanism thought to underpin spatial attention across sensory modalities. Yet, how healthy ageing, beginning in middle adulthood, impacts the modulation of lateralised alpha power supporting auditory attention remains poorly understood. In the current electroencephalography study, middle-aged and older adults (N = 29; ~40-70 years) performed a dichotic listening task that simulates a challenging, multitalker scenario. We examined the extent to which the modulation of 8-12 Hz alpha power would serve as neural marker of listening success across age. With respect to the increase in interindividual variability with age, we examined an extensive battery of behavioural, perceptual and neural measures. Similar to findings on younger adults, middle-aged and older listeners' auditory spatial attention induced robust lateralisation of alpha power, which synchronised with the speech rate. Notably, the observed relationship between this alpha lateralisation and task performance did not co-vary with age. Instead, task performance was strongly related to an individual's attentional and working memory capacity. Multivariate analyses revealed a separation of neural and behavioural variables independent of age. Our results suggest that in age-varying samples as the present one, the lateralisation of alpha power is neither a sufficient nor necessary neural strategy for an individual's auditory spatial attention, as higher age might come with increased use of alternative, compensatory mechanisms. Our findings emphasise that explaining interindividual variability will be key to understanding the role of alpha oscillations in auditory attention in the ageing listener.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
37 |
20
|
Har-shai Yahav P, Zion Golumbic E. Linguistic processing of task-irrelevant speech at a cocktail party. eLife 2021; 10:e65096. [PMID: 33942722 PMCID: PMC8163500 DOI: 10.7554/elife.65096] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 04/26/2021] [Indexed: 01/05/2023] Open
Abstract
Paying attention to one speaker in a noisy place can be extremely difficult, because to-be-attended and task-irrelevant speech compete for processing resources. We tested whether this competition is restricted to acoustic-phonetic interference or if it extends to competition for linguistic processing as well. Neural activity was recorded using Magnetoencephalography as human participants were instructed to attend to natural speech presented to one ear, and task-irrelevant stimuli were presented to the other. Task-irrelevant stimuli consisted either of random sequences of syllables, or syllables structured to form coherent sentences, using hierarchical frequency-tagging. We find that the phrasal structure of structured task-irrelevant stimuli was represented in the neural response in left inferior frontal and posterior parietal regions, indicating that selective attention does not fully eliminate linguistic processing of task-irrelevant speech. Additionally, neural tracking of to-be-attended speech in left inferior frontal regions was enhanced when competing with structured task-irrelevant stimuli, suggesting inherent competition between them for linguistic processing.
Collapse
|
research-article |
4 |
33 |
21
|
Porter BA, Rosenthal TR, Ranasinghe KG, Kilgard MP. Discrimination of brief speech sounds is impaired in rats with auditory cortex lesions. Behav Brain Res 2011; 219:68-74. [PMID: 21167211 PMCID: PMC3062672 DOI: 10.1016/j.bbr.2010.12.015] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2010] [Revised: 12/08/2010] [Accepted: 12/08/2010] [Indexed: 11/19/2022]
Abstract
Auditory cortex (AC) lesions impair complex sound discrimination. However, a recent study demonstrated spared performance on an acoustic startle response test of speech discrimination following AC lesions (Floody et al., 2010). The current study reports the effects of AC lesions on two operant speech discrimination tasks. AC lesions caused a modest and quickly recovered impairment in the ability of rats to discriminate consonant-vowel-consonant speech sounds. This result seems to suggest that AC does not play a role in speech discrimination. However, the speech sounds used in both studies differed in many acoustic dimensions and an adaptive change in discrimination strategy could allow the rats to use an acoustic difference that does not require an intact AC to discriminate. Based on our earlier observation that the first 40 ms of the spatiotemporal activity patterns elicited by speech sounds best correlate with behavioral discriminations of these sounds (Engineer et al., 2008), we predicted that eliminating additional cues by truncating speech sounds to the first 40 ms would render the stimuli indistinguishable to a rat with AC lesions. Although the initial discrimination of truncated sounds took longer to learn, the final performance paralleled rats using full-length consonant-vowel-consonant sounds. After 20 days of testing, half of the rats using speech onsets received bilateral AC lesions. Lesions severely impaired speech onset discrimination for at least one-month post lesion. These results support the hypothesis that auditory cortex is required to accurately discriminate the subtle differences between similar consonant and vowel sounds.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
31 |
22
|
Auditory-frontal Channeling in α and β Bands is Altered by Age-related Hearing Loss and Relates to Speech Perception in Noise. Neuroscience 2019; 423:18-28. [PMID: 31705894 DOI: 10.1016/j.neuroscience.2019.10.044] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 09/19/2019] [Accepted: 10/27/2019] [Indexed: 01/16/2023]
Abstract
Difficulty understanding speech-in-noise (SIN) is a pervasive problem faced by older adults particularly those with hearing loss. Previous studies have identified structural and functional changes in the brain that contribute to older adults' speech perception difficulties. Yet, many of these studies use neuroimaging techniques that evaluate only gross activation in isolated brain regions. Neural oscillations may provide further insight into the processes underlying SIN perception as well as the interaction between auditory cortex and prefrontal linguistic brain regions that mediate complex behaviors. We examined frequency-specific neural oscillations and functional connectivity of the EEG in older adults with and without hearing loss during an active SIN perception task. Brain-behavior correlations revealed listeners who were more resistant to the detrimental effects of noise also demonstrated greater modulation of α phase coherence between clean and noise-degraded speech, suggesting α desynchronization reflects release from inhibition and more flexible allocation of neural resources. Additionally, we found top-down β connectivity between prefrontal and auditory cortices strengthened with poorer hearing thresholds despite minimal behavioral differences. This is consistent with the proposal that linguistic brain areas may be recruited to compensate for impoverished auditory inputs through increased top-down predictions to assist SIN perception. Overall, these results emphasize the importance of top-down signaling in low-frequency brain rhythms that help compensate for hearing-related declines and facilitate efficient SIN processing.
Collapse
|
Research Support, Non-U.S. Gov't |
6 |
28 |
23
|
Travis KE, Leonard MK, Chan AM, Torres C, Sizemore ML, Qu Z, Eskandar E, Dale AM, Elman JL, Cash SS, Halgren E. Independence of early speech processing from word meaning. ACTA ACUST UNITED AC 2012; 23:2370-9. [PMID: 22875868 DOI: 10.1093/cercor/bhs228] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We combined magnetoencephalography (MEG) with magnetic resonance imaging and electrocorticography to separate in anatomy and latency 2 fundamental stages underlying speech comprehension. The first acoustic-phonetic stage is selective for words relative to control stimuli individually matched on acoustic properties. It begins ∼60 ms after stimulus onset and is localized to middle superior temporal cortex. It was replicated in another experiment, but is strongly dissociated from the response to tones in the same subjects. Within the same task, semantic priming of the same words by a related picture modulates cortical processing in a broader network, but this does not begin until ∼217 ms. The earlier onset of acoustic-phonetic processing compared with lexico-semantic modulation was significant in each individual subject. The MEG source estimates were confirmed with intracranial local field potential and high gamma power responses acquired in 2 additional subjects performing the same task. These recordings further identified sites within superior temporal cortex that responded only to the acoustic-phonetic contrast at short latencies, or the lexico-semantic at long. The independence of the early acoustic-phonetic response from semantic context suggests a limited role for lexical feedback in early speech perception.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
13 |
28 |
24
|
Thorpe K, Fernald A. Knowing what a novel word is not: Two-year-olds 'listen through' ambiguous adjectives in fluent speech. Cognition 2006; 100:389-433. [PMID: 16125688 PMCID: PMC3214592 DOI: 10.1016/j.cognition.2005.04.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2004] [Accepted: 04/04/2005] [Indexed: 10/25/2022]
Abstract
Three studies investigated how 24-month-olds and adults resolve temporary ambiguity in fluent speech when encountering prenominal adjectives potentially interpretable as nouns. Children were tested in a looking-while-listening procedure to monitor the time course of speech processing. In Experiment 1, the familiar and unfamiliar adjectives preceding familiar target nouns were accented or deaccented. Target word recognition was disrupted only when lexically ambiguous adjectives were accented like nouns. Experiment 2 measured the extent of interference experienced by children when interpreting prenominal words as nouns. In Experiment 3, adults used prosodic cues to identify the form class of adjective/noun homophones in string-identical sentences before the ambiguous words were fully spoken. Results show that children and adults use prosody in conjunction with lexical and distributional cues to 'listen through' prenominal adjectives, avoiding costly misinterpretation.
Collapse
|
Research Support, N.I.H., Extramural |
19 |
24 |
25
|
Worschech F, Marie D, Jünemann K, Sinke C, Krüger THC, Großbach M, Scholz DS, Abdili L, Kliegel M, James CE, Altenmüller E. Improved Speech in Noise Perception in the Elderly After 6 Months of Musical Instruction. Front Neurosci 2021; 15:696240. [PMID: 34305522 PMCID: PMC8299120 DOI: 10.3389/fnins.2021.696240] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/14/2021] [Indexed: 01/19/2023] Open
Abstract
Understanding speech in background noise poses a challenge in daily communication, which is a particular problem among the elderly. Although musical expertise has often been suggested to be a contributor to speech intelligibility, the associations are mostly correlative. In the present multisite study conducted in Germany and Switzerland, 156 healthy, normal-hearing elderly were randomly assigned to either piano playing or music listening/musical culture groups. The speech reception threshold was assessed using the International Matrix Test before and after a 6 month intervention. Bayesian multilevel modeling revealed an improvement of both groups over time under binaural conditions. Additionally, the speech reception threshold of the piano group decreased during stimuli presentation to the left ear. A right ear improvement only occurred in the German piano group. Furthermore, improvements were predominantly found in women. These findings are discussed in the light of current neuroscientific theories on hemispheric lateralization and biological sex differences. The study indicates a positive transfer from musical training to speech processing, probably supported by the enhancement of auditory processing and improvement of general cognitive functions.
Collapse
|
research-article |
4 |
22 |