1
|
Sutojo S, Par S, Schoenmaker E. Contribution of binaural masking release to improved speech intelligibility for different masker types. Eur J Neurosci 2020; 51:1339-1352. [DOI: 10.1111/ejn.13980] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 04/23/2018] [Accepted: 05/18/2018] [Indexed: 11/28/2022]
Affiliation(s)
- Sarinah Sutojo
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| | - Steven Par
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| | - Esther Schoenmaker
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| |
Collapse
|
2
|
Silva DMR, Rothe-Neves R, Melges DB. Long-latency event-related responses to vowels: N1-P2 decomposition by two-step principal component analysis. Int J Psychophysiol 2019; 148:93-102. [PMID: 31863852 DOI: 10.1016/j.ijpsycho.2019.11.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 11/16/2019] [Accepted: 11/18/2019] [Indexed: 11/26/2022]
Abstract
The N1-P2 complex of the auditory event-related potential (ERP) has been used to examine neural activity associated with speech sound perception. Since it is thought to reflect multiple generator processes, its functional significance is difficult to infer. In the present study, a temporospatial principal component analysis (PCA) was used to decompose the N1-P2 response into latent factors underlying covariance patterns in ERP data recorded during passive listening to pairs of successive vowels. In each trial, one of six sounds drawn from an /i/-/e/ vowel continuum was followed either by an identical sound, a different token of the same vowel category, or a token from the other category. Responses were examined as to how they were modulated by within- and across-category vowel differences and by adaptation (repetition suppression) effects. Five PCA factors were identified as corresponding to three well-known N1 subcomponents and two P2 subcomponents. Results added evidence that the N1 peak reflects both generators that are sensitive to spectral information and generators that are not. For later latency ranges, different patterns of sensitivity to vowel quality were found, including category-related effects. Particularly, a subcomponent identified as the Tb wave showed release from adaptation in response to an /i/ followed by an /e/ sound. A P2 subcomponent varied linearly with spectral shape along the vowel continuum, while the other was stronger the closer the vowel was to the category boundary, suggesting separate processing of continuous and category-related information. Thus, the PCA-based decomposition of the N1-P2 complex was functionally meaningful, revealing distinct underlying processes at work during speech sound perception.
Collapse
Affiliation(s)
- Daniel M R Silva
- Phonetics Lab, Faculty of Letters, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Rui Rothe-Neves
- Phonetics Lab, Faculty of Letters, Federal University of Minas Gerais, Belo Horizonte, Brazil.
| | - Danilo B Melges
- Graduate Program in Electrical Engineering, Department of Electrical Engineering, Federal University of Minas Gerais
| |
Collapse
|
3
|
Fisher JM, Dick FK, Levy DF, Wilson SM. Neural representation of vowel formants in tonotopic auditory cortex. Neuroimage 2018; 178:574-582. [PMID: 29860083 DOI: 10.1016/j.neuroimage.2018.05.072] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/25/2022] Open
Abstract
Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.
Collapse
Affiliation(s)
- Julia M Fisher
- Department of Linguistics, University of Arizona, Tucson, AZ, USA; Statistics Consulting Laboratory, BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Frederic K Dick
- Department of Psychological Sciences, Birkbeck College, University of London, UK; Birkbeck-UCL Center for Neuroimaging, London, UK; Department of Experimental Psychology, University College London, UK
| | - Deborah F Levy
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephen M Wilson
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
4
|
Silva DMR, Melges DB, Rothe-Neves R. N1 response attenuation and the mismatch negativity (MMN) to within- and across-category phonetic contrasts. Psychophysiology 2017; 54:591-600. [PMID: 28169421 DOI: 10.1111/psyp.12824] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Accepted: 12/07/2016] [Indexed: 11/29/2022]
Abstract
According to the neural adaptation model of the mismatch negativity (MMN), the sensitivity of this event-related response to both acoustic and categorical information in speech sounds can be accounted for by assuming that (a) the degree of overlapping between neural representations of two sounds depends on both the acoustic difference between them and whether or not they belong to distinct phonetic categories, and (b) a release from stimulus-specific adaptation causes an enhanced N1 obligatory response to infrequent deviant stimuli. On the basis of this view, we tested in Experiment 1 whether the N1 response to the second sound of a pair (S2 ) would be more attenuated in pairs of identical vowels compared with pairs of different vowels, and in pairs of exemplars of the same vowel category compared with pairs of exemplars of different categories. The psychoacoustic distance between S1 and S2 was the same for all within-category and across-category pairs. While N1 amplitudes decreased markedly from S1 to S2 , responses to S2 were quite similar across pair types, indicating that the attenuation effect in such conditions is not stimulus specific. In Experiment 2, a pronounced MMN was elicited by a deviant vowel sound in an across-category oddball sequence, but not when the exact same deviant vowel was presented in a within-category oddball sequence. This adds evidence that MMN reflects categorical phonetic processing. Taken together, the results suggest that different neural processes underlie the attenuation of the N1 response to S2 and the MMN to vowels.
Collapse
Affiliation(s)
- Daniel M R Silva
- Graduate Program in Neuroscience, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Danilo B Melges
- Graduate Program in Electrical Engineering, Department of Electrical Engineering, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Rui Rothe-Neves
- Phonetics Lab, Faculty of Letters, Federal University of Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
5
|
Manca AD, Grimaldi M. Vowels and Consonants in the Brain: Evidence from Magnetoencephalographic Studies on the N1m in Normal-Hearing Listeners. Front Psychol 2016; 7:1413. [PMID: 27713712 PMCID: PMC5031792 DOI: 10.3389/fpsyg.2016.01413] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 09/05/2016] [Indexed: 01/07/2023] Open
Abstract
Speech sound perception is one of the most fascinating tasks performed by the human brain. It involves a mapping from continuous acoustic waveforms onto the discrete phonological units computed to store words in the mental lexicon. In this article, we review the magnetoencephalographic studies that have explored the timing and morphology of the N1m component to investigate how vowels and consonants are computed and represented within the auditory cortex. The neurons that are involved in the N1m act to construct a sensory memory of the stimulus due to spatially and temporally distributed activation patterns within the auditory cortex. Indeed, localization of auditory fields maps in animals and humans suggested two levels of sound coding, a tonotopy dimension for spectral properties and a tonochrony dimension for temporal properties of sounds. When the stimulus is a complex speech sound, tonotopy and tonochrony data may give important information to assess whether the speech sound parsing and decoding are generated by pure bottom-up reflection of acoustic differences or whether they are additionally affected by top-down processes related to phonological categories. Hints supporting pure bottom-up processing coexist with hints supporting top-down abstract phoneme representation. Actually, N1m data (amplitude, latency, source generators, and hemispheric distribution) are limited and do not help to disentangle the issue. The nature of these limitations is discussed. Moreover, neurophysiological studies on animals and neuroimaging studies on humans have been taken into consideration. We compare also the N1m findings with the investigation of the magnetic mismatch negativity (MMNm) component and with the analogous electrical components, the N1 and the MMN. We conclude that N1 seems more sensitive to capture lateralization and hierarchical processes than N1m, although the data are very preliminary. Finally, we suggest that MEG data should be integrated with EEG data in the light of the neural oscillations framework and we propose some concerns that should be addressed by future investigations if we want to closely line up language research with issues at the core of the functional brain mechanisms.
Collapse
Affiliation(s)
- Anna Dora Manca
- Dipartimento di Studi Umanistici, Centro di Ricerca Interdisciplinare sul Linguaggio, University of SalentoLecce, Italy; Laboratorio Diffuso di Ricerca Interdisciplinare Applicata alla MedicinaLecce, Italy
| | - Mirko Grimaldi
- Dipartimento di Studi Umanistici, Centro di Ricerca Interdisciplinare sul Linguaggio, University of SalentoLecce, Italy; Laboratorio Diffuso di Ricerca Interdisciplinare Applicata alla MedicinaLecce, Italy
| |
Collapse
|
6
|
Nakamura I, Hirano Y, Ohara N, Hirano S, Ueno T, Tsuchimoto R, Kanba S, Onitsuka T. Early integration processing between faces and vowel sounds in human brain: an MEG investigation. Neuropsychobiology 2016; 71:187-95. [PMID: 26044647 DOI: 10.1159/000377680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 02/03/2015] [Indexed: 11/19/2022]
Abstract
OBJECTIVE Unconscious fast integration of face and voice information is a crucial brain function necessary for communicating effectively with others. Here, we investigated for evidence of rapid face-voice integration in the auditory cortex. METHODS Magnetic fields (P50m and N100m) evoked by visual stimuli (V), auditory stimuli (A) and audiovisual stimuli (VA), i.e. by face, vowel and simultaneous vowel-face stimuli, were recorded in 22 healthy subjects. Magnetoencephalographic data from 28 channels around bilateral auditory cortices were analyzed. RESULTS In both hemispheres, AV - V showed significantly larger P50m amplitudes than A. Additionally, compared with A, the N100m amplitudes and dipole moments of AV - V were significantly smaller in the left hemisphere, but not in the right hemisphere. CONCLUSIONS Differential changes in P50m (bilateral) and N100m (left hemisphere) that occur when V (faces) are associated with A (vowel sounds) indicate that AV (face-voice) integration occurs in early processing, likely enabling us to communicate effectively in our lives.
Collapse
Affiliation(s)
- Itta Nakamura
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Zhang C, Pugh KR, Mencl WE, Molfese PJ, Frost SJ, Magnuson JS, Peng G, Wang WSY. Functionally integrated neural processing of linguistic and talker information: An event-related fMRI and ERP study. Neuroimage 2015; 124:536-549. [PMID: 26343322 DOI: 10.1016/j.neuroimage.2015.08.064] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Revised: 08/15/2015] [Accepted: 08/28/2015] [Indexed: 11/16/2022] Open
Abstract
Speech signals contain information of both linguistic content and a talker's voice. Conventionally, linguistic and talker processing are thought to be mediated by distinct neural systems in the left and right hemispheres respectively, but there is growing evidence that linguistic and talker processing interact in many ways. Previous studies suggest that talker-related vocal tract changes are processed integrally with phonetic changes in the bilateral posterior superior temporal gyrus/superior temporal sulcus (STG/STS), because the vocal tract parameter influences the perception of phonetic information. It is yet unclear whether the bilateral STG is also activated by the integral processing of another parameter - pitch, which influences the perception of lexical tone information and is related to talker differences in tone languages. In this study, we conducted separate functional magnetic resonance imaging (fMRI) and event-related potential (ERP) experiments to examine the spatial and temporal loci of interactions of lexical tone and talker-related pitch processing in Cantonese. We found that the STG was activated bilaterally during the processing of talker changes when listeners attended to lexical tone changes in the stimuli and during the processing of lexical tone changes when listeners attended to talker changes, suggesting that lexical tone and talker processing are functionally integrated in the bilateral STG. It extends the previous study, providing evidence for a general neural mechanism of integral phonetic and talker processing in the bilateral STG. The ERP results show interactions of lexical tone and talker processing 500-800ms after auditory word onset (a simultaneous posterior P3b and a frontal negativity). Moreover, there is some asymmetry in the interaction, such that unattended talker changes affect linguistic processing more than vice versa, which may be related to the ambiguity that talker changes cause in speech perception and/or attention bias to talker changes. Our findings have implications for understanding the neural encoding of linguistic and talker information.
Collapse
Affiliation(s)
- Caicai Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China; Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Kenneth R Pugh
- Haskins Laboratories, New Haven, CT, USA; Department of Psychology, University of Connecticut, Storrs, CT, USA; Department of Linguistics, Yale University, New Haven, CT, USA
| | - W Einar Mencl
- Haskins Laboratories, New Haven, CT, USA; Department of Linguistics, Yale University, New Haven, CT, USA
| | - Peter J Molfese
- Haskins Laboratories, New Haven, CT, USA; Department of Psychology, University of Connecticut, Storrs, CT, USA
| | | | - James S Magnuson
- Haskins Laboratories, New Haven, CT, USA; Department of Psychology, University of Connecticut, Storrs, CT, USA
| | - Gang Peng
- Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; CUHK-PKU-UST Joint Research Centre for Language and Human Complexity, The Chinese University of Hong Kong, Hong Kong, China; Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China.
| | - William S-Y Wang
- Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; CUHK-PKU-UST Joint Research Centre for Language and Human Complexity, The Chinese University of Hong Kong, Hong Kong, China; Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China; Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
8
|
Takanen M, Raitio T, Santala O, Alku P, Pulkki V. Fusion of spatially separated vowel formant cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:4508. [PMID: 25669261 DOI: 10.1121/1.4826181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Previous studies on fusion in speech perception have demonstrated the ability of the human auditory system to group separate components of speech-like sounds together and consequently to enable the identification of speech despite the spatial separation between the components. Typically, the spatial separation has been implemented using headphone reproduction where the different components evoke auditory images at different lateral positions. In the present study, a multichannel loudspeaker system was used to investigate whether the correct vowel is identified and whether two auditory events are perceived when a noise-excited vowel is divided into two components that are spatially separated. The two components consisted of the even and odd formants. Both the amount of spatial separation between the components and the directions of the components were varied. Neither the spatial separation nor the directions of the components affected the vowel identification. Interestingly, an additional auditory event not associated with any vowel was perceived at the same time when the components were presented symmetrically in front of the listener. In such scenarios, the vowel was perceived from the direction of the odd formant components.
Collapse
Affiliation(s)
- Marko Takanen
- Department of Signal Processing and Acoustics, Aalto University School of Electrical Engineering, P.O. Box 13000, FI-00076 Aalto, Finland
| | - Tuomo Raitio
- Department of Signal Processing and Acoustics, Aalto University School of Electrical Engineering, P.O. Box 13000, FI-00076 Aalto, Finland
| | - Olli Santala
- Department of Signal Processing and Acoustics, Aalto University School of Electrical Engineering, P.O. Box 13000, FI-00076 Aalto, Finland
| | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University School of Electrical Engineering, P.O. Box 13000, FI-00076 Aalto, Finland
| | - Ville Pulkki
- Department of Signal Processing and Acoustics, Aalto University School of Electrical Engineering, P.O. Box 13000, FI-00076 Aalto, Finland
| |
Collapse
|
9
|
Tuomainen J, Savela J, Obleser J, Aaltonen O. Attention modulates the use of spectral attributes in vowel discrimination: behavioral and event-related potential evidence. Brain Res 2012; 1490:170-83. [PMID: 23174416 DOI: 10.1016/j.brainres.2012.10.067] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 10/26/2012] [Accepted: 10/31/2012] [Indexed: 10/27/2022]
Abstract
Speech contains a variety of acoustic cues to auditory and phonetic contrasts that are exploited by the listener in decoding the acoustic signal. In three experiments, we tried to elucidate whether listeners rely on formant peak frequencies or whole spectrum attributes in vowel discrimination. We created two vowel continua in which the acoustic distance in formant frequencies was constant but the continua differed in spectral moments (i.e., the whole spectrum modeled as a probability density function). In Experiment 1, we measured reaction times and response accuracy while listeners performed a go/no-go discrimination task. The results indicated that the performance of the listeners was based on the spectral moments (especially the first and second moments), and not on formant peaks. Behavioral results in Experiment 2 showed that, when the stimuli were presented in noise eliminating differences in spectral moments between the two continua, listeners employed formant peak frequencies. In Experiment 3, using the same listeners and stimuli as in Experiment 1, we measured an automatic brain potential, the mismatch negativity (MMN), when listeners did not attend to the auditory stimuli. Results showed that the MMN reflects sensitivity only to the formant structure of the vowels. We suggest that the auditory cortex automatically and pre-attentively encodes formant peak frequencies, whereas attention can be deployed for processing additional spectral information, such as spectral moments, to enhance vowel discrimination.
Collapse
Affiliation(s)
- J Tuomainen
- Department of Speech, Hearing and Phonetic Sciences, University College London, UK.
| | | | | | | |
Collapse
|
10
|
Scharinger M, Monahan PJ, Idsardi WJ. Asymmetries in the processing of vowel height. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:903-918. [PMID: 22232394 DOI: 10.1044/1092-4388(2011/11-0065)] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE Speech perception can be described as the transformation of continuous acoustic information into discrete memory representations. Therefore, research on neural representations of speech sounds is particularly important for a better understanding of this transformation. Speech perception models make specific assumptions regarding the representation of mid vowels (e.g., [ε]) that are articulated with a neutral position in regard to height. One hypothesis is that their representation is less specific than the representation of vowels with a more specific position (e.g., [æ]). METHOD In a magnetoencephalography study, we tested the underspecification of mid vowel in American English. Using a mismatch negativity (MMN) paradigm, mid and low lax vowels ([ε]/[æ]), and high and low lax vowels ([i]/[æ]), were opposed, and M100/N1 dipole source parameters as well as MMN latency and amplitude were examined. RESULTS Larger MMNs occurred when the mid vowel [ε] was a deviant to the standard [æ], a result consistent with less specific representations for mid vowels. MMNs of equal magnitude were elicited in the high-low comparison, consistent with more specific representations for both high and low vowels. M100 dipole locations support early vowel categorization on the basis of linguistically relevant acoustic-phonetic features. CONCLUSION We take our results to reflect an abstract long-term representation of vowels that do not include redundant specifications at very early stages of processing the speech signal. Moreover, the dipole locations indicate extraction of distinctive features and their mapping onto representationally faithful cortical locations (i.e., a feature map).
Collapse
|
11
|
Sjerps MJ, Mitterer H, McQueen JM. Listening to different speakers: On the time-course of perceptual compensation for vocal-tract characteristics. Neuropsychologia 2011; 49:3831-46. [DOI: 10.1016/j.neuropsychologia.2011.09.044] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Revised: 09/22/2011] [Accepted: 09/27/2011] [Indexed: 11/26/2022]
|
12
|
Scharinger M, Idsardi WJ, Poe S. A comprehensive three-dimensional cortical map of vowel space. J Cogn Neurosci 2011; 23:3972-82. [PMID: 21568638 DOI: 10.1162/jocn_a_00056] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Mammalian cortex is known to contain various kinds of spatial encoding schemes for sensory information including retinotopic, somatosensory, and tonotopic maps. Tonotopic maps are especially interesting for human speech sound processing because they encode linguistically salient acoustic properties. In this study, we mapped the entire vowel space of a language (Turkish) onto cortical locations by using the magnetic N1 (M100), an auditory-evoked component that peaks approximately 100 msec after auditory stimulus onset. We found that dipole locations could be structured into two distinct maps, one for vowels produced with the tongue positioned toward the front of the mouth (front vowels) and one for vowels produced in the back of the mouth (back vowels). Furthermore, we found spatial gradients in lateral-medial, anterior-posterior, and inferior-superior dimensions that encoded the phonetic, categorical distinctions between all the vowels of Turkish. Statistical model comparisons of the dipole locations suggest that the spatial encoding scheme is not entirely based on acoustic bottom-up information but crucially involves featural-phonetic top-down modulation. Thus, multiple areas of excitation along the unidimensional basilar membrane are mapped into higher dimensional representations in auditory cortex.
Collapse
|
13
|
Scharinger M, Merickel J, Riley J, Idsardi WJ. Neuromagnetic evidence for a featural distinction of English consonants: sensor- and source-space data. BRAIN AND LANGUAGE 2011; 116:71-82. [PMID: 21185073 PMCID: PMC3031676 DOI: 10.1016/j.bandl.2010.11.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 10/31/2010] [Accepted: 11/15/2010] [Indexed: 05/30/2023]
Abstract
Speech sounds can be classified on the basis of their underlying articulators or on the basis of the acoustic characteristics resulting from particular articulatory positions. Research in speech perception suggests that distinctive features are based on both articulatory and acoustic information. In recent years, neuroelectric and neuromagnetic investigations provided evidence for the brain's early sensitivity to distinctive features and their acoustic consequences, particularly for place of articulation distinctions. Here, we compare English consonants in a Mismatch Field design across two broad and distinct places of articulation - labial and coronal - and provide further evidence that early evoked auditory responses are sensitive to these features. We further add to the findings of asymmetric consonant processing, although we do not find support for coronal underspecification. Labial glides (Experiment 1) and fricatives (Experiment 2) elicited larger Mismatch responses than their coronal counterparts. Interestingly, their M100 dipoles differed along the anterior/posterior dimension in the auditory cortex that has previously been found to spatially reflect place of articulation differences. Our results are discussed with respect to acoustic and articulatory bases of featural speech sound classifications and with respect to a model that maps distinctive phonetic features onto long-term representations of speech sounds.
Collapse
Affiliation(s)
- Mathias Scharinger
- Department of Linguistics, University of Maryland, College Park, MD 20742-7505, USA.
| | | | | | | |
Collapse
|
14
|
Effects of various articulatory features of speech on cortical event-related potentials and behavioral measures of speech-sound processing. Ear Hear 2010; 31:491-504. [PMID: 20453651 DOI: 10.1097/aud.0b013e3181d8683d] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To investigate the effects of three articulatory features of speech (i.e., vowel-space contrast, place of articulation of stop consonants, and voiced/voiceless distinctions) on cortical event-related potentials (ERPs) (waves N1, mismatch negativity, N2b, and P3b) and their related behavioral measures of discrimination (d-prime sensitivity and reaction time [RT]) in normal-hearing adults to increase our knowledge regarding how the brain responds to acoustical differences that occur within an articulatory speech feature and across articulatory features of speech. DESIGN Cortical ERPs were recorded to three sets of consonant-vowel speech stimuli (/bi versus /bu/, /ba/ versus /da/, /da/ versus /ta/) presented at 65 and 80 dB peak-to-peak equivalent SPL from 20 normal-hearing adults. All speech stimuli were presented in an oddball paradigm. Cortical ERPs were recorded from 10 individuals in the active-listening condition and another 10 individuals in the passive-listening condition. All listeners were tested at both stimulus intensities. RESULTS Mean amplitudes for all ERP components were considerably larger for the responses to the vowel contrast in comparison with the responses to the two consonant contrasts. Similarly, the mean mismatch negativity, P3b, and RT latencies were significantly shorter for the responses to the vowel versus consonant contrasts. For the majority of ERP components, only small nonsignificant differences occurred in either the ERP amplitude or the latency response measurements for stimuli within a particular articulatory feature of speech. CONCLUSIONS The larger response amplitudes and earlier latencies for the cortical ERPs to the vowel versus consonant stimuli are likely related, in part, to the large spectral differences present in these speech contrasts. The measurements of response strength (amplitudes and d-prime scores) and response timing (ERP and RT latencies) for the various cortical ERPs suggest that the brain may have an easier task processing the steady state information present in the vowel stimuli in comparison with the rapidly changing formant transitions in the consonant stimuli.
Collapse
|
15
|
Yrttiaho S, Tiitinen H, Alku P, Miettinen I, May PJC. Temporal integration of vowel periodicity in the auditory cortex. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:224-234. [PMID: 20649218 DOI: 10.1121/1.3397622] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Cortical sensitivity to the periodicity of speech sounds has been evidenced by larger, more anterior responses to periodic than to aperiodic vowels in several non-invasive studies of the human brain. The current study investigated the temporal integration underlying the cortical sensitivity to speech periodicity by studying the increase in periodicity-specific cortical activation with growing stimulus duration. Periodicity-specific activation was estimated from magnetoencephalography as the differences between the N1m responses elicited by periodic and aperiodic vowel stimuli. The duration of the vowel stimuli with a fundamental frequency (F0=106 Hz) representative of typical male speech was varied in units corresponding to the vowel fundamental period (9.4 ms) and ranged from one to ten units. Cortical sensitivity to speech periodicity, as reflected by larger and more anterior responses to periodic than to aperiodic stimuli, was observed when stimulus duration was 3 cycles or more. Further, for stimulus durations of 5 cycles and above, response latency was shorter for the periodic than for the aperiodic stimuli. Together the current results define a temporal window of integration for the periodicity of speech sounds in the F0 range of typical male speech. The length of this window is 3-5 cycles, or 30-50 ms.
Collapse
Affiliation(s)
- Santeri Yrttiaho
- Department of Signal Processing and Acoustics, Aalto University School of Science and Technology, PO Box 13000, Aalto FI-00076, Finland.
| | | | | | | | | |
Collapse
|
16
|
Monahan PJ, Idsardi WJ. Auditory Sensitivity to Formant Ratios:Toward an Account of Vowel Normalization. LANGUAGE AND COGNITIVE PROCESSES 2010; 25:808-839. [PMID: 20606713 PMCID: PMC2893733 DOI: 10.1080/01690965.2010.490047] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
A long-standing question in speech perception research is how do listeners extract linguistic content from a highly variable acoustic input. In the domain of vowel perception, formant ratios, or the calculation of relative bark differences between vowel formants, have been a sporadically proposed solution. We propose a novel formant ratio algorithm in which the first (F1) and second (F2) formants are compared against the third formant (F3). Results from two magnetoencephelographic (MEG) experiments are presented that suggest auditory cortex is sensitive to formant ratios. Our findings also demonstrate that the perceptual system shows heightened sensitivity to formant ratios for tokens located in more crowded regions of the vowel space. Additionally, we present statistical evidence that this algorithm eliminates speaker-dependent variation based on age and gender from vowel productions. We conclude that these results present an impetus to reconsider formant ratios as a legitimate mechanistic component in the solution to the problem of speaker normalization.
Collapse
Affiliation(s)
- Philip J. Monahan
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - William J. Idsardi
- Department of Linguistics, University of Maryland, USA
- Neuroscience and Cognitive Science Program University of Maryland, USA
| |
Collapse
|
17
|
The effects of cortical ischemic stroke on auditory processing in humans as indexed by transient brain responses. Clin Neurophysiol 2010; 121:912-20. [DOI: 10.1016/j.clinph.2010.03.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Revised: 03/01/2010] [Accepted: 03/04/2010] [Indexed: 11/19/2022]
|
18
|
Miettinen I, Tiitinen H, Alku P, May PJC. Sensitivity of the human auditory cortex to acoustic degradation of speech and non-speech sounds. BMC Neurosci 2010; 11:24. [PMID: 20175890 PMCID: PMC2837048 DOI: 10.1186/1471-2202-11-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 02/22/2010] [Indexed: 12/04/2022] Open
Abstract
Background Recent studies have shown that the human right-hemispheric auditory cortex is particularly sensitive to reduction in sound quality, with an increase in distortion resulting in an amplification of the auditory N1m response measured in the magnetoencephalography (MEG). Here, we examined whether this sensitivity is specific to the processing of acoustic properties of speech or whether it can be observed also in the processing of sounds with a simple spectral structure. We degraded speech stimuli (vowel /a/), complex non-speech stimuli (a composite of five sinusoidals), and sinusoidal tones by decreasing the amplitude resolution of the signal waveform. The amplitude resolution was impoverished by reducing the number of bits to represent the signal samples. Auditory evoked magnetic fields (AEFs) were measured in the left and right hemisphere of sixteen healthy subjects. Results We found that the AEF amplitudes increased significantly with stimulus distortion for all stimulus types, which indicates that the right-hemispheric N1m sensitivity is not related exclusively to degradation of acoustic properties of speech. In addition, the P1m and P2m responses were amplified with increasing distortion similarly in both hemispheres. The AEF latencies were not systematically affected by the distortion. Conclusions We propose that the increased activity of AEFs reflects cortical processing of acoustic properties common to both speech and non-speech stimuli. More specifically, the enhancement is most likely caused by spectral changes brought about by the decrease of amplitude resolution, in particular the introduction of periodic, signal-dependent distortion to the original sound. Converging evidence suggests that the observed AEF amplification could reflect cortical sensitivity to periodic sounds.
Collapse
Affiliation(s)
- Ismo Miettinen
- Department of Biomedical Engineering and Computational Science, Aalto University School of Science and Technology, Espoo, Finland.
| | | | | | | |
Collapse
|
19
|
May PJC, Tiitinen H. Mismatch negativity (MMN), the deviance-elicited auditory deflection, explained. Psychophysiology 2010; 47:66-122. [DOI: 10.1111/j.1469-8986.2009.00856.x] [Citation(s) in RCA: 374] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
20
|
Abstract
OBJECTIVE To evaluate the response of the human auditory cortex to the temporal amplitude-envelope of speech. Responses to the speech envelope could be useful for validating the neural encoding of intelligible speech, particularly during hearing aid fittings--because hearing aid gain and compression characteristics for ongoing speech should more closely resemble real world performance than for isolated brief syllables. DESIGN The speech envelope comprises energy changes corresponding to phonemic and syllabic transitions. Envelope frequencies between 2 and 20 Hz are important for speech intelligibility. Human event-related potentials were recorded to six different sentences and the sources of these potentials in the auditory cortex were determined. To improve the signal to noise ratio over ongoing electroencephalographic recordings, we averaged the responses over multiple presentations, and derived source waveforms from multichannel scalp recordings. Source analysis led to bilateral, symmetrical, vertical, and horizontal dipoles in the posterior auditory cortices. The source waveforms were then cross-correlated with the low frequency log-envelopes of the sentences. The significance and latency of the maximum correlation for each sentence demonstrated the presence and latency of the brain's response. The source waveforms were also cross-correlated with a simple model based on a series of overlapping transient responses to stimulus change (the derivative of the log-envelope). RESULTS Correlations between the log-envelope and vertical dipole source waveforms were significant for all sentences and for all but one of the participants (mean r = 0.35), at an average delay of 175 (left) to 180 (right) msec. Correlations between the transient response model (P1 at 68 msec, N1 at 124 msec, and P2 at 208 msec) and the vertical dipole source waveforms were detected for all sentences and all participants (mean r = 0.30), at an average delay of 6 (right) to 10 (left) msec. CONCLUSIONS These results show that the human auditory cortex either directly follows the speech envelope or consistently reacts to changes in this envelope. The delay between the envelope and the response is approximately 180 msec.
Collapse
|
21
|
Tavabi K, Obleser J, Dobel C, Pantev C. Auditory evoked fields differentially encode speech features: an MEG investigation of the P50m and N100m time courses during syllable processing. Eur J Neurosci 2007; 25:3155-62. [PMID: 17561829 DOI: 10.1111/j.1460-9568.2007.05572.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The functional organization of speech sound processing in the human brain and its unfolding over time are still not well understood. While the N100/N100m is a comparatively well-studied, and quite late, component of the auditory evoked field elicited by speech, earlier processes such as those reflected in the P50m remain to be resolved. Using magnetoencephalography, the present study follows up on previous reports of N100m-centred spatiotemporal encoding of phonological features and coarticulatory processes in the auditory cortex during consonant-vowel syllable perception. Our results indicate that the time course and response strength of the P50m and N100m components of evoked magnetic fields are differentially influenced by mutually exclusive place-of-articulation features of a syllable's stop consonant and vowel segments. Topographical differences in P50m generators were driven by place contrasts between consonants in syllables, with spatial gradients orthogonal to the ones previously reported for N100m. Peak latency results replicated previous findings for the N100m and revealed a reverse pattern for the earlier P50m (shorter latencies depending on the presence of a back vowel [o]). Our findings allow attribution of a role in basic feature extraction to the comparatively early P50m time window. Moreover, the observations substantiate the assumption that the N100m response reflects a more abstract phonological representational stage during speech perception.
Collapse
Affiliation(s)
- Kambiz Tavabi
- Institute for Biomagnetism and Biosignalanalysis, University of Münster, Germany.
| | | | | | | |
Collapse
|
22
|
Abstract
Recently, Haenschel et al. (2005) suggested that a positive event-related potential wave called 'repetition positivity' (RP) is a direct neural correlate of the formation of sensory memory traces. We investigated whether RP is elicited by familiar vowels that have previously been suggested to form accurate memory representations faster than unfamiliar sounds. No evidence for RP elicitation was found, however, even though memory representations for the vowels were formed. Instead of finding an increasing positive response along with stimulus repetition predicted by the RP hypothesis, we found that N1 and sustained potential were enhanced in amplitude for the familiar vowels as compared with unfamiliar sounds, indicating stronger activation at the auditory cortex.
Collapse
Affiliation(s)
- Sari Ylinen
- Cognitive Brain Research Unit, Department of Psychology, University of Helsinki, Finland.
| | | |
Collapse
|
23
|
Liikkanen LA, Tiitinen H, Alku P, Leino S, Yrttiaho S, May PJC. The right-hemispheric auditory cortex in humans is sensitive to degraded speech sounds. Neuroreport 2007; 18:601-5. [PMID: 17413665 DOI: 10.1097/wnr.0b013e3280b07bde] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We investigated how degraded speech sounds activate the auditory cortices of the left and right hemisphere. To degrade the stimuli, we introduce uniform scalar quantization, a controlled and replicable manipulation, not used before, in cognitive neuroscience. Three Finnish vowels (/a/, /e/ and /u/) were used as stimuli for 10 participants in magnetoencephalography registrations. Compared with the original vowel sounds, the degraded sounds increased the amplitude of the right-hemispheric N1m without affecting the latency whereas the amplitude and latency of the N1m in the left hemisphere remained unaffected. Although the participants were able to identify the stimuli correctly, the increased degradation led to increased reaction times which correlated positively with the N1m amplitude. Thus, the auditory cortex of right hemisphere might be particularly involved in processing degraded speech and possibly compensates for the poor signal quality by increasing its activity.
Collapse
Affiliation(s)
- Lassi A Liikkanen
- Apperception and Cortical Dynamics (ACD), Cognitive Science Unit, Department of Psychology, University of Helsinki, Finland
| | | | | | | | | | | |
Collapse
|
24
|
Kuriki S, Ohta K, Koyama S. Persistent responsiveness of long-latency auditory cortical activities in response to repeated stimuli of musical timbre and vowel sounds. ACTA ACUST UNITED AC 2007; 17:2725-32. [PMID: 17289776 DOI: 10.1093/cercor/bhl182] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Long-latency auditory-evoked magnetic field and potential show strong attenuation of N1m/N1 responses when an identical stimulus is presented repeatedly due to adaptation of auditory cortical neurons. This adaptation is weak in subsequently occurring P2m/P2 responses, being weaker for piano chords than single piano notes. The adaptation of P2m is more suppressed in musicians having long-term musical training than in nonmusicians, whereas the amplitude of P2 is enhanced preferentially in musicians as the spectral complexity of musical tones increases. To address the key issues of whether such high responsiveness of P2m/P2 responses to complex sounds is intrinsic and common to nonmusical sounds, we conducted a magnetoencephalographic study on participants who had no experience of musical training, using consecutive trains of piano and vowel sounds. The dipole moment of the P2m sources located in the auditory cortex indicated significantly suppressed adaptation in the right hemisphere both to piano and vowel sounds. Thus, the persistent responsiveness of the P2m activity may be inherent, not induced by intensive training, and common to spectrally complex sounds. The right hemisphere dominance of the responsiveness to musical and speech sounds suggests analysis of acoustic features of object sounds to be a significant function of P2m activity.
Collapse
Affiliation(s)
- Shinya Kuriki
- Research Institute for Electronic Science, Hokkaido University, Sapporo, Japan
| | | | | |
Collapse
|
25
|
Kaganovich N, Francis AL, Melara RD. Electrophysiological evidence for early interaction between talker and linguistic information during speech perception. Brain Res 2006; 1114:161-72. [PMID: 16920083 DOI: 10.1016/j.brainres.2006.07.049] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2006] [Revised: 07/12/2006] [Accepted: 07/14/2006] [Indexed: 10/24/2022]
Abstract
This study combined behavioral and electrophysiological measurements to investigate interactions during speech perception between native phonemes and talker's voice. In a Garner selective attention task, participants either classified each sound as one of two native vowels ([epsilon] and [ae]), ignoring the talker, or as one of two male talkers, ignoring the vowel. The dimension to be ignored was held constant in baseline tasks and changed randomly across trials in filtering tasks. Irrelevant variation in talker produced as much filtering interference (i.e., poorer performance in filtering relative to baseline) in classifying vowels as vice versa, suggesting that the two dimensions strongly interact. Event-related potentials (ERPs) were recorded to identify the processing origin of the interference: an early disruption in extracting dimension-specific information or a later disruption in selecting appropriate responses. Processing in the filtering task was characterized by a sustained negativity starting 100 ms after stimulus onset and peaking 200 ms later. The early onset of this negativity suggests that interference originates in the cognitive effort required by listeners to extract dimension-specific information, a process that precedes response selection. In agreement with these findings, our results revealed numerous dimension-specific effects, most prominently in the filtering tasks.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Linguistics Program, Purdue University, West Lafayette, IN 47907-1353, USA.
| | | | | |
Collapse
|
26
|
Obleser J, Boecker H, Drzezga A, Haslinger B, Hennenlotter A, Roettinger M, Eulitz C, Rauschecker JP. Vowel sound extraction in anterior superior temporal cortex. Hum Brain Mapp 2006; 27:562-71. [PMID: 16281283 PMCID: PMC6871493 DOI: 10.1002/hbm.20201] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
We investigated the functional neuroanatomy of vowel processing. We compared attentive auditory perception of natural German vowels to perception of nonspeech band-passed noise stimuli using functional magnetic resonance imaging (fMRI). More specifically, the mapping in auditory cortex of first and second formants was considered, which spectrally characterize vowels and are linked closely to phonological features. Multiple exemplars of natural German vowels were presented in sequences alternating either mainly along the first formant (e.g., [u]-[o], [i]-[e]) or along the second formant (e.g., [u]-[i], [o]-[e]). In fixed-effects and random-effects analyses, vowel sequences elicited more activation than did nonspeech noise in the anterior superior temporal cortex (aST) bilaterally. Partial segregation of different vowel categories was observed within the activated regions, suggestive of a speech sound mapping across the cortical surface. Our results add to the growing evidence that speech sounds, as one of the behaviorally most relevant classes of auditory objects, are analyzed and categorized in aST. These findings also support the notion of an auditory "what" stream, with highly object-specialized areas anterior to primary auditory cortex.
Collapse
Affiliation(s)
- Jonas Obleser
- Fachgruppen Psychologie und Linguistik, Universität Konstanz, Konstanz, Germany
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Ogata E, Yumoto M, Itoh K, Sekimoto S, Karino S, Kaga K. A magnetoencephalographic study of Japanese vowel processing. Neuroreport 2006; 17:1127-31. [PMID: 16837840 DOI: 10.1097/01.wnr.0000230503.47973.b7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Magnetic brain responses were recorded to clarify the cortical representation of vowel processing in Japanese. We investigated the peak latencies and equivalent current dipoles of the auditory N1m responses to the Japanese vowels [a], [i], [o], and [u]. In intraindividual analyses for a single participant, well-replicated results for the dipole parameters supported the existence of phoneme-specific cortical maps for vowels. In the interindividual analyses for the eight participants, [a] and [i] elicited significantly earlier N1m responses than [u], and the dipole for [i] was more posteriorly oriented than [a] in the left hemisphere. The results of the current study suggest left hemispheric predominance in vowel processing and that factors associated with a different language system may modify the cortical map.
Collapse
Affiliation(s)
- Erika Ogata
- Department of Sensory and Motor Neuroscience, Graduate School of Medicine, University of Tokyo, Tokyo, Japan.
| | | | | | | | | | | |
Collapse
|
28
|
Tiitinen H, Mäkelä AM, Mäkinen V, May PJC, Alku P. Disentangling the effects of phonation and articulation: hemispheric asymmetries in the auditory N1m response of the human brain. BMC Neurosci 2005; 6:62. [PMID: 16225699 PMCID: PMC1280927 DOI: 10.1186/1471-2202-6-62] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Accepted: 10/15/2005] [Indexed: 11/16/2022] Open
Abstract
Background The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain. Results The N1m responses for the six stimulus types displayed a considerable dynamic range of 115–135 ms, and were elicited faster (~10 ms) by the high-formant /a/ than by the low-formant /u/, indicating an effect of articulation. While excitation type had no effect on the latency of the right-hemispheric N1m, the left-hemispheric N1m elicited by the tonally excited /a/ was some 10 ms earlier than that elicited by the periodic and the aperiodic excitation. The amplitude of the N1m in both hemispheres was systematically stronger to stimulation with natural periodic excitation. Also, stimulus type had a marked (up to 7 mm) effect on the source location of the N1m, with periodic excitation resulting in more anterior sources than aperiodic and tonal excitation. Conclusion The auditory brain areas of the two hemispheres exhibit differential tuning to natural speech signals, observable already in the passive recording condition. The variations in the latency and strength of the auditory N1m response can be traced back to the spectral structure of the stimuli. More specifically, the combined effects of the harmonic comb structure originating from the natural voice excitation caused by the fluctuating vocal folds and the location of the formant frequencies originating from the vocal tract leads to asymmetric behaviour of the left and right hemisphere.
Collapse
Affiliation(s)
- Hannu Tiitinen
- Apperception & Cortical Dynamics (ACD), Department of Psychology, P.O.B. 9, FIN-00014 University of Helsinki, Finland
- BioMag Laboratory, Engineering Centre, Helsinki University Central Hospital, Finland
| | - Anna Mari Mäkelä
- Apperception & Cortical Dynamics (ACD), Department of Psychology, P.O.B. 9, FIN-00014 University of Helsinki, Finland
- BioMag Laboratory, Engineering Centre, Helsinki University Central Hospital, Finland
| | - Ville Mäkinen
- Apperception & Cortical Dynamics (ACD), Department of Psychology, P.O.B. 9, FIN-00014 University of Helsinki, Finland
- BioMag Laboratory, Engineering Centre, Helsinki University Central Hospital, Finland
| | - Patrick JC May
- Apperception & Cortical Dynamics (ACD), Department of Psychology, P.O.B. 9, FIN-00014 University of Helsinki, Finland
- BioMag Laboratory, Engineering Centre, Helsinki University Central Hospital, Finland
| | - Paavo Alku
- Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Espoo, Finland
| |
Collapse
|
29
|
Obleser J, Scott SK, Eulitz C. Now you hear it, now you don't: transient traces of consonants and their nonspeech analogues in the human brain. ACTA ACUST UNITED AC 2005; 16:1069-76. [PMID: 16207930 DOI: 10.1093/cercor/bhj047] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The apparently effortless identification of speech is one of the human auditory cortex' finest and least understood functions. This is partly due to difficulties to tease apart effects of acoustic and phonetic attributes of speech sounds. Here we present evidence from magnetic source imaging that the auditory cortex represents speech sounds (such as [g] and [t]) in a topographically orderly fashion that is based on phonetic features. Moreover, this mapping is dependent on intelligibility. Only when consonants are identifiable as members of a native speech sound category is topographical spreading out in the auditory cortex observed. Feature separation in the cortex also varies with a listener's ability to tell these easy-to-confuse consonants from one another. This is the first demonstration that speech-specific maps of features can be identified in human auditory cortex, and it will further help us to delineate speech processing pathways based on models from functional neuroimaging and non-human primates.
Collapse
Affiliation(s)
- Jonas Obleser
- Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London, WC1N 3AR, UK.
| | | | | |
Collapse
|
30
|
Sittiprapaporn W, Tervaniemi M, Chindaduangratn C, Kotchabhakdi N. Preattentive discrimination of across-category and within-category change in consonant–vowel syllable. Neuroreport 2005; 16:1513-8. [PMID: 16110281 DOI: 10.1097/01.wnr.0000175618.46677.07] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Event-related potentials to infrequently presented spoken deviant syllables /pi/ and /po/ among repetitive standard [see text] syllables were recorded in Thai study participants who ignored these stimuli while reading books of their choices. The vowel across-category and within-category changes elicited a change-specific mismatch negativity response. The across-category and within-category change discrimination of vowels in consonant-vowel syllable was also assessed using the low-resolution electromagnetic tomography. The results of low-resolution electromagnetic tomography mismatch negativity generator analysis suggest that the within-category change perception of vowels is analyzed as the change in physical features of the stimuli, thus predominantly activating the right temporal cortex. In contrast, the left temporal cortex is predominantly activated in the across-category change perception of vowels, emphasizing the role of the left hemisphere in speech processing already at a preattentive processing level also in consonant-vowel syllables. The results support the hypothesis that a part of the superior temporal gyrus contains neurons specialized for speech perception.
Collapse
Affiliation(s)
- Wichian Sittiprapaporn
- Neuro-Behavioural Biology Center, Institute of Science and Technology for Research and Development, Mahidol University, Salaya, Nakhonpathom, Thailand.
| | | | | | | |
Collapse
|
31
|
Alain C, Reinke K, He Y, Wang C, Lobaugh N. Hearing Two Things at Once: Neurophysiological Indices of Speech Segregation and Identification. J Cogn Neurosci 2005; 17:811-8. [PMID: 15904547 DOI: 10.1162/0898929053747621] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
The discrimination of concurrent sounds is paramount to speech perception. During social gatherings, listeners must extract information from a composite acoustic wave, which sums multiple individual voices that are simultaneously active. The observers' ability to identify two simultaneously presented vowels improves with increasing separation between the fundamental frequencies (f 0) of the two vowels. Event-related potentials to stimuli presented during attend and ignore conditions revealed activity between 130 and 170 msec after sound onset that reflected the f 0 differences between the two vowels. Another, more posterior and right-lateralized, negative wave maximal at 250 msec, and a central-parietal slow negativity were observed only during vowel identification and may index stimulus categorization. This sequence of neural events supports a multistage model of auditory scene analysis in which the spectral pattern of each vowel constituent is automatically extracted and then matched against representations of those vowels in working memory.
Collapse
Affiliation(s)
- Claude Alain
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada.
| | | | | | | | | |
Collapse
|
32
|
Shestakova A, Brattico E, Soloviev A, Klucharev V, Huotilainen M. Orderly cortical representation of vowel categories presented by multiple exemplars. ACTA ACUST UNITED AC 2004; 21:342-50. [PMID: 15511650 DOI: 10.1016/j.cogbrainres.2004.06.011] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2004] [Indexed: 11/17/2022]
Abstract
This study aimed at determining how the human brain automatically processes phoneme categories irrespective of the large acoustic inter-speaker variability. Subjects were presented with 450 different speech stimuli, equally distributed across the [a], [i], and [u] vowel categories, and each uttered by a different male speaker. A 306-channel magnetoencephalogram (MEG) was used to record N1m, the magnetic counterpart of the N1 component of the auditory event-related potential (ERP). The N1m amplitude and source locations differed between vowel categories. We also found that the spectrum dissimilarities were reproduced in the cortical representations of the large set of the phonemes used in this study: vowels with similar spectral envelopes had closer cortical representations than those whose spectral differences were the largest. Our data further extend the notion of differential cortical representations in response to vowel categories, previously demonstrated by using only one or a few tokens representing each category.
Collapse
Affiliation(s)
- Anna Shestakova
- Cognitive Brain Research Unit, Department of Psychology, PO Box 9, FIN-00014 University of Helsinki, Helsinki, Finland.
| | | | | | | | | |
Collapse
|
33
|
Roberts TPL, Flagg EJ, Gage NM. Vowel categorization induces departure of M100 latency from acoustic prediction. Neuroreport 2004; 15:1679-82. [PMID: 15232306 DOI: 10.1097/01.wnr.0000134928.96937.10] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
MEG studies have shown that the timing (latency) of the evoked response that peaks approximately 100 ms post-stimulus onset (M100) decreases as frequency increases for sinusoidal tones. We investigated M100 latency using a continuum of synthesized vowel stimuli in which the dominant formant frequency increases from 250 Hz (perceived /u/) to 750 Hz (perceived /a/) in 50 Hz steps. While M100 latency did vary inversely with formant frequency overall, frequency modulation was flattened within each vowel category. However, for mid-continuum ambiguous tokens (i.e. those with increased reaction time/decreased accuracy in the concurrent behavioral identification task), M100 reverted to formant frequency differences, agreeing with previous findings of frequency-dependence. A theory is proposed in which phonological categorization emerges from specific spatial distribution of frequency-tuned neurons.
Collapse
Affiliation(s)
- Timothy P L Roberts
- Department of Medical Imaging, University of Toronto, 150 College Street, Toronto, Ontario M5S 3E2, Canada
| | | | | |
Collapse
|