1
|
Ngan VSH, Cheung LYT, Ng HTY, Yip KHM, Wong YK, Wong ACN. An early perceptual locus of absolute pitch. Psychophysiology 2023; 60:e14170. [PMID: 36094011 DOI: 10.1111/psyp.14170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/04/2022] [Accepted: 08/08/2022] [Indexed: 01/04/2023]
Abstract
Absolute pitch (AP) refers to the naming of musical tone without external reference. The influential two-component model states that AP is limited by the late-emerging pitch labeling process only and not the earlier perceptual and memory processes. Over the years, however, support for this model at the neural level has been mixed with various methodological limitations. Here, the electroencephalography responses of 27 AP possessors and 27 non-AP possessors were recorded. During both name verification and passive listening, event-related potential analyses showed a difference between AP and non-AP possessors at about 200 ms in their response toward tones compared with noise stimuli. Multivariate pattern analyses suggested that pitch naming was subserved by a series of transient processes for the first 250 ms, followed by a stage-like process for both AP and non-AP possessors with no group differences between them. These findings are inconsistent with the predictions of the two-component model, and instead suggest the existence of an early perceptual locus of AP.
Collapse
Affiliation(s)
- Vince S H Ngan
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Leo Y T Cheung
- Department of Educational Psychology, Faculty of Education, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Hezul T Y Ng
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Ken H M Yip
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Yetta Kwailing Wong
- Department of Educational Psychology, Faculty of Education, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Alan C-N Wong
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
2
|
Mai G, Howell P. The possible role of early-stage phase-locked neural activities in speech-in-noise perception in human adults across age and hearing loss. Hear Res 2023; 427:108647. [PMID: 36436293 DOI: 10.1016/j.heares.2022.108647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 10/26/2022] [Accepted: 11/04/2022] [Indexed: 11/11/2022]
Abstract
Ageing affects auditory neural phase-locked activities which could increase the challenges experienced during speech-in-noise (SiN) perception by older adults. However, evidence for how ageing affects SiN perception through these phase-locked activities is still lacking. It is also unclear whether influences of ageing on phase-locked activities in response to different acoustic properties have similar or different mechanisms to affect SiN perception. The present study addressed these issues by measuring early-stage phase-locked encoding of speech under quiet and noisy backgrounds (speech-shaped noise (SSN) and multi-talker babbles) in adults across a wide age range (19-75 years old). Participants passively listened to a repeated vowel whilst the frequency-following response (FFR) to fundamental frequency that has primary subcortical sources and cortical phase-locked response to slowly-fluctuating acoustic envelopes were recorded. We studied how these activities are affected by age and age-related hearing loss and how they are related to SiN performances (word recognition in sentences in noise). First, we found that the effects of age and hearing loss differ for the FFR and slow-envelope phase-locking. FFR was significantly decreased with age and high-frequency (≥ 2 kHz) hearing loss but increased with low-frequency (< 2 kHz) hearing loss, whilst the slow-envelope phase-locking was significantly increased with age and hearing loss across frequencies. Second, potential relationships between the types of phase-locked activities and SiN perception performances were also different. We found that the FFR and slow-envelope phase-locking positively corresponded to SiN performance under multi-talker babbles and SSN, respectively. Finally, we investigated how age and hearing loss affected SiN perception through phase-locked activities via mediation analyses. We showed that both types of activities significantly mediated the relation between age/hearing loss and SiN perception but in distinct manners. Specifically, FFR decreased with age and high-frequency hearing loss which in turn contributed to poorer SiN performance but increased with low-frequency hearing loss which in turn contributed to better SiN performance under multi-talker babbles. Slow-envelope phase-locking increased with age and hearing loss which in turn contributed to better SiN performance under both SSN and multi-talker babbles. Taken together, the present study provided evidence for distinct neural mechanisms of early-stage auditory phase-locked encoding of different acoustic properties through which ageing affects SiN perception.
Collapse
Affiliation(s)
- Guangting Mai
- National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham NG1 5DU, UK; Academic Unit of Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK; Department of Experimental Psychology, University College London, London WC1H 0AP, UK.
| | - Peter Howell
- Department of Experimental Psychology, University College London, London WC1H 0AP, UK
| |
Collapse
|
3
|
Waselius T, Xu W, Sparre JI, Penttonen M, Nokia MS. -Cardiac cycle and respiration phase affect responses to the conditioned stimulus in young adults trained in trace eyeblink conditioning. J Neurophysiol 2022; 127:767-775. [PMID: 35138956 DOI: 10.1152/jn.00298.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Rhythms of breathing and heartbeat are linked to each other as well as to rhythms of the brain. Our recent studies suggest that presenting the conditioned stimulus during expiration or during the diastolic phase of the cardiac cycle facilitates neural processing of that stimulus and improves learning an eyeblink classical conditioning task. To date, it has not been examined whether utilizing information from both respiration and cardiac cycle phases simultaneously allows even more efficient modulation of learning. Here we studied whether the timing of the conditioned stimulus to different cardiorespiratory rhythm phase combinations affects learning trace eyeblink conditioning in healthy young adults. The results were consistent with previous reports: Timing the conditioned stimulus to diastole during expiration was more beneficial for learning than timing it to systole during inspiration. Cardiac cycle phase seemed to explain most of this variation in learning at the behavioral level. Brain evoked potentials (N1) elicited by the conditioned stimulus and recorded using electroencephalogram were larger when the conditioned stimulus was presented to diastole during expiration than when it was presented to systole during inspiration. Breathing phase explained the variation in the N1 amplitude. To conclude, our findings suggest that non-invasive monitoring of bodily rhythms combined with closed-loop control of stimulation can be used to promote learning in humans. The next step will be to test if performance can also be improved in humans with compromised cognitive ability, such as in older people with memory impairments.
Collapse
Affiliation(s)
- Tomi Waselius
- Department of Psychology and Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland
| | - Weiyong Xu
- Department of Psychology and Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland
| | - Julia Isabella Sparre
- Department of Psychology and Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland
| | - Markku Penttonen
- Department of Psychology and Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland
| | - Miriam S Nokia
- Department of Psychology and Centre for Interdisciplinary Brain Research, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
4
|
Whitten A, Key AP, Mefferd AS, Bodfish JW. Auditory event-related potentials index faster processing of natural speech but not synthetic speech over nonspeech analogs in children. BRAIN AND LANGUAGE 2020; 207:104825. [PMID: 32563764 DOI: 10.1016/j.bandl.2020.104825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/29/2020] [Accepted: 05/30/2020] [Indexed: 06/11/2023]
Abstract
Given the crucial role of speech sounds in human language, it may be beneficial for speech to be supported by more efficient auditory and attentional neural processing mechanisms compared to nonspeech sounds. However, previous event-related potential (ERP) studies have found either no differences or slower auditory processing of speech than nonspeech, as well as inconsistent attentional processing. We hypothesized that this may be due to the use of synthetic stimuli in past experiments. The present study measured ERP responses during passive listening to both synthetic and natural speech and complexity-matched nonspeech analog sounds in 22 8-11-year-old children. We found that although children were more likely to show immature auditory ERP responses to the more complex natural stimuli, ERP latencies were significantly faster to natural speech compared to cow vocalizations, but were significantly slower to synthetic speech compared to tones. The attentional results indicated a P3a orienting response only to the cow sound, and we discuss potential methodological reasons for this. We conclude that our results support more efficient auditory processing of natural speech sounds in children, though more research with a wider array of stimuli will be necessary to confirm these results. Our results also highlight the importance of using natural stimuli in research investigating the neurobiology of language.
Collapse
Affiliation(s)
- Allison Whitten
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, TN, USA.
| | - Alexandra P Key
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, TN, USA; Department of Psychiatry and Behavioral Sciences, Vanderbilt Psychiatric Hospital, 1601 23rd Ave. S, Nashville, TN, USA; Vanderbilt Kennedy Center, 110 Magnolia Cir, Nashville, TN, USA
| | - Antje S Mefferd
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, TN, USA; Vanderbilt Kennedy Center, 110 Magnolia Cir, Nashville, TN, USA
| | - James W Bodfish
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Ave S., Nashville, TN, USA; Department of Psychiatry and Behavioral Sciences, Vanderbilt Psychiatric Hospital, 1601 23rd Ave. S, Nashville, TN, USA; Vanderbilt Kennedy Center, 110 Magnolia Cir, Nashville, TN, USA; Vanderbilt Brain Institute, 6133 Medical Research Building III, 465 21st Avenue S., Nashville, TN, USA
| |
Collapse
|
5
|
Modulation of phase-locked neural responses to speech during different arousal states is age-dependent. Neuroimage 2019; 189:734-744. [DOI: 10.1016/j.neuroimage.2019.01.049] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/08/2018] [Accepted: 01/20/2019] [Indexed: 01/29/2023] Open
|
6
|
Mathias B, Gehring WJ, Palmer C. Electrical Brain Responses Reveal Sequential Constraints on Planning during Music Performance. Brain Sci 2019; 9:E25. [PMID: 30696038 PMCID: PMC6406892 DOI: 10.3390/brainsci9020025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 01/21/2019] [Accepted: 01/26/2019] [Indexed: 12/20/2022] Open
Abstract
Elements in speech and music unfold sequentially over time. To produce sentences and melodies quickly and accurately, individuals must plan upcoming sequence events, as well as monitor outcomes via auditory feedback. We investigated the neural correlates of sequential planning and monitoring processes by manipulating auditory feedback during music performance. Pianists performed isochronous melodies from memory at an initially cued rate while their electroencephalogram was recorded. Pitch feedback was occasionally altered to match either an immediately upcoming Near-Future pitch (next sequence event) or a more distant Far-Future pitch (two events ahead of the current event). Near-Future, but not Far-Future altered feedback perturbed the timing of pianists' performances, suggesting greater interference of Near-Future sequential events with current planning processes. Near-Future feedback triggered a greater reduction in auditory sensory suppression (enhanced response) than Far-Future feedback, reflected in the P2 component elicited by the pitch event following the unexpected pitch change. Greater timing perturbations were associated with enhanced cortical sensory processing of the pitch event following the Near-Future altered feedback. Both types of feedback alterations elicited feedback-related negativity (FRN) and P3a potentials and amplified spectral power in the theta frequency range. These findings suggest similar constraints on producers' sequential planning to those reported in speech production.
Collapse
Affiliation(s)
- Brian Mathias
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada.
- Research Group Neural Mechanisms of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany.
| | - William J Gehring
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Caroline Palmer
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada.
| |
Collapse
|
7
|
Abbott NT, Shahin AJ. Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration. J Neurophysiol 2018; 120:2988-3000. [PMID: 30303762 DOI: 10.1152/jn.00262.2018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In spoken language, audiovisual (AV) perception occurs when the visual modality influences encoding of acoustic features (e.g., phonetic representations) at the auditory cortex. We examined how visual speech (mouth movements) transforms phonetic representations, indexed by changes to the N1 auditory evoked potential (AEP). EEG was acquired while human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables, /ba/ and /wa/, presented in auditory-only or AV congruent or incongruent contexts or in a context in which the consonants were replaced by white noise (noise replaced). Subjects reported whether they heard "ba" or "wa." We hypothesized that the auditory N1 amplitude during illusory perception (caused by incongruent AV input, as in the McGurk illusion, or white noise-replaced consonants in CV utterances) should shift to reflect the auditory N1 characteristics of the phonemes conveyed visually (by mouth movements) as opposed to acoustically. Indeed, the N1 AEP became larger and occurred earlier when listeners experienced illusory "ba" (video /ba/, audio /wa/, heard as "ba") and vice versa when they experienced illusory "wa" (video /wa/, audio /ba/, heard as "wa"), mirroring the N1 AEP characteristics for /ba/ and /wa/ observed in natural acoustic situations (e.g., auditory-only setting). This visually mediated N1 behavior was also observed for noise-replaced CVs. Taken together, the findings suggest that information relayed by the visual modality modifies phonetic representations at the auditory cortex and that similar neural mechanisms support the McGurk illusion and visually mediated phonemic restoration. NEW & NOTEWORTHY Using a variant of the McGurk illusion experimental design (using the syllables /ba/ and /wa/), we demonstrate that lipreading influences phonetic encoding at the auditory cortex. We show that the N1 auditory evoked potential morphology shifts to resemble the N1 morphology of the syllable conveyed visually. We also show similar N1 shifts when the consonants are replaced by white noise, suggesting that the McGurk illusion and the visually mediated phonemic restoration rely on common mechanisms.
Collapse
Affiliation(s)
- Noelle T Abbott
- Center for Mind and Brain, University of California, Davis, California.,San Diego State University-University of California, San Diego Joint Doctoral Program in Language and Communicative Disorders, San Diego, California
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, California.,Department of Cognitive and Information Sciences, University of California, Merced, California
| |
Collapse
|
8
|
Peter V, Kalashnikova M, Burnham D. Weighting of Amplitude and Formant Rise Time Cues by School-Aged Children: A Mismatch Negativity Study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1322-1333. [PMID: 29800360 DOI: 10.1044/2018_jslhr-h-17-0334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 02/07/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch negativity (MMN). METHOD Twelve adults and 36 children aged 6-12 years listened to a /ba/-/wa/ contrast in an oddball paradigm in which the standard stimulus had the ART and FRT cues of /ba/. In different blocks, the deviant stimulus had either the ART or FRT cues of /wa/. RESULTS The results revealed that children younger than 10 years were sensitive to both ART and FRT cues whereas 10- to 12-year-old children and adults were sensitive only to FRT cues. Moreover, children younger than 10 years generated a positive mismatch response, whereas older children and adults generated MMN. CONCLUSION These results suggest that preattentive adultlike weighting of ART and FRT cues is attained only by 10 years of age and accompanies the change from mismatch response to the more mature MMN response. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.6207608.
Collapse
Affiliation(s)
- Varghese Peter
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| | - Marina Kalashnikova
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| | - Denis Burnham
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| |
Collapse
|
9
|
Mai G, Tuomainen J, Howell P. Relationship between speech-evoked neural responses and perception of speech in noise in older adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1333. [PMID: 29604686 DOI: 10.1121/1.5024340] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Speech-in-noise (SPIN) perception involves neural encoding of temporal acoustic cues. Cues include temporal fine structure (TFS) and envelopes that modulate at syllable (Slow-rate ENV) and fundamental frequency (F0-rate ENV) rates. Here the relationship between speech-evoked neural responses to these cues and SPIN perception was investigated in older adults. Theta-band phase-locking values (PLVs) that reflect cortical sensitivity to Slow-rate ENV and peripheral/brainstem frequency-following responses phase-locked to F0-rate ENV (FFRENV_F0) and TFS (FFRTFS) were measured from scalp-electroencephalography responses to a repeated speech syllable in steady-state speech-shaped noise (SpN) and 16-speaker babble noise (BbN). The results showed that (1) SPIN performance and PLVs were significantly higher under SpN than BbN, implying differential cortical encoding may serve as the neural mechanism of SPIN performance that varies as a function of noise types; (2) PLVs and FFRTFS at resolved harmonics were significantly related to good SPIN performance, supporting the importance of phase-locked neural encoding of Slow-rate ENV and TFS of resolved harmonics during SPIN perception; (3) FFRENV_F0 was not associated to SPIN performance until audiometric threshold was controlled for, indicating that hearing loss should be carefully controlled when studying the role of neural encoding of F0-rate ENV. Implications are drawn with respect to fitting auditory prostheses.
Collapse
Affiliation(s)
- Guangting Mai
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP, England
| | - Jyrki Tuomainen
- Department of Speech, Hearing and Phonetic Sciences, Division of Psychology and Language Sciences, University College London, London, WC1N 1PF, England
| | - Peter Howell
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP, England
| |
Collapse
|
10
|
Abstract
OBJECTIVES Formant rise time (FRT) and amplitude rise time (ART) are acoustic cues that inform phonetic identity. FRT represents the rate of transition of the formant(s) to a steady state, while ART represents the rate at which the sound reaches its peak amplitude. Normal-hearing (NH) native English speakers weight FRT more than ART during the perceptual labeling of the /ba/-/wa/ contrast. This weighting strategy is reflected neurophysiologically in the magnitude of the mismatch negativity (MMN)-MMN is larger during the FRT than the ART distinction. The present study examined the neurophysiological basis of acoustic cue weighting in adult cochlear implant (CI) listeners using the MMN design. It was hypothesized that individuals with CIs who weight ART more in behavioral labeling (ART users) would show larger MMNs during the ART than the FRT contrast, and the opposite would be seen for FRT users. DESIGN Electroencephalography was recorded while 20 adults with CIs listened passively to combinations of 3 synthetic speech stimuli: a /ba/ with /ba/-like FRT and ART; a /wa/ with /wa/-like FRT and ART; and a /ba/ stimulus with /ba/-like FRT and /wa/-like ART. The MMN response was elicited during the FRT contrast by having participants passively listen to a train of /wa/ stimuli interrupted occasionally by /ba/ stimuli, and vice versa. For the ART contrast, the same procedure was implemented using the /ba/ and /ba/ stimuli. RESULTS Both ART and FRT users with CIs elicited MMNs that were equal in magnitudes during FRT and ART contrasts, with the exception that FRT users exhibited MMNs for ART and FRT contrasts that were temporally segregated. That is, their MMNs occurred significantly earlier during the ART contrast (~100 msec following sound onset) than during the FRT contrast (~200 msec). In contrast, the MMNs for ART users of both contrasts occurred later and were not significantly separable in time (~230 msec). Interestingly, this temporal segregation observed in FRT users is consistent with the MMN behavior in NH listeners. CONCLUSIONS Results suggest that listeners with CIs who learn to classify phonemes based on formant dynamics, consistent with NH listeners, develop a strategy similar to NH listeners, in which the organization of the amplitude and spectral representations of phonemes in auditory memory are temporally segregated.
Collapse
|
11
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
12
|
Cognitive basis of individual differences in speech perception, production and representations: The role of domain general attentional switching. Atten Percept Psychophys 2017; 79:945-963. [PMID: 28144832 DOI: 10.3758/s13414-017-1283-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigated whether individual differences in cognitive functions, attentional abilities in particular, were associated with individual differences in the quality of phonological representations, resulting in variability in speech perception and production. To do so, we took advantage of a tone merging phenomenon in Cantonese, and identified three groups of typically developed speakers who could differentiate the two rising tones (high and low rising) in both perception and production [+Per+Pro], only in perception [+Per-Pro], or in neither modalities [-Per-Pro]. Perception and production were reflected, respectively, by discrimination sensitivity d' and acoustic measures of pitch offset and rise time differences. Components of event-related potential (ERP)-the mismatch negativity (MMN) and the ERPs to amplitude rise time-were taken to reflect the representations of the acoustic cues of tones. Components of attention and working memory in the auditory and visual modalities were assessed with published test batteries. The results show that individual differences in both perception and production are linked to how listeners encode and represent the acoustic cues (pitch contour and rise time) as reflected by ERPs. The present study has advanced our knowledge from previous work by integrating measures of perception, production, attention, and those reflecting quality of representation, to offer a comprehensive account for the underlying cognitive factors of individual differences in speech processing. Particularly, it is proposed that domain-general attentional switching affects the quality of perceptual representations of the acoustic cues, giving rise to individual differences in perception and production.
Collapse
|
13
|
Baart M, Lindborg A, Andersen TS. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception. Eur J Neurosci 2017; 46:2578-2583. [PMID: 28976045 PMCID: PMC5725699 DOI: 10.1111/ejn.13734] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 09/27/2017] [Accepted: 09/27/2017] [Indexed: 11/30/2022]
Abstract
Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech‐induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, Tilburg, 5000 LE, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
14
|
Ou J, Law SP. Individual differences in processing pitch contour and rise time in adults: A behavioral and electrophysiological study of Cantonese tone merging. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:3226. [PMID: 27369146 DOI: 10.1121/1.4954252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
One way to understand the relationship between speech perception and production is to examine cases where the two dissociate. This study investigates the hypothesis that perceptual acuity reflected in event-related potentials (ERPs) to rise time of sound amplitude envelope and pitch contour [reflected in the mismatch negativity (MMN)] may associate with individual differences in production among speakers with otherwise comparable perceptual abilities. To test this hypothesis, advantage was taken of an on-going sound change-tone merging in Cantonese, and compared the ERPs between two groups of typically developed native speakers who could discriminate the high rising and low rising tones with equivalent accuracy but differed in the distinctiveness of their production of these tones. Using a passive oddball paradigm, early positive-going EEG components to rise time and MMN to pitch contour were elicited during perception of the two tones. Significant group differences were found in neural responses to rise time rather than pitch contour. More importantly, individual differences in efficiency of tone discrimination in response latency and magnitude of neural responses to rise time were correlated with acoustic measures of F0 offset and rise time differences in productions of the two rising tones.
Collapse
Affiliation(s)
- Jinghua Ou
- Division of Speech and Hearing Science, the University of Hong Kong, Hong Kong Special Administrative Region
| | - Sam-Po Law
- Division of Speech and Hearing Science, the University of Hong Kong, Hong Kong Special Administrative Region
| |
Collapse
|
15
|
Lockwood G, Tuomainen J. Ideophones in Japanese modulate the P2 and late positive complex responses. Front Psychol 2015; 6:933. [PMID: 26191031 PMCID: PMC4488605 DOI: 10.3389/fpsyg.2015.00933] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 06/22/2015] [Indexed: 11/13/2022] Open
Abstract
Sound-symbolism, or the direct link between sound and meaning, is typologically and behaviorally attested across languages. However, neuroimaging research has mostly focused on artificial non-words or individual segments, which do not represent sound-symbolism in natural language. We used EEG to compare Japanese ideophones, which are phonologically distinctive sound-symbolic lexical words, and arbitrary adverbs during a sentence reading task. Ideophones elicit a larger visual P2 response than arbitrary adverbs, as well as a sustained late positive complex. Our results and previous literature suggest that the larger P2 may indicate the integration of sound and sensory information by association in response to the distinctive phonology of ideophones. The late positive complex may reflect the facilitated lexical retrieval of arbitrary words in comparison to ideophones. This account provides new evidence that ideophones exhibit similar cross-modal correspondences to those which have been proposed for non-words and individual sounds.
Collapse
Affiliation(s)
- Gwilym Lockwood
- Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen Netherlands ; Division of Psychology and Language Sciences, University College London UK
| | - Jyrki Tuomainen
- Division of Psychology and Language Sciences, University College London UK
| |
Collapse
|
16
|
Nittrouer S, Lowenstein JH. Weighting of Acoustic Cues to a Manner Distinction by Children With and Without Hearing Loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:1077-92. [PMID: 25813201 PMCID: PMC4583325 DOI: 10.1044/2015_jslhr-h-14-0263] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 12/19/2014] [Accepted: 03/10/2015] [Indexed: 05/23/2023]
Abstract
PURPOSE Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction, (b) sensitivity to those cues, and (c) real-world communication functions. METHOD One hundred and seven children (43 with normal hearing [NH], 17 with hearing aids [HAs], and 47 with cochlear implants [CIs]) performed several tasks: labeling of stimuli from /bɑ/-to-/wɑ/ continua varying in formant and amplitude rise time (FRT and ART), discrimination of ART, word recognition, and phonemic awareness. RESULTS Children with hearing loss were less attentive overall to acoustic structure than children with NH. Children with CIs, but not those with HAs, weighted FRT less and ART more than children with NH. Sensitivity could not explain cue weighting. FRT cue weighting explained significant amounts of variability in word recognition and phonemic awareness; ART cue weighting did not. CONCLUSION Signal degradation inhibits access to spectral structure for children with CIs, but cannot explain their delayed development of optimal weighting strategies. Auditory training could strengthen the weighting of spectral cues for children with CIs, thus aiding spoken language acquisition.
Collapse
|
17
|
Lowenstein JH, Nittrouer S. All cues are not created equal: the case for facilitating the acquisition of typical weighting strategies in children with hearing loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:466-80. [PMID: 25611214 PMCID: PMC4398599 DOI: 10.1044/2015_jslhr-h-14-0254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 11/12/2014] [Accepted: 11/21/2014] [Indexed: 05/26/2023]
Abstract
PURPOSE One task of childhood involves learning to optimally weight acoustic cues in the speech signal in order to recover phonemic categories. This study examined the extent to which spectral degradation, as associated with cochlear implants, might interfere. The 3 goals were to measure, for adults and children, (a) cue weighting with spectrally degraded signals, (b) sensitivity to degraded cues, and (c) word recognition for degraded signals. METHOD Twenty-three adults and 36 children (10 and 8 years old) labeled spectrally degraded stimuli from /bɑ/-to-/wɑ/ continua varying in formant and amplitude rise time (FRT and ART). They also discriminated degraded stimuli from FRT and ART continua, and recognized words. RESULTS A developmental increase in the weight assigned to FRT in labeling was clearly observed, with a slight decrease in weight assigned to ART. Sensitivity to these degraded cues measured by the discrimination task could not explain variability in cue weighting. FRT cue weighting explained significant variability in word recognition; ART cue weighting did not. CONCLUSION Spectral degradation affects children more than adults, but that degradation cannot explain the greater diminishment in children's weighting of FRT. It is suggested that auditory training could strengthen the weighting of spectral cues for implant recipients.
Collapse
|
18
|
Souza PE, Wright RA, Blackburn MC, Tatman R, Gallun FJ. Individual sensitivity to spectral and temporal cues in listeners with hearing impairment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:520-34. [PMID: 25629388 PMCID: PMC4462137 DOI: 10.1044/2015_jslhr-h-14-0138] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 10/14/2014] [Accepted: 12/18/2014] [Indexed: 05/26/2023]
Abstract
PURPOSE The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. METHOD Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. RESULTS Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. CONCLUSION Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions.
Collapse
Affiliation(s)
- Pamela E. Souza
- Northwestern University, Evanston, IL
- Knowles Hearing Center, Northwestern University, Evanston, IL
| | | | | | | | - Frederick J. Gallun
- National Center for Rehabilitative Auditory Research, Portland VA Medical Center, OR
- Oregon Health & Science University, Portland
| |
Collapse
|
19
|
Bhat J, Miller LM, Pitt MA, Shahin AJ. Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony. J Neurophysiol 2015; 113:1437-50. [PMID: 25505102 DOI: 10.1152/jn.00200.2014] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14–30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8–14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance.
Collapse
Affiliation(s)
- Jyoti Bhat
- Department of Otolaryngology—Head and Neck Surgery, The Ohio State University, College of Medicine, Columbus, Ohio
| | - Lee M. Miller
- Center for Mind and Brain, University of California, Davis, California
- Department of Neurobiology, Physiology, and Behavior, University of California, Davis, California; and
| | - Mark A. Pitt
- Department of Psychology, The Ohio State University, Columbus, Ohio
| | - Antoine J. Shahin
- Department of Otolaryngology—Head and Neck Surgery, The Ohio State University, College of Medicine, Columbus, Ohio
- Center for Mind and Brain, University of California, Davis, California
| |
Collapse
|
20
|
Moberly AC, Bhat J, Welling DB, Shahin AJ. Neurophysiology of spectrotemporal cue organization of spoken language in auditory memory. BRAIN AND LANGUAGE 2014; 130:42-49. [PMID: 24576808 PMCID: PMC3989417 DOI: 10.1016/j.bandl.2014.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 01/07/2014] [Accepted: 01/23/2014] [Indexed: 06/03/2023]
Abstract
Listeners assign different weights to spectral dynamics, such as formant rise time (FRT), and temporal dynamics, such as amplitude rise time (ART), during phonetic judgments. We examined the neurophysiological basis of FRT and ART weighting in the /ba/-/wa/ contrast. Electroencephalography was recorded for thirteen adult English speakers during a mismatch negativity (MMN) design using synthetic stimuli: a /ba/ with /ba/-like FRT and ART; a /wa/ with /wa/-like FRT and ART; and a /ba/(wa) with /ba/-like FRT and /wa/-like ART. We hypothesized that because of stronger reliance on FRT, subjects would encode a stronger memory trace and exhibit larger MMN during the FRT than the ART contrast. Results supported this hypothesis. The effect was most robust in the later portion of MMN. Findings suggest that MMN is generated by multiple sources, differentially reflecting acoustic change detection (earlier MMN, bottom-up process) and perceptual weighting of ART and FRT (later MMN, top-down process).
Collapse
Affiliation(s)
- Aaron C Moberly
- The Ohio State University, Wexner Medical Center, Department of Otolaryngology - Head & Neck Surgery, United States.
| | - Jyoti Bhat
- The Ohio State University, Wexner Medical Center, Department of Otolaryngology - Head & Neck Surgery, United States
| | - D Bradley Welling
- The Ohio State University, Wexner Medical Center, Department of Otolaryngology - Head & Neck Surgery, United States
| | - Antoine J Shahin
- The Ohio State University, Wexner Medical Center, Department of Otolaryngology - Head & Neck Surgery, United States
| |
Collapse
|