1
|
Cusimano M, Hewitt LB, McDermott JH. Listening with generative models. Cognition 2024; 253:105874. [PMID: 39216190 DOI: 10.1016/j.cognition.2024.105874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/31/2024] [Accepted: 07/03/2024] [Indexed: 09/04/2024]
Abstract
Perception has long been envisioned to use an internal model of the world to explain the causes of sensory signals. However, such accounts have historically not been testable, typically requiring intractable search through the space of possible explanations. Using auditory scenes as a case study, we leveraged contemporary computational tools to infer explanations of sounds in a candidate internal generative model of the auditory world (ecologically inspired audio synthesizers). Model inferences accounted for many classic illusions. Unlike traditional accounts of auditory illusions, the model is applicable to any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable model structure enabled 'rich falsification', revealing additional assumptions about sound generation needed to account for perception. The results show how generative models can account for the perception of both classic illusions and everyday sensory signals, and illustrate the opportunities and challenges involved in incorporating them into theories of perception.
Collapse
Affiliation(s)
- Maddie Cusimano
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America.
| | - Luke B Hewitt
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America; McGovern Institute, Massachusetts Institute of Technology, United States of America; Center for Brains Minds and Machines, Massachusetts Institute of Technology, United States of America; Speech and Hearing Bioscience and Technology, Harvard University, United States of America.
| |
Collapse
|
2
|
Calcus A. Development of auditory scene analysis: a mini-review. Front Hum Neurosci 2024; 18:1352247. [PMID: 38532788 PMCID: PMC10963424 DOI: 10.3389/fnhum.2024.1352247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/22/2024] [Indexed: 03/28/2024] Open
Abstract
Most auditory environments contain multiple sound waves that are mixed before reaching the ears. In such situations, listeners must disentangle individual sounds from the mixture, performing the auditory scene analysis. Analyzing complex auditory scenes relies on listeners ability to segregate acoustic events into different streams, and to selectively attend to the stream of interest. Both segregation and selective attention are known to be challenging for adults with normal hearing, and seem to be even more difficult for children. Here, we review the recent literature on the development of auditory scene analysis, presenting behavioral and neurophysiological results. In short, cognitive and neural mechanisms supporting stream segregation are functional from birth but keep developing until adolescence. Similarly, from 6 months of age, infants can orient their attention toward a target in the presence of distractors. However, selective auditory attention in the presence of interfering streams only reaches maturity in late childhood at the earliest. Methodological limitations are discussed, and a new paradigm is proposed to clarify the relationship between auditory scene analysis and speech perception in noise throughout development.
Collapse
Affiliation(s)
- Axelle Calcus
- Center for Research in Cognitive Neuroscience (CRCN), ULB Neuroscience Institute (UNI), Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
3
|
Gohari N, Hosseini Dastgerdi Z, Bernstein LJ, Alain C. Neural correlates of concurrent sound perception: A review and guidelines for future research. Brain Cogn 2022; 163:105914. [PMID: 36155348 DOI: 10.1016/j.bandc.2022.105914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/30/2022] [Accepted: 09/02/2022] [Indexed: 11/02/2022]
Abstract
The perception of concurrent sound sources depends on processes (i.e., auditory scene analysis) that fuse and segregate acoustic features according to harmonic relations, temporal coherence, and binaural cues (encompass dichotic pitch, location difference, simulated echo). The object-related negativity (ORN) and P400 are electrophysiological indices of concurrent sound perception. Here, we review the different paradigms used to study concurrent sound perception and the brain responses obtained from these paradigms. Recommendations regarding the design and recording parameters of the ORN and P400 are made, and their clinical applications in assessing central auditory processing ability in different populations are discussed.
Collapse
Affiliation(s)
- Nasrin Gohari
- Department of Audiology, School of Rehabilitation, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Zahra Hosseini Dastgerdi
- Department of Audiology, School of Rehabilitation, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Lori J Bernstein
- Department of Supportive Care, University Health Network, and Department of Psychiatry, University of Toronto, Toronto, Canada
| | - Claude Alain
- Rotman Research Institute, Baycrest Centre for Geriatric Care & Department of Psychology, University of Toronto, Canada
| |
Collapse
|
4
|
Mehrkian S, Moossavi A, Gohari N, Nazari MA, Bakhshi E, Alain C. Long Latency Auditory Evoked Potentials and Object-Related Negativity Based on Harmonicity in Hearing-Impaired Children. Neurosci Res 2022; 178:52-59. [PMID: 35007647 DOI: 10.1016/j.neures.2022.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 11/10/2021] [Accepted: 01/06/2022] [Indexed: 11/28/2022]
Abstract
Hearing-impaired children (HIC) have difficulty understanding speech in noise, which may be due to difficulty parsing concurrent sound object based on harmonicity cues. Using long latency auditory evoked potentials (LLAEPs) and object-related negativity (ORN), a neural metric of concurrent sound segregation, this study investigated the sensitivity of HIC in processing harmonic relation. The participants were 14 normal-hearing children (NHC) with an average age of 7.82 ± 1.31 years and 17 HIC with an average age of 7.98 ± 1.25 years. They were presented with a sequence of 200 Hz harmonic complex tones that had either all harmonic in tune or the third harmonic mistuned by 2%, 4%, 8%, and 16% of its original value while neuroelectric brain activity was recorded. The analysis of scalp-recorded LLAEPs revealed lower N2 amplitudes elicited by the tuned stimuli in HIC than control. The ORN, isolated in difference wave between LLAEP elicited by tuned and mistuned stimuli, was delayed and smaller in HIC than NHC. This study showed that deficits in processing harmonic relation in HIC, which may contribute to their difficulty in understanding speech in noise. As a result, top-down and bottom-up rehabilitations aiming to improve processing of basic acoustic characteristics, including harmonics are recommended for children with hearing loss.
Collapse
Affiliation(s)
- Saeideh Mehrkian
- Department of Audiology, University of Social Welfare and Rehabilitation Science, Tehran, Iran
| | - Abdollah Moossavi
- Department of Otolaryngology and Head and Neck Surgery, School of Medicine, Iran University of Medical Science, Tehran, Iran
| | - Nasrin Gohari
- Department of Audiology, University of Social Welfare and Rehabilitation Science, Tehran, Iran.
| | - Mohammad Ali Nazari
- Department of Neuroscience, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Enayatollah Bakhshi
- Department of Biostatistics and Epidemiology, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Claude Alain
- The Rotman Research Institute, Baycrest Centre for Geriatric Care, University of Toronto, Canada, & Department of Psychology, University of Toronto, Canada
| |
Collapse
|
5
|
Cortical Processing of Binaural Cues as Shown by EEG Responses to Random-Chord Stereograms. J Assoc Res Otolaryngol 2021; 23:75-94. [PMID: 34904205 PMCID: PMC8783002 DOI: 10.1007/s10162-021-00820-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/06/2021] [Indexed: 10/26/2022] Open
Abstract
Spatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.
Collapse
|
6
|
Segal O, Kligler N, Kishon-Rabin L. Infants' Preference for Child-Directed Speech Over Time-Reversed Speech in On-Channel and Off-Channel Masking. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2897-2908. [PMID: 34157233 DOI: 10.1044/2021_jslhr-20-00279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose This study aims to examine the development of auditory selective attention to speech in noise by examining the ability of infants to prefer child-directed speech (CDS) over time-reversed speech (TRS) presented in "on-channel" and "off-channel" noise. Method A total of 32 infants participated in the study. Sixteen typically developing infants were tested at 7 and 11 months of age using the central fixation procedure with CDS and TRS in two types of noise at +10 dB signal-to-noise ratio. One type of noise was an "on-channel" masker with a spectrum overlapping that of the CDS (energetic masking), and the second was an "off-channel" masker with frequencies that were outside the spectrum of the CDS (distractive masking). An additional group of sixteen 11-month-old infants were tested in quiet and served as controls for the "off-frequency" masker condition. Results Infants preferred CDS over TRS in both age groups, but this preference was more pronounced with "off-channel" masker regardless of age. Also, older infants demonstrated longer looking time for the target stimuli when presented with an "off-channel" masker compared to the "on-channel" masker. Looking time in quiet was similar to looking time in the "off-channel" condition, and looking time for CDS was longer in quiet compared to the "on-channel" condition. Conclusions These findings support the notion that (a) infants as young as 7 months of age are already showing preference for speech in noise, regardless of type of masker; (b) by 11 months of age, listening with the "off-channel" condition did not yield different results than in quiet. Thus, by 11 months of age, infants' cognitive-attentional abilities may be more developed.
Collapse
Affiliation(s)
- Osnat Segal
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Sackler Faculty of Medicine, Tel Aviv University, Israel
| | - Nitzan Kligler
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Sackler Faculty of Medicine, Tel Aviv University, Israel
| | - Liat Kishon-Rabin
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Sackler Faculty of Medicine, Tel Aviv University, Israel
| |
Collapse
|
7
|
Oster MM, Werner LA. Infants' use of isolated and combined temporal cues in speech sound segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:401. [PMID: 32752747 PMCID: PMC7386947 DOI: 10.1121/10.0001582] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 06/14/2020] [Accepted: 06/28/2020] [Indexed: 06/11/2023]
Abstract
This paper investigates infants' and adults' use of envelope cues and combined onset asynchrony and envelope cues in the segregation of concurrent vowels. Listeners heard superimposed vowel pairs consisting of two different vowels spoken by a male and a female talker and were trained to respond to one specific target vowel, either the male /u:/ or male /i:/. Vowel detection was measured in three conditions. In the baseline condition the two superimposed vowels had similar amplitude envelopes and synchronous onset. In the envelope cue condition, the amplitude envelopes of the two vowels differed. In the combined cue condition, both the onset time and amplitude envelopes of the two vowels differed. Seven-month-old infants' concurrent vowel segregation improved both with envelope and with combined onset asynchrony and envelope cues to the same extent as adults'. A preliminary investigation with 3-month-old infants suggested that neither envelope cues nor combined asynchrony and envelope cues improved their ability to detect the target vowel. Taken together, these results suggest that envelope and combined onset-asynchrony cues are available to infants as they attempt to process competing speech sounds, at least after 7 months of age.
Collapse
Affiliation(s)
- Monika-Maria Oster
- Listen and Talk, 8610 8th Avenue Northeast, Seattle, Washington 98115, USA
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| |
Collapse
|
8
|
Oster MM, Werner LA. Infants use onset asynchrony cues in auditory scene analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2052. [PMID: 30404496 PMCID: PMC6181648 DOI: 10.1121/1.5058397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 09/01/2018] [Accepted: 09/17/2018] [Indexed: 06/08/2023]
Abstract
This experiment investigated the effect of onset asynchrony on the segregation of concurrent vowels in infants and adults. Two vowels, randomly chosen from seven American-English vowels, were superimposed. Each vowel pair contained one vowel by a male and one by a female talker. A train of such vowel pairs was presented to listeners, who were trained to respond to the male target vowel /i:/ or /u:/. The ability to identify the target vowel was compared among three conditions: synchronous onset, 100-, and 200-ms onset asynchrony. Experiment 1 measured performance, in d', in 7-month-old infants and adults. Infants and adults performed better with asynchronous than synchronous vowel onset, regardless of asynchrony duration. Experiment 2 compared the proportion of 3-month-old infants achieving an 80% correct criterion with and without onset asynchrony. Significantly more infants reached criterion with asynchronous than with synchronous vowel onset. Asynchrony duration did not influence performance. These experiments show that infants, as young as 3 months old, benefit from onset asynchrony.
Collapse
Affiliation(s)
- Monika-Maria Oster
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105, USA
| |
Collapse
|
9
|
Smith NA, Folland NA, Martinez DM, Trainor LJ. Multisensory object perception in infancy: 4-month-olds perceive a mistuned harmonic as a separate auditory and visual object. Cognition 2017; 164:1-7. [PMID: 28346869 PMCID: PMC5429982 DOI: 10.1016/j.cognition.2017.01.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 01/17/2017] [Accepted: 01/24/2017] [Indexed: 10/19/2022]
Abstract
Infants learn to use auditory and visual information to organize the sensory world into identifiable objects with particular locations. Here we use a behavioural method to examine infants' use of harmonicity cues to auditory object perception in a multisensory context. Sounds emitted by different objects sum in the air and the auditory system must figure out which parts of the complex waveform belong to different sources (auditory objects). One important cue to this source separation is that complex tones with pitch typically contain a fundamental frequency and harmonics at integer multiples of the fundamental. Consequently, adults hear a mistuned harmonic in a complex sound as a distinct auditory object (Alain, Theunissen, Chevalier, Batty, & Taylor, 2003). Previous work by our group demonstrated that 4-month-old infants are also sensitive to this cue. They behaviourally discriminate a complex tone with a mistuned harmonic from the same complex with in-tune harmonics, and show an object-related event-related potential (ERP) electrophysiological (EEG) response to the stimulus with mistuned harmonics. In the present study we use an audiovisual procedure to investigate whether infants perceive a complex tone with an 8% mistuned harmonic as emanating from two objects, rather than merely detecting the mistuned cue. We paired in-tune and mistuned complex tones with visual displays that contained either one or two bouncing balls. Four-month-old infants showed surprise at the incongruous pairings, looking longer at the display of two balls when paired with the in-tune complex and at the display of one ball when paired with the mistuned harmonic complex. We conclude that infants use harmonicity as a cue for source separation when integrating auditory and visual information in object perception.
Collapse
Affiliation(s)
- Nicholas A Smith
- Perceptual Development Laboratory, Boys Town National Research Hospital, 555 N. 30th Street, Omaha, NE 68131, United States
| | - Nicole A Folland
- Department of Psychology, Neuroscience and Behaviour, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
| | - Diana M Martinez
- Department of Psychology, Neuroscience and Behaviour, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
| | - Laurel J Trainor
- Department of Psychology, Neuroscience and Behaviour, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada; McMaster Institute for Music and the Mind, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada; Rotman Research Institute, Baycrest, University of Toronto, 3560 Bathurst Street, Toronto, Ontario M6A 2E1, Canada.
| |
Collapse
|
10
|
Tóth B, Kocsis Z, Háden GP, Szerafin Á, Shinn-Cunningham BG, Winkler I. EEG signatures accompanying auditory figure-ground segregation. Neuroimage 2016; 141:108-119. [PMID: 27421185 PMCID: PMC5656226 DOI: 10.1016/j.neuroimage.2016.07.028] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 07/06/2016] [Accepted: 07/11/2016] [Indexed: 11/16/2022] Open
Abstract
In everyday acoustic scenes, figure-ground segregation typically requires one to group together sound elements over both time and frequency. Electroencephalogram was recorded while listeners detected repeating tonal complexes composed of a random set of pure tones within stimuli consisting of randomly varying tonal elements. The repeating pattern was perceived as a figure over the randomly changing background. It was found that detection performance improved both as the number of pure tones making up each repeated complex (figure coherence) increased, and as the number of repeated complexes (duration) increased - i.e., detection was easier when either the spectral or temporal structure of the figure was enhanced. Figure detection was accompanied by the elicitation of the object related negativity (ORN) and the P400 event-related potentials (ERPs), which have been previously shown to be evoked by the presence of two concurrent sounds. Both ERP components had generators within and outside of auditory cortex. The amplitudes of the ORN and the P400 increased with both figure coherence and figure duration. However, only the P400 amplitude correlated with detection performance. These results suggest that 1) the ORN and P400 reflect processes involved in detecting the emergence of a new auditory object in the presence of other concurrent auditory objects; 2) the ORN corresponds to the likelihood of the presence of two or more concurrent sound objects, whereas the P400 reflects the perceptual recognition of the presence of multiple auditory objects and/or preparation for reporting the detection of a target object.
Collapse
Affiliation(s)
- Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Center for Computational Neuroscience and Neural Technology, Boston University, Boston, USA.
| | - Zsuzsanna Kocsis
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | - Gábor P Háden
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Ágnes Szerafin
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | | | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive and Neuropsychology, Institute of Psychology, University of Szeged, Szeged, Hungary
| |
Collapse
|
11
|
Theta oscillations accompanying concurrent auditory stream segregation. Int J Psychophysiol 2016; 106:141-51. [PMID: 27170058 DOI: 10.1016/j.ijpsycho.2016.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 04/25/2016] [Accepted: 05/06/2016] [Indexed: 11/21/2022]
Abstract
The ability to isolate a single sound source among concurrent sources is crucial for veridical auditory perception. The present study investigated the event-related oscillations evoked by complex tones, which could be perceived as a single sound and tonal complexes with cues promoting the perception of two concurrent sounds by inharmonicity, onset asynchrony, and/or perceived source location difference of the components tones. In separate task conditions, participants performed a visual change detection task (visual control), watched a silent movie (passive listening) or reported for each tone whether they perceived one or two concurrent sounds (active listening). In two time windows, the amplitude of theta oscillation was modulated by the presence vs. absence of the cues: 60-350ms/6-8Hz (early) and 350-450ms/4-8Hz (late). The early response appeared both in the passive and the active listening conditions; it did not closely match the task performance; and it had a fronto-central scalp distribution. The late response was only elicited in the active listening condition; it closely matched the task performance; and it had a centro-parietal scalp distribution. The neural processes reflected by these responses are probably involved in the processing of concurrent sound segregation cues, in sound categorization, and response preparation and monitoring. The current results are compatible with the notion that theta oscillations mediate some of the processes involved in concurrent sound segregation.
Collapse
|
12
|
Folland NA, Butler BE, Payne JE, Trainor LJ. Cortical Representations Sensitive to the Number of Perceived Auditory Objects Emerge between 2 and 4 Months of Age: Electrophysiological Evidence. J Cogn Neurosci 2015; 27:1060-7. [DOI: 10.1162/jocn_a_00764] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Sound waves emitted by two or more simultaneous sources reach the ear as one complex waveform. Auditory scene analysis involves parsing a complex waveform into separate perceptual representations of the sound sources [Bregman, A. S. Auditory scene analysis: The perceptual organization of sounds. London: MIT Press, 1990]. Harmonicity provides an important cue for auditory scene analysis. Normally, harmonics at integer multiples of a fundamental frequency are perceived as one sound with a pitch corresponding to the fundamental frequency. However, when one harmonic in such a complex, pitch-evoking sound is sufficiently mistuned, that harmonic emerges from the complex tone and is perceived as a separate auditory object. Previous work has shown that the percept of two objects is indexed in both children and adults by the object-related negativity component of the ERP derived from EEG recordings [Alain, C., Arnott, S. T., & Picton, T. W. Bottom–up and top–down influences on auditory scene analysis: Evidence from event-related brain potentials. Journal of Experimental Psychology: Human Perception and Performance, 27, 1072–1089, 2001]. Here we examine the emergence of object-related responses to an 8% harmonic mistuning in infants between 2 and 12 months of age. Two-month-old infants showed no significant object-related response. However, in 4- to 12-month-old infants, a significant frontally positive component was present, and by 8–12 months, a significant frontocentral object-related negativity was present, similar to that seen in older children and adults. This is in accordance with previous research demonstrating that infants younger than 4 months of age do not integrate harmonic information to perceive pitch when the fundamental is missing [He, C., Hotson, L., & Trainor, L. J. Maturation of cortical mismatch mismatch responses to occasional pitch change in early infancy: Effects of presentation rate and magnitude of change. Neuropsychologia, 47, 218–229, 2009]. The results indicate that the ability to use harmonic information to segregate simultaneous sounds emerges at the cortical level between 2 and 4 months of age.
Collapse
|
13
|
Trainor LJ. The origins of music in auditory scene analysis and the roles of evolution and culture in musical creation. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140089. [PMID: 25646512 PMCID: PMC4321130 DOI: 10.1098/rstb.2014.0089] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Whether music was an evolutionary adaptation that conferred survival advantages or a cultural creation has generated much debate. Consistent with an evolutionary hypothesis, music is unique to humans, emerges early in development and is universal across societies. However, the adaptive benefit of music is far from obvious. Music is highly flexible, generative and changes rapidly over time, consistent with a cultural creation hypothesis. In this paper, it is proposed that much of musical pitch and timing structure adapted to preexisting features of auditory processing that evolved for auditory scene analysis (ASA). Thus, music may have emerged initially as a cultural creation made possible by preexisting adaptations for ASA. However, some aspects of music, such as its emotional and social power, may have subsequently proved beneficial for survival and led to adaptations that enhanced musical behaviour. Ontogenetic and phylogenetic evidence is considered in this regard. In particular, enhanced auditory-motor pathways in humans that enable movement entrainment to music and consequent increases in social cohesion, and pathways enabling music to affect reward centres in the brain should be investigated as possible musical adaptations. It is concluded that the origins of music are complex and probably involved exaptation, cultural creation and evolutionary adaptation.
Collapse
Affiliation(s)
- Laurel J Trainor
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Ontario, Canada McMaster Institute for Music and the Mind, McMaster University, Hamilton, Ontario, Canada Rotman Research Institute, Baycrest Hospital, Toronto, Ontario, Canada
| |
Collapse
|
14
|
Bendixen A, Háden GP, Németh R, Farkas D, Török M, Winkler I. Newborn Infants Detect Cues of Concurrent Sound Segregation. Dev Neurosci 2015; 37:172-81. [DOI: 10.1159/000370237] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 11/28/2014] [Indexed: 11/19/2022] Open
Abstract
Separating concurrent sounds is fundamental for a veridical perception of one's auditory surroundings. Sound components that are harmonically related and start at the same time are usually grouped into a common perceptual object, whereas components that are not in harmonic relation or have different onset times are more likely to be perceived in terms of separate objects. Here we tested whether neonates are able to pick up the cues supporting this sound organization principle. We presented newborn infants with a series of complex tones with their harmonics in tune (creating the percept of a unitary sound object) and with manipulated variants, which gave the impression of two concurrently active sound sources. The manipulated variant had either one mistuned partial (single-cue condition) or the onset of this mistuned partial was also delayed (double-cue condition). Tuned and manipulated sounds were presented in random order with equal probabilities. Recording the neonates' electroencephalographic responses allowed us to evaluate their processing of the sounds. Results show that, in both conditions, mistuned sounds elicited a negative displacement of the event-related potential (ERP) relative to tuned sounds from 360 to 400 ms after sound onset. The mistuning-related ERP component resembles the object-related negativity (ORN) component in adults, which is associated with concurrent sound segregation. Delayed onset additionally led to a negative displacement from 160 to 200 ms, which was probably more related to the physical parameters of the sounds than to their perceptual segregation. The elicitation of an ORN-like response in newborn infants suggests that neonates possess the basic capabilities of segregating concurrent sounds by detecting inharmonic relations between the co-occurring sounds.
Collapse
|
15
|
Marie C, Trainor LJ. Early development of polyphonic sound encoding and the high voice superiority effect. Neuropsychologia 2014; 57:50-8. [PMID: 24613759 DOI: 10.1016/j.neuropsychologia.2014.02.023] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 02/25/2014] [Accepted: 02/26/2014] [Indexed: 11/17/2022]
Abstract
Previous research suggests that when two streams of pitched tones are presented simultaneously, adults process each stream in a separate memory trace, as reflected by mismatch negativity (MMN), a component of the event-related potential (ERP). Furthermore, a superior encoding of the higher tone or voice in polyphonic sounds has been found for 7-month-old infants and both musician and non-musician adults in terms of a larger amplitude MMN in response to pitch deviant stimuli in the higher than the lower voice. These results, in conjunction with modeling work, suggest that the high voice superiority effect might originate in characteristics of the peripheral auditory system. If this is the case, the high voice superiority effect should be present in infants younger than 7 months. In the present study we tested 3-month-old infants as there is no evidence at this age of perceptual narrowing or specialization of musical processing according to the pitch or rhythmic structure of music experienced in the infant׳s environment. We presented two simultaneous streams of tones (high and low) with 50% of trials modified by 1 semitone (up or down), either on the higher or the lower tone, leaving 50% standard trials. Results indicate that like the 7-month-olds, 3-month-old infants process each tone in a separate memory trace and show greater saliency for the higher tone. Although MMN was smaller and later in both voices for the group of sixteen 3-month-olds compared to the group of sixteen 7-month-olds, the size of the difference in MMN for the high compared to low voice was similar across ages. These results support the hypothesis of an innate peripheral origin of the high voice superiority effect.
Collapse
Affiliation(s)
- Céline Marie
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada L8S 4K1; McMaster Institute for Music and the Mind, Hamilton, Ontario, Canada
| | - Laurel J Trainor
- Department of Psychology, Neuroscience & Behaviour, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada L8S 4K1; McMaster Institute for Music and the Mind, Hamilton, Ontario, Canada; Rotman Research Institute, Baycrest Centre, Toronto, Ontario, Canada.
| |
Collapse
|
16
|
Lodhia V, Brock J, Johnson BW, Hautus MJ. Reduced object related negativity response indicates impaired auditory scene analysis in adults with autistic spectrum disorder. PeerJ 2014; 2:e261. [PMID: 24688845 PMCID: PMC3940479 DOI: 10.7717/peerj.261] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 01/15/2014] [Indexed: 11/20/2022] Open
Abstract
Auditory Scene Analysis provides a useful framework for understanding atypical auditory perception in autism. Specifically, a failure to segregate the incoming acoustic energy into distinct auditory objects might explain the aversive reaction autistic individuals have to certain auditory stimuli or environments. Previous research with non-autistic participants has demonstrated the presence of an Object Related Negativity (ORN) in the auditory event related potential that indexes pre-attentive processes associated with auditory scene analysis. Also evident is a later P400 component that is attention dependent and thought to be related to decision-making about auditory objects. We sought to determine whether there are differences between individuals with and without autism in the levels of processing indexed by these components. Electroencephalography (EEG) was used to measure brain responses from a group of 16 autistic adults, and 16 age- and verbal-IQ-matched typically-developing adults. Auditory responses were elicited using lateralized dichotic pitch stimuli in which inter-aural timing differences create the illusory perception of a pitch that is spatially separated from a carrier noise stimulus. As in previous studies, control participants produced an ORN in response to the pitch stimuli. However, this component was significantly reduced in the participants with autism. In contrast, processing differences were not observed between the groups at the attention-dependent level (P400). These findings suggest that autistic individuals have difficulty segregating auditory stimuli into distinct auditory objects, and that this difficulty arises at an early pre-attentive level of processing.
Collapse
Affiliation(s)
- Veema Lodhia
- Research Centre for Cognitive Neuroscience, School of Psychology, The University of Auckland , New Zealand
| | - Jon Brock
- ARC Centre of Excellence in Cognition and its Disorders , Australia ; Department of Cognitive Science, Macquarie University , Sydney , Australia
| | - Blake W Johnson
- ARC Centre of Excellence in Cognition and its Disorders , Australia ; Department of Cognitive Science, Macquarie University , Sydney , Australia
| | - Michael J Hautus
- Research Centre for Cognitive Neuroscience, School of Psychology, The University of Auckland , New Zealand
| |
Collapse
|
17
|
Alain C, Zendel BR, Hutka S, Bidelman GM. Turning down the noise: The benefit of musical training on the aging auditory brain. Hear Res 2014. [DOI: 10.10.1016/j.heares.2013.06.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
18
|
Explaining the high voice superiority effect in polyphonic music: evidence from cortical evoked potentials and peripheral auditory models. Hear Res 2013; 308:60-70. [PMID: 23916754 DOI: 10.1016/j.heares.2013.07.014] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Revised: 07/12/2013] [Accepted: 07/25/2013] [Indexed: 11/23/2022]
Abstract
Natural auditory environments contain multiple simultaneously-sounding objects and the auditory system must parse the incoming complex sound wave they collectively create into parts that represent each of these individual objects. Music often similarly requires processing of more than one voice or stream at the same time, and behavioral studies demonstrate that human listeners show a systematic perceptual bias in processing the highest voice in multi-voiced music. Here, we review studies utilizing event-related brain potentials (ERPs), which support the notions that (1) separate memory traces are formed for two simultaneous voices (even without conscious awareness) in auditory cortex and (2) adults show more robust encoding (i.e., larger ERP responses) to deviant pitches in the higher than in the lower voice, indicating better encoding of the former. Furthermore, infants also show this high-voice superiority effect, suggesting that the perceptual dominance observed across studies might result from neurophysiological characteristics of the peripheral auditory system. Although musically untrained adults show smaller responses in general than musically trained adults, both groups similarly show a more robust cortical representation of the higher than of the lower voice. Finally, years of experience playing a bass-range instrument reduces but does not reverse the high voice superiority effect, indicating that although it can be modified, it is not highly neuroplastic. Results of new modeling experiments examined the possibility that characteristics of middle-ear filtering and cochlear dynamics (e.g., suppression) reflected in auditory nerve firing patterns might account for the higher-voice superiority effect. Simulations show that both place and temporal AN coding schemes well-predict a high-voice superiority across a wide range of interval spacings and registers. Collectively, we infer an innate, peripheral origin for the higher-voice superiority observed in human ERP and psychophysical music listening studies.
Collapse
|
19
|
Alain C, Zendel BR, Hutka S, Bidelman GM. Turning down the noise: the benefit of musical training on the aging auditory brain. Hear Res 2013; 308:162-73. [PMID: 23831039 DOI: 10.1016/j.heares.2013.06.008] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Revised: 06/19/2013] [Accepted: 06/24/2013] [Indexed: 11/29/2022]
Abstract
Age-related decline in hearing abilities is a ubiquitous part of aging, and commonly impacts speech understanding, especially when there are competing sound sources. While such age effects are partially due to changes within the cochlea, difficulties typically exist beyond measurable hearing loss, suggesting that central brain processes, as opposed to simple peripheral mechanisms (e.g., hearing sensitivity), play a critical role in governing hearing abilities late into life. Current training regimens aimed to improve central auditory processing abilities have experienced limited success in promoting listening benefits. Interestingly, recent studies suggest that in young adults, musical training positively modifies neural mechanisms, providing robust, long-lasting improvements to hearing abilities as well as to non-auditory tasks that engage cognitive control. These results offer the encouraging possibility that musical training might be used to counteract age-related changes in auditory cognition commonly observed in older adults. Here, we reviewed studies that have examined the effects of age and musical experience on auditory cognition with an emphasis on auditory scene analysis. We infer that musical training may offer potential benefits to complex listening and might be utilized as a means to delay or even attenuate declines in auditory perception and cognition that often emerge later in life.
Collapse
Affiliation(s)
- Claude Alain
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Canada; Department of Psychology, University of Toronto, Canada.
| | - Benjamin Rich Zendel
- International Laboratory for Brain, Music and Sound Research (BRAMS), Département de Psychologie, Université de Montréal, Québec, Canada; Centre de Recherche, Institut Universitaire de Gériatrie de Montréal, Québec, Canada
| | - Stefanie Hutka
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Canada; Department of Psychology, University of Toronto, Canada
| | - Gavin M Bidelman
- Institute for Intelligent Systems & School of Communication Sciences and Disorders, University of Memphis, USA
| |
Collapse
|
20
|
Herholz S, Zatorre R. Musical Training as a Framework for Brain Plasticity: Behavior, Function, and Structure. Neuron 2012; 76:486-502. [PMID: 23141061 DOI: 10.1016/j.neuron.2012.10.011] [Citation(s) in RCA: 415] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2012] [Indexed: 10/27/2022]
|
21
|
DePape AMR, Hall GBC, Tillmann B, Trainor LJ. Auditory processing in high-functioning adolescents with Autism Spectrum Disorder. PLoS One 2012; 7:e44084. [PMID: 22984462 PMCID: PMC3440400 DOI: 10.1371/journal.pone.0044084] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Accepted: 07/31/2012] [Indexed: 11/18/2022] Open
Abstract
Autism Spectrum Disorder (ASD) is a pervasive developmental disorder including abnormalities in perceptual processing. We measure perception in a battery of tests across speech (filtering, phoneme categorization, multisensory integration) and music (pitch memory, meter categorization, harmonic priming). We found that compared to controls, the ASD group showed poorer filtering, less audio-visual integration, less specialization for native phonemic and metrical categories, and a higher instance of absolute pitch. No group differences were found in harmonic priming. Our results are discussed in a developmental framework where culture-specific knowledge acquired early compared to late in development is most impaired, perhaps because of early-accelerated brain growth in ASD. These results suggest that early auditory remediation is needed for good communication and social functioning.
Collapse
Affiliation(s)
- Anne-Marie R. DePape
- Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada
| | - Geoffrey B. C. Hall
- Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada
- Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Ontario, Canada
| | - Barbara Tillmann
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, CNRS-UMR 5292, INSERM U1028, Université Lyon 1, Lyon, Rhône-Alpes, France
| | - Laurel J. Trainor
- Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario, Canada
- Rotman Research Institute, Baycrest Hospital, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
22
|
Marie C, Trainor LJ. Development of simultaneous pitch encoding: infants show a high voice superiority effect. Cereb Cortex 2012; 23:660-9. [PMID: 22419678 DOI: 10.1093/cercor/bhs050] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Infants must learn to make sense of real-world auditory environments containing simultaneous and overlapping sounds. In adults, event-related potential studies have demonstrated the existence of separate preattentive memory traces for concurrent note sequences and revealed perceptual dominance for encoding of the voice with higher fundamental frequency of 2 simultaneous tones or melodies. Here, we presented 2 simultaneous streams of notes (15 semitones apart) to 7-month-old infants. On 50% of trials, either the higher or the lower note was modified by one semitone, up or down, leaving 50% standard trials. Infants showed mismatch negativity (MMN) to changes in both voices, indicating separate memory traces for each voice. Furthermore, MMN was earlier and larger for the higher voice as in adults. When in the context of a second voice, representation of the lower voice was decreased and that of the higher voice increased compared with when each voice was presented alone. Additionally, correlations between MMN amplitude and amount of weekly music listening suggest that experience affects the development of auditory memory. In sum, the ability to process simultaneous pitches and the dominance of the highest voice emerge early during infancy and are likely important for the perceptual organization of sound in realistic environments.
Collapse
Affiliation(s)
- Céline Marie
- Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | | |
Collapse
|
23
|
Trainor LJ. Predictive information processing is a fundamental learning mechanism present in early development: Evidence from infants. Int J Psychophysiol 2012; 83:256-8. [DOI: 10.1016/j.ijpsycho.2011.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 12/22/2011] [Indexed: 12/01/2022]
|