1
|
Chang A, Teng X, Assaneo MF, Poeppel D. The human auditory system uses amplitude modulation to distinguish music from speech. PLoS Biol 2024; 22:e3002631. [PMID: 38805517 PMCID: PMC11132470 DOI: 10.1371/journal.pbio.3002631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/17/2024] [Indexed: 05/30/2024] Open
Abstract
Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated. However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions. Here, we hypothesized that a sound's amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech. Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters. Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music. In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication. The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.
Collapse
Affiliation(s)
- Andrew Chang
- Department of Psychology, New York University, New York, New York, United States of America
| | - Xiangbin Teng
- Department of Psychology, Chinese University of Hong Kong, Hong Kong SAR, China
| | - M. Florencia Assaneo
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - David Poeppel
- Department of Psychology, New York University, New York, New York, United States of America
- Ernst Struengmann Institute for Neuroscience, Frankfurt am Main, Germany
- Center for Language, Music, and Emotion (CLaME), New York University, New York, New York, United States of America
- Music and Audio Research Lab (MARL), New York University, New York, New York, United States of America
| |
Collapse
|
2
|
Yu CY, Cabildo A, Grahn JA, Vanden Bosch der Nederlanden CM. Perceived rhythmic regularity is greater for song than speech: examining acoustic correlates of rhythmic regularity in speech and song. Front Psychol 2023; 14:1167003. [PMID: 37303916 PMCID: PMC10250601 DOI: 10.3389/fpsyg.2023.1167003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/09/2023] [Indexed: 06/13/2023] Open
Abstract
Rhythm is a key feature of music and language, but the way rhythm unfolds within each domain differs. Music induces perception of a beat, a regular repeating pulse spaced by roughly equal durations, whereas speech does not have the same isochronous framework. Although rhythmic regularity is a defining feature of music and language, it is difficult to derive acoustic indices of the differences in rhythmic regularity between domains. The current study examined whether participants could provide subjective ratings of rhythmic regularity for acoustically matched (syllable-, tempo-, and contour-matched) and acoustically unmatched (varying in tempo, syllable number, semantics, and contour) exemplars of speech and song. We used subjective ratings to index the presence or absence of an underlying beat and correlated ratings with stimulus features to identify acoustic metrics of regularity. Experiment 1 highlighted that ratings based on the term "rhythmic regularity" did not result in consistent definitions of regularity across participants, with opposite ratings for participants who adopted a beat-based definition (song greater than speech), a normal-prosody definition (speech greater than song), or an unclear definition (no difference). Experiment 2 defined rhythmic regularity as how easy it would be to tap or clap to the utterances. Participants rated song as easier to clap or tap to than speech for both acoustically matched and unmatched datasets. Subjective regularity ratings from Experiment 2 illustrated that stimuli with longer syllable durations and with less spectral flux were rated as more rhythmically regular across domains. Our findings demonstrate that rhythmic regularity distinguishes speech from song and several key acoustic features can be used to predict listeners' perception of rhythmic regularity within and across domains as well.
Collapse
Affiliation(s)
- Chu Yi Yu
- The Brain and Mind Institute, Western University, London, ON, Canada
- Department of Psychology, Western University, London, ON, Canada
| | - Anne Cabildo
- Department of Psychology, University of Toronto, Mississauga, ON, Canada
| | - Jessica A. Grahn
- The Brain and Mind Institute, Western University, London, ON, Canada
- Department of Psychology, Western University, London, ON, Canada
| | - Christina M. Vanden Bosch der Nederlanden
- The Brain and Mind Institute, Western University, London, ON, Canada
- Department of Psychology, Western University, London, ON, Canada
- Department of Psychology, University of Toronto, Mississauga, ON, Canada
| |
Collapse
|
3
|
Haiduk F, Fitch WT. Understanding Design Features of Music and Language: The Choric/Dialogic Distinction. Front Psychol 2022; 13:786899. [PMID: 35529579 PMCID: PMC9075586 DOI: 10.3389/fpsyg.2022.786899] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 02/22/2022] [Indexed: 12/03/2022] Open
Abstract
Music and spoken language share certain characteristics: both consist of sequences of acoustic elements that are combinatorically combined, and these elements partition the same continuous acoustic dimensions (frequency, formant space and duration). However, the resulting categories differ sharply: scale tones and note durations of small integer ratios appear in music, while speech uses phonemes, lexical tone, and non-isochronous durations. Why did music and language diverge into the two systems we have today, differing in these specific features? We propose a framework based on information theory and a reverse-engineering perspective, suggesting that design features of music and language are a response to their differential deployment along three different continuous dimensions. These include the familiar propositional-aesthetic ('goal') and repetitive-novel ('novelty') dimensions, and a dialogic-choric ('interactivity') dimension that is our focus here. Specifically, we hypothesize that music exhibits specializations enhancing coherent production by several individuals concurrently-the 'choric' context. In contrast, language is specialized for exchange in tightly coordinated turn-taking-'dialogic' contexts. We examine the evidence for our framework, both from humans and non-human animals, and conclude that many proposed design features of music and language follow naturally from their use in distinct dialogic and choric communicative contexts. Furthermore, the hybrid nature of intermediate systems like poetry, chant, or solo lament follows from their deployment in the less typical interactive context.
Collapse
Affiliation(s)
- Felix Haiduk
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
| | - W. Tecumseh Fitch
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
- Vienna Cognitive Science Hub, University of Vienna, Vienna, Austria
| |
Collapse
|
4
|
The influence of memory on the speech-to-song illusion. Mem Cognit 2022; 50:1804-1815. [PMID: 35083717 PMCID: PMC9767999 DOI: 10.3758/s13421-021-01269-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2021] [Indexed: 12/30/2022]
Abstract
In the speech-to-song illusion a spoken phrase is presented repeatedly and begins to sound as if it is being sung. Anecdotal reports suggest that subsequent presentations of a previously heard phrase enhance the illusion, even if several hours or days have elapsed between presentations. In Experiment 1, we examined in a controlled laboratory setting whether memory traces for a previously heard phrase would influence song-like ratings to a subsequent presentation of that phrase. The results showed that word lists that were played several times throughout the experimental session were rated as being more song-like at the end of the experiment than word lists that were played only once in the experimental session. In Experiment 2, we examined if the memory traces that influenced the speech-to-song illusion were abstract in nature or exemplar-based by playing some word lists several times during the experiment in the same voice and playing other word lists several times during the experiment but in different voices. The results showed that word lists played in the same voice were rated as more song-like at the end of the experiment than word lists played in different voices. Many previous studies have examined how various aspects of the stimulus itself influences the perception of the speech-to-song illusion. The results of the present experiments demonstrate that memory traces of the stimulus also influence the speech-to-song illusion.
Collapse
|
5
|
Mullin HAC, Norkey EA, Kodwani A, Vitevitch MS, Castro N. Does age affect perception of the Speech-to-Song Illusion? PLoS One 2021; 16:e0250042. [PMID: 33872326 PMCID: PMC8055000 DOI: 10.1371/journal.pone.0250042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 03/30/2021] [Indexed: 11/19/2022] Open
Abstract
The Speech-to-Song Illusion is an auditory illusion that occurs when a spoken phrase is repeatedly presented. After several presentations, listeners report that the phrase seems to be sung rather than spoken. Previous work [1] indicates that the mechanisms-priming, activation, and satiation-found in the language processing model, Node Structure Theory (NST), may account for the Speech-to-Song Illusion. NST also accounts for other language-related phenomena, including increased experiences in older adults of the tip-of-the-tongue state (where you know a word, but can't retrieve it). Based on the mechanism in NST used to account for the age-related increase in the tip-of-the-tongue phenomenon, we predicted that older adults may be less likely to experience the Speech-to-Song Illusion than younger adults. Adults of a wide range of ages heard a stimulus known to evoke the Speech-to-Song Illusion. Then, they were asked to indicate if they experienced the illusion or not (Study 1), to respond using a 5-point song-likeness rating scale (Study 2), or to indicate when the percept changed from speech to song (Study 3). The results of these studies suggest that the illusion is experienced with similar frequency and strength, and after the same number of repetitions by adult listeners regardless of age.
Collapse
Affiliation(s)
| | - Evan A. Norkey
- University of Kansas, Lawrence, KS, United States of America
| | - Anisha Kodwani
- University of Kansas, Lawrence, KS, United States of America
| | | | - Nichol Castro
- University at Buffalo, Buffalo, NY, United States of America
| |
Collapse
|
6
|
Vitevitch MS, Ng JW, Hatley E, Castro N. Phonological but not semantic influences on the speech-to-song illusion. Q J Exp Psychol (Hove) 2021; 74:585-597. [PMID: 33089742 PMCID: PMC8287799 DOI: 10.1177/1747021820969144] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In the speech to song illusion, a spoken phrase begins to sound as if it is being sung after several repetitions. Castro et al. (2018) used Node Structure Theory (NST; MacKay, 1987), a model of speech perception and production, to explain how the illusion occurs. Two experiments further test the mechanisms found in NST-priming, activation, and satiation-as an account of the speech to song illusion. In Experiment 1, words varying in the phonological clustering coefficient influenced how quickly a lexical node could recover from satiation, thereby influencing the song-like ratings to lists of words that were high versus low in phonological clustering coefficient. In Experiment 2, we used equivalence testing (i.e., the TOST procedure) to demonstrate that once lexical nodes are satiated the higher level semantic information associated with the word cannot differentially influence song-like ratings to lists of words varying in emotional arousal. The results of these two experiments further support the NST account of the speech to song illusion.
Collapse
|