1
|
Winter B. The size and shape of sound: The role of articulation and acoustics in iconicity and crossmodal correspondencesa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2636-2656. [PMID: 40202363 DOI: 10.1121/10.0036362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 03/14/2025] [Indexed: 04/10/2025]
Abstract
Onomatopoeias like hiss and peep are iconic because their forms resemble their meanings. Iconicity can also involve forms and meanings in different modalities, such as when people match the nonce words bouba and kiki to round and angular objects, and mil and mal to small and large ones, also known as "sound symbolism." This paper focuses on what specific analogies motivate such correspondences in spoken language: do people associate shapes and size with how phonemes sound (auditory), or how they are produced (articulatory)? Based on a synthesis of empirical evidence probing the cognitive mechanisms underlying different types of sound symbolism, this paper argues that analogies based on acoustics alone are often sufficient, rendering extant articulatory explanations for many iconic phenomena superfluous. This paper further suggests that different types of crossmodal iconicity in spoken language can fruitfully be understood as an extension of onomatopoeia: when speakers iconically depict such perceptual characteristics as size and shape, they mimic the acoustics that are correlated with these characteristics in the natural world.
Collapse
Affiliation(s)
- Bodo Winter
- Department of Linguistics and Communication, University of Birmingham, Birmingham B15 2TT, United Kingdom
| |
Collapse
|
2
|
Bradshaw AR. Universally memorable voices. Nat Hum Behav 2025; 9:648-649. [PMID: 40011685 DOI: 10.1038/s41562-025-02113-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Affiliation(s)
- Abigail R Bradshaw
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| |
Collapse
|
3
|
Anikin A, Barreda S, Reby D. A practical guide to calculating vocal tract length and scale-invariant formant patterns. Behav Res Methods 2024; 56:5588-5604. [PMID: 38158551 PMCID: PMC11525281 DOI: 10.3758/s13428-023-02288-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2023] [Indexed: 01/03/2024]
Abstract
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE-221 00, Lund, Sweden.
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint Étienne, 42023, St-Étienne, France.
| | - Santiago Barreda
- Department of Linguistics, University of California, Davis, Davis, CA, USA
| | - David Reby
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint Étienne, 42023, St-Étienne, France
- Institut Universitaire de France, 75005, Paris, France
| |
Collapse
|
4
|
Belyk M, Carignan C, McGettigan C. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images. Behav Res Methods 2024; 56:2623-2635. [PMID: 37507650 PMCID: PMC10990993 DOI: 10.3758/s13428-023-02171-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/30/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, UK.
| | - Christopher Carignan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
5
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
6
|
Belyk M, McGettigan C. Real-time magnetic resonance imaging reveals distinct vocal tract configurations during spontaneous and volitional laughter. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210511. [PMID: 36126659 PMCID: PMC9489295 DOI: 10.1098/rstb.2021.0511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 02/15/2022] [Indexed: 12/22/2022] Open
Abstract
A substantial body of acoustic and behavioural evidence points to the existence of two broad categories of laughter in humans: spontaneous laughter that is emotionally genuine and somewhat involuntary, and volitional laughter that is produced on demand. In this study, we tested the hypothesis that these are also physiologically distinct vocalizations, by measuring and comparing them using real-time magnetic resonance imaging (rtMRI) of the vocal tract. Following Ruch and Ekman (Ruch and Ekman 2001 In Emotions, qualia, and consciousness (ed. A Kaszniak), pp. 426-443), we further predicted that spontaneous laughter should be relatively less speech-like (i.e. less articulate) than volitional laughter. We collected rtMRI data from five adult human participants during spontaneous laughter, volitional laughter and spoken vowels. We report distinguishable vocal tract shapes during the vocalic portions of these three vocalization types, where volitional laughs were intermediate between spontaneous laughs and vowels. Inspection of local features within the vocal tract across the different vocalization types offers some additional support for Ruch and Ekman's predictions. We discuss our findings in light of a dual pathway hypothesis for the neural control of human volitional and spontaneous vocal behaviours, identifying tongue shape and velum lowering as potential biomarkers of spontaneous laughter to be investigated in future research. This article is part of the theme issue 'Cracking the laugh code: laughter through the lens of biology, psychology and neuroscience'.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk L39 4QP, UK
- Department of Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, UK
| |
Collapse
|