1
|
Assaneo MF, Orpella J. Rhythms in Speech. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:257-274. [PMID: 38918356 DOI: 10.1007/978-3-031-60183-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Speech can be defined as the human ability to communicate through a sequence of vocal sounds. Consequently, speech requires an emitter (the speaker) capable of generating the acoustic signal and a receiver (the listener) able to successfully decode the sounds produced by the emitter (i.e., the acoustic signal). Time plays a central role at both ends of this interaction. On the one hand, speech production requires precise and rapid coordination, typically within the order of milliseconds, of the upper vocal tract articulators (i.e., tongue, jaw, lips, and velum), their composite movements, and the activation of the vocal folds. On the other hand, the generated acoustic signal unfolds in time, carrying information at different timescales. This information must be parsed and integrated by the receiver for the correct transmission of meaning. This chapter describes the temporal patterns that characterize the speech signal and reviews research that explores the neural mechanisms underlying the generation of these patterns and the role they play in speech comprehension.
Collapse
Affiliation(s)
- M Florencia Assaneo
- Instituto de Neurobiología, Universidad Autónoma de México, Santiago de Querétaro, Mexico.
| | - Joan Orpella
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|
2
|
Modulation transfer functions for audiovisual speech. PLoS Comput Biol 2022; 18:e1010273. [PMID: 35852989 PMCID: PMC9295967 DOI: 10.1371/journal.pcbi.1010273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/01/2022] [Indexed: 11/19/2022] Open
Abstract
Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.
Collapse
|
3
|
Canault M, Yamaguchi N, Paillereau N, Krzonowski J, Roy JP, Dos Santos C, Kern S. Syllable duration changes during babbling: a longitudinal study of French infant productions. JOURNAL OF CHILD LANGUAGE 2020; 47:1207-1227. [PMID: 32347197 DOI: 10.1017/s030500092000015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
At the babbling stage, the syllable does not have the temporal characteristics of adult syllables because of the infant's limited oro-motor skills. This research aims to further our knowledge of syllable duration and temporal variability and their evolution with age as an indicator of the development of articulatory skills. The possible impact of syllable position, as well as that of type of intrasyllabic associations and intersyllabic articulatory changes on these parameters has also been tested. Oral productions of 22 French infants were recorded monthly from 8 to 14 months. 11 261 Consonant-Vowel (CV) syllables were annotated and temporally analyzed. The mean duration varied according to syllable position, but not to the intrasyllabic or intersyllabic articulatory changes. Moreover, the syllable duration decreased significantly from the age of 10 months onwards, whereas the temporal variability remained the same.
Collapse
Affiliation(s)
- Mélanie Canault
- Laboratoire Dynamique du Langage, UMR 5596 CNRS, Université Lumière Lyon 2 Institut des Sciences et Techniques de la Réadaptation, 69008Lyon, France
| | - Naomi Yamaguchi
- Laboratoire de Phonétique et Phonologie, UMR 7018 (Sorbonne-Nouvelle & CNRS), 75005Paris, France
| | - Nikola Paillereau
- Laboratoire de Phonétique et Phonologie, UMR 7018 (Sorbonne-Nouvelle & CNRS), 75005Paris, France
- Institute of Psychology, Czech Academy of Sciences, Prague, Czech Republic
| | - Jennifer Krzonowski
- Laboratoire Dynamique du Langage, UMR 5596 CNRS, Université Lumière Lyon 2 Institut des Sciences et Techniques de la Réadaptation, 69008Lyon, France
| | - Johanna-Pascale Roy
- Laboratoire de phonétique - Département de langues, linguistique et traduction - Université Laval - Quebec, Canada
| | | | - Sophie Kern
- Laboratoire Dynamique du Langage, UMR 5596 CNRS, Université Lumière Lyon 2 Institut des Sciences et Techniques de la Réadaptation, 69008Lyon, France
| |
Collapse
|
4
|
Alexandrou AM, Saarinen T, Kujala J, Salmelin R. A multimodal spectral approach to characterize rhythm in natural speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:215-26. [PMID: 26827019 DOI: 10.1121/1.4939496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Human utterances demonstrate temporal patterning, also referred to as rhythm. While simple oromotor behaviors (e.g., chewing) feature a salient periodical structure, conversational speech displays a time-varying quasi-rhythmic pattern. Quantification of periodicity in speech is challenging. Unimodal spectral approaches have highlighted rhythmic aspects of speech. However, speech is a complex multimodal phenomenon that arises from the interplay of articulatory, respiratory, and vocal systems. The present study addressed the question of whether a multimodal spectral approach, in the form of coherence analysis between electromyographic (EMG) and acoustic signals, would allow one to characterize rhythm in natural speech more efficiently than a unimodal analysis. The main experimental task consisted of speech production at three speaking rates; a simple oromotor task served as control. The EMG-acoustic coherence emerged as a sensitive means of tracking speech rhythm, whereas spectral analysis of either EMG or acoustic amplitude envelope alone was less informative. Coherence metrics seem to distinguish and highlight rhythmic structure in natural speech.
Collapse
Affiliation(s)
- Anna Maria Alexandrou
- Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland
| | - Timo Saarinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland
| | - Jan Kujala
- Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, FI-00076 AALTO, Finland
| |
Collapse
|
5
|
Ghazanfar AA, Takahashi DY. The evolution of speech: vision, rhythm, cooperation. Trends Cogn Sci 2014; 18:543-53. [PMID: 25048821 PMCID: PMC4177957 DOI: 10.1016/j.tics.2014.06.004] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 06/11/2014] [Accepted: 06/16/2014] [Indexed: 10/25/2022]
Abstract
A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes, and monkeys recognize the correspondence between vocalizations and their associated facial postures, and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity but lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that rhythmic facial expressions such as lip-smacking may have been linked to vocal output to produce an ancestral form of rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis.
Collapse
Affiliation(s)
- Asif A Ghazanfar
- Princeton Neuroscience Institute, Departments of Psychology and Ecology & Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA.
| | - Daniel Y Takahashi
- Princeton Neuroscience Institute, Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
6
|
Abstract
In primates, different vocalizations are produced, at least in part, by making different facial expressions. Not surprisingly, humans, apes, and monkeys all recognize the correspondence between vocalizations and the facial postures associated with them. However, one major dissimilarity between monkey vocalizations and human speech is that, in the latter, the acoustic output and associated movements of the mouth are both rhythmic (in the 3- to 8-Hz range) and tightly correlated, whereas monkey vocalizations have a similar acoustic rhythmicity but lack the concommitant rhythmic facial motion. This raises the question of how we evolved from a presumptive ancestral acoustic-only vocal rhythm to the one that is audiovisual with improved perceptual sensitivity. According to one hypothesis, this bisensory speech rhythm evolved through the rhythmic facial expressions of ancestral primates. If this hypothesis has any validity, we expect that the extant nonhuman primates produce at least some facial expressions with a speech-like rhythm in the 3- to 8-Hz frequency range. Lip smacking, an affiliative signal observed in many genera of primates, satisfies this criterion. We review a series of studies using developmental, x-ray cineradiographic, EMG, and perceptual approaches with macaque monkeys producing lip smacks to further investigate this hypothesis. We then explore its putative neural basis and remark on important differences between lip smacking and speech production. Overall, the data support the hypothesis that lip smacking may have been an ancestral expression that was linked to vocal output to produce the original rhythmic audiovisual speech-like utterances in the human lineage.
Collapse
|
7
|
Yamashita Y, Nakajima Y, Ueda K, Shimada Y, Hirsh D, Seno T, Smith BA. Acoustic analyses of speech sounds and rhythms in Japanese- and english-learning infants. Front Psychol 2013; 4:57. [PMID: 23450824 PMCID: PMC3584442 DOI: 10.3389/fpsyg.2013.00057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 01/25/2013] [Indexed: 11/25/2022] Open
Abstract
The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults' auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants' speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment.
Collapse
Affiliation(s)
- Yuko Yamashita
- Graduate School of Design, Kyushu UniversityFukuoka, Japan
| | - Yoshitaka Nakajima
- Department of Human Science, Center for Applied Perceptual Research, Kyushu UniversityFukuoka, Japan
| | - Kazuo Ueda
- Department of Human Science, Center for Applied Perceptual Research, Kyushu UniversityFukuoka, Japan
| | - Yohko Shimada
- Graduate School of Asian and African Studies, Kyoto UniversityKyoto, Japan
| | - David Hirsh
- Faculty of Education and Social Work, University of SydneySydney, NSW, Australia
| | - Takeharu Seno
- Faculty of Design, Institute for Advanced Study, Kyushu UniversityFukuoka, Japan
| | | |
Collapse
|
8
|
Ghazanfar AA. Multisensory vocal communication in primates and the evolution of rhythmic speech. Behav Ecol Sociobiol 2013; 67. [PMID: 24222931 DOI: 10.1007/s00265-013-1491-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The integration of the visual and auditory modalities during human speech perception is the default mode of speech processing. That is, visual speech perception is not a capacity that is "piggybacked" on to auditory-only speech perception. Visual information from the mouth and other parts of the face is used by all perceivers to enhance auditory speech. This integration is ubiquitous and automatic and is similar across all individuals across all cultures. The two modalities seem to be integrated even at the earliest stages of human cognitive development. If multisensory speech is the default mode of perception, then this should be reflected in the evolution of vocal communication. The purpose of this review is to describe the data that reveal that human speech is not uniquely multisensory. In fact, the default mode of communication is multisensory in nonhuman primates as well but perhaps emerging with a different developmental trajectory. Speech production, however, exhibits a unique bimodal rhythmic structure in that both the acoustic output and the movements of the mouth are rhythmic and tightly correlated. This structure is absent in most monkey vocalizations. One hypothesis is that the bimodal speech rhythm may have evolved through the rhythmic facial expressions of ancestral primates, as indicated by mounting comparative evidence focusing on the lip-smacking gesture.
Collapse
Affiliation(s)
- Asif A Ghazanfar
- Neuroscience Institute, Princeton University, Princeton NJ 08540, USA ; Department of Psychology, Princeton University, Princeton NJ 08540, USA ; Department of Ecology & Evolutionary Biology, Princeton University, Princeton NJ 08540, USA
| |
Collapse
|
9
|
Ghazanfar AA, Takahashi DY, Mathur N, Fitch WT. Cineradiography of monkey lip-smacking reveals putative precursors of speech dynamics. Curr Biol 2012; 22:1176-82. [PMID: 22658603 PMCID: PMC3569518 DOI: 10.1016/j.cub.2012.04.055] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2012] [Revised: 03/14/2012] [Accepted: 04/17/2012] [Indexed: 11/18/2022]
Abstract
A key feature of speech is its stereotypical 5 Hz rhythm. One theory posits that this rhythm evolved through the modification of rhythmic facial movements in ancestral primates. If the hypothesis has any validity, then a comparative approach may shed some light. We tested this idea by using cineradiography (X-ray movies) to characterize and quantify the internal dynamics of the macaque monkey vocal tract during lip-smacking (a rhythmic facial expression) versus chewing. Previous human studies showed that speech movements are faster than chewing movements, and the functional coordination between vocal tract structures is different between the two behaviors. If rhythmic speech evolved through a rhythmic ancestral facial movement, then one hypothesis is that monkey lip-smacking versus chewing should also exhibit these differences. We found that the lips, tongue, and hyoid move with a speech-like 5 Hz rhythm during lip-smacking, but not during chewing. Most importantly, the functional coordination between these structures was distinct for each behavior. These data provide empirical support for the idea that the human speech rhythm evolved from the rhythmic facial expressions of ancestral primates.
Collapse
Affiliation(s)
- Asif A. Ghazanfar
- Neuroscience Institute
- Departments of Ecology and Evolutionary Biology
- Departments of Psychology Princeton University, Princeton, NJ 08540, USA
| | - Daniel Y. Takahashi
- Neuroscience Institute
- Departments of Psychology Princeton University, Princeton, NJ 08540, USA
| | - Neil Mathur
- Neuroscience Institute
- Departments of Ecology and Evolutionary Biology
| | - W. Tecumseh Fitch
- Department of Cognitive Biology, Faculty of Life Sciences, University of Vienna, 14 Althanstrasse, A-1090 Vienna, Austria
| |
Collapse
|
10
|
Abstract
Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved de novo in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial expressions. We tested this idea by investigating the structure and development of macaque monkey lipsmacks and found that their developmental trajectory is strikingly similar to the one that leads from human infant babbling to adult speech. Specifically, we show that: (1) younger monkeys produce slower, more variable mouth movements and as they get older, these movements become faster and less variable; and (2) this developmental pattern does not occur for another cyclical mouth movement--chewing. These patterns parallel human developmental patterns for speech and chewing. They suggest that, in both species, the two types of rhythmic mouth movements use different underlying neural circuits that develop in different ways. Ultimately, both lipsmacking and speech converge on a ~5 Hz rhythm that represents the frequency that characterizes the speech rhythm of human adults. We conclude that monkey lipsmacking and human speech share a homologous developmental mechanism, lending strong empirical support to the idea that the human speech rhythm evolved from the rhythmic facial expressions of our primate ancestors.
Collapse
Affiliation(s)
- Ryan J. Morrill
- Neuroscience Institute, Princeton University, Princeton NJ 08540, USA
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton NJ 08540, USA
| | - Annika Paukner
- Laboratory of Comparative Ethology, Eunice Kennedy Shriver National Institute of Child Health and Human, Development, National Institutes of Health, Bethesda MD, 20892-7971, USA
| | - Pier F. Ferrari
- Dipartimento di Biologia Evolutiva e Funzionale and Dipartimento di Neuroscienze, Via Usberti 11A, Università di Parma, 43100 Parma, Italy
| | - Asif A. Ghazanfar
- Neuroscience Institute, Princeton University, Princeton NJ 08540, USA
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton NJ 08540, USA
- Department of Psychology, Princeton University, Princeton NJ 08540, USA
| |
Collapse
|
11
|
Abstract
Human speech features rhythmicity that frames distinctive, fine-grained speech patterns. Speech can thus be counted among rhythmic motor behaviors that generally manifest characteristic spontaneous rates. However, the critical neural evidence for tuning of articulatory control to a spontaneous rate of speech has not been uncovered. The present study examined the spontaneous rhythmicity in speech production and its relationship to cortex-muscle neurocommunication, which is essential for speech control. Our MEG results show that, during articulation, coherent oscillatory coupling between the mouth sensorimotor cortex and the mouth muscles is strongest at the frequency of spontaneous rhythmicity of speech at 2-3 Hz, which is also the typical rate of word production. Corticomuscular coherence, a measure of efficient cortex-muscle neurocommunication, thus reveals behaviorally relevant oscillatory tuning for spoken language.
Collapse
|
12
|
Kovelman I, Mascho K, Millott L, Mastic A, Moiseff B, H.Shalinsky M. At the rhythm of language: Brain bases of language-related frequency perception in children. Neuroimage 2012; 60:673-82. [DOI: 10.1016/j.neuroimage.2011.12.066] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 12/21/2011] [Accepted: 12/22/2011] [Indexed: 10/14/2022] Open
|