1
|
Boorom O, Alviar C, Zhang Y, Muñoz VA, Kello CT, Lense MD. Child language and autism diagnosis impact hierarchical temporal structure of parent-child vocal interactions in early childhood. Autism Res 2022; 15:2099-2111. [PMID: 36056678 PMCID: PMC9995224 DOI: 10.1002/aur.2804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 08/15/2022] [Indexed: 12/15/2022]
Abstract
Timing is critical to successful social interactions. The temporal structure of dyadic vocal interactions emerges from the rhythm, timing, and frequency of each individuals' vocalizations and reflects how the dyad dynamically organizes and adapts during an interaction. This study investigated the temporal structure of vocal interactions longitudinally in parent-child dyads of typically developing (TD) infants (n = 49; 9-18 months; 48% male) and toddlers with ASD (n = 23; 27.2 ± 5.0 months; 91.3% male) to identify how developing language and social skills impact the temporal dynamics of the interaction. Acoustic hierarchical temporal structure (HTS), a measure of the nested clustering of acoustic events across multiple timescales, was measured in free play interactions using Allan Factor. HTS reflects a signal's temporal complexity and variability, with greater HTS indicating reduced flexibility of the dyadic system. Child expressive language significantly predicted HTS (ß = -0.2) longitudinally across TD infants, with greater dyadic HTS associated with lower child language skills. ASD dyads exhibited greater HTS (i.e., more rigid temporal structure) than nonverbal matched (d = 0.41) and expressive language matched TD dyads (d = 0.28). Increased HTS in ASD dyads occurred at timescales >1 s, suggesting greater structuring of pragmatic aspects of interaction. Results provide a new window into how language development and social reciprocity serve as constraints to shape parent-child interaction dynamics and showcase a novel automated approach to characterizing vocal interactions across multiple timescales during early childhood.
Collapse
Affiliation(s)
- Olivia Boorom
- Department of Speech-Language-Hearing: Sciences and Disorders, University of Kansas, Lawrence, KS, USA
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Camila Alviar
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Cognitive and Information Sciences, University of California, Merced, Merced, CA, USA
| | - Yumeng Zhang
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Medicine, Health, and Society, Vanderbilt University, Nashville, TN, USA
| | - Valerie A. Muñoz
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA
| | - Christopher T. Kello
- Cognitive and Information Sciences, University of California, Merced, Merced, CA, USA
| | - Miriam D. Lense
- Department of Otolaryngology - Head & Neck Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
2
|
From Boltzmann to Zipf through Shannon and Jaynes. ENTROPY 2020; 22:e22020179. [PMID: 33285954 PMCID: PMC7516604 DOI: 10.3390/e22020179] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 01/31/2020] [Accepted: 02/01/2020] [Indexed: 12/04/2022]
Abstract
The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf’s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes’ maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek’s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf’s law, but with some limitations: the general Zipf’s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.
Collapse
|
3
|
Abstract
In this work we consider Glissando Corpus—an oral corpus of Catalan and Spanish—and empirically analyze the presence of the four classical linguistic laws (Zipf’s law, Herdan’s law, Brevity law, and Menzerath–Altmann’s law) in oral communication, and further complement this with the analysis of two recently formulated laws: lognormality law and size-rank law. By aligning the acoustic signal of speech production with the speech transcriptions, we are able to measure and compare the agreement of each of these laws when measured in both physical and symbolic units. Our results show that these six laws are recovered in both languages but considerably more emphatically so when these are examined in physical units, hence reinforcing the so-called ‘physical hypothesis’ according to which linguistic laws might indeed have a physical origin and the patterns recovered in written texts would, therefore, be just a byproduct of the regularities already present in the acoustic signals of oral communication.
Collapse
|
4
|
Torre IG, Luque B, Lacasa L, Kello CT, Hernández-Fernández A. On the physical origin of linguistic laws and lognormality in speech. ROYAL SOCIETY OPEN SCIENCE 2019; 6:191023. [PMID: 31598263 PMCID: PMC6731709 DOI: 10.1098/rsos.191023] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 07/23/2019] [Indexed: 06/10/2023]
Abstract
Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this 'lognormality law' using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf's Law, Herdan's Law, Brevity Law and Menzerath-Altmann's Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan's Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.
Collapse
Affiliation(s)
- Iván G. Torre
- Departamento de Matemática Aplicada, ETSIAE, Universidad Politécnica de Madrid, Plaza Cardenal Cisneros, 28040 Madrid, Spain
- Cognitive and Information Sciences, University of California Merced, 5200 North Lake Road Merced, 95343 CA, USA
| | - Bartolo Luque
- Departamento de Matemática Aplicada, ETSIAE, Universidad Politécnica de Madrid, Plaza Cardenal Cisneros, 28040 Madrid, Spain
| | - Lucas Lacasa
- School of Mathematical Sciences, Queen Mary University of London, Mile End Road, E1 4NS London, UK
| | - Christopher T. Kello
- Cognitive and Information Sciences, University of California Merced, 5200 North Lake Road Merced, 95343 CA, USA
| | - Antoni Hernández-Fernández
- Complexity and Quantitative Linguistics Lab, Laboratory for Relational Algorithmics, Complexity and Learning (LARCA), Institut de Ciències de l’Educació; Universitat Politècnica de Catalunya, Barcelona, Spain
| |
Collapse
|
5
|
Kello CT, Bella SD, Médé B, Balasubramaniam R. Hierarchical temporal structure in music, speech and animal vocalizations: jazz is like a conversation, humpbacks sing like hermit thrushes. J R Soc Interface 2018; 14:rsif.2017.0231. [PMID: 29021158 DOI: 10.1098/rsif.2017.0231] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 09/12/2017] [Indexed: 11/12/2022] Open
Abstract
Humans talk, sing and play music. Some species of birds and whales sing long and complex songs. All these behaviours and sounds exhibit hierarchical structure-syllables and notes are positioned within words and musical phrases, words and motives in sentences and musical phrases, and so on. We developed a new method to measure and compare hierarchical temporal structures in speech, song and music. The method identifies temporal events as peaks in the sound amplitude envelope, and quantifies event clustering across a range of timescales using Allan factor (AF) variance. AF variances were analysed and compared for over 200 different recordings from more than 16 different categories of signals, including recordings of speech in different contexts and languages, musical compositions and performances from different genres. Non-human vocalizations from two bird species and two types of marine mammals were also analysed for comparison. The resulting patterns of AF variance across timescales were distinct to each of four natural categories of complex sound: speech, popular music, classical music and complex animal vocalizations. Comparisons within and across categories indicated that nested clustering in longer timescales was more prominent when prosodic variation was greater, and when sounds came from interactions among individuals, including interactions between speakers, musicians, and even killer whales. Nested clustering also was more prominent for music compared with speech, and reflected beat structure for popular music and self-similarity across timescales for classical music. In summary, hierarchical temporal structures reflect the behavioural and social processes underlying complex vocalizations and musical performances.
Collapse
Affiliation(s)
- Christopher T Kello
- Cognitive and Information Sciences, University of California, Merced, 5200 North Lake Rd., Merced, CA 95343, USA
| | - Simone Dalla Bella
- EuroMov Laboratory, Université de Montpellier, 700 Avenue du Pic Saint-Loup, 34090 Montpellier, France.,Institut Universitaire de France, 1 Rue Descartes, 75231 Paris, France.,International Laboratory for Brain, Music and Sound Research (BRAMS), 1430 Boulevard du Mont-Royal, Montreal, Quebec, Canada H2 V 2J2.,Department of Cognitive Psychology, WSFiZ in Warsaw, 55 Pawia Street, 01-030 Warsaw, Poland
| | - Butovens Médé
- Cognitive and Information Sciences, University of California, Merced, 5200 North Lake Rd., Merced, CA 95343, USA
| | - Ramesh Balasubramaniam
- Cognitive and Information Sciences, University of California, Merced, 5200 North Lake Rd., Merced, CA 95343, USA
| |
Collapse
|
6
|
Abney DH, Dale R, Louwerse MM, Kello CT. The Bursts and Lulls of Multimodal Interaction: Temporal Distributions of Behavior Reveal Differences Between Verbal and Non-Verbal Communication. Cogn Sci 2018; 42:1297-1316. [PMID: 29630740 DOI: 10.1111/cogs.12612] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 11/22/2017] [Accepted: 01/26/2018] [Indexed: 11/30/2022]
Abstract
Recent studies of naturalistic face-to-face communication have demonstrated coordination patterns such as the temporal matching of verbal and non-verbal behavior, which provides evidence for the proposal that verbal and non-verbal communicative control derives from one system. In this study, we argue that the observed relationship between verbal and non-verbal behaviors depends on the level of analysis. In a reanalysis of a corpus of naturalistic multimodal communication (Louwerse, Dale, Bard, & Jeuniaux, ), we focus on measuring the temporal patterns of specific communicative behaviors in terms of their burstiness. We examined burstiness estimates across different roles of the speaker and different communicative modalities. We observed more burstiness for verbal versus non-verbal channels, and for more versus less informative language subchannels. Using this new method for analyzing temporal patterns in communicative behaviors, we show that there is a complex relationship between verbal and non-verbal channels. We propose a "temporal heterogeneity" hypothesis to explain how the language system adapts to the demands of dialog.
Collapse
Affiliation(s)
- Drew H Abney
- Department of Psychological and Brain Sciences, Indiana University
| | - Rick Dale
- Department of Communication, University of California, Los Angeles
| | - Max M Louwerse
- Cognitive Science and Artificial Intelligence, Tilburg University
| | | |
Collapse
|
7
|
Borges AFT, Giraud AL, Mansvelder HD, Linkenkaer-Hansen K. Scale-Free Amplitude Modulation of Neuronal Oscillations Tracks Comprehension of Accelerated Speech. J Neurosci 2018; 38:710-722. [PMID: 29217685 PMCID: PMC6596185 DOI: 10.1523/jneurosci.1515-17.2017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/24/2017] [Accepted: 11/20/2017] [Indexed: 01/17/2023] Open
Abstract
Speech comprehension is preserved up to a threefold acceleration, but deteriorates rapidly at higher speeds. Current models posit that perceptual resilience to accelerated speech is limited by the brain's ability to parse speech into syllabic units using δ/θ oscillations. Here, we investigated whether the involvement of neuronal oscillations in processing accelerated speech also relates to their scale-free amplitude modulation as indexed by the strength of long-range temporal correlations (LRTC). We recorded MEG while 24 human subjects (12 females) listened to radio news uttered at different comprehensible rates, at a mostly unintelligible rate and at this same speed interleaved with silence gaps. δ, θ, and low-γ oscillations followed the nonlinear variation of comprehension, with LRTC rising only at the highest speed. In contrast, increasing the rate was associated with a monotonic increase in LRTC in high-γ activity. When intelligibility was restored with the insertion of silence gaps, LRTC in the δ, θ, and low-γ oscillations resumed the low levels observed for intelligible speech. Remarkably, the lower the individual subject scaling exponents of δ/θ oscillations, the greater the comprehension of the fastest speech rate. Moreover, the strength of LRTC of the speech envelope decreased at the maximal rate, suggesting an inverse relationship with the LRTC of brain dynamics when comprehension halts. Our findings show that scale-free amplitude modulation of cortical oscillations and speech signals are tightly coupled to speech uptake capacity.SIGNIFICANCE STATEMENT One may read this statement in 20-30 s, but reading it in less than five leaves us clueless. Our minds limit how much information we grasp in an instant. Understanding the neural constraints on our capacity for sensory uptake is a fundamental question in neuroscience. Here, MEG was used to investigate neuronal activity while subjects listened to radio news played faster and faster until becoming unintelligible. We found that speech comprehension is related to the scale-free dynamics of δ and θ bands, whereas this property in high-γ fluctuations mirrors speech rate. We propose that successful speech processing imposes constraints on the self-organization of synchronous cell assemblies and their scale-free dynamics adjusts to the temporal properties of spoken language.
Collapse
Affiliation(s)
- Ana Filipa Teixeira Borges
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| | - Anne-Lise Giraud
- Department of Neuroscience, University of Geneva, Biotech Campus, Geneva 1211, Switzerland
| | - Huibert D Mansvelder
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| | - Klaus Linkenkaer-Hansen
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands,
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| |
Collapse
|
8
|
Falk S, Kello CT. Hierarchical organization in the temporal structure of infant-direct speech and song. Cognition 2017; 163:80-86. [PMID: 28292666 DOI: 10.1016/j.cognition.2017.02.017] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Revised: 02/01/2017] [Accepted: 02/28/2017] [Indexed: 11/26/2022]
Abstract
Caregivers alter the temporal structure of their utterances when talking and singing to infants compared with adult communication. The present study tested whether temporal variability in infant-directed registers serves to emphasize the hierarchical temporal structure of speech. Fifteen German-speaking mothers sang a play song and told a story to their 6-months-old infants, or to an adult. Recordings were analyzed using a recently developed method that determines the degree of nested clustering of temporal events in speech. Events were defined as peaks in the amplitude envelope, and clusters of various sizes related to periods of acoustic speech energy at varying timescales. Infant-directed speech and song clearly showed greater event clustering compared with adult-directed registers, at multiple timescales of hundreds of milliseconds to tens of seconds. We discuss the relation of this newly discovered acoustic property to temporal variability in linguistic units and its potential implications for parent-infant communication and infants learning the hierarchical structures of speech and language.
Collapse
Affiliation(s)
- Simone Falk
- Ludwig-Maximilians-University, Munich, Germany; Laboratoire Parole et Langage, UMR 7309, CNRS / Aix-Marseille University, Aix-en-Provence, France; Laboratoire Phonétique et Phonologie, UMR 7018, CNRS / Université Sorbonne Nouvelle Paris-3, Paris, France.
| | | |
Collapse
|
9
|
Torre IG, Luque B, Lacasa L, Luque J, Hernández-Fernández A. Emergence of linguistic laws in human voice. Sci Rep 2017; 7:43862. [PMID: 28272418 PMCID: PMC5341060 DOI: 10.1038/srep43862] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 01/31/2017] [Indexed: 11/08/2022] Open
Abstract
Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of making comparative studies between human voice and other animal communication systems. Here we bridge this gap by proposing a method that allows to measure such patterns in acoustic signals of arbitrary origin, without needs to have access to the language corpus underneath. The method has been applied to sixteen different human languages, recovering successfully some well-known laws of human communication at timescales even below the phoneme and finding yet another link between complexity and criticality in a biological system. These methods further pave the way for new comparative studies in animal communication or the analysis of signals of unknown code.
Collapse
Affiliation(s)
- Iván González Torre
- Department of Applied Mathematics and Statistics, EIAE, Technical University of Madrid, Plaza Cardenal Cisneros, 28040, Madrid, Spain
| | - Bartolo Luque
- Department of Applied Mathematics and Statistics, EIAE, Technical University of Madrid, Plaza Cardenal Cisneros, 28040, Madrid, Spain
- School of Mathematical Sciences, Queen Mary University of London, Mile End Road, E14NS, London, UK
| | - Lucas Lacasa
- School of Mathematical Sciences, Queen Mary University of London, Mile End Road, E14NS, London, UK
| | - Jordi Luque
- Telefonica Research, Edificio Telefonica-Diagonal 00, Barcelona, Spain
| | - Antoni Hernández-Fernández
- Complexity and Quantitative Linguistics Lab, Laboratory for Relational Algorithmics, Complexity and Learning (LARCA), Institut de Ciències de l’Educació, Universitat Politècnica de Catalunya, Barcelona, Spain
| |
Collapse
|
10
|
Abney DH, Warlaumont AS, Oller DK, Wallot S, Kello CT. Multiple Coordination Patterns in Infant and Adult Vocalizations. INFANCY 2016; 22:514-539. [PMID: 29375276 DOI: 10.1111/infa.12165] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The study of vocal coordination between infants and adults has led to important insights into the development of social, cognitive, emotional and linguistic abilities. We used an automatic system to identify vocalizations produced by infants and adults over the course of the day for fifteen infants studied longitudinally during the first two years of life. We measured three different types of vocal coordination: coincidence-based, rate-based, and cluster-based. Coincidence-based and rate-based coordination are established measures in the developmental literature. Cluster-based coordination is new and measures the strength of matching in the degree to which vocalization events occur in hierarchically nested clusters. We investigated whether various coordination patterns differ as a function of vocalization type, whether different coordination patterns provide unique information about the dynamics of vocal interaction, and how the various coordination patterns each relate to infant age. All vocal coordination patterns displayed greater coordination for infant speech-related vocalizations, adults adapted the hierarchical clustering of their vocalizations to match that of infants, and each of the three coordination patterns had unique associations with infant age. Altogether, our results indicate that vocal coordination between infants and adults is multifaceted, suggesting a complex relationship between vocal coordination and the development of vocal communication.
Collapse
Affiliation(s)
- Drew H Abney
- Cognitive and Information Sciences, University of California, Merced
| | - Anne S Warlaumont
- Cognitive and Information Sciences, University of California, Merced
| | - D Kimbrough Oller
- School of Communication Sciences and Disorders, University of Memphis
| | | | | |
Collapse
|
11
|
Abstract
Quantifying how patterns of behavior relate across multiple levels of measurement typically requires long time series for reliable parameter estimation. We describe a novel analysis that estimates patterns of variability across multiple scales of analysis suitable for time series of short duration. The multiscale coefficient of variation (MSCV) measures the distance between local coefficient of variation estimates within particular time windows and the overall coefficient of variation across all time samples. We first describe the MSCV analysis and provide an example analytical protocol with corresponding MATLAB implementation and code. Next, we present a simulation study testing the new analysis using time series generated by ARFIMA models that span white noise, short-term and long-term correlations. The MSCV analysis was observed to be sensitive to specific parameters of ARFIMA models varying in the type of temporal structure and time series length. We then apply the MSCV analysis to short time series of speech phrases and musical themes to show commonalities in multiscale structure. The simulation and application studies provide evidence that the MSCV analysis can discriminate between time series varying in multiscale structure and length.
Collapse
|