1
|
Sorokowski P, Pisanski K, Frąckowiak T, Kobylarek A, Groyecka-Bernard A. Voice-based judgments of sex, height, weight, attractiveness, health, and psychological traits based on free speech versus scripted speech. Psychon Bull Rev 2024; 31:1680-1689. [PMID: 38238560 DOI: 10.3758/s13423-023-02445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2023] [Indexed: 08/29/2024]
Abstract
How do we perceive others based on their voices? This question has attracted research and media attention for decades, producing hundreds of studies showing that the voice is socially and biologically relevant, but these studies vary in methodology and ecological validity. Here we test whether vocalizers producing read versus free speech are judged similarly by listeners on ten biological and/or psychosocial traits. In perception experiments using speech from 208 men and women and ratings from 4,088 listeners, we show that listeners' assessments of vocalizer sex and age are highly accurate, regardless of speech type. Assessments of body size, femininity-masculinity and women's health also did not differ between free and read speech. In contrast, read speech elicited higher ratings of attractiveness, dominance and trustworthiness in both sexes and of health in males compared to free speech. Importantly, these differences were small, and we additionally show moderate to strong correlations between ratings of the same vocalizers based on their read and free speech for all ten traits, indicating that voice-based judgments are highly consistent within speakers, whether or not speech is spontaneous. Our results provide evidence that the human voice can communicate various biological and psychosocial traits via both read and free speech, with theoretical and practical implications.
Collapse
Affiliation(s)
- Piotr Sorokowski
- Institute of Psychology, University of Wroclaw, ul. Dawida 1, 50-527, Wroclaw, Poland
| | - Katarzyna Pisanski
- Institute of Psychology, University of Wroclaw, ul. Dawida 1, 50-527, Wroclaw, Poland.
- ENES Bioacoustics Research Lab, CRNL, CNRS UMR5292, University Jean Monnet, Saint-Etienne, France.
- Centre National de la Recherche Scientifique, Laboratoire Dynamique du Langage, Université Lyon 2, Lyon, France.
| | - Tomasz Frąckowiak
- Institute of Psychology, University of Wroclaw, ul. Dawida 1, 50-527, Wroclaw, Poland
| | | | | |
Collapse
|
2
|
Corvin S, Fauchon C, Patural H, Peyron R, Reby D, Theunissen F, Mathevon N. Pain cues override identity cues in baby cries. iScience 2024; 27:110375. [PMID: 39055954 PMCID: PMC11269312 DOI: 10.1016/j.isci.2024.110375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 04/29/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open
Abstract
Baby cries can convey both static information related to individual identity and dynamic information related to the baby's emotional and physiological state. How do these dimensions interact? Are they transmitted independently, or do they compete against one another? Here we show that the universal acoustic expression of pain in distress cries overrides individual differences at the expense of identity signaling. Our acoustic analysis show that pain cries, compared with discomfort cries, are characterized by a more unstable source, thus interfering with the production of identity cues. Machine learning analyses and psychoacoustic experiments reveal that while the baby's identity remains encoded in pain cries, it is considerably weaker than in discomfort cries. Our results are consistent with the prediction that the costs of failing to signal distress outweigh the cost of weakening cues to identity.
Collapse
Affiliation(s)
- Siloé Corvin
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, Saint-Etienne, France
- Université Jean-Monnet-Saint-Etienne, INSERM, CNRS, UCBL, CRNL U1028, NeuroPain team, 42023 Saint-Etienne, France
| | - Camille Fauchon
- Université Jean-Monnet-Saint-Etienne, INSERM, CNRS, UCBL, CRNL U1028, NeuroPain team, 42023 Saint-Etienne, France
- Université Clermont Auvergne, CHU de Clermont-Ferrand, Inserm, Neuro-Dol, Clermont-Ferrand, France
| | - Hugues Patural
- Neonatal and Pediatric Intensive Care Unit, SAINBIOSE laboratory, Inserm, University Hospital of Saint-Etienne, University of Saint-Etienne, Saint-Etienne, France
| | - Roland Peyron
- Université Jean-Monnet-Saint-Etienne, INSERM, CNRS, UCBL, CRNL U1028, NeuroPain team, 42023 Saint-Etienne, France
| | - David Reby
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, Saint-Etienne, France
- Institut universitaire de France, Paris, France
| | - Frédéric Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Nicolas Mathevon
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, Saint-Etienne, France
- Institut universitaire de France, Paris, France
- Ecole Pratique des Hautes Etudes, CHArt lab, PSL University, Paris, France
| |
Collapse
|
3
|
Pisanski K, Reby D, Oleszkiewicz A. Humans need auditory experience to produce typical volitional nonverbal vocalizations. COMMUNICATIONS PSYCHOLOGY 2024; 2:65. [PMID: 39242947 PMCID: PMC11332021 DOI: 10.1038/s44271-024-00104-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 05/16/2024] [Indexed: 09/09/2024]
Abstract
Human nonverbal vocalizations such as screams and cries often reflect their evolved functions. Although the universality of these putatively primordial vocal signals and their phylogenetic roots in animal calls suggest a strong reflexive foundation, many of the emotional vocalizations that we humans produce are under our voluntary control. This suggests that, like speech, volitional vocalizations may require auditory input to develop typically. Here, we acoustically analyzed hundreds of volitional vocalizations produced by profoundly deaf adults and typically-hearing controls. We show that deaf adults produce unconventional and homogenous vocalizations of aggression and pain that are unusually high-pitched, unarticulated, and with extremely few harsh-sounding nonlinear phenomena compared to controls. In contrast, fear vocalizations of deaf adults are relatively acoustically typical. In four lab experiments involving a range of perception tasks with 444 participants, listeners were less accurate in identifying the intended emotions of vocalizations produced by deaf vocalizers than by controls, perceived their vocalizations as less authentic, and reliably detected deafness. Vocalizations of congenitally deaf adults with zero auditory experience were most atypical, suggesting additive effects of auditory deprivation. Vocal learning in humans may thus be required not only for speech, but also to acquire the full repertoire of volitional non-linguistic vocalizations.
Collapse
Affiliation(s)
- Katarzyna Pisanski
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint-Étienne, 42023, Saint-Étienne, France.
- CNRS French National Centre for Scientific Research, DDL Dynamics of Language Lab, University of Lyon 2, 69007, Lyon, France.
- Institute of Psychology, University of Wrocław, 50-527, Wrocław, Poland.
| | - David Reby
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint-Étienne, 42023, Saint-Étienne, France
- Institut Universitaire de France, Paris, France
| | - Anna Oleszkiewicz
- Institute of Psychology, University of Wrocław, 50-527, Wrocław, Poland.
- Department of Otorhinolaryngology, Smell and Taste Clinic, Carl Gustav Carus Medical School, Technische Universitaet Dresden, 01307, Dresden, Germany.
| |
Collapse
|
4
|
Lockhart-Bouron M, Anikin A, Pisanski K, Corvin S, Cornec C, Papet L, Levréro F, Fauchon C, Patural H, Reby D, Mathevon N. Infant cries convey both stable and dynamic information about age and identity. COMMUNICATIONS PSYCHOLOGY 2023; 1:26. [PMID: 39242685 PMCID: PMC11332224 DOI: 10.1038/s44271-023-00022-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/31/2023] [Indexed: 09/09/2024]
Abstract
What information is encoded in the cries of human babies? While it is widely recognized that cries can encode distress levels, whether cries reliably encode the cause of crying remains disputed. Here, we collected 39201 cries from 24 babies recorded in their homes longitudinally, from 15 days to 3.5 months of age, a database we share publicly for reuse. Based on the parental action that stopped the crying, which matched the parental evaluation of cry cause in 75% of cases, each cry was classified as caused by discomfort, hunger, or isolation. Our analyses show that baby cries provide reliable information about age and identity. Baby voices become more tonal and less shrill with age, while individual acoustic signatures drift throughout the first months of life. In contrast, neither machine learning algorithms nor trained adult listeners can reliably recognize the causes of crying.
Collapse
Affiliation(s)
- Marguerite Lockhart-Bouron
- Neonatal and Pediatric Intensive Care Unit, SAINBIOSE Iaboratory, Inserm, University Hospital of Saint-Etienne, University of Saint-Etienne, Saint-Etienne, France
| | - Andrey Anikin
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Division of Cognitive Science, Lund University, Lund, Sweden
| | - Katarzyna Pisanski
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Laboratoire Dynamique du Langage DDL, CNRS, University of Lyon 2, Lyon, France
| | - Siloé Corvin
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Central Integration of Pain-Neuropain Laboratory, CRNL, CNRS, Inserm, UCB Lyon 1, University of Saint-Etienne, Saint-Etienne, France
| | - Clément Cornec
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
| | - Léo Papet
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
| | - Florence Levréro
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Institut universitaire de France, Paris, France
| | - Camille Fauchon
- Central Integration of Pain-Neuropain Laboratory, CRNL, CNRS, Inserm, UCB Lyon 1, University of Saint-Etienne, Saint-Etienne, France
| | - Hugues Patural
- Neonatal and Pediatric Intensive Care Unit, SAINBIOSE Iaboratory, Inserm, University Hospital of Saint-Etienne, University of Saint-Etienne, Saint-Etienne, France
| | - David Reby
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Institut universitaire de France, Paris, France
| | - Nicolas Mathevon
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France.
- Institut universitaire de France, Paris, France.
- Ecole Pratique des Hautes Etudes, CHArt Lab, PSL Research University, Paris, France.
| |
Collapse
|
5
|
Yurdum L, Singh M, Glowacki L, Vardy T, Atkinson QD, Hilton CB, Sauter D, Krasnow MM, Mehr SA. Universal interpretations of vocal music. Proc Natl Acad Sci U S A 2023; 120:e2218593120. [PMID: 37676911 PMCID: PMC10500275 DOI: 10.1073/pnas.2218593120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 06/21/2023] [Indexed: 09/09/2023] Open
Abstract
Despite the variability of music across cultures, some types of human songs share acoustic characteristics. For example, dance songs tend to be loud and rhythmic, and lullabies tend to be quiet and melodious. Human perceptual sensitivity to the behavioral contexts of songs, based on these musical features, suggests that basic properties of music are mutually intelligible, independent of linguistic or cultural content. Whether these effects reflect universal interpretations of vocal music, however, is unclear because prior studies focus almost exclusively on English-speaking participants, a group that is not representative of humans. Here, we report shared intuitions concerning the behavioral contexts of unfamiliar songs produced in unfamiliar languages, in participants living in Internet-connected industrialized societies (n = 5,516 native speakers of 28 languages) or smaller-scale societies with limited access to global media (n = 116 native speakers of three non-English languages). Participants listened to songs randomly selected from a representative sample of human vocal music, originally used in four behavioral contexts, and rated the degree to which they believed the song was used for each context. Listeners in both industrialized and smaller-scale societies inferred the contexts of dance songs, lullabies, and healing songs, but not love songs. Within and across cohorts, inferences were mutually consistent. Further, increased linguistic or geographical proximity between listeners and singers only minimally increased the accuracy of the inferences. These results demonstrate that the behavioral contexts of three common forms of music are mutually intelligible cross-culturally and imply that musical diversity, shaped by cultural evolution, is nonetheless grounded in some universal perceptual phenomena.
Collapse
Affiliation(s)
- Lidya Yurdum
- Child Study Center, Yale University, New Haven, CT06520
- Department of Psychology, University of Amsterdam, Amsterdam1018WT, Netherlands
| | - Manvir Singh
- Department of Anthropology, University of California, Davis, DavisCA95616
| | - Luke Glowacki
- Department of Anthropology, Boston University, Boston, MA02215
| | - Thomas Vardy
- School of Psychology, University of Auckland, Auckland1010, New Zealand
| | | | | | - Disa Sauter
- Department of Psychology, University of Amsterdam, Amsterdam1018WT, Netherlands
| | - Max M. Krasnow
- Division of Continuing Education, Harvard University, Cambridge, MA02138
| | - Samuel A. Mehr
- Child Study Center, Yale University, New Haven, CT06520
- School of Psychology, University of Auckland, Auckland1010, New Zealand
| |
Collapse
|
6
|
Groyecka-Bernard A, Pisanski K, Frąckowiak T, Kobylarek A, Kupczyk P, Oleszkiewicz A, Sabiniewicz A, Wróbel M, Sorokowski P. Do Voice-Based Judgments of Socially Relevant Speaker Traits Differ Across Speech Types? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3674-3694. [PMID: 36167068 DOI: 10.1044/2022_jslhr-21-00690] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE The human voice is a powerful and evolved social tool, with hundreds of studies showing that nonverbal vocal parameters robustly influence listeners' perceptions of socially meaningful speaker traits, ranging from perceived gender and age to attractiveness and trustworthiness. However, these studies have utilized a wide variety of voice stimuli to measure listeners' voice-based judgments of these traits. Here, in the largest scale study known to date, we test whether listeners judge the same unseen speakers differently depending on the complexity of the neutral speech stimulus, from single vowel sounds to a full paragraph. METHOD In a playback experiment testing 2,618 listeners, we examine whether commonly studied voice-based judgments of attractiveness, trustworthiness, dominance, likability, femininity/masculinity, and health differ if listeners hear isolated vowels, a series of vowels, single words, single sentences (greeting), counting from 1 to 10, or a full paragraph recited aloud (Rainbow Passage), recorded from the same 208 men and women. Data were collected using a custom-designed interface in which vocalizers and traits were randomly assigned to raters. RESULTS Linear-mixed models show that the type of voice stimulus does indeed consistently affect listeners' judgments. Overall, ratings of attractiveness, trustworthiness, dominance, likability, health, masculinity among men, and femininity among women increase as speech duration increases. At the same time, speaker-level regression analyses show that interindividual differences in perceived speaker traits are largely preserved across voice stimuli, especially among those of a similar duration. CONCLUSIONS Socially relevant perceptions of speakers are not wholly changed but rather moderated by the length of their speech. Indeed, the same vocalizer is perceived in a similar way regardless of which neutral statements they speak, with the caveat that longer utterances explain the most shared variance in listeners' judgments and elicit the highest ratings on all traits, possibly by providing additional nonverbal information to listeners. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21158890.
Collapse
Affiliation(s)
| | - Katarzyna Pisanski
- Institute of Psychology, University of Wrocław, Poland
- ENES Bioacoustics Research Laboratory, University of Saint-Etienne, France
- CNRS Centre National de la Recherche Scientifique, Laboratoire Dynamique du Langage, Université Lyon 2, France
| | | | | | - Piotr Kupczyk
- Institute of Psychology, University of Wrocław, Poland
| | - Anna Oleszkiewicz
- Institute of Psychology, University of Wrocław, Poland
- Smell and Taste Clinic, Department of Otolaryngology, Technische Universität Dresden, Germany
| | - Agnieszka Sabiniewicz
- Institute of Psychology, University of Wrocław, Poland
- Smell and Taste Clinic, Department of Otolaryngology, Technische Universität Dresden, Germany
| | - Monika Wróbel
- Institute of Psychology, University of Wrocław, Poland
| | | |
Collapse
|
7
|
Reybrouck M, Eerola T. Musical Enjoyment and Reward: From Hedonic Pleasure to Eudaimonic Listening. Behav Sci (Basel) 2022; 12:bs12050154. [PMID: 35621451 PMCID: PMC9137732 DOI: 10.3390/bs12050154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/11/2022] [Accepted: 05/17/2022] [Indexed: 02/04/2023] Open
Abstract
This article is a hypothesis and theory paper. It elaborates on the possible relation between music as a stimulus and its possible effects, with a focus on the question of why listeners are experiencing pleasure and reward. Though it is tempting to seek for a causal relationship, this has proven to be elusive given the many intermediary variables that intervene between the actual impingement on the senses and the reactions/responses by the listener. A distinction can be made, however, between three elements: (i) an objective description of the acoustic features of the music and their possible role as elicitors; (ii) a description of the possible modulating factors—both external/exogenous and internal/endogenous ones; and (iii) a continuous and real-time description of the responses by the listener, both in terms of their psychological reactions and their physiological correlates. Music listening, in this broadened view, can be considered as a multivariate phenomenon of biological, psychological, and cultural factors that, together, shape the overall, full-fledged experience. In addition to an overview of the current and extant research on musical enjoyment and reward, we draw attention to some key methodological problems that still complicate a full description of the musical experience. We further elaborate on how listening may entail both adaptive and maladaptive ways of coping with the sounds, with the former allowing a gentle transition from mere hedonic pleasure to eudaimonic enjoyment.
Collapse
Affiliation(s)
- Mark Reybrouck
- Musicology Research Group, Faculty of Arts, KU Leuven—University of Leuven, 3000 Leuven, Belgium
- Department of Art History, Musicology and Theatre Studies, Institute for Psychoacoustics and Electronic Music (IPEM), 9000 Ghent, Belgium
- Correspondence:
| | - Tuomas Eerola
- Department of Music, Durham University, Durham DH1 3RL, UK;
| |
Collapse
|
8
|
Lee Y, Kreiman J. Acoustic voice variation in spontaneous speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3462. [PMID: 35649890 PMCID: PMC9135459 DOI: 10.1121/10.0011471] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568-1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| |
Collapse
|
9
|
Zhang L, Fujiki RB, Brookes S, Calcagno H, Awonusi O, Kluender K, Berry K, Venkatraman A, Maulden A, Sivasankar MP, Voytik-Harbin S, Halum S. Eliciting and Characterizing Porcine Vocalizations: When Pigs Fly. J Voice 2022:S0892-1997(22)00062-5. [PMID: 35504794 PMCID: PMC9617810 DOI: 10.1016/j.jvoice.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 02/23/2022] [Accepted: 02/24/2022] [Indexed: 11/28/2022]
Abstract
BACKGROUND/OBJECTIVES While voice-related therapeutic interventions are often researched preclinically in the porcine model, there are no well-established methods to induce porcine glottic phonation. Described approaches, such as training animals to phonate for positive reinforcement are time-consuming and plagued by inherent variability in the type of phonation produced and contamination of background noise. Thus, a reliable method of assessing glottic phonation in the porcine model is needed. METHODS In this study, we have created a novel pulley-based apparatus with harness for "pig-lifting" with surrounding acoustic insulation and high-directional microphone with digital recorder for recording phonation. Praat and Matlab were used to analyze all porcine vocalizations for fundamental frequency (F0), intensity, duration of phonation and cepstral peak prominence (CPP). Glottic phonation was detected using F0 (≥2000 hz), duration (≥3 seconds) and researcher perceptual judgment. Partial-glottic phonations were also analyzed. Reliability between researcher judgment and acoustic measures for glottic phonation detection was high. RESULTS Acoustic analysis demonstrated that glottic and partial-glottic phonation was consistently elicited, with no formal training of the minipigs required. Glottic vocalizations increased with multiple lifts. Glottic phonation continued to be elicited after multiple days but became less frequent. Glottic and partial-glottic phonations had similar CPP values over the 6 experimental days. CONCLUSION Our cost-effective, reliable method of inducing and recording glottic phonation in the porcine model may provide a cost effective, preclinical tool in voice research.
Collapse
Affiliation(s)
- Lujuan Zhang
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana
| | - Robert Brinton Fujiki
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana
| | - Sarah Brookes
- Department of Basic Medical Sciences, Purdue University, West Lafayette, Indiana
| | - Haley Calcagno
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana
| | - Oluwaseyi Awonusi
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana
| | - Keith Kluender
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana
| | - Kevin Berry
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana
| | - Anumitha Venkatraman
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana
| | - Amanda Maulden
- Department of Animal Science, Purdue University, West Lafayette, Indiana
| | - M Preeti Sivasankar
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana
| | - Sherry Voytik-Harbin
- Department of Basic Medical Sciences, Purdue University, West Lafayette, Indiana; Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana
| | - Stacey Halum
- Department of Otolaryngology-Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, Indiana; Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
10
|
Lefter I, Baird A, Stappen L, Schuller BW. A Cross-Corpus Speech-Based Analysis of Escalating Negative Interactions. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.749804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The monitoring of an escalating negative interaction has several benefits, particularly in security, (mental) health, and group management. The speech signal is particularly suited to this, as aspects of escalation, including emotional arousal, are proven to easily be captured by the audio signal. A challenge of applying trained systems in real-life applications is their strong dependence on the training material and limited generalization abilities. For this reason, in this contribution, we perform an extensive analysis of three corpora in the Dutch language. All three corpora are high in escalation behavior content and are annotated on alternative dimensions related to escalation. A process of label mapping resulted in two possible ground truth estimations for the three datasets as low, medium, and high escalation levels. To observe class behavior and inter-corpus differences more closely, we perform acoustic analysis of the audio samples, finding that derived labels perform similarly across each corpus, with escalation interaction increasing in pitch (F0) and intensity (dB). We explore the suitability of different speech features, data augmentation, merging corpora for training, and testing on actor and non-actor speech through our experiments. We find that the extent to which merging corpora is successful depends greatly on the similarities between label definitions before label mapping. Finally, we see that the escalation recognition task can be performed in a cross-corpus setup with hand-crafted speech features, obtaining up to 63.8% unweighted average recall (UAR) at best for a cross-corpus analysis, an increase from the inter-corpus results of 59.4% UAR.
Collapse
|
11
|
Pisanski K, Bryant GA, Cornec C, Anikin A, Reby D. Form follows function in human nonverbal vocalisations. ETHOL ECOL EVOL 2022. [DOI: 10.1080/03949370.2022.2026482] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Katarzyna Pisanski
- ENES Sensory Neuro-Ethology Lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, St-Étienne 42023, France
- CNRS French National Centre for Scientific Research, DDL Dynamics of Language Lab, University of Lyon 2, Lyon 69007, France
| | - Gregory A. Bryant
- Department of Communication, Center for Behavior, Evolution, and Culture, University of California, Los Angeles, California, USA
| | - Clément Cornec
- ENES Sensory Neuro-Ethology Lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, St-Étienne 42023, France
| | - Andrey Anikin
- ENES Sensory Neuro-Ethology Lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, St-Étienne 42023, France
- Division of Cognitive Science, Lund University, Lund 22100, Sweden
| | - David Reby
- ENES Sensory Neuro-Ethology Lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, St-Étienne 42023, France
| |
Collapse
|
12
|
Reybrouck M, Podlipniak P, Welch D. Music Listening and Homeostatic Regulation: Surviving and Flourishing in a Sonic World. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 19:278. [PMID: 35010538 PMCID: PMC8751057 DOI: 10.3390/ijerph19010278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/10/2021] [Accepted: 12/20/2021] [Indexed: 01/01/2023]
Abstract
This paper argues for a biological conception of music listening as an evolutionary achievement that is related to a long history of cognitive and affective-emotional functions, which are grounded in basic homeostatic regulation. Starting from the three levels of description, the acoustic description of sounds, the neurological level of processing, and the psychological correlates of neural stimulation, it conceives of listeners as open systems that are in continuous interaction with the sonic world. By monitoring and altering their current state, they can try to stay within the limits of operating set points in the pursuit of a controlled state of dynamic equilibrium, which is fueled by interoceptive and exteroceptive sources of information. Listening, in this homeostatic view, can be adaptive and goal-directed with the aim of maintaining the internal physiology and directing behavior towards conditions that make it possible to thrive by seeking out stimuli that are valued as beneficial and worthy, or by attempting to avoid those that are annoying and harmful. This calls forth the mechanisms of pleasure and reward, the distinction between pleasure and enjoyment, the twin notions of valence and arousal, the affect-related consequences of music listening, the role of affective regulation and visceral reactions to the sounds, and the distinction between adaptive and maladaptive listening.
Collapse
Affiliation(s)
- Mark Reybrouck
- Faculty of Arts, University of Leuven, 3000 Leuven, Belgium
- Department of Art History, Musicology and Theater Studies, IPEM Institute for Psychoacoustics and Electronic Music, 9000 Ghent, Belgium
| | - Piotr Podlipniak
- Institute of Musicology, Adam Mickiewicz University in Poznań, 61-712 Poznan, Poland;
| | - David Welch
- Institute Audiology Section, School of Population Health, University of Auckland, Auckland 2011, New Zealand;
| |
Collapse
|
13
|
Kleisner K, Leongómez JD, Pisanski K, Fiala V, Cornec C, Groyecka-Bernard A, Butovskaya M, Reby D, Sorokowski P, Akoko RM. Predicting strength from aggressive vocalizations versus speech in African bushland and urban communities. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200403. [PMID: 34719250 PMCID: PMC8558769 DOI: 10.1098/rstb.2020.0403] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/23/2021] [Indexed: 02/03/2023] Open
Abstract
The human voice carries information about a vocalizer's physical strength that listeners can perceive and that may influence mate choice and intrasexual competition. Yet, reliable acoustic correlates of strength in human speech remain unclear. Compared to speech, aggressive nonverbal vocalizations (roars) may function to maximize perceived strength, suggesting that their acoustic structure has been selected to communicate formidability, similar to the vocal threat displays of other animals. Here, we test this prediction in two non-WEIRD African samples: an urban community of Cameroonians and rural nomadic Hadza hunter-gatherers in the Tanzanian bushlands. Participants produced standardized speech and volitional roars and provided handgrip strength measures. Using acoustic analysis and information-theoretic multi-model inference and averaging techniques, we show that strength can be measured from both speech and roars, and as predicted, strength is more reliably gauged from roars than vowels, words or greetings. The acoustic structure of roars explains 40-70% of the variance in actual strength within adults of either sex. However, strength is predicted by multiple acoustic parameters whose combinations vary by sex, sample and vocal type. Thus, while roars may maximally signal strength, more research is needed to uncover consistent and likely interacting acoustic correlates of strength in the human voice. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Karel Kleisner
- Department of Philosophy and History of Science, Charles University, Prague, 12800, Czech Republic
| | - Juan David Leongómez
- Human Behaviour Lab (LACH), Faculty of Psychology, Universidad El Bosque, Bogota, DC, 110121, Colombia
| | - Katarzyna Pisanski
- Equipe de Neuro-Ethologie Sensorielle, Centre de Recherche en Neurosciences de Lyon, Jean Monnet University of Saint-Etienne, 42100, France
- CNRS | Centre National de la Recherche Scientifique, Laboratoire Dynamique du Langage, Université Lyon 2, Lyon, 69363, France
- Institute of Psychology, University of Wroclaw, 50–527, Poland
| | - Vojtěch Fiala
- Department of Philosophy and History of Science, Charles University, Prague, 12800, Czech Republic
| | - Clément Cornec
- Equipe de Neuro-Ethologie Sensorielle, Centre de Recherche en Neurosciences de Lyon, Jean Monnet University of Saint-Etienne, 42100, France
| | | | - Marina Butovskaya
- Institute of Ethnology and Anthropology, Russian Academy of Science, Russia
- Russian State University for the Humanities, Moscow, 125047, Russia
| | - David Reby
- Equipe de Neuro-Ethologie Sensorielle, Centre de Recherche en Neurosciences de Lyon, Jean Monnet University of Saint-Etienne, 42100, France
| | | | - Robert Mbe Akoko
- Department of Communication and Development Studies, University of Bamenda, PO Box 39, Bambili, Bamenda, Cameroon
| |
Collapse
|
14
|
Pisanski K, Groyecka-Bernard A, Sorokowski P. Human voice pitch measures are robust across a variety of speech recordings: methodological and theoretical implications. Biol Lett 2021; 17:20210356. [PMID: 34582736 DOI: 10.1098/rsbl.2021.0356] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Fundamental frequency (fo), perceived as voice pitch, is the most sexually dimorphic, perceptually salient and intensively studied voice parameter in human nonverbal communication. Thousands of studies have linked human fo to biological and social speaker traits and life outcomes, from reproductive to economic. Critically, researchers have used myriad speech stimuli to measure fo and infer its functional relevance, from individual vowels to longer bouts of spontaneous speech. Here, we acoustically analysed fo in nearly 1000 affectively neutral speech utterances (vowels, words, counting, greetings, read paragraphs and free spontaneous speech) produced by the same 154 men and women, aged 18-67, with two aims: first, to test the methodological validity of comparing fo measures from diverse speech stimuli, and second, to test the prediction that the vast inter-individual differences in habitual fo found between same-sex adults are preserved across speech types. Indeed, despite differences in linguistic content, duration, scripted or spontan--eous production and within-individual variability, we show that 42-81% of inter-individual differences in fo can be explained between any two speech types. Beyond methodological implications, together with recent evidence that inter-individual differences in fo are remarkably stable across the lifespan and generalize to emotional speech and nonverbal vocalizations, our results further substantiate voice pitch as a robust and reliable biomarker in human communication.
Collapse
Affiliation(s)
- Katarzyna Pisanski
- University of Wroclaw, Wroclaw, Poland.,CNRS/Centre National de la Recherche Scientifique, Laboratoire Dynamique du Langage, Université Lyon 2, Lyon, France.,Equipe de Neuro-Ethologie Sensorielle, Centre de Recherche en Neurosciences de Lyon, Jean Monnet University of Saint-Etienne, France
| | - Agata Groyecka-Bernard
- University of Wroclaw, Wroclaw, Poland.,Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | | |
Collapse
|
15
|
Fletcher MD, Verschuur CA. Electro-Haptic Stimulation: A New Approach for Improving Cochlear-Implant Listening. Front Neurosci 2021; 15:581414. [PMID: 34177440 PMCID: PMC8219940 DOI: 10.3389/fnins.2021.581414] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 04/29/2021] [Indexed: 12/12/2022] Open
Abstract
Cochlear implants (CIs) have been remarkably successful at restoring speech perception for severely to profoundly deaf individuals. Despite their success, several limitations remain, particularly in CI users' ability to understand speech in noisy environments, locate sound sources, and enjoy music. A new multimodal approach has been proposed that uses haptic stimulation to provide sound information that is poorly transmitted by the implant. This augmenting of the electrical CI signal with haptic stimulation (electro-haptic stimulation; EHS) has been shown to improve speech-in-noise performance and sound localization in CI users. There is also evidence that it could enhance music perception. We review the evidence of EHS enhancement of CI listening and discuss key areas where further research is required. These include understanding the neural basis of EHS enhancement, understanding the effectiveness of EHS across different clinical populations, and the optimization of signal-processing strategies. We also discuss the significant potential for a new generation of haptic neuroprosthetic devices to aid those who cannot access hearing-assistive technology, either because of biomedical or healthcare-access issues. While significant further research and development is required, we conclude that EHS represents a promising new approach that could, in the near future, offer a non-invasive, inexpensive means of substantially improving clinical outcomes for hearing-impaired individuals.
Collapse
Affiliation(s)
- Mark D. Fletcher
- Faculty of Engineering and Physical Sciences, University of Southampton Auditory Implant Service, University of Southampton, Southampton, United Kingdom
- Faculty of Engineering and Physical Sciences, Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Carl A. Verschuur
- Faculty of Engineering and Physical Sciences, University of Southampton Auditory Implant Service, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
16
|
Low fundamental and formant frequencies predict fighting ability among male mixed martial arts fighters. Sci Rep 2021; 11:905. [PMID: 33441596 PMCID: PMC7806622 DOI: 10.1038/s41598-020-79408-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 12/08/2020] [Indexed: 01/29/2023] Open
Abstract
Human voice pitch is highly sexually dimorphic and eminently quantifiable, making it an ideal phenotype for studying the influence of sexual selection. In both traditional and industrial populations, lower pitch in men predicts mating success, reproductive success, and social status and shapes social perceptions, especially those related to physical formidability. Due to practical and ethical constraints however, scant evidence tests the central question of whether male voice pitch and other acoustic measures indicate actual fighting ability in humans. To address this, we examined pitch, pitch variability, and formant position of 475 mixed martial arts (MMA) fighters from an elite fighting league, with each fighter's acoustic measures assessed from multiple voice recordings extracted from audio or video interviews available online (YouTube, Google Video, podcasts), totaling 1312 voice recording samples. In four regression models each predicting a separate measure of fighting ability (win percentages, number of fights, Elo ratings, and retirement status), no acoustic measure significantly predicted fighting ability above and beyond covariates. However, after fight statistics, fight history, height, weight, and age were used to extract underlying dimensions of fighting ability via factor analysis, pitch and formant position negatively predicted "Fighting Experience" and "Size" factor scores in a multivariate regression model, explaining 3-8% of the variance. Our findings suggest that lower male pitch and formants may be valid cues of some components of fighting ability in men.
Collapse
|
17
|
Keenan S, Mathevon N, Stevens JM, Nicolè F, Zuberbühler K, Guéry JP, Levréro F. The reliability of individual vocal signature varies across the bonobo's graded repertoire. Anim Behav 2020. [DOI: 10.1016/j.anbehav.2020.08.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|