1
|
Lavan N. Left-handed voices? Examining the perceptual learning of novel person characteristics from the voice. Q J Exp Psychol (Hove) 2024; 77:2325-2338. [PMID: 38229446 DOI: 10.1177/17470218241228849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
We regularly form impressions of who a person is from their voice, such that we can readily categorise people as being female or male, child or adult, trustworthy or not, and can furthermore recognise who specifically is speaking. How we establish mental representations for such categories of person characteristics has, however, only been explored in detail for voice identity learning. In a series of experiments, we therefore set out to examine whether and how listeners can learn to recognise a novel person characteristic. We specifically asked how diagnostic acoustic properties underpinning category distinctions inform perceptual judgements. We manipulated recordings of voices to create acoustic signatures for a person's handedness (left-handed vs. right-handed) in their voice. After training, we found that listeners were able to successfully learn to recognise handedness from voices with above-chance accuracy, although no significant differences in accuracy between the different types of manipulation emerged. Listeners were, furthermore, sensitive to the specific distributions of acoustic properties that underpinned the category distinctions. We, however, also find evidence for perceptual biases that may reflect long-term prior exposure to how voices vary in naturalistic settings. These biases shape how listeners use acoustic information in the voices when forming representations for distinguishing handedness from voices. This study is thus a first step to examine how representations for novel person characteristics are established, outside of voice identity perception. We discuss our findings in light of theoretical accounts of voice perception and speculate about potential mechanisms that may underpin our results.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
2
|
Dolquist DV, Munson B. Clinical Focus: The Development and Description of a Palette of Transmasculine Voices. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2024; 33:1113-1126. [PMID: 38501906 DOI: 10.1044/2024_ajslp-23-00398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
PURPOSE The study of gender and speech has historically excluded studies of transmasculine individuals. Consequently, generalizations about speech and gender are based on cisgender individuals. This lack of representation hinders clinical training and clinical service delivery, particularly by speech-language pathologists providing gender-affirming communication services. This letter describes a new corpus of the speech of American English-speaking transmasculine men, transmasculine nonbinary people, and cisgender men that is open and available to clinicians and researchers. METHOD Twenty masculine-presenting native English speakers from the Upper Midwestern United States (including cisgender men, transmasculine men, and transmasculine nonbinary people) were recorded, producing three sets of speech materials: Consensus Auditory-Perceptual Evaluation of Voice sentences, the Rainbow Passage, and a novel set of sentences developed for this project. Acoustic measures vowels (overall formant frequency scaling, vowel-space dispersion, fundamental frequency, breathiness), consonants (voice onset time of word-initial voiceless stops, spectral moments of word-initial /s/), and the entire sentence (rate of speech) that were made. RESULTS The acoustic measures reveal a wide range for all dependent measures and low correlations among the measures. Results show that many of the voices depart considerably from the norms for men's speech in published studies. CONCLUSION This new corpus can be used to illustrate different ways of sounding masculine by speech-language pathologists performing gender-affirming communication services and by higher education teachers as examples of diverse ways of sounding masculine.
Collapse
Affiliation(s)
- Devin V Dolquist
- Department of Speech-Language-Hearing Sciences, University of Minnesota-Twin Cities, Minneapolis
- School of Music, University of Minnesota-Twin Cities, Minneapolis
| | - Benjamin Munson
- Department of Speech-Language-Hearing Sciences, University of Minnesota-Twin Cities, Minneapolis
| |
Collapse
|
3
|
Busquet F, Efthymiou F, Hildebrand C. Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices. Behav Res Methods 2024; 56:2114-2134. [PMID: 37253958 PMCID: PMC10228884 DOI: 10.3758/s13428-023-02139-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2023] [Indexed: 06/01/2023]
Abstract
The use of voice recordings in both research and industry practice has increased dramatically in recent years-from diagnosing a COVID-19 infection based on patients' self-recorded voice samples to predicting customer emotions during a service center call. Crowdsourced audio data collection in participants' natural environment using their own recording device has opened up new avenues for researchers and practitioners to conduct research at scale across a broad range of disciplines. The current research examines whether fundamental properties of the human voice are reliably and validly captured through common consumer-grade audio-recording devices in current medical, behavioral science, business, and computer science research. Specifically, this work provides evidence from a tightly controlled laboratory experiment analyzing 1800 voice samples and subsequent simulations that recording devices with high proximity to a speaker (such as a headset or a lavalier microphone) lead to inflated measures of amplitude compared to a benchmark studio-quality microphone while recording devices with lower proximity to a speaker (such as a laptop or a smartphone in front of the speaker) systematically reduce measures of amplitude and can lead to biased measures of the speaker's true fundamental frequency. We further demonstrate through simulation studies that these differences can lead to biased and ultimately invalid conclusions in, for example, an emotion detection task. Finally, we outline a set of recording guidelines to ensure reliable and valid voice recordings and offer initial evidence for a machine-learning approach to bias correction in the case of distorted speech signals.
Collapse
Affiliation(s)
- Francesc Busquet
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| | - Fotis Efthymiou
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland
| | - Christian Hildebrand
- Institute of Behavioral Science and Technology, University of St. Gallen, Torstrasse 25, St. Gallen, 9000, Switzerland.
| |
Collapse
|
4
|
Yang J, Sidhu J, Totino G, McKim S, Xu L. Accent rating of vocoded foreign-accented speech by native listeners. JASA EXPRESS LETTERS 2023; 3:095204. [PMID: 37747319 DOI: 10.1121/10.0020989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/23/2023] [Indexed: 09/26/2023]
Abstract
This study examined accent rating of speech samples collected from 12 Mandarin-accented English talkers and two native English talkers. The speech samples were processed with noise- and tone-vocoders at 1, 2, 4, 8, and 16 channels. The accentedness of the vocoded and unprocessed signals was judged by 53 native English listeners on a 9-point scale. The foreign-accented talkers were judged as having a less strong accent in the vocoded conditions than in the unprocessed condition. The native talkers and foreign-accented talkers with varying degrees of accentedness demonstrated different patterns of accent rating changes as a function of the number of channels.
Collapse
Affiliation(s)
- Jing Yang
- Communication Sciences and Disorders, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201, USA
| | - Jaskirat Sidhu
- Communication Sciences and Disorders, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201, USA
| | - Gabrielle Totino
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| | - Sarah McKim
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| | - Li Xu
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| |
Collapse
|
5
|
Mailhos A, Egea-Caparrós DA, Cabana Á, Martínez-Sánchez F. Voice pitch is negatively associated with sociosexual behavior in males but not in females. Front Psychol 2023; 14:1200065. [PMID: 37496795 PMCID: PMC10367086 DOI: 10.3389/fpsyg.2023.1200065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 06/12/2023] [Indexed: 07/28/2023] Open
Abstract
Acoustic cues play a major role in social interactions in many animal species. In addition to the semantic contents of human speech, voice attributes - e.g., voice pitch, formant position, formant dispersion, etc. - have been proposed to provide critical information for the assessment of potential rivals and mates. However, prior studies exploring the association of acoustic attributes with reproductive success, or some of its proxies, have produced mixed results. Here, we investigate whether the mean fundamental frequency (F0), formant position (Pf), and formant dispersion (Df) - dimorphic attributes of the human voice - are related to sociosexuality, as measured by the Revised Sociosexual Orientation Inventory (SOI-R) - a trait also known to exhibit sex differences - in a sample of native Spanish-speaking students (101 males, 147 females). Analyses showed a significant negative correlation between F0 and sociosexual behavior, and between Pf and sociosexual desire in males but not in females. These correlations remained significant after correcting for false discovery rate (FDR) and controlling for age, a potential confounding variable. Our results are consistent with a role of F0 and Pf serving as cues in the mating domain in males but not in females. Alternatively, the association of voice attributes and sociosexual orientation might stem from the parallel effect of male sex hormones both on the male brain and the anatomical structures involved in voice production.
Collapse
Affiliation(s)
- Alvaro Mailhos
- Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
| | | | - Álvaro Cabana
- Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
| | | |
Collapse
|
6
|
Zaltz Y. The effect of stimulus type and testing method on talker discrimination of school-age children. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2611. [PMID: 37129674 DOI: 10.1121/10.0017999] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 04/12/2023] [Indexed: 05/03/2023]
Abstract
Efficient talker discrimination (TD) improves speech understanding under multi-talker conditions. So far, TD of children has been assessed using various testing parameters, making it difficult to draw comparative conclusions. This study explored the effects of the stimulus type and variability on children's TD. Thirty-two children (7-10 years old) underwent eight TD assessments with fundamental frequency + formant changes using an adaptive procedure. Stimuli included consonant-vowel-consonant words or three-word sentences and were either fixed by run or by trial (changing throughout the run). Cognitive skills were also assessed. Thirty-one adults (18-35 years old) served as controls. The results showed (1) poorer TD for the fixed-by-trial than the fixed-by-run method, with both stimulus types for the adults but only with the words for the children; (2) poorer TD for the words than the sentences with the fixed-by-trial method only for the children; and (3) significant correlations between the children's age and TD. These results support a developmental trajectory in the use of perceptual anchoring for TD and in its reliance on comprehensive acoustic and linguistic information. The finding that the testing parameters may influence the top-down and bottom-up processing for TD should be considered when comparing data across studies or when planning new TD experiments.
Collapse
Affiliation(s)
- Yael Zaltz
- Department of Communication Disorders, The Steyer School of Health Professions, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
7
|
Lavan N. How do we describe other people from voices and faces? Cognition 2023; 230:105253. [PMID: 36215763 DOI: 10.1016/j.cognition.2022.105253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/29/2022] [Accepted: 08/06/2022] [Indexed: 11/07/2022]
Abstract
When seeing someone's face or hearing their voice, perceivers routinely infer information about a person's age, sex and social traits. While many experiments have explored how individual person characteristics are perceived in isolation, less is known about which person characteristics are described spontaneously from voices and faces and how descriptions may differ across modalities. In Experiment 1, participants provided free descriptions for voices and faces. These free descriptions followed similar patterns for voices and faces - and for individual identities: Participants spontaneously referred to a wide range of descriptors. Psychological descriptors, such as character traits, were used most frequently; physical characteristics, such as age and sex, were notable as they were mentioned earlier than other types of descriptors. After finding primarily similarities between modalities when analysing person descriptions across identities, Experiment 2 asked whether free descriptions encode how individual identities differ. For this purpose, the measures derived from the free descriptions were linked to voice/face discrimination judgements that are known to describe differences in perceptual properties between identity pairs. Significant relationships emerged within and across modalities, showing that free descriptions indeed encode differences between identities - information that is shared with discrimination judgements. This suggests that the two tasks tap into similar, high-level person representations. These findings show that free description data can offer valuable insights into person perception and underline that person perception is a multivariate process during which perceivers rapidly and spontaneously infer many different person characteristics to form a holistic impression of a person.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom.
| |
Collapse
|
8
|
Li A, Purse R, Holliday N. Variation in global and intonational pitch settings among black and white speakers of Southern American English. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2617. [PMID: 36456281 DOI: 10.1121/10.0014906] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 10/04/2022] [Indexed: 06/17/2023]
Abstract
This article revisits classic questions about how pitch varies between groups by examining global and intonational pitch differences between black and white speakers from Memphis, Tennessee, using data from read speech to control for stylistic and segmental variables. Results from both mixed-effects regression modeling and smoothing spline analysis of variance find no difference between black and white men in mean F0 and pitch range measures. However, black women produced consistently lower mean F0 than white women. These findings suggest that while pitch patterns in black women's speech remain underexplored in the literature, they may play an important role in shaping attitudes and ideological associations concerning black American speakers in general. Moreover, vocal pitch may be a linguistic variable subject to variation, especially in a context of racialized and gendered linguistic standards.
Collapse
Affiliation(s)
- Aini Li
- Department of Linguistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Ruaridh Purse
- Department of Linguistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Nicole Holliday
- Department of Linguistics and Cognitive Science, Pomona College, Claremont, California 91711, USA
| |
Collapse
|
9
|
Mills HE, Shorey AE, Theodore RM, Stilp CE. Context effects in perception of vowels differentiated by F 1 are not influenced by variability in talkers' mean F 1 or F 3. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:55. [PMID: 35931547 DOI: 10.1121/10.0011920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 06/08/2022] [Indexed: 06/15/2023]
Abstract
Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443-1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent.
Collapse
Affiliation(s)
- Hannah E Mills
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
10
|
Kapolowicz MR, Guest DR, Montazeri V, Baese-Berk MM, Assmann PF. Effects of Spectral Envelope and Fundamental Frequency Shifts on the Perception of Foreign-Accented Speech. LANGUAGE AND SPEECH 2022; 65:418-443. [PMID: 34240630 DOI: 10.1177/00238309211029679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
To investigate the role of spectral pattern information in the perception of foreign-accented speech, we measured the effects of spectral shifts on judgments of talker discrimination, perceived naturalness, and intelligibility when listening to Mandarin-accented English and native-accented English sentences. In separate conditions, the spectral envelope and fundamental frequency (F0) contours were shifted up or down in three steps using coordinated scale factors (multiples of 8% and 30%, respectively). Experiment 1 showed that listeners perceive spectrally shifted sentences as coming from a different talker for both native-accented and foreign-accented speech. Experiment 2 demonstrated that downward shifts applied to male talkers and the largest upward shifts applied to all talkers reduced the perceived naturalness, regardless of accent. Overall, listeners rated foreign-accented speech as sounding less natural even for unshifted speech. In Experiment 3, introducing spectral shifts further lowered the intelligibility of foreign-accented speech. When speech from the same foreign-accented talker was shifted to simulate five different talkers, increased exposure failed to produce an improvement in intelligibility scores, similar to the pattern observed when listeners actually heard five foreign-accented talkers. Intelligibility of spectrally shifted native-accented speech was near ceiling performance initially, and no further improvement or decrement was observed. These experiments suggest a mechanism that utilizes spectral envelope and F0 cues in a talker-dependent manner to support the perception of foreign-accented speech.
Collapse
|
11
|
Lim SJ, Thiel C, Sehm B, Deserno L, Lepsien J, Obleser J. Distributed networks for auditory memory differentially contribute to recall precision. Neuroimage 2022; 256:119227. [PMID: 35452804 DOI: 10.1016/j.neuroimage.2022.119227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/13/2022] [Accepted: 04/17/2022] [Indexed: 11/25/2022] Open
Abstract
Re-directing attention to objects in working memory can enhance their representational fidelity. However, how this attentional enhancement of memory representations is implemented across distinct, sensory and cognitive-control brain network is unspecified. The present fMRI experiment leverages psychophysical modelling and multivariate auditory-pattern decoding as behavioral and neural proxies of mnemonic fidelity. Listeners performed an auditory syllable pitch-discrimination task and received retro-active cues to selectively attend to a to-be-probed syllable in memory. Accompanied by increased neural activation in fronto-parietal and cingulo-opercular networks, valid retro-cues yielded faster and more perceptually sensitive responses in recalling acoustic detail of memorized syllables. Information about the cued auditory object was decodable from hemodynamic response patterns in superior temporal sulcus (STS), fronto-parietal, and sensorimotor regions. However, among these regions retaining auditory memory objects, neural fidelity in the left STS and its enhancement through attention-to-memory best predicted individuals' gain in auditory memory recall precision. Our results demonstrate how functionally discrete brain regions differentially contribute to the attentional enhancement of memory representations.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Department of Psychology, Binghamton University, State University of New York, 4400 Vestal Parkway E, Vestal, Binghamton, NY 13902, USA; Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA.
| | - Christiane Thiel
- Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg 26129, Germany
| | - Bernhard Sehm
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lorenz Deserno
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jöran Lepsien
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck 23562, Germany.
| |
Collapse
|
12
|
Linhart P, Mahamoud-Issa M, Stowell D, Blumstein DT. The potential for acoustic individual identification in mammals. Mamm Biol 2022. [DOI: 10.1007/s42991-021-00222-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
13
|
Merritt B, Bent T. Revisiting the acoustics of speaker gender perception: A gender expansive perspective. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:484. [PMID: 35105035 DOI: 10.1121/10.0009282] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 12/21/2021] [Indexed: 06/14/2023]
Abstract
Examinations of speaker gender perception have primarily focused on the roles of fundamental frequency (fo) and formant frequencies from structured speech tasks using cisgender speakers. Yet, there is evidence to suggest that fo and formants do not fully account for listeners' perceptual judgements of gender, particularly from connected speech. This study investigated the perceptual importance of fo, formant frequencies, articulation, and intonation in listeners' judgements of gender identity and masculinity/femininity from spontaneous speech from cisgender male and female speakers as well as transfeminine and transmasculine speakers. Stimuli were spontaneous speech samples from 12 speakers who are cisgender (6 female and 6 male) and 12 speakers who are transgender (6 transfeminine and 6 transmasculine). Listeners performed a two-alternative forced choice (2AFC) gender identification task and masculinity/femininity rating task in two experiments that manipulated which acoustic cues were available. Experiment 1 confirmed that fo and formant frequency manipulations were insufficient to alter listener judgements across all speakers. Experiment 2 demonstrated that articulatory cues had greater weighting than intonation cues on the listeners' judgements when the fo and formant frequencies were in a gender ambiguous range. These findings counter the assumptions that fo and formant manipulations are sufficient to effectively alter perceived speaker gender.
Collapse
Affiliation(s)
- Brandon Merritt
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| | - Tessa Bent
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
14
|
Melchor J, Vergara J, Figueroa T, Morán I, Lemus L. Formant-Based Recognition of Words and Other Naturalistic Sounds in Rhesus Monkeys. Front Neurosci 2021; 15:728686. [PMID: 34776842 PMCID: PMC8586527 DOI: 10.3389/fnins.2021.728686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 10/08/2021] [Indexed: 11/21/2022] Open
Abstract
In social animals, identifying sounds is critical for communication. In humans, the acoustic parameters involved in speech recognition, such as the formant frequencies derived from the resonance of the supralaryngeal vocal tract, have been well documented. However, how formants contribute to recognizing learned sounds in non-human primates remains unclear. To determine this, we trained two rhesus monkeys to discriminate target and non-target sounds presented in sequences of 1–3 sounds. After training, we performed three experiments: (1) We tested the monkeys’ accuracy and reaction times during the discrimination of various acoustic categories; (2) their ability to discriminate morphing sounds; and (3) their ability to identify sounds consisting of formant 1 (F1), formant 2 (F2), or F1 and F2 (F1F2) pass filters. Our results indicate that macaques can learn diverse sounds and discriminate from morphs and formants F1 and F2, suggesting that information from few acoustic parameters suffice for recognizing complex sounds. We anticipate that future neurophysiological experiments in this paradigm may help elucidate how formants contribute to the recognition of sounds.
Collapse
Affiliation(s)
- Jonathan Melchor
- Department of Cognitive Neuroscience, Institute of Cell Physiology, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - José Vergara
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, United States
| | - Tonatiuh Figueroa
- Department of Cognitive Neuroscience, Institute of Cell Physiology, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Isaac Morán
- Department of Cognitive Neuroscience, Institute of Cell Physiology, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Luis Lemus
- Department of Cognitive Neuroscience, Institute of Cell Physiology, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Barreda S, Assmann PF. Perception of gender in children's voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3949. [PMID: 34852594 DOI: 10.1121/10.0006785] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 09/30/2021] [Indexed: 06/13/2023]
Abstract
To investigate the perception of gender from children's voices, adult listeners were presented with /hVd/ syllables, in isolation and in sentence context, produced by children between 5 and 18 years. Half the listeners were informed of the age of the talker during trials, while the other half were not. Correct gender identifications increased with talker age; however, performance was above chance even for age groups where the cues most often associated with gender differentiation (i.e., average fundamental frequency and formant frequencies) were not consistently different between boys and girls. The results of acoustic models suggest that cues were used in an age-dependent manner, whether listeners were explicitly told the age of the talker or not. Overall, results are consistent with the hypothesis that talker age and gender are estimated jointly in the process of speech perception. Furthermore, results show that the gender of individual talkers can be identified accurately well before reliable anatomical differences arise in the vocal tracts of females and males. In general, results support the notion that the transmission of gender information from voice depends substantially on gender-dependent patterns of articulation, rather than following deterministically from anatomical differences between male and female talkers.
Collapse
Affiliation(s)
- Santiago Barreda
- Department of Linguistics, University of California, Davis, California 95616, USA
| | - Peter F Assmann
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
16
|
Tripp A, Munson B. Perceiving gender while perceiving language: Integrating psycholinguistics and gender theory. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1583. [PMID: 34716654 DOI: 10.1002/wcs.1583] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 09/10/2021] [Accepted: 09/29/2021] [Indexed: 11/11/2022]
Abstract
There is a substantial body of literature showing that men and women speak differently and that these differences are endemic to the speech signal. However, the psycholinguistic mechanisms underlying the integration of social category perception and language are still poorly understood. Speaker attributes such as emotional state, age, sex, and race have often been treated in the literature as dissociable, but perceptual systems for social categories demonstrably rely on interdependent cognitive processes. We introduce a diversity science framework for evaluating the existing literature on gender and speech perception, arguing that differences in beliefs about gender may be defined as differences in beliefs about differences. Treating individual, group, and societal level contrasts in ideological patterns as phenomenologically distinctive, we enumerate six ideological arenas which define claims about gender and examine the literature for treatment of these issues. We argue that both participants and investigators predictably show evidence of differences in ideological attitudes toward the normative definition of persons. The influence of social knowledge on linguistic perception therefore occurs in the context of predictable variation in both attention and inattention to people and the distinguishing features which mark them salient as kinds. We link experiences of visibility, invisibility, and hypervisibility with ideological variation regarding the significance of physiological, linguistic, and social features, concluding that gender ideologies are implicated both in linguistic processing and in social judgments of value between groups. We conclude with a summary of the key gaps in the current literature and recommendations for best practices studies that may use in future investigations of socially meaningful variation in speech perception. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Language Linguistics > Language Acquisition Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Alayo Tripp
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, USA
| | - Benjamin Munson
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
17
|
Domestic dogs (Canis lupus familiaris) are sensitive to the correlation between pitch and timbre in human speech. Anim Cogn 2021; 25:545-554. [PMID: 34714438 PMCID: PMC9107418 DOI: 10.1007/s10071-021-01567-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/14/2021] [Accepted: 10/15/2021] [Indexed: 12/01/2022]
Abstract
The perceived pitch of human voices is highly correlated with the fundamental frequency (f0) of the laryngeal source, which is determined largely by the length and mass of the vocal folds. The vocal folds are larger in adult males than in adult females, and men’s voices consequently have a lower pitch than women’s. The length of the supralaryngeal vocal tract (vocal-tract length; VTL) affects the resonant frequencies (formants) of speech which characterize the timbre of the voice. Men’s longer vocal tracts produce lower frequency, and less dispersed, formants than women’s shorter vocal tracts. Pitch and timbre combine to influence the perception of speaker characteristics such as size and age. Together, they can be used to categorize speaker sex with almost perfect accuracy. While it is known that domestic dogs can match a voice to a person of the same sex, there has been no investigation into whether dogs are sensitive to the correlation between pitch and timbre. We recorded a female voice giving three commands (‘Sit’, ‘Lay down’, ‘Come here’), and manipulated the recordings to lower the fundamental frequency (thus lowering pitch), increase simulated VTL (hence affecting timbre), or both (synthesized adult male voice). Dogs responded to the original adult female and synthesized adult male voices equivalently. Their tendency to obey the commands was, however, reduced when either pitch or timbre was manipulated alone. These results suggest that dogs are sensitive to both the pitch and timbre of human voices, and that they learn about the natural covariation of these perceptual attributes.
Collapse
|
18
|
Nuyen BA, Qian ZJ, Campbell RD, Erickson-DiRenzo E, Thomas J, Sung CK. Feminization Laryngoplasty: 17-Year Review on Long-Term Outcomes, Safety, and Technique. Otolaryngol Head Neck Surg 2021; 167:112-117. [PMID: 34399638 DOI: 10.1177/01945998211036870] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
OBJECTIVES Transfeminine patients can experience significant gender dysphoria in vocal communication. Feminization laryngoplasty (FL) is a gender-affirming surgery developed to elevate speaking vocal range, as well as alter vocal resonance and laryngeal cosmesis. The purpose here was to appraise FL's long-term voice outcomes across a 17-year review period. STUDY DESIGN Level III, retrospective study and description of technique. SETTING A single-institution transfeminine voice clinic. METHODS Voice data (speaking fundamental frequency [F0], lowest F0, highest F0, F0 range in both Hertz and semitones, and maximum phonation time [MPT]) were collected and assessed. Self-assessment of voice femininity and complications were documented. RESULTS The 162 patients, all transfeminine women, had a mean age of 40 years with 36-month mean follow-up. There were significant increases in mean speaking F0 (Δ = 50 ± 30 Hz, Δ = 6 ± 3 semitones; P < .001) and mean change in lowest F0 (Δ = 58 ± 31 Hz, Δ = 8 ± 4 semitones; P < .001). There was no significant difference in mean change in highest F0 or MPT. There was significant improvement (Δ = 60% ± 39%; P < .001) in perceptual self-assessment of vocal femininity. There was a 1.2% rate of major postoperative complications requiring inpatient admission or operative intervention. There were no differences in vocal outcomes between those patients who had less than 1-year follow-up and those who had 5-year follow-up. CONCLUSION FL in this cohort was a safe and effective technique for increasing mean speaking F0, mean lowest F0, and voice gender perception over a prolonged follow-up period. These findings add to the possible treatments aimed at addressing the morbid dysphoria related to voice and communication for our transfeminine patients.
Collapse
Affiliation(s)
- Brian A Nuyen
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Z Jason Qian
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Ross D Campbell
- Department of Otolaryngology-Head and Neck Surgery, University of Ottawa, Ontario, Canada
| | - Elizabeth Erickson-DiRenzo
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA.,Division of Laryngology, Stanford University School of Medicine, Stanford, California, USA
| | - James Thomas
- James P. Thomas, MD Voicedoctor Clinic, Portland, Oregon, USA
| | - C Kwang Sung
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA.,Division of Laryngology, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
19
|
Nuyen B, Kandathil C, McDonald D, Thomas J, Most SP. The impact of living with transfeminine vocal gender dysphoria: Health utility outcomes assessment. INTERNATIONAL JOURNAL OF TRANSGENDER HEALTH 2021; 24:99-107. [PMID: 36713148 PMCID: PMC9879186 DOI: 10.1080/26895269.2021.1919277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Background: The voice signals a tremendous amount of gender cues. Transfeminine individuals report debilitating quality-of-life deficits as a result of their vocal gender dysphoria.Aims: We aimed to quantify the potential impact of this dysphoria experienced with quality-adjusted life years (QALYs), as well as associated treatments, through validated health utility measures. Methods: Peri-operative phonometric audio recordings of a consented transfeminine patient volunteer with a history of vocal gender dysphoria aided in the description of two transfeminine health states, pre- and post-vocal feminization gender dysphoria; monocular and binocular blindness were health state controls. Survey responses from general population adults rate these four health states via visual analogue scale (VAS), standard gamble (SG), and time tradeoff (TTO). Results: Survey respondents totaled 206 with a mean age of 35.8 years. Through VAS measures, these general adult respondents on average perceived a year of life with transfeminine vocal gender dysphoria as approximately three-quarters of a life-year of perfect health. Respondents also on average would have risked a 15%-20% chance of death on SG analysis and would have sacrificed 10 years of their remaining life on TTO measures to cure the condition. The QALY scores for the post-gender affirming treatments for vocal gender dysphoria (+0.09 VAS, p < 0.01) were significantly higher compared to the pretreatment state. There were no differences in the severity of these QALY scores by survey respondent's political affiliation or gender identity. Conclusions: To our knowledge, this study is the first to quantify how the general population perceives the health burden of vocal gender dysphoria experienced by transfeminine patients. Feminization treatments including voice therapy with feminization laryngoplasty appear to significantly increase health utility scores.
Collapse
Affiliation(s)
- Brian Nuyen
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Cherian Kandathil
- Division of Facial Plastic and Reconstructive Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Daniella McDonald
- Medical Scientist Training Program, University of California, San Diego School of Medicine, La Jolla, California, USA
| | - James Thomas
- Clinic for Voice Disorders, Portland, Oregon, USA
| | - Sam P. Most
- Division of Facial Plastic and Reconstructive Surgery, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
20
|
|
21
|
Hodges-Simeon CR, Grail GPO, Albert G, Groll MD, Stepp CE, Carré JM, Arnocky SA. Testosterone therapy masculinizes speech and gender presentation in transgender men. Sci Rep 2021; 11:3494. [PMID: 33568701 PMCID: PMC7876019 DOI: 10.1038/s41598-021-82134-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 12/09/2020] [Indexed: 11/18/2022] Open
Abstract
Voice is one of the most noticeably dimorphic traits in humans and plays a central role in gender presentation. Transgender males seeking to align internal identity and external gender expression frequently undergo testosterone (T) therapy to masculinize their voices and other traits. We aimed to determine the importance of changes in vocal masculinity for transgender men and to determine the effectiveness of T therapy at masculinizing three speech parameters: fundamental frequency (i.e., pitch) mean and variation (fo and fo-SD) and estimated vocal tract length (VTL) derived from formant frequencies. Thirty transgender men aged 20 to 40 rated their satisfaction with traits prior to and after T therapy and contributed speech samples and salivary T. Similar-aged cisgender men and women contributed speech samples for comparison. We show that transmen viewed voice change as critical to transition success compared to other masculine traits. However, T therapy may not be sufficient to fully masculinize speech: while fo and fo-SD were largely indistinguishable from cismen, VTL was intermediate between cismen and ciswomen. fo was correlated with salivary T, and VTL associated with T therapy duration. This argues for additional approaches, such as behavior therapy and/or longer duration of hormone therapy, to improve speech transition.
Collapse
Affiliation(s)
- Carolyn R Hodges-Simeon
- Department of Anthropology, Boston University, 232 Bay Stated Rd., Room 102-B, Boston, MA, 02215, USA.
| | - Graham P O Grail
- Department of Anthropology, Boston University, 232 Bay Stated Rd., Room 102-B, Boston, MA, 02215, USA
- Department of Forensic Sciences, George Washington University, Washington, D.C., USA
| | - Graham Albert
- Department of Anthropology, Boston University, 232 Bay Stated Rd., Room 102-B, Boston, MA, 02215, USA
| | - Matti D Groll
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Cara E Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
- Department of Otolaryngology - Head and Neck Surgery, Boston University School of Medicine, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Justin M Carré
- Department of Psychology, Nipissing University, North Bay, ON, Canada
| | - Steven A Arnocky
- Department of Psychology, Nipissing University, North Bay, ON, Canada
| |
Collapse
|
22
|
Abstract
AbstractThe adult voice is a strong bio-social marker for masculinity and femininity. In this study we investigated whether children make gender stereotypical judgments about adults’ occupational competence on the basis of their voice. Forty-eight 8- to 10- year olds were asked to rate the competence of adult voices that varied in vocal masculinity (by artificially manipulating voice pitch) and were randomly paired with 9 occupations (3 stereotypically male, 3 female, 3 gender-neutral). In line with gender stereotypes, children rated men as more competent for the male occupations and women as more competent for the female occupations. Moreover, children rated speakers of both sexes with feminine (high-pitched) voices as more competent for the female occupations. Finally, children rated men (but not women) with masculine (low-pitched) voices as more competent for stereotypically male occupations. Our results thus indicate that stereotypical voice-based judgments of occupational competence previously identified in adults are already present in children, and likely to affect how they consider adults and interact with them in their social environment.
Collapse
|
23
|
Heeren WFL. The effect of word class on speaker-dependent information in the Standard Dutch vowel /aː/. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2028. [PMID: 33138546 DOI: 10.1121/10.0002173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 09/24/2020] [Indexed: 06/11/2023]
Abstract
Linguistic structure co-determines how a speech sound is produced. This study therefore investigated whether the speaker-dependent information in the vowel [aː] varies when uttered in different word classes. From two spontaneous speech corpora, [aː] tokens were sampled and annotated for word class (content, function word). This was done for 50 male adult speakers of Standard Dutch in face-to-face speech (N = 3128 tokens), and another 50 male adult speakers in telephone speech (N = 3136 tokens). First, the effect of word class on various acoustic variables in spontaneous speech was tested. Results showed that [aː]'s were shorter and more centralized in function than content words. Next, tokens were used to assess their speaker-dependent information as a function of word class, by using acoustic-phonetic variables to (a) build speaker classification models and (b) compute the strength-of-evidence, a technique from forensic phonetics. Speaker-classification performance was somewhat better for content than function words, whereas forensic strength-of-evidence was comparable between the word classes. This seems explained by how these methods weigh between- and within-speaker variation. Because these two sources of variation co-varied in size with word class, acoustic word-class variation is not expected to affect the sampling of tokens in forensic speaker comparisons.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Leiden University Centre for Linguistics, Leiden University, Reuvensplaats 3-4, 2311 BE Leiden, the Netherlands
| |
Collapse
|
24
|
Charlton BD, Newman C, Macdonald DW, Buesching CD. Male European badger churrs: insights into call function and motivational basis. Mamm Biol 2020. [DOI: 10.1007/s42991-020-00033-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
25
|
Sulpizio S, Fasoli F, Antonio R, Eyssel F, Paladino MP, Diehl C. Auditory Gaydar: Perception of Sexual Orientation Based on Female Voice. LANGUAGE AND SPEECH 2020; 63:184-206. [PMID: 30773985 DOI: 10.1177/0023830919828201] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We investigated auditory gaydar (i.e., the ability to recognize sexual orientation) in female speakers, addressing three related issues: whether auditory gaydar is (1) accurate, (2) language-dependent (i.e., occurs only in some languages, but not in others), and (3) ingroup-specific (i.e., occurs only when listeners judge speakers of their own language, but not when they judge foreign language speakers). In three experiments, we asked Italian, Portuguese, and German participants (total N = 466) to listen to voices of Italian, Portuguese, and German women, and to rate their sexual orientation. Our results showed that auditory gaydar was not accurate; listeners were not able to identify speakers' sexual orientation correctly. The same pattern emerged consistently across all three languages and when listeners rated foreign-language speakers.
Collapse
Affiliation(s)
- Simone Sulpizio
- Department of Psychology and Cognitive Science, University of Trento, Italy
- Faculty of Psychology, Vita-Salute San Raffaele University, Italy
| | - Fabio Fasoli
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa, Portugal
- School of Psychology, University of Surrey, Guildford, UK
| | - Raquel Antonio
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa, Portugal
| | - Friederike Eyssel
- Center of Excellence, Cognitive Interaction Technology, Bielefeld University, Germany
| | | | - Charlotte Diehl
- Center of Excellence, Cognitive Interaction Technology, Bielefeld University, Germany
| |
Collapse
|
26
|
Wallentin M. Gender differences in language are small but matter for disorders. HANDBOOK OF CLINICAL NEUROLOGY 2020; 175:81-102. [DOI: 10.1016/b978-0-444-64123-6.00007-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
27
|
Root-Gutteridge H, Ratcliffe VF, Korzeniowska AT, Reby D. Dogs perceive and spontaneously normalize formant-related speaker and vowel differences in human speech sounds. Biol Lett 2019; 15:20190555. [PMID: 31795850 DOI: 10.1098/rsbl.2019.0555] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation-dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.
Collapse
Affiliation(s)
- Holly Root-Gutteridge
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton BN1 9QH, UK
| | | | - Anna T Korzeniowska
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton BN1 9QH, UK
| | - David Reby
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton BN1 9QH, UK.,Equipe de Neuro-Ethologie Sensorielle ENES/CRNL, University of Lyon/Saint-Etienne, CNRS UMR5292, INSERM UMR_S 1028, Saint-Etienne, France
| |
Collapse
|
28
|
Choi JY, Perrachione TK. Time and information in perceptual adaptation to speech. Cognition 2019; 192:103982. [PMID: 31229740 PMCID: PMC6732236 DOI: 10.1016/j.cognition.2019.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 05/11/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States.
| |
Collapse
|
29
|
Falagiarda F, Collignon O. Time-resolved discrimination of audio-visual emotion expressions. Cortex 2019; 119:184-194. [DOI: 10.1016/j.cortex.2019.04.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 04/05/2019] [Accepted: 04/29/2019] [Indexed: 10/26/2022]
|
30
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
31
|
Suire A, Raymond M, Barkat-Defradas M. Male Vocal Quality and Its Relation to Females' Preferences. EVOLUTIONARY PSYCHOLOGY 2019; 17:1474704919874675. [PMID: 31564128 PMCID: PMC10367192 DOI: 10.1177/1474704919874675] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 08/16/2019] [Indexed: 11/16/2022] Open
Abstract
In both correlational and experimental settings, studies on women's vocal preferences have reported negative relationships between perceived attractiveness and men's vocal pitch, emphasizing the idea of an adaptive preference. However, such consensus on vocal attractiveness has been mostly conducted with native English speakers, but a few evidence suggest that it may be culture-dependent. Moreover, other overlooked acoustic components of vocal quality, such as intonation, perceived breathiness and roughness, may influence vocal attractiveness. In this context, the present study aims to contribute to the literature by investigating vocal attractiveness in an underrepresented language (i.e., French) as well as shedding light on its relationship with understudied acoustic components of vocal quality. More specifically, we investigated the relationships between attractiveness ratings as assessed by female raters and male voice pitch, its variation, the formants' dispersion and position, and the harmonics-to-noise and jitter ratios. Results show that women were significantly more attracted to lower vocal pitch and higher intonation patterns. However, they did not show any directional preferences for all the other acoustic features. We discuss our results in light of the adaptive functions of vocal preferences in a mate choice context.
Collapse
Affiliation(s)
- Alexandre Suire
- ISEM, University Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Michel Raymond
- ISEM, University Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | |
Collapse
|
32
|
Sliwa J, Takahashi D, Shepherd S. Mécanismes neuronaux pour la communication chez les primates. REVUE DE PRIMATOLOGIE 2018. [DOI: 10.4000/primatologie.2950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
33
|
Lavan N, Domone A, Fisher B, Kenigzstein N, Scott SK, McGettigan C. Speaker Sex Perception from Spontaneous and Volitional Nonverbal Vocalizations. JOURNAL OF NONVERBAL BEHAVIOR 2018; 43:1-22. [PMID: 31148883 PMCID: PMC6514200 DOI: 10.1007/s10919-018-0289-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In two experiments, we explore how speaker sex recognition is affected by vocal flexibility, introduced by volitional and spontaneous vocalizations. In Experiment 1, participants judged speaker sex from two spontaneous vocalizations, laughter and crying, and volitionally produced vowels. Striking effects of speaker sex emerged: For male vocalizations, listeners' performance was significantly impaired for spontaneous vocalizations (laughter and crying) compared to a volitional baseline (repeated vowels), a pattern that was also reflected in longer reaction times for spontaneous vocalizations. Further, performance was less accurate for laughter than crying. For female vocalizations, a different pattern emerged. In Experiment 2, we largely replicated the findings of Experiment 1 using spontaneous laughter, volitional laughter and (volitional) vowels: here, performance for male vocalizations was impaired for spontaneous laughter compared to both volitional laughter and vowels, providing further evidence that differences in volitional control over vocal production may modulate our ability to accurately perceive speaker sex from vocal signals. For both experiments, acoustic analyses showed relationships between stimulus fundamental frequency (F0) and the participants' responses. The higher the F0 of a vocal signal, the more likely listeners were to perceive a vocalization as being produced by a female speaker, an effect that was more pronounced for vocalizations produced by males. We discuss the results in terms of the availability of salient acoustic cues across different vocalizations.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Abigail Domone
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | - Betty Fisher
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | - Noa Kenigzstein
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | | | - Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
34
|
Abstract
Amati and Stradivari violins are highly appreciated by musicians and collectors, but the objective understanding of their acoustic qualities is still lacking. By applying speech analysis techniques, we found early Italian violins to emulate the vocal tract resonances of male singers, comparable to basses or baritones. Stradivari pushed these resonance peaks higher to resemble the shorter vocal tract lengths of tenors or altos. Stradivari violins also exhibit vowel qualities that correspond to lower tongue height and backness. These properties may explain the characteristic brilliance of Stradivari violins. The ideal for violin tone in the Baroque era was to imitate the human voice, and we found that Cremonese violins are capable of producing the formant features of human singers. The shape and design of the modern violin are largely influenced by two makers from Cremona, Italy: The instrument was invented by Andrea Amati and then improved by Antonio Stradivari. Although the construction methods of Amati and Stradivari have been carefully examined, the underlying acoustic qualities which contribute to their popularity are little understood. According to Geminiani, a Baroque violinist, the ideal violin tone should “rival the most perfect human voice.” To investigate whether Amati and Stradivari violins produce voice-like features, we recorded the scales of 15 antique Italian violins as well as male and female singers. The frequency response curves are similar between the Andrea Amati violin and human singers, up to ∼4.2 kHz. By linear predictive coding analyses, the first two formants of the Amati exhibit vowel-like qualities (F1/F2 = 503/1,583 Hz), mapping to the central region on the vowel diagram. Its third and fourth formants (F3/F4 = 2,602/3,731 Hz) resemble those produced by male singers. Using F1 to F4 values to estimate the corresponding vocal tract length, we observed that antique Italian violins generally resemble basses/baritones, but Stradivari violins are closer to tenors/altos. Furthermore, the vowel qualities of Stradivari violins show reduced backness and height. The unique formant properties displayed by Stradivari violins may represent the acoustic correlate of their distinctive brilliance perceived by musicians. Our data demonstrate that the pioneering designs of Cremonese violins exhibit voice-like qualities in their acoustic output.
Collapse
|
35
|
Abstract
Infants and adults learn new phonological varieties better when exposed to multiple rather than a single speaker. This article tests whether having a larger social network similarly facilitates phonological performance. Experiment 1 shows that people with larger social networks are better at vowel perception in noise, indicating that the benefit of laboratory exposure to multiple speakers extends to real life experience and to adults tested in their native language. Furthermore, the experiment shows that this association is not due to differences in amount of input or to cognitive differences between people with different social network sizes. Follow-up computational simulations reveal that the benefit of larger social networks is mostly due to increased input variability. Additionally, the simulations show that the boost that larger social networks provide is independent of the amount of input received but is larger if the population is more heterogeneous. Finally, a comparison of "adult" and "child" simulations reconciles previous conflicting findings by suggesting that input variability along the relevant dimension might be less useful at the earliest stages of learning. Together, this article shows when and how the size of our social network influences our speech perception. It thus shows how aspects of our lifestyle can influence our linguistic performance.
Collapse
Affiliation(s)
- Shiri Lev-Ari
- 1 Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,2 Royal Holloway, University of London, Egham, UK
| |
Collapse
|
36
|
Rilliard A, d'Alessandro C, Evrard M. Paradigmatic variation of vowels in expressive speech: Acoustic description and dimensional analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:109. [PMID: 29390730 DOI: 10.1121/1.5018433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Acoustic variation in expressive speech at the syllable level is studied. As emotions or attitudes can be conveyed by short spoken words, analysis of paradigmatic variations in vowels is an important issue to characterize the expressive content of such speech segments. The corpus contains 160 sentences produced under seven expressive conditions (Neutral, Anger, Fear, Surprise, Sensuality, Joy, Sadness) acted by a French female speaker (a total of 1120 sentences, 13 140 vowels). Eleven base acoustic parameters are selected for voice source and vocal tract related feature analysis. An acoustic description of the expressions is drawn, using the dimensions of melodic range, intensity, noise, spectral tilt, vocalic space, and dynamic features. The first three functions of a discriminant analysis explain 95% of the variance in the data. These statistical dimensions are consistently associated with acoustic dimensions. Covariation of intensity and F0 explains over 80% of the variance, followed by noise features (8%), covariation of spectral tilt, and F0 (7%). On the basis of isolated vowels alone, expressions are classified with a mean accuracy of 78%.
Collapse
Affiliation(s)
- Albert Rilliard
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, Centre National de la Recherche Scientifique, Université Paris-Saclay, F-91405 Orsay, France
| | - Christophe d'Alessandro
- Sorbonne Universités, Université Pierre-et-Marie-Curie, University Paris 06, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 7190, Institut Jean Le Rond d'Alembert, 4 Place Jussieu, F-75005 Paris, France
| | - Marc Evrard
- Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, Centre National de la Recherche Scientifique, Université Paris Sud, Université Paris-Saclay, F-91405 Orsay, France
| |
Collapse
|
37
|
Furuyama T, Kobayasi KI, Riquimaroux H. Acoustic characteristics used by Japanese macaques for individual discrimination. ACTA ACUST UNITED AC 2017; 220:3571-3578. [PMID: 28778999 PMCID: PMC5665434 DOI: 10.1242/jeb.154765] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/31/2017] [Indexed: 11/20/2022]
Abstract
The vocalizations of primates contain information about speaker individuality. Many primates, including humans, are able to distinguish conspecifics based solely on vocalizations. The purpose of this study was to investigate the acoustic characteristics used by Japanese macaques in individual vocal discrimination. Furthermore, we tested human subjects using monkey vocalizations to evaluate species specificity with respect to such discriminations. Two monkeys and five humans were trained to discriminate the coo calls of two unfamiliar monkeys. We created a stimulus continuum between the vocalizations of the two monkeys as a set of probe stimuli (whole morph). We also created two sets of continua in which only one acoustic parameter, fundamental frequency (f0) or vocal tract characteristic (VTC), was changed from the coo call of one monkey to that of another while the other acoustic feature remained the same (f0 morph and VTC morph, respectively). According to the results, the reaction times both of monkeys and humans were correlated with the morph proportion under the whole morph and f0 morph conditions. The reaction time to the VTC morph was correlated with the morph proportion in both monkeys, whereas the reaction time in humans, on average, was not correlated with morph proportion. Japanese monkeys relied more consistently on VTC than did humans for discriminating monkey vocalizations. Our results support the idea that the auditory system of primates is specialized for processing conspecific vocalizations and suggest that VTC is a significant acoustic feature used by Japanese macaques to discriminate conspecific vocalizations.
Collapse
Affiliation(s)
- Takafumi Furuyama
- Graduate School of Life and Medical Sciences, Doshisha University, Kyoto, Japan
| | - Kohta I Kobayasi
- Graduate School of Life and Medical Sciences, Doshisha University, Kyoto, Japan
| | - Hiroshi Riquimaroux
- Graduate School of Life and Medical Sciences, Doshisha University, Kyoto, Japan
| |
Collapse
|
38
|
Facial biases on vocal perception and memory. Acta Psychol (Amst) 2017; 177:54-68. [PMID: 28477455 DOI: 10.1016/j.actpsy.2017.04.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 04/17/2017] [Accepted: 04/27/2017] [Indexed: 11/22/2022] Open
Abstract
Does a speaker's face influence the way their voice is heard and later remembered? This question was addressed through two experiments where in each, participants listened to middle-aged voices accompanied by faces that were either age-appropriate, younger or older than the voice or, as a control, no face at all. In Experiment 1, participants evaluated each voice on various acoustical dimensions and speaker characteristics. The results showed that facial displays influenced perception such that the same voice was heard differently depending on the age of the accompanying face. Experiment 2 further revealed that facial displays led to memory distortions that were age-congruent in nature. These findings illustrate that faces can activate certain social categories and preconceived stereotypes that then influence vocal and person perception in a corresponding fashion. Processes of face/voice integration are very similar to those of music/film, indicating that the two areas can mutually inform one another and perhaps, more generally, reflect a centralized mechanism of cross-sensory integration.
Collapse
|
39
|
Šebesta P, Kleisner K, Tureček P, Kočnar T, Akoko RM, Třebický V, Havlíček J. Voices of Africa: acoustic predictors of human male vocal attractiveness. Anim Behav 2017. [DOI: 10.1016/j.anbehav.2017.03.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
40
|
Bang HY, Clayards M, Goad H. Compensatory Strategies in the Developmental Patterns of English /s/: Gender and Vowel Context Effects. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:571-591. [PMID: 28241209 DOI: 10.1044/2016_jslhr-l-15-0381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 07/11/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE The developmental trajectory of English /s/ was investigated to determine the extent to which children's speech productions are acoustically fine-grained. Given the hypothesis that young children have adultlike phonetic knowledge of /s/, the following were examined: (a) whether this knowledge manifests itself in acoustic spectra that match the gender-specific patterns of adults, (b) whether vowel context affects the spectra of /s/ in adults and children similarly, and (c) whether children adopt compensatory production strategies to match adult acoustic targets. METHOD Several acoustic variables were measured from word-initial /s/ (and /t/) and the following vowel in the productions of children aged 2 to 5 years and adult controls using 2 sets of corpora from the Paidologos database. RESULTS Gender-specific patterns in the spectral distribution of /s/ were found. Acoustically, more canonical /s/ was produced before vowels with higher F1 (i.e., lower vowels) in children, a context where lingual articulation is challenging. Measures of breathiness and vowel intrinsic F0 provide evidence that children use a compensatory aerodynamic mechanism to achieve their acoustic targets in articulatorily challenging contexts. CONCLUSION Together, these results provide evidence that children's phonetic knowledge is acoustically detailed and gender specified and that speech production goals are acoustically oriented at early stages of speech development.
Collapse
Affiliation(s)
- Hye-Young Bang
- Department of Linguistics, McGill University, Montreal, QC, Canada
| | - Meghan Clayards
- Department of Linguistics, McGill University, Montreal, QC, CanadaSchool of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada
| | - Heather Goad
- Department of Linguistics, McGill University, Montreal, QC, Canada
| |
Collapse
|
41
|
Bent T, Holt RF. Representation of speech variability. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2017; 8. [DOI: 10.1002/wcs.1434] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 10/20/2016] [Accepted: 11/27/2016] [Indexed: 11/07/2022]
Affiliation(s)
- Tessa Bent
- Department of Speech and Hearing Sciences; Indiana University; Bloomington IN USA
| | - Rachael F. Holt
- Department of Speech and Hearing Science; Ohio State University; Columbus OH USA
| |
Collapse
|
42
|
Abstract
For both humans and other animals, the ability to combine information obtained through different senses is fundamental to the perception of the environment. It is well established that humans form systematic cross-modal correspondences between stimulus features that can facilitate the accurate combination of sensory percepts. However, the evolutionary origins of the perceptual and cognitive mechanisms involved in these cross-modal associations remain surprisingly underexplored. In this review we outline recent comparative studies investigating how non-human mammals naturally combine information encoded in different sensory modalities during communication. The results of these behavioural studies demonstrate that various mammalian species are able to combine signals from different sensory channels when they are perceived to share the same basic features, either because they can be redundantly sensed and/or because they are processed in the same way. Moreover, evidence that a wide range of mammals form complex cognitive representations about signallers, both within and across species, suggests that animals also learn to associate different sensory features which regularly co-occur. Further research is now necessary to determine how multisensory representations are formed in individual animals, including the relative importance of low level feature-related correspondences. Such investigations will generate important insights into how animals perceive and categorise their environment, as well as provide an essential basis for understanding the evolution of multisensory perception in humans.
Collapse
|
43
|
Schochat E, Rocha-Muniz CN, Filippini R. Understanding Auditory Processing Disorder Through the FFR. THE FREQUENCY-FOLLOWING RESPONSE 2017. [DOI: 10.1007/978-3-319-47944-6_9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
44
|
Role of vocal tract characteristics in individual discrimination by Japanese macaques (Macaca fuscata). Sci Rep 2016; 6:32042. [PMID: 27550840 PMCID: PMC4994087 DOI: 10.1038/srep32042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 08/01/2016] [Indexed: 11/08/2022] Open
Abstract
The Japanese macaque (Macaca fuscata) exhibits a species-specific communication sound called the “coo call” to locate group members and maintain within-group contact. Monkeys have been demonstrated to be capable of discriminating between individuals based only on their voices, but there is still debate regarding how the fundamental frequencies (F0) and filter properties of the vocal tract characteristics (VTC) contribute to individual discrimination in nonhuman primates. This study was performed to investigate the acoustic keys used by Japanese macaques in individual discrimination. Two animals were trained with standard Go/NoGo operant conditioning to distinguish the coo calls of two unfamiliar monkeys. The subjects were required to continue depressing a lever until the stimulus changed from one monkey to the other. The test stimuli were synthesized by combining the F0s and VTC from each individual. Both subjects released the lever when the VTC changed, whereas they did not when the F0 changed. The reaction times to the test stimuli were not significantly different from that to the training stimuli that shared the same VTC. Our data suggest that vocal tract characteristics are important for the identification of individuals by Japanese macaques.
Collapse
|
45
|
Kriengwatana B, Terry J, Chládková K, Escudero P. Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners. PLoS One 2016; 11:e0156870. [PMID: 27309889 PMCID: PMC4911083 DOI: 10.1371/journal.pone.0156870] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 05/20/2016] [Indexed: 11/22/2022] Open
Abstract
Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries.
Collapse
Affiliation(s)
- Buddhamas Kriengwatana
- Institute for Biology Leiden, Leiden University, Leiden, the Netherlands
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
| | - Josephine Terry
- The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, Australia
- ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, Australia
| | - Kateřina Chládková
- Amsterdam Center for Language and Communication, Phonetic Sciences, University of Amsterdam, Amsterdam, the Netherlands
- Institute of Psychology, University of Leipzig, Leipzig, Germany
| | - Paola Escudero
- The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Sydney, Australia
- ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, Australia
| |
Collapse
|
46
|
Stoeger AS, Baotic A. Information content and acoustic structure of male African elephant social rumbles. Sci Rep 2016; 6:27585. [PMID: 27273586 PMCID: PMC4897791 DOI: 10.1038/srep27585] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 05/17/2016] [Indexed: 11/14/2022] Open
Abstract
Until recently, the prevailing theory about male African elephants (Loxodonta africana) was that, once adult and sexually mature, males are solitary and targeted only at finding estrous females. While this is true during the state of 'musth' (a condition characterized by aggressive behavior and elevated androgen levels), 'non-musth' males exhibit a social system seemingly based on companionship, dominance and established hierarchies. Research on elephant vocal communication has so far focused on females, and very little is known about the acoustic structure and the information content of male vocalizations. Using the source and filter theory approach, we analyzed social rumbles of 10 male African elephants. Our results reveal that male rumbles encode information about individuality and maturity (age and size), with formant frequencies and absolute fundamental frequency values having the most informative power. This first comprehensive study on male elephant vocalizations gives important indications on their potential functional relevance for male-male and male-female communication. Our results suggest that, similar to the highly social females, future research on male elephant vocal behavior will reveal a complex communication system in which social knowledge, companionship, hierarchy, reproductive competition and the need to communicate over long distances play key roles.
Collapse
Affiliation(s)
- Angela S. Stoeger
- Mammal Communication Lab, Department of Cognitive Biology, University of Vienna, Vienna, 1090, Austria
| | - Anton Baotic
- Mammal Communication Lab, Department of Cognitive Biology, University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
47
|
Meister H, Fürsen K, Streicher B, Lang-Roth R, Walger M. The Use of Voice Cues for Speaker Gender Recognition in Cochlear Implant Recipients. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:546-556. [PMID: 27135985 DOI: 10.1044/2015_jslhr-h-15-0128] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 09/23/2015] [Indexed: 06/05/2023]
Abstract
PURPOSE The focus of this study was to examine the influence of fundamental frequency (F0) and vocal tract length (VTL) modifications on speaker gender recognition in cochlear implant (CI) recipients for different stimulus types. METHOD Single words and sentences were manipulated using isolated or combined F0 and VTL cues. Using an 11-point rating scale, CI recipients and listeners with normal hearing rated the maleness/femaleness of the corresponding voice. RESULTS Speaker gender ratings for combined F0 and VTL modifications were similar across all stimulus types in both CI recipients and listeners with normal hearing, although the CI recipients showed a somewhat larger ambiguity. In contrast to listeners with normal hearing, F0-VTL and F0-only modifications revealed similar ratings in the CI recipients when using words as stimuli. However, when sentences were used, a difference was found between F0-VTL-based and F0-based ratings. Modifying VTL cues alone did not affect ratings in the CI group. CONCLUSIONS Whereas speaker gender ratings by listeners with normal hearing relied on combined VTL and F0 cues, CI recipients made only limited use of VTL cues, which might be one reason behind problems with identifying the speaker on the basis of voice. However, use of the voice cues depended on stimulus type, with the greater information in sentences allowing a more detailed analysis than single words in both listener groups.
Collapse
|
48
|
McCordic JA, Root-Gutteridge H, Cusano DA, Denes SL, Parks SE. Calls of North Atlantic right whales Eubalaena glacialis contain information on individual identity and age class. ENDANGER SPECIES RES 2016. [DOI: 10.3354/esr00735] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
49
|
Reby D, Wyman MT, Frey R, Passilongo D, Gilbert J, Locatelli Y, Charlton BD. Evidence of biphonation and source–filter interactions in the bugles of male North American wapiti (Cervus canadensis). J Exp Biol 2016; 219:1224-36. [DOI: 10.1242/jeb.131219] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/12/2016] [Indexed: 11/20/2022]
Abstract
ABSTRACT
With an average male body mass of 320 kg, the wapiti, Cervus canadensis, is the largest extant species of Old World deer (Cervinae). Despite this large body size, male wapiti produce whistle-like sexual calls called bugles characterised by an extremely high fundamental frequency. Investigations of the biometry and physiology of the male wapiti's relatively large larynx have so far failed to account for the production of such a high fundamental frequency. Our examination of spectrograms of male bugles suggested that the complex harmonic structure is best explained by a dual-source model (biphonation), with one source oscillating at a mean of 145 Hz (F0) and the other oscillating independently at an average of 1426 Hz (G0). A combination of anatomical investigations and acoustical modelling indicated that the F0 of male bugles is consistent with the vocal fold dimensions reported in this species, whereas the secondary, much higher source at G0 is more consistent with an aerodynamic whistle produced as air flows rapidly through a narrow supraglottic constriction. We also report a possible interaction between the higher frequency G0 and vocal tract resonances, as G0 transiently locks onto individual formants as the vocal tract is extended. We speculate that male wapiti have evolved such a dual-source phonation to advertise body size at close range (with a relatively low-frequency F0 providing a dense spectrum to highlight size-related information contained in formants) while simultaneously advertising their presence over greater distances using the very high-amplitude G0 whistle component.
Collapse
Affiliation(s)
- D. Reby
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton BN1 9QH, UK
| | - M. T. Wyman
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton BN1 9QH, UK
| | - R. Frey
- Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin 10315, Germany
| | - D. Passilongo
- Department of Science for Nature and Environmental Resources, University of Sassari, Sassari 07100, Italy
| | - J. Gilbert
- Laboratoire d'Acoustique de l'Université du Maine – UMR CNRS, le Mans 72085, France
| | - Y. Locatelli
- Réserve de la Haute Touche, Muséum National d'Histoire Naturelle, Obterre 36290, France
| | - B. D. Charlton
- School of Biology and Environmental Science, Science Centre West, University College Dublin (UCD), Belfield, Dublin 4, Ireland
| |
Collapse
|
50
|
Patel R, Threats TT. One's Voice: A Central Component of Personal Factors in Augmentative and Alternative Communication. ACTA ACUST UNITED AC 2016. [DOI: 10.1044/persp1.sig12.94] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Augmentative and alternative communication (AAC) devices have opened the gates to interaction for those with severe communication impairments. In the assessment and intervention, all components of the World Health Organization's International Classification of Functioning, Disability, and Health (ICF) should be addressed. However, an important Personal Factor to full integration has been largely ignored—that of one's voice. Each one of us has a unique voice that conveys our age, cultural background and personality—it's how people know and remember you. These affordances of the natural voice are not available to those who express themselves using AAC devices. A personalized digital voice brings the field of AAC to a closer realization of the social model of disability in which individuals are not defined by their disability and it is just one aspect of them. Access to a personalized voice uplifts the AAC user and provides an opportunity for social and emotional engagement that enhances quality of life.
Collapse
Affiliation(s)
- Rupal Patel
- Departments of Communication Sciences and Disorders and Computer and Information Science, Northeastern University
Boston, MA
- VocaliD Inc.
Belmont, MA
| | - Travis T. Threats
- Department of Communication Sciences and Disorders Saint Louis University
St. Louis, MO
| |
Collapse
|