1
|
Wheatley T, Thornton MA, Stolk A, Chang LJ. The Emerging Science of Interacting Minds. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:355-373. [PMID: 38096443 PMCID: PMC10932833 DOI: 10.1177/17456916231200177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
For over a century, psychology has focused on uncovering mental processes of a single individual. However, humans rarely navigate the world in isolation. The most important determinants of successful development, mental health, and our individual traits and preferences arise from interacting with other individuals. Social interaction underpins who we are, how we think, and how we behave. Here we discuss the key methodological challenges that have limited progress in establishing a robust science of how minds interact and the new tools that are beginning to overcome these challenges. A deep understanding of the human mind requires studying the context within which it originates and exists: social interaction.
Collapse
Affiliation(s)
- Thalia Wheatley
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
- Santa Fe Institute
| | - Mark A. Thornton
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Arjen Stolk
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| | - Luke J. Chang
- Consortium for Interacting Minds, Psychological and Brain Sciences, Dartmouth, Hanover, NH USA
| |
Collapse
|
2
|
Arias Sarah P, Hall L, Saitovitch A, Aucouturier JJ, Zilbovicius M, Johansson P. Pupil dilation reflects the dynamic integration of audiovisual emotional speech. Sci Rep 2023; 13:5507. [PMID: 37016041 PMCID: PMC10073148 DOI: 10.1038/s41598-023-32133-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
Emotional speech perception is a multisensory process. When speaking with an individual we concurrently integrate the information from their voice and face to decode e.g., their feelings, moods, and emotions. However, the physiological reactions-such as the reflexive dilation of the pupil-associated to these processes remain mostly unknown. That is the aim of the current article, to investigate whether pupillary reactions can index the processes underlying the audiovisual integration of emotional signals. To investigate this question, we used an algorithm able to increase or decrease the smiles seen in a person's face or heard in their voice, while preserving the temporal synchrony between visual and auditory channels. Using this algorithm, we created congruent and incongruent audiovisual smiles, and investigated participants' gaze and pupillary reactions to manipulated stimuli. We found that pupil reactions can reflect emotional information mismatch in audiovisual speech. In our data, when participants were explicitly asked to extract emotional information from stimuli, the first fixation within emotionally mismatching areas (i.e., the mouth) triggered pupil dilation. These results reveal that pupil dilation can reflect the dynamic integration of audiovisual emotional speech and provide insights on how these reactions are triggered during stimulus perception.
Collapse
Affiliation(s)
- Pablo Arias Sarah
- Lund University Cognitive Science, Lund University, Lund, Sweden.
- STMS Lab, UMR 9912 (IRCAM/CNRS/SU), Paris, France.
- School of Neuroscience and Psychology, Glasgow University, Glasgow, UK.
| | - Lars Hall
- STMS Lab, UMR 9912 (IRCAM/CNRS/SU), Paris, France
| | - Ana Saitovitch
- U1000 Brain Imaging in Psychiatry, INSERM-CEA, Pediatric Radiology Service, Necker Enfants Malades Hospital, Paris V René Descartes University, Paris, France
| | - Jean-Julien Aucouturier
- Department of Robotics and Automation FEMTO-ST Institute (CNRS/Université de Bourgogne Franche Comté), Besançon, France
| | - Monica Zilbovicius
- U1000 Brain Imaging in Psychiatry, INSERM-CEA, Pediatric Radiology Service, Necker Enfants Malades Hospital, Paris V René Descartes University, Paris, France
| | | |
Collapse
|
3
|
Anikin A. The honest sound of physical effort. PeerJ 2023; 11:e14944. [PMID: 37033726 PMCID: PMC10078454 DOI: 10.7717/peerj.14944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/02/2023] [Indexed: 04/05/2023] Open
Abstract
Acoustic correlates of physical effort are still poorly understood, even though effort is vocally communicated in a variety of contexts with crucial fitness consequences, including both confrontational and reproductive social interactions. In this study 33 lay participants spoke during a brief, but intense isometric hold (L-sit), first without any voice-related instructions, and then asked either to conceal their effort or to imitate it without actually performing the exercise. Listeners in two perceptual experiments then rated 383 recordings on perceived level of effort (n = 39 listeners) or categorized them as relaxed speech, actual effort, pretended effort, or concealed effort (n = 102 listeners). As expected, vocal effort increased compared to baseline, but the accompanying acoustic changes (increased loudness, pitch, and tense voice quality) were under voluntary control, so that they could be largely suppressed or imitated at will. In contrast, vocal tremor at approximately 10 Hz was most pronounced under actual load, and its experimental addition to relaxed baseline recordings created the impression of concealed effort. In sum, a brief episode of intense physical effort causes pronounced vocal changes, some of which are difficult to control. Listeners can thus estimate the true level of exertion, whether to judge the condition of their opponent in a fight or to monitor a partner’s investment into cooperative physical activities.
Collapse
|
4
|
Nakai T, Rachman L, Arias Sarah P, Okanoya K, Aucouturier JJ. Algorithmic voice transformations reveal the phonological basis of language-familiarity effects in cross-cultural emotion judgments. PLoS One 2023; 18:e0285028. [PMID: 37134091 PMCID: PMC10156011 DOI: 10.1371/journal.pone.0285028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 04/13/2023] [Indexed: 05/04/2023] Open
Abstract
People have a well-described advantage in identifying individuals and emotions in their own culture, a phenomenon also known as the other-race and language-familiarity effect. However, it is unclear whether native-language advantages arise from genuinely enhanced capacities to extract relevant cues in familiar speech or, more simply, from cultural differences in emotional expressions. Here, to rule out production differences, we use algorithmic voice transformations to create French and Japanese stimulus pairs that differed by exactly the same acoustical characteristics. In two cross-cultural experiments, participants performed better in their native language when categorizing vocal emotional cues and detecting non-emotional pitch changes. This advantage persisted over three types of stimulus degradation (jabberwocky, shuffled and reversed sentences), which disturbed semantics, syntax, and supra-segmental patterns, respectively. These results provide evidence that production differences are not the sole drivers of the language-familiarity effect in cross-cultural emotion perception. Listeners' unfamiliarity with the phonology of another language, rather than with its syntax or semantics, impairs the detection of pitch prosodic cues and, in turn, the recognition of expressive prosody.
Collapse
Affiliation(s)
- Tomoya Nakai
- Lyon Neuroscience Research Center (CRNL), (INSERM/CNRS/University of Lyon), Bron, France
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan
| | - Laura Rachman
- Department of Otorhinolaryngology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Pablo Arias Sarah
- Lund University Cognitive Science, Lund University, Lund, Sweden
- Sciences et Technologies de la Musique et du Son (IRCAM/CNRS/Sorbonne Université), Paris, France
- School of Psychology & Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Kazuo Okanoya
- The University of Tokyo, Graduate School of Arts and Sciences, Tokyo, Japan
- Advanced Comprehensive Research Organization, Teikyo University, Tokyo, Japan
| | - Jean-Julien Aucouturier
- Sciences et Technologies de la Musique et du Son (IRCAM/CNRS/Sorbonne Université), Paris, France
- FEMTO-ST Institute (CNRS/Université de Bourgogne Franche Comté), Besançon, France
| |
Collapse
|
5
|
Guerouaou N, Vaiva G, Aucouturier JJ. The shallow of your smile: the ethics of expressive vocal deep-fakes. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210083. [PMID: 34775820 PMCID: PMC8591385 DOI: 10.1098/rstb.2021.0083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/28/2021] [Indexed: 11/12/2022] Open
Abstract
Rapid technological advances in artificial intelligence are creating opportunities for real-time algorithmic modulations of a person's facial and vocal expressions, or 'deep-fakes'. These developments raise unprecedented societal and ethical questions which, despite much recent public awareness, are still poorly understood from the point of view of moral psychology. We report here on an experimental ethics study conducted on a sample of N = 303 participants (predominantly young, western and educated), who evaluated the acceptability of vignettes describing potential applications of expressive voice transformation technology. We found that vocal deep-fakes were generally well accepted in the population, notably in a therapeutic context and for emotions judged otherwise difficult to control, and surprisingly, even if the user lies to their interlocutors about using them. Unlike other emerging technologies like autonomous vehicles, there was no evidence of social dilemma in which one would, for example, accept for others what they resent for themselves. The only real obstacle to the massive deployment of vocal deep-fakes appears to be situations where they are applied to a speaker without their knowing, but even the acceptability of such situations was modulated by individual differences in moral values and attitude towards science fiction. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.
Collapse
Affiliation(s)
- Nadia Guerouaou
- Science and Technology of Music and Sound, IRCAM/CNRS/Sorbonne Université, Paris, France
- Lille Neuroscience and Cognition Center (LiNC), Team PSY, INSERM U-1172/CHRU Lille, France
| | - Guillaume Vaiva
- Lille Neuroscience and Cognition Center (LiNC), Team PSY, INSERM U-1172/CHRU Lille, France
| | | |
Collapse
|
6
|
Bedoya D, Arias P, Rachman L, Liuni M, Canonne C, Goupil L, Aucouturier JJ. Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200396. [PMID: 34719254 PMCID: PMC8558776 DOI: 10.1098/rstb.2020.0396] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A wealth of theoretical and empirical arguments have suggested that music triggers emotional responses by resembling the inflections of expressive vocalizations, but have done so using low-level acoustic parameters (pitch, loudness, speed) that, in fact, may not be processed by the listener in reference to human voice. Here, we take the opportunity of the recent availability of computational models that allow the simulation of three specifically vocal emotional behaviours: smiling, vocal tremor and vocal roughness. When applied to musical material, we find that these three acoustic manipulations trigger emotional perceptions that are remarkably similar to those observed on speech and scream sounds, and identical across musician and non-musician listeners. Strikingly, this not only applied to singing voice with and without musical background, but also to purely instrumental material. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.
Collapse
Affiliation(s)
- D Bedoya
- Science and Technology of Music and Sound, IRCAM/CNRS/Sorbonne Université, Paris, France
| | - P Arias
- Science and Technology of Music and Sound, IRCAM/CNRS/Sorbonne Université, Paris, France.,Department of Cognitive Science, Lund University, Lund, Sweden
| | - L Rachman
- Faculty of Medical Sciences, University of Groningen, Groningen, The Netherlands
| | - M Liuni
- Alta Voce SAS, Houilles, France
| | - C Canonne
- Science and Technology of Music and Sound, IRCAM/CNRS/Sorbonne Université, Paris, France
| | - L Goupil
- BabyDevLab, University of East London, London, UK
| | - J-J Aucouturier
- FEMTO-ST Institute, Université de Bourgogne Franche-Comté/CNRS, Besançon, France
| |
Collapse
|
7
|
Jeganathan J, Breakspear M. An active inference perspective on the negative symptoms of schizophrenia. Lancet Psychiatry 2021; 8:732-738. [PMID: 33865502 DOI: 10.1016/s2215-0366(20)30527-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 11/03/2020] [Accepted: 11/23/2020] [Indexed: 10/21/2022]
Abstract
Predictive coding has played a transformative role in the study of psychosis, casting delusions and hallucinations as statistical inference in a system with abnormal precision. However, the negative symptoms of schizophrenia, such as affective blunting, avolition, and asociality, remain poorly understood. We propose a computational framework for emotional expression based on active inference-namely that affective behaviours such as smiling are driven by predictions about the social consequences of smiling. Similarly to how delusions and hallucinations can be explained by predictive uncertainty in sensory circuits, negative symptoms naturally arise from uncertainty in social prediction circuits. This perspective draws on computational principles to explain blunted facial expressiveness and apathy-anhedonia in schizophrenia. Its phenomenological consequences also shed light on the content of paranoid delusions and indistinctness of self-other boundaries. Close links are highlighted between social prediction, facial affect mirroring, and the fledgling study of interoception. Advances in automated analysis of facial expressions and acoustic speech patterns will allow empirical testing of these computational models of the negative symptoms of schizophrenia.
Collapse
Affiliation(s)
- Jayson Jeganathan
- School of Psychology, College of Engineering, Science, and the Environment, The University of Newcastle, Newcastle, NSW, Australia; Hunter Medical Research Institute, Newcastle, NSW, Australia.
| | - Michael Breakspear
- School of Psychology, College of Engineering, Science, and the Environment, The University of Newcastle, Newcastle, NSW, Australia; School of Medicine and Public Health, College of Health and Medicine, The University of Newcastle, Newcastle, NSW, Australia; Hunter Medical Research Institute, Newcastle, NSW, Australia
| |
Collapse
|
8
|
Hosaka T, Kimura M, Yotsumoto Y. Neural representations of own-voice in the human auditory cortex. Sci Rep 2021; 11:591. [PMID: 33436798 PMCID: PMC7804419 DOI: 10.1038/s41598-020-80095-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 12/15/2020] [Indexed: 01/29/2023] Open
Abstract
We have a keen sensitivity when it comes to the perception of our own voices. We can detect not only the differences between ourselves and others, but also slight modifications of our own voices. Here, we examined the neural correlates underlying such sensitive perception of one's own voice. In the experiments, we modified the subjects' own voices by using five types of filters. The subjects rated the similarity of the presented voices to their own. We compared BOLD (Blood Oxygen Level Dependent) signals between the voices that subjects rated as least similar to their own voice and those they rated as most similar. The contrast revealed that the bilateral superior temporal gyrus exhibited greater activities while listening to the voice least similar to their own voice and lesser activation while listening to the voice most similar to their own. Our results suggest that the superior temporal gyrus is involved in neural sharpening for the own-voice. The lesser degree of activations observed by the voices that were similar to the own-voice indicates that these areas not only respond to the differences between self and others, but also respond to the finer details of own-voices.
Collapse
Affiliation(s)
- Taishi Hosaka
- grid.26999.3d0000 0001 2151 536XDepartment of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Marino Kimura
- grid.26999.3d0000 0001 2151 536XDepartment of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Yuko Yotsumoto
- grid.26999.3d0000 0001 2151 536XDepartment of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
9
|
Goupil L, Johansson P, Hall L, Aucouturier JJ. Vocal signals only impact speakers' own emotions when they are self-attributed. Conscious Cogn 2021; 88:103072. [PMID: 33406449 DOI: 10.1016/j.concog.2020.103072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 11/02/2020] [Accepted: 12/12/2020] [Indexed: 11/30/2022]
Abstract
Emotions are often accompanied by vocalizations whose acoustic features provide information about the physiological state of the speaker. Here, we ask if perceiving these affective signals in one's own voice has an impact on one's own emotional state, and if it is necessary to identify these signals as self-originated for the emotional effect to occur. Participants had to deliberate out loud about how they would feel in various familiar emotional scenarios, while we covertly manipulated their voices in order to make them sound happy or sad. Perceiving the artificial affective signals in their own voice altered participants' judgements about how they would feel in these situations. Crucially, this effect disappeared when participants detected the vocal manipulation, either explicitly or implicitly. The original valence of the scenarios also modulated the vocal feedback effect. These results highlight the role of the exteroception of self-attributed affective signals in the emergence of emotional feelings.
Collapse
Affiliation(s)
- Louise Goupil
- STMS UMR 9912 (CNRS/IRCAM/SU), Paris, France; University of East London, London, UK.
| | - Petter Johansson
- Lund University Cognitive Science, Lund University, Lund, Sweden
| | - Lars Hall
- Lund University Cognitive Science, Lund University, Lund, Sweden
| | | |
Collapse
|
10
|
Abstract
Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.
Collapse
|
11
|
Arias P, Rachman L, Liuni M, Aucouturier JJ. Beyond Correlation: Acoustic Transformation Methods for the Experimental Study of Emotional Voice and Speech. EMOTION REVIEW 2020. [DOI: 10.1177/1754073920934544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
While acoustic analysis methods have become a commodity in voice emotion research, experiments that attempt not only to describe but to computationally manipulate expressive cues in emotional voice and speech have remained relatively rare. We give here a nontechnical overview of voice-transformation techniques from the audio signal-processing community that we believe are ripe for adoption in this context. We provide sound examples of what they can achieve, examples of experimental questions for which they can be used, and links to open-source implementations. We point at a number of methodological properties of these algorithms, such as being specific, parametric, exhaustive, and real-time, and describe the new possibilities that these open for the experimental study of the emotional voice.
Collapse
Affiliation(s)
- Pablo Arias
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Laura Rachman
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Marco Liuni
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | | |
Collapse
|
12
|
Rachman L, Dubal S, Aucouturier JJ. Happy you, happy me: expressive changes on a stranger's voice recruit faster implicit processes than self-produced expressions. Soc Cogn Affect Neurosci 2020; 14:559-568. [PMID: 31044241 PMCID: PMC6545538 DOI: 10.1093/scan/nsz030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 04/09/2019] [Accepted: 04/21/2019] [Indexed: 01/09/2023] Open
Abstract
In social interactions, people have to pay attention both to the ‘what’ and ‘who’. In particular, expressive changes heard on speech signals have to be integrated with speaker identity, differentiating e.g. self- and other-produced signals. While previous research has shown that self-related visual information processing is facilitated compared to non-self stimuli, evidence in the auditory modality remains mixed. Here, we compared electroencephalography (EEG) responses to expressive changes in sequence of self- or other-produced speech sounds using a mismatch negativity (MMN) passive oddball paradigm. Critically, to control for speaker differences, we used programmable acoustic transformations to create voice deviants that differed from standards in exactly the same manner, making EEG responses to such deviations comparable between sequences. Our results indicate that expressive changes on a stranger’s voice are highly prioritized in auditory processing compared to identical changes on the self-voice. Other-voice deviants generate earlier MMN onset responses and involve stronger cortical activations in a left motor and somatosensory network suggestive of an increased recruitment of resources for less internally predictable, and therefore perhaps more socially relevant, signals.
Collapse
Affiliation(s)
- Laura Rachman
- Inserm U, CNRS UMR, Sorbonne Université UMR S, Institut du Cerveau et de la Moelle épinière, Social and Affective Neuroscience Lab, Paris, France.,Science & Technology of Music and Sound, UMR (CNRS/IRCAM/Sorbonne Université), Paris, France
| | - Stéphanie Dubal
- Inserm U, CNRS UMR, Sorbonne Université UMR S, Institut du Cerveau et de la Moelle épinière, Social and Affective Neuroscience Lab, Paris, France
| | - Jean-Julien Aucouturier
- Science & Technology of Music and Sound, UMR (CNRS/IRCAM/Sorbonne Université), Paris, France
| |
Collapse
|
13
|
Abstract
Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.
Collapse
|
14
|
Abstract
People perceive their recorded voice differently from their actively spoken voice. The uncanny valley theory proposes that as an object approaches humanlike characteristics, there is an increase in the sense of familiarity; however, eventually a point is reached where the object becomes strangely similar and makes us feel uneasy. The feeling of discomfort experienced when people hear their recorded voice may correspond to the floor of the proposed uncanny valley. To overcome the feeling of eeriness of own-voice recordings, previous studies have suggested equalization of the recorded voice with various types of filters, such as step, bandpass, and low-pass, yet the effectiveness of these filters has not been evaluated. To address this, the aim of experiment 1 was to identify what type of voice recording was the most representative of one’s own voice. The voice recordings were presented in five different conditions: unadjusted recorded voice, step filtered voice, bandpass filtered voice, low-pass filtered voice, and a voice for which the participants freely adjusted the parameters. We found large individual differences in the most representative own-voice filter. In order to consider roles of sense of agency, experiment 2 investigated if lip-synching would influence the rating of own voice. The result suggested lip-synching did not affect own voice ratings. In experiment 3, based on the assumption that the voices used in previous experiments corresponded to continuous representations of non-own voice to own voice, the existence of an uncanny valley was examined. Familiarity, eeriness, and the sense of own voice were rated. The result did not support the existence of an uncanny valley. Taken together, the experiments led us to the following conclusions: there is no general filter that can represent own voice for everyone, sense of agency has no effect on own voice rating, and the uncanny valley does not exist for own voice, specifically.
Collapse
|
15
|
Brain mechanisms involved in angry prosody change detection in school-age children and adults, revealed by electrophysiology. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2018; 18:748-763. [DOI: 10.3758/s13415-018-0602-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|