1
|
Mehrabi A, Dixon S, Sandler M. Vocal imitation of percussion sounds: On the perceptual similarity between imitations and imitated sounds. PLoS One 2019; 14:e0219955. [PMID: 31344080 PMCID: PMC6657857 DOI: 10.1371/journal.pone.0219955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 07/06/2019] [Indexed: 11/19/2022] Open
Abstract
Recent studies have demonstrated the effectiveness of the voice for communicating sonic ideas, and the accuracy with which it can be used to imitate acoustic instruments, synthesised sounds and environmental sounds. However, there has been little research on vocal imitation of percussion sounds, particularly concerning the perceptual similarity between imitations and the sounds being imitated. In the present study we address this by investigating how accurately musicians can vocally imitate percussion sounds, in terms of whether listeners consider the imitations 'more similar' to the imitated sounds than to other same-category sounds. In a vocal production task, 14 musicians imitated 30 drum sounds from five categories (cymbals, hats, kicks, snares, toms). Listeners were then asked to rate the similarity between the imitations and same-category drum sounds via web based listening test. We found that imitated sounds received the highest similarity ratings for 16 of the 30 sounds. The similarity between a given drum sound and its imitation was generally rated higher than for imitations of another same-category sound, however for some drum categories (snares and toms) certain sounds were consistently considered most similar to the imitations, irrespective of the sound being imitated. Finally, we apply an existing auditory image based measure for perceptual similarity between same-category drum sounds, to model the similarity ratings using linear mixed effect regression. The results indicate that this measure is a good predictor of perceptual similarity between imitations and imitated sounds, when compared to acoustic features containing only temporal or spectral features.
Collapse
Affiliation(s)
- Adib Mehrabi
- Department of Linguistics, Queen Mary University of London, London, England
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, England
| | - Simon Dixon
- Department of Linguistics, Queen Mary University of London, London, England
| | - Mark Sandler
- Department of Linguistics, Queen Mary University of London, London, England
| |
Collapse
|
2
|
Mehrabi A, Dixon S, Sandler MB. Vocal imitation of synthesised sounds varying in pitch, loudness and spectral centroid. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:783. [PMID: 28253682 DOI: 10.1121/1.4974825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 12/27/2016] [Accepted: 01/06/2017] [Indexed: 06/06/2023]
Abstract
Vocal imitations are often used to convey sonic ideas [Lemaitre, Dessein, Susini, and Aura. (2011). Ecol. Psych. 23(4), 267-307]. For computer based systems to interpret these vocalisations, it is advantageous to apply knowledge of what happens when people vocalise sounds where the acoustic features have different temporal envelopes. In the present study, 19 experienced musicians and music producers were asked to imitate 44 sounds with one or two feature envelopes applied. The study addresses two main questions: (1) How accurately can people imitate ramp and modulation envelopes for pitch, loudness, and spectral centroid?; (2) What happens to this accuracy when people are asked to imitate two feature envelopes simultaneously? The results show that experienced musicians can imitate pitch, loudness, and spectral centroid accurately, and that imitation accuracy is generally preserved when the imitated stimuli combine two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously.
Collapse
Affiliation(s)
- Adib Mehrabi
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Simon Dixon
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Mark B Sandler
- Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
3
|
Lemaitre G, Houix O, Voisin F, Misdariis N, Susini P. Vocal Imitations of Non-Vocal Sounds. PLoS One 2016; 11:e0168167. [PMID: 27992480 PMCID: PMC5161510 DOI: 10.1371/journal.pone.0168167] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 11/24/2016] [Indexed: 11/25/2022] Open
Abstract
Imitative behaviors are widespread in humans, in particular whenever two persons communicate and interact. Several tokens of spoken languages (onomatopoeias, ideophones, and phonesthemes) also display different degrees of iconicity between the sound of a word and what it refers to. Thus, it probably comes at no surprise that human speakers use a lot of imitative vocalizations and gestures when they communicate about sounds, as sounds are notably difficult to describe. What is more surprising is that vocal imitations of non-vocal everyday sounds (e.g. the sound of a car passing by) are in practice very effective: listeners identify sounds better with vocal imitations than with verbal descriptions, despite the fact that vocal imitations are inaccurate reproductions of a sound created by a particular mechanical system (e.g. a car driving by) through a different system (the voice apparatus). The present study investigated the semantic representations evoked by vocal imitations of sounds by experimentally quantifying how well listeners could match sounds to category labels. The experiment used three different types of sounds: recordings of easily identifiable sounds (sounds of human actions and manufactured products), human vocal imitations, and computational “auditory sketches” (created by algorithmic computations). The results show that performance with the best vocal imitations was similar to the best auditory sketches for most categories of sounds, and even to the referent sounds themselves in some cases. More detailed analyses showed that the acoustic distance between a vocal imitation and a referent sound is not sufficient to account for such performance. Analyses suggested that instead of trying to reproduce the referent sound as accurately as vocally possible, vocal imitations focus on a few important features, which depend on each particular sound category. These results offer perspectives for understanding how human listeners store and access long-term sound representations, and sets the stage for the development of human-computer interfaces based on vocalizations.
Collapse
Affiliation(s)
- Guillaume Lemaitre
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
- * E-mail:
| | - Olivier Houix
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Frédéric Voisin
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Nicolas Misdariis
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Patrick Susini
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| |
Collapse
|
4
|
Lemaitre G, Jabbari A, Misdariis N, Houix O, Susini P. Vocal imitations of basic auditory features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:290-300. [PMID: 26827025 DOI: 10.1121/1.4939738] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Describing complex sounds with words is a difficult task. In fact, previous studies have shown that vocal imitations of sounds are more effective than verbal descriptions [Lemaitre and Rocchesso (2014). J. Acoust. Soc. Am. 135, 862-873]. The current study investigated how vocal imitations of sounds enable their recognition by studying how two expert and two lay participants reproduced four basic auditory features: pitch, tempo, sharpness, and onset. It used 4 sets of 16 referent sounds (modulated narrowband noises and pure tones), based on 1 feature or crossing 2 of the 4 features. Dissimilarity rating experiments and multidimensional scaling analyses confirmed that listeners could accurately perceive the four features composing the four sets of referent sounds. The four participants recorded vocal imitations of the four sets of sounds. Analyses identified three strategies: (1) Vocal imitations of pitch and tempo reproduced faithfully the absolute value of the feature; (2) Vocal imitations of sharpness transposed the feature into the participants' registers; (3) Vocal imitations of onsets categorized the continuum of onset values into two discrete morphological profiles. Overall, these results highlight that vocal imitations do not simply mimic the referent sounds, but seek to emphasize the characteristic features of the referent sounds within the constraints of human vocal production.
Collapse
Affiliation(s)
- Guillaume Lemaitre
- STMS-IRCAM-CNRS-UPMC, Equipe Perception et Design Sonores, Paris, France
| | - Ali Jabbari
- STMS-IRCAM-CNRS-UPMC, Equipe Perception et Design Sonores, Paris, France
| | - Nicolas Misdariis
- STMS-IRCAM-CNRS-UPMC, Equipe Perception et Design Sonores, Paris, France
| | - Olivier Houix
- STMS-IRCAM-CNRS-UPMC, Equipe Perception et Design Sonores, Paris, France
| | - Patrick Susini
- STMS-IRCAM-CNRS-UPMC, Equipe Perception et Design Sonores, Paris, France
| |
Collapse
|
5
|
Bezat MC, Kronland-Martinet R, Roussarie V, Ystad S. From acoustic descriptors to evoked quality of car door sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:226-241. [PMID: 24993209 DOI: 10.1121/1.4883364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This article describes the first part of a study aiming at adapting the mechanical car door construction to the drivers' expectancies in terms of perceived quality of cars deduced from car door sounds. A perceptual cartography of car door sounds is obtained from various listening tests aiming at revealing both ecological and analytical properties linked to evoked car quality. In the first test naive listeners performed absolute evaluations of five ecological properties (i.e., solidity, quality, weight, closure energy, and success of closure). Then experts in the area of automobile doors categorized the sounds according to organic constituents (lock, joints, door panel), in particular whether or not the lock mechanism could be perceived. Further, a sensory panel of naive listeners identified sensory descriptors such as classical descriptors or onomatopoeia that characterize the sounds, hereby providing an analytic description of the sounds. Finally, acoustic descriptors were calculated after decomposition of the signal into a lock and a closure component by the Empirical Mode Decomposition (EMD) method. A statistical relationship between the acoustic descriptors and the perceptual evaluations of the car door sounds could then be obtained through linear regression analysis.
Collapse
Affiliation(s)
| | | | - Vincent Roussarie
- PSA Peugeot-Citroen, 2 route de Gisy 78943 Velizy-Villacoublay, France
| | - Sølvi Ystad
- LMA, CNRS, UPR 7051, Aix-Marseille Univ, Centrale Marseille, F-13402 Marseille Cedex 20, France
| |
Collapse
|
6
|
Lemaitre G, Rocchesso D. On the effectiveness of vocal imitations and verbal descriptions of sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:862-873. [PMID: 25234894 DOI: 10.1121/1.4861245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Describing unidentified sounds with words is a frustrating task and vocally imitating them is often a convenient way to address the issue. This article reports on a study that compared the effectiveness of vocal imitations and verbalizations to communicate different referent sounds. The stimuli included mechanical and synthesized sounds and were selected on the basis of participants' confidence in identifying the cause of the sounds, ranging from easy-to-identify to unidentifiable sounds. The study used a selection of vocal imitations and verbalizations deemed adequate descriptions of the referent sounds. These descriptions were used in a nine-alternative forced-choice experiment: Participants listened to a description and picked one sound from a list of nine possible referent sounds. Results showed that recognition based on verbalizations was maximally effective when the referent sounds were identifiable. Recognition accuracy with verbalizations dropped when identifiability of the sounds decreased. Conversely, recognition accuracy with vocal imitations did not depend on the identifiability of the referent sounds and was as high as with the best verbalizations. This shows that vocal imitations are an effective means of representing and communicating sounds and suggests that they could be used in a number of applications.
Collapse
Affiliation(s)
- Guillaume Lemaitre
- Dipartimento di Culture del progetto, Università Iuav di Venezia, Dorsoduro 2206, 30123 Venezia, Italy
| | - Davide Rocchesso
- Dipartimento di Culture del progetto, Università Iuav di Venezia, Dorsoduro 2206, 30123 Venezia, Italy
| |
Collapse
|
7
|
Grassi M, Pastore M, Lemaitre G. Looking at the world with your ears: how do we get the size of an object from its sound? Acta Psychol (Amst) 2013; 143:96-104. [PMID: 23542810 DOI: 10.1016/j.actpsy.2013.02.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 01/07/2013] [Accepted: 02/20/2013] [Indexed: 11/16/2022] Open
Abstract
Identifying the properties of on-going events by the sound they produce is crucial for our interaction with the environment when visual information is not available. Here, we investigated the ability of listeners to estimate the size of an object (a ball) dropped on a plate with ecological listening conditions (balls were dropped in real time) and response methods (listeners estimate ball-size by drawing a disk). Previous studies had shown that listeners can veridically estimate the size of objects by the sound they produce, but it is yet unclear which acoustical index listeners use to produce their estimates. In particular, it is unclear whether listeners listen to amplitude (related to loudness) or frequency (related to the sound's brightness) domain cue to produce their estimates. In the current study, in order to understand which cue is used by the listener to recover the size of the object, we manipulated the sound source event in such a way that frequency and amplitude cues provided contrasting size-information (balls were dropped from various heights). Results showed that listeners' estimations were accurate regardless of the experimental manipulations performed in the experiments. In addition, results suggest that listeners were likely integrating frequency and amplitude acoustical cues in order to produce their estimate and although these cues were often providing contrasting size-information.
Collapse
Affiliation(s)
- Massimo Grassi
- Dipartimento di Psicologia Generale, Università di Padova, Via Venezia 8, 35131 Padova, Italy.
| | | | | |
Collapse
|
8
|
Evidence for a basic level in a taxonomy of everyday action sounds. Exp Brain Res 2013; 226:253-64. [PMID: 23411674 DOI: 10.1007/s00221-013-3430-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 01/23/2013] [Indexed: 10/27/2022]
Abstract
We searched for evidence that the auditory organization of categories of sounds produced by actions includes a privileged or "basic" level of description. The sound events consisted of single objects (or substances) undergoing simple actions. Performance on sound events was measured in two ways: sounds were directly verified as belonging to a category, or sounds were used to create lexical priming. The category verification experiment measured the accuracy and reaction time to brief excerpts of these sounds. The lexical priming experiment measured reaction time benefits and costs caused by the presentation of these sounds prior to a lexical decision. The level of description of a sound varied in how specifically it described the physical properties of the action producing the sound. Both identification and priming effects were superior when a label described the specific interaction causing the sound (e.g. trickling) in comparison to the following: (1) more general descriptions (e.g. pour, liquid: trickling is a specific manner of pouring liquid), (2) more detailed descriptions using adverbs to provide detail regarding the manner of the action (e.g. trickling evenly). These results are consistent with neuroimaging studies showing that auditory representations of sounds produced by actions familiar to the listener activate motor representations of the gestures involved in sound production.
Collapse
|