1
|
Belyk M, Carignan C, McGettigan C. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images. Behav Res Methods 2024; 56:2623-2635. [PMID: 37507650 PMCID: PMC10990993 DOI: 10.3758/s13428-023-02171-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/30/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, UK.
| | - Christopher Carignan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
2
|
Pouw W, Harrison SJ, Dixon JA. The importance of visual control and biomechanics in the regulation of gesture-speech synchrony for an individual deprived of proprioceptive feedback of body position. Sci Rep 2022; 12:14775. [PMID: 36042321 PMCID: PMC9428168 DOI: 10.1038/s41598-022-18300-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 08/09/2022] [Indexed: 11/17/2022] Open
Abstract
Do communicative actions such as gestures fundamentally differ in their control mechanisms from other actions? Evidence for such fundamental differences comes from a classic gesture-speech coordination experiment performed with a person (IW) with deafferentation (McNeill, 2005). Although IW has lost both his primary source of information about body position (i.e., proprioception) and discriminative touch from the neck down, his gesture-speech coordination has been reported to be largely unaffected, even if his vision is blocked. This is surprising because, without vision, his object-directed actions almost completely break down. We examine the hypothesis that IW's gesture-speech coordination is supported by the biomechanical effects of gesturing on head posture and speech. We find that when vision is blocked, there are micro-scale increases in gesture-speech timing variability, consistent with IW's reported experience that gesturing is difficult without vision. Supporting the hypothesis that IW exploits biomechanical consequences of the act of gesturing, we find that: (1) gestures with larger physical impulses co-occur with greater head movement, (2) gesture-speech synchrony relates to larger gesture-concurrent head movements (i.e. for bimanual gestures), (3) when vision is blocked, gestures generate more physical impulse, and (4) moments of acoustic prominence couple more with peaks of physical impulse when vision is blocked. It can be concluded that IW's gesturing ability is not based on a specialized language-based feedforward control as originally concluded from previous research, but is still dependent on a varied means of recurrent feedback from the body.
Collapse
Affiliation(s)
- Wim Pouw
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands.
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | - Steven J Harrison
- Center for the Ecological Study of Perception and Action, University of Connecticut, Storrs, USA
- Department of Kinesiology, University of Connecticut, Storrs, USA
| | - James A Dixon
- Center for the Ecological Study of Perception and Action, University of Connecticut, Storrs, USA
- Department of Psychological Sciences, University of Connecticut, Storrs, USA
| |
Collapse
|
3
|
Pouw W, Fuchs S. Origins Of Vocal-Entangled Gesture. Neurosci Biobehav Rev 2022; 141:104836. [PMID: 36031008 DOI: 10.1016/j.neubiorev.2022.104836] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/12/2022] [Accepted: 08/21/2022] [Indexed: 01/13/2023]
Abstract
Gestures during speaking are typically understood in a representational framework: they represent absent or distal states of affairs by means of pointing, resemblance, or symbolic replacement. However, humans also gesture along with the rhythm of speaking, which is amenable to a non-representational perspective. Such a perspective centers on the phenomenon of vocal-entangled gestures and builds on evidence showing that when an upper limb with a certain mass decelerates/accelerates sufficiently, it yields impulses on the body that cascade in various ways into the respiratory-vocal system. It entails a physical entanglement between body motions, respiration, and vocal activities. It is shown that vocal-entangled gestures are realized in infant vocal-motor babbling before any representational use of gesture develops. Similarly, an overview is given of vocal-entangled processes in non-human animals. They can frequently be found in rats, bats, birds, and a range of other species that developed even earlier in the phylogenetic tree. Thus, the origins of human gesture lie in biomechanics, emerging early in ontogeny and running deep in phylogeny.
Collapse
Affiliation(s)
- Wim Pouw
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Susanne Fuchs
- Leibniz Center General Linguistics, Berlin, Germany.
| |
Collapse
|
4
|
Pearson L, Pouw W. Gesture-vocal coupling in Karnatak music performance: A neuro-bodily distributed aesthetic entanglement. Ann N Y Acad Sci 2022; 1515:219-236. [PMID: 35730069 DOI: 10.1111/nyas.14806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
In many musical styles, vocalists manually gesture while they sing. Coupling between gesture kinematics and vocalization has been examined in speech contexts, but it is an open question how these couple in music making. We examine this in a corpus of South Indian, Karnatak vocal music that includes motion-capture data. Through peak magnitude analysis (linear mixed regression) and continuous time-series analyses (generalized additive modeling), we assessed whether vocal trajectories around peaks in vertical velocity, speed, or acceleration were coupling with changes in vocal acoustics (namely, F0 and amplitude). Kinematic coupling was stronger for F0 change versus amplitude, pointing to F0's musical significance. Acceleration was the most predictive for F0 change and had the most reliable magnitude coupling, showing a one-third power relation. That acceleration, rather than other kinematics, is maximally predictive for vocalization is interesting because acceleration entails force transfers onto the body. As a theoretical contribution, we argue that gesturing in musical contexts should be understood in relation to the physical connections between gesturing and vocal production that are brought into harmony with the vocalists' (enculturated) performance goals. Gesture-vocal coupling should, therefore, be viewed as a neuro-bodily distributed aesthetic entanglement.
Collapse
Affiliation(s)
- Lara Pearson
- Department of Music, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Wim Pouw
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, The Netherlands.,Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| |
Collapse
|
5
|
Aaen M, Sadolin C, White A, Nouraei R, McGlashan J. Extreme Vocals-A Retrospective Longitudinal study of Vocal Health in 20 Professional Singers Performing and Teaching Rough Vocal Effects. J Voice 2022:S0892-1997(22)00134-5. [PMID: 35667986 DOI: 10.1016/j.jvoice.2022.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022]
Abstract
BACKGROUND Rough vocal effects, extreme, or extended vocal techniques to sound intentionally hoarse or rough are an integral part of many genres and styles, and research has recently demonstrated the involvement of supraglottic narrowing and vibrations to produce such sounds. The vocal health of singing with rough vocal effects is poorly documented, especially in a longitudinal manner, while much vocal pedagogy continuously treats the sounds as harming to or dangerous for the vocal mechanism. OBJECTIVE To longitudinally investigate the vocal health of professional singers who perform the five rough-sounding vocal effects Distortion, Growl, Grunt, Rattle, and Creaking as part of their singing and teaching. METHODS Twenty singers underwent nasoendoscopic examination, filled in SVHI questionnaires, and were assessed by GRBAS with a 14-year interval in a retrospective longitudinal study (from 2007 to 2021). Endoscopic materials were assessed by Reflux Finding Score and a hybrid version of the Stroboscopy Rating Scale. RESULTS Singers presented at initiation of study with an average SVHI of 9.2 (±9), which decreased at time of follow up 14 years later to an average of 5.12 (±6). Laryngeal assessments (RFS and SRS) revealed low averages at initiation of study as well as at conclusion of the study with only small fluctuations in averages, with findings mainly relating to arytenoid asymmetry. CONCLUSION The participating singers perform and teach rough vocal effects continually and present with healthy laryngeal mechanisms and within-normal SVHI and GRBAS scores. The findings suggest that controlled supraglottic narrowing and techniques to allow for supraglottic structures to engage in vibration as an additional noise source can be performed sustainable and in a healthy manner if performed with correct vocal technique.
Collapse
Affiliation(s)
- Mathias Aaen
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom.
| | | | - Anna White
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom
| | - Reza Nouraei
- University of Southampton, University Road, Southampton, United Kingdom
| | - Julian McGlashan
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom
| |
Collapse
|
6
|
Pouw W, de Jonge‐Hoekstra L, Harrison SJ, Paxton A, Dixon JA. Gesture-speech physics in fluent speech and rhythmic upper limb movements. Ann N Y Acad Sci 2021; 1491:89-105. [PMID: 33336809 PMCID: PMC8246948 DOI: 10.1111/nyas.14532] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 10/15/2020] [Accepted: 10/23/2020] [Indexed: 12/18/2022]
Abstract
It is commonly understood that hand gesture and speech coordination in humans is culturally and cognitively acquired, rather than having a biological basis. Recently, however, the biomechanical physical coupling of arm movements to speech vocalization has been studied in steady-state vocalization and monosyllabic utterances, where forces produced during gesturing are transferred onto the tensioned body, leading to changes in respiratory-related activity and thereby affecting vocalization F0 and intensity. In the current experiment (n = 37), we extend this previous line of work to show that gesture-speech physics also impacts fluent speech. Compared with nonmovement, participants who are producing fluent self-formulated speech while rhythmically moving their limbs demonstrate heightened F0 and amplitude envelope, and such effects are more pronounced for higher-impulse arm versus lower-impulse wrist movement. We replicate that acoustic peaks arise especially during moments of peak impulse (i.e., the beat) of the movement, namely around deceleration phases of the movement. Finally, higher deceleration rates of higher-mass arm movements were related to higher peaks in acoustics. These results confirm a role for physical impulses of gesture affecting the speech system. We discuss the implications of gesture-speech physics for understanding of the emergence of communicative gesture, both ontogenetically and phylogenetically.
Collapse
Affiliation(s)
- Wim Pouw
- Center for the Ecological Study of Perception and ActionUniversity of ConnecticutStorrsConnecticut
- Donders Institute for Brain, Cognition and BehaviourRadboud University NijmegenNijmegenthe Netherlands
- Institute for PsycholinguisticsMax Planck NijmegenNijmegenthe Netherlands
| | - Lisette de Jonge‐Hoekstra
- Center for the Ecological Study of Perception and ActionUniversity of ConnecticutStorrsConnecticut
- Faculty of Behavioral and Social SciencesUniversity of GroningenGroningenthe Netherlands
- Royal Dutch KentalisSint‐Michielsgestelthe Netherlands
| | - Steven J. Harrison
- Center for the Ecological Study of Perception and ActionUniversity of ConnecticutStorrsConnecticut
- Department of KinesiologyUniversity of ConnecticutStorrsConnecticut
| | - Alexandra Paxton
- Center for the Ecological Study of Perception and ActionUniversity of ConnecticutStorrsConnecticut
- Department of Psychological SciencesUniversity of ConnecticutStorrsConnecticut
| | - James A. Dixon
- Center for the Ecological Study of Perception and ActionUniversity of ConnecticutStorrsConnecticut
- Department of Psychological SciencesUniversity of ConnecticutStorrsConnecticut
| |
Collapse
|
7
|
Knight EJ, Austin SF. The Effect of Head Flexion/Extension on Acoustic Measures of Singing Voice Quality. J Voice 2020; 34:964.e11-964.e21. [DOI: 10.1016/j.jvoice.2019.06.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 06/26/2019] [Accepted: 06/27/2019] [Indexed: 10/26/2022]
|
8
|
Cardoso R, Lumini-Oliveira J, Meneses RF. Associations between Posture, Voice, and Dysphonia: A Systematic Review. J Voice 2019; 33:124.e1-124.e12. [DOI: 10.1016/j.jvoice.2017.08.030] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Revised: 08/30/2017] [Accepted: 08/30/2017] [Indexed: 11/28/2022]
|
9
|
Delmoral JC, Rua Ventura SM, Tavares JMR. Segmentation of tongue shapes during vowel production in magnetic resonance images based on statistical modelling. Proc Inst Mech Eng H 2018; 232:271-281. [PMID: 29350087 DOI: 10.1177/0954411917751000] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Quantification of the anatomic and functional aspects of the tongue is pertinent to analyse the mechanisms involved in speech production. Speech requires dynamic and complex articulation of the vocal tract organs, and the tongue is one of the main articulators during speech production. Magnetic resonance imaging has been widely used in speech-related studies. Moreover, the segmentation of such images of speech organs is required to extract reliable statistical data. However, standard solutions to analyse a large set of articulatory images have not yet been established. Therefore, this article presents an approach to segment the tongue in two-dimensional magnetic resonance images and statistically model the segmented tongue shapes. The proposed approach assesses the articulator morphology based on an active shape model, which captures the shape variability of the tongue during speech production. To validate this new approach, a dataset of mid-sagittal magnetic resonance images acquired from four subjects was used, and key aspects of the shape of the tongue during the vocal production of relevant European Portuguese vowels were evaluated.
Collapse
Affiliation(s)
- Jessica C Delmoral
- 1 Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
| | - Sandra M Rua Ventura
- 2 Centro de Estudo do Movimento e Atividade Humana, Escola Superior de Tecnologia da Saúde do Porto, Instituto Politécnico do Porto, Porto, Portugal
| | - João Manuel Rs Tavares
- 3 Departamento de Engenharia Mecânica, Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
| |
Collapse
|
10
|
|
11
|
Gilman M, Johns MM. The Effect of Head Position and/or Stance on the Self-perception of Phonatory Effort. J Voice 2016; 31:131.e1-131.e4. [PMID: 26778325 DOI: 10.1016/j.jvoice.2015.11.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 11/30/2015] [Indexed: 11/18/2022]
Abstract
BACKGROUND Vocal fatigue is a common but poorly defined complaint of patients presenting with voice disorders. Definitions of vocal fatigue generally include increased self-perceived phonatory effort resulting from references to vocal loading or prolonged voice use resulting in deterioration of function. The present study looks at the role of posture, specifically head position and stance, in self-perceived phonatory effort. METHODS Forty-six healthy adults, 13 males and 33 females (mean age was 27.5), with no history of vocal problems/disorders within the past year were recruited. Subjects were asked to sustain the vowel /a/ at a comfortable pitch and loudness for 5-10 seconds in each of six positions: sitting and standing in the manner habitual for each subject, two exaggerated positions of the head (head back and head forward), and two exaggerated positions in standing (standing with knees locked and with knees soft). Each position was repeated three times in randomized order, resulting in 18 trials for each subject. After each repetition of the sustained /a/, subjects were asked to rate their experience of vocal effort using a 100-mm visual analog scale (0-40 least effort, 40-60 habitual effort, and 60-100 increased effort). RESULTS Repeated measures analysis of variance revealed significant difference in the self-perceived phonatory effort levels across positions (P value < 0.001). The exaggerated forward and back head positions in both sitting and standing positions showed the greatest significance on the Tukey post hoc tests (P < 0.000). CONCLUSIONS Based on the findings, posture may play a more important role in vocal fatigue than previously thought.
Collapse
Affiliation(s)
- Marina Gilman
- Speech-Language Pathology, The Emory Voice Center, Department of Otolaryngology, Emory University, Atlanta, Georgia.
| | - Michael M Johns
- USC Voice Center, University of Southern California, Los Angeles, California
| |
Collapse
|
12
|
Varzi D, Coupaud SAF, Purcell M, Allan DB, Gregory JS, Barr RJ. Bone morphology of the femur and tibia captured by statistical shape modelling predicts rapid bone loss in acute spinal cord injury patients. Bone 2015; 81:495-501. [PMID: 26341577 DOI: 10.1016/j.bone.2015.08.026] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Revised: 08/18/2015] [Accepted: 08/30/2015] [Indexed: 01/13/2023]
Abstract
After spinal cord injury (SCI), bone loss in the paralysed limbs progresses at variable rates. Decreases in bone mineral density (BMD) in the first year range from 1% (slow) to 40% (rapid). In chronic SCI, fragility fractures commonly occur around the knee, with significant associated morbidity. Osteoporosis treatments await full evaluation in SCI, but should be initiated early and targeted towards patients exhibiting rapid bone loss. The potential to predict rapid bone loss from a single bone scan within weeks of a SCI was investigated using statistical shape modelling (SSM) of bone morphology, hypothesis: baseline bone shape predicts bone loss at 12-months post-injury at fracture-prone sites. In this retrospective cohort study 25 SCI patients (median age, 33 years) were scanned at the distal femur and proximal tibia using peripheral Quantitative Computed Tomography at <5 weeks (baseline), 4, 8 and 12 months post-injury. An SSM was made for each bone. Links between the baseline shape-modes and 12-month total and trabecular BMD loss were analysed using multiple linear regression. One mode from each SSM significantly predicted bone loss (age-adjusted P<0.05 R(2)=0.37-0.61) at baseline. An elongated intercondylar femoral notch (femur mode 4, +1 SD from the mean) was associated with 8.2% additional loss of femoral trabecular BMD at 12-months. A more concave posterior tibial fossa (tibia mode 3, +1 SD) was associated with 9.4% additional 12-month tibial trabecular BMD loss. Baseline bone shape determined from a single bone scan is a valid imaging biomarker for the prediction of 12-month bone loss in SCI patients.
Collapse
Affiliation(s)
- Delaram Varzi
- Musculoskeletal Research Programme, University of Aberdeen, Aberdeen, UK
| | - Sylvie A F Coupaud
- Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK; Scottish Centre for Innovation in Spinal Cord Injury, Queen Elizabeth National Spinal Injuries Unit, Southern General Hospital, Glasgow, UK
| | - Mariel Purcell
- Scottish Centre for Innovation in Spinal Cord Injury, Queen Elizabeth National Spinal Injuries Unit, Southern General Hospital, Glasgow, UK
| | - David B Allan
- Scottish Centre for Innovation in Spinal Cord Injury, Queen Elizabeth National Spinal Injuries Unit, Southern General Hospital, Glasgow, UK
| | - Jennifer S Gregory
- Musculoskeletal Research Programme, University of Aberdeen, Aberdeen, UK
| | - Rebecca J Barr
- Musculoskeletal Research Programme, University of Aberdeen, Aberdeen, UK.
| |
Collapse
|