1
|
Ikävalko T, Laukkanen AM, McAllister A, Eklund R, Lammentausta E, Leppävuori M, Nieminen MT. Three Professional Singers' Vocal Tract Dimensions in Operatic Singing, Kulning, and Edge-A Multiple Case Study Examining Loud Singing. J Voice 2024; 38:1253.e11-1253.e27. [PMID: 35277318 DOI: 10.1016/j.jvoice.2022.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 01/20/2022] [Accepted: 01/26/2022] [Indexed: 10/18/2022]
Abstract
OBJECTIVE A comprehensive understanding of how vocal tract dimensions vary among different types of loud voice productions has not yet been fully formed. This study aims to expand the existing knowledge on the topic. METHODS Three trained professional singers together practiced the vocal techniques underlying Opera and Kulning singing styles for one hour and, afterwards, phonated using these techniques on vowel [iː] at pitch C5 (523 Hz), while their vocal tracts were scanned via MRI. One of the participants also produced the samples in the Edge vocal mode using [ɛː]. Several dimensional vocal tract measurements were calculated from the MRIs. Spectral analysis was conducted on the filtered audio recorded during the MRI. RESULTS The Operatic technique demonstrated a lower larynx, a larger tongue-palate distance, and larger epilaryngeal and pharyngeal tube diameters compared to Kulning. Edge showed the highest laryngeal position, narrowest pharynx and epilarynx tubes, and the least forward-tilted larynx out of the styles studied. The spectra of Opera and Kulning showed a dominant first harmonic, while in Edge, the second harmonic was the strongest. CONCLUSIONS The results shed light on the magnitude of vocal tract changes necessary for genre-typical vocal projection. This information can be pedagogically helpful.
Collapse
Affiliation(s)
- Tero Ikävalko
- Speech and Voice Research Laboratory, Faculty of Social Sciences, Tampere University, Tampere, Finland.
| | - Anne-Maria Laukkanen
- Speech and Voice Research Laboratory, Faculty of Social Sciences, Tampere University, Tampere, Finland
| | - Anita McAllister
- Medical Unit Speech and Language Pathology, CLINTEC, Karolinska Institutet, Stockholm, Sweden; Functional Area Speech and Language Pathology, Karolinska University Hospital, Stockholm, Sweden
| | - Robert Eklund
- Department of Culture and Communication (IDA), Department of Computer Science (IDA), Linköping University, Linköping, Sweden
| | | | - Mari Leppävuori
- Community of Research in Education, Music, and the Arts, Faculty of Education, University of Oulu, Oulu, Finland; Research Unit of Medical Imaging, Physics and Technology, Faculty of Medicine, University of Oulu, Oulu, Finland
| | - Miika T Nieminen
- Department of Diagnostic Radiology, Oulu University Hospital, Oulu, Finland; Research Unit of Medical Imaging, Physics and Technology, Faculty of Medicine, University of Oulu, Oulu, Finland; Medical Research Center, Oulu University Hospital and University of Oulu, Oulu, Finland
| |
Collapse
|
2
|
Vampola T, Horáček J, Laukkanen AM. Three-Dimensional Finite Element Modeling of the Singer's Formant Cluster Optimization by Epilaryngeal Narrowing With and Without Velopharyngeal Opening. J Voice 2024:S0892-1997(24)00248-0. [PMID: 39218756 DOI: 10.1016/j.jvoice.2024.07.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/31/2024] [Accepted: 07/31/2024] [Indexed: 09/04/2024]
Abstract
This study aimed to find the optimal geometrical configuration of the vocal tract (VT) to increase the total acoustic energy output of human voice in the frequency interval 2-3.5 kHz "singer's formant cluster," (SFC) for vowels [a:] and [i:] considering epilaryngeal changes and the velopharyngeal opening (VPO). The study applied 3D volume models of the vocal and nasal tract based on computer tomography images of a female speaker. The epilaryngeal narrowing (EN) increased the total sound pressure level (SPL) and SPL of the SFC by diminishing the frequency difference between acoustic resonances F3 and F4 for [a:] and between F2 and F3 for [i:]. The effect reached its maximum at the low pharynx/epilarynx cross-sectional area ratio 11.4:1 for [a:] and 25:1 for [i:]. The acoustic results obtained with the model optimization are in good agreement with the results of an internationally recognized operatic alto singer. With the EN and the VPO, the VT input reactance was positive over the entire fo singing range (ca 75-1500 Hz). The VPO increased the strength of the SFC and diminished the SPL of F1 for both vowels, but with EN, the SPL decrease was compensated. The effect of EN is not linear and depends on the vowel. Both the EN and the VPO alone and together can support (singing) voice production.
Collapse
Affiliation(s)
- Tomáš Vampola
- Department of Mechanics, Biomechanics and Mechatronics, Faculty of Mechanical Engineering, Czech Technical University in Prague, Prague, Czech Republic.
| | - Jaromír Horáček
- Institute of Thermomechanics, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | | |
Collapse
|
3
|
Aaen M, Sadolin C. Towards Improved Auditory-Perceptual Assessment of Timbres: Comparing Accuracy and Reliability of Four Deconstructed Timbre Assessment Models. J Voice 2024:S0892-1997(24)00117-6. [PMID: 38755075 DOI: 10.1016/j.jvoice.2024.03.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 03/27/2024] [Accepted: 03/28/2024] [Indexed: 05/18/2024]
Abstract
Timbre is a central quality of singing, yet remains a complex notion poorly understood in psychoacoustic studies. Previous studies note how no single acoustic variable or combinations of variables consistently predict timbre dimensions. Timbre varies on a continuum from darkest to lightest. These extremes are associated with laryngeal and vocal tract adjustments related to smaller and larger vocal tract area and variations in vocal fold vibratory characteristics. Perceptually, timbre assessment is influenced by spectral characteristics and formant frequency adjustments, though these dimensions are not independently perceived. Perceptual studies repeatedly demonstrate difficulties in correlating variations in timbre stimuli to specific measures. A recent study demonstrated how acoustic predictive salience of voice category and voice weight across pitches contribute to timbre assessments and concludes that timbre may be related to as-of-yet unknown factor(s). The purpose of this study was to test four different models for assessing timbre; one model focused on specific anatomy, one on listener intuition, one utilizing auditory anchors, and one using expert raters in a deconstructed timbre model with five specific dimensions. METHODS Four independent panels were conducted with separate cohorts of professional singing teachers. Forty-one assessors took part in the anatomically focused panel, 54 in the intuition-based panel, 30 in the anchored panel, and 12 in the expert listener panel. Stimuli taken from live performances of well-known singers were used for all panels, representing all genders, genres, and styles across a large pitch range. All stimuli are available as Supplementary Materials. Fleiss' kappa values, descriptive statistics, and significance tests are reported for all panel assessments. RESULTS Panels 1 through 4 varied in overall accuracy and agreement. The intuition-based model showed overall 45% average accuracy (SD ± 4%), k = 0.289 (<0.001) compared to overall 71% average accuracy (SD ± 3%), k = 0.368 (<0.001) of the anatomical focused panel. The auditory-anchored model showed overall 75% average accuracy (SD ± 8%), k = 0.54 (<0.001) compared with overall 83% average accuracy and agreement of k = 0.63 (<0.001) for panel 4. Results revealed that the highest accuracy and reliability were achieved in a deconstructed timbre model and that providing anchoring improved reliability but with no further increase in accuracy. CONCLUSION Deconstructing timbre into specific parameters improved auditory perceptual accuracy and overall agreement. Assessing timbre along with other perceptual dimensions improves accuracy and reliability. Panel assessors' expert level of listening skills remain an important factor in obtaining reliable and accurate assessments of auditory stimuli for timbre dimensions. Anchoring improved reliability but with no further increase in accuracy. The study suggests that timbre assessment can be improved by approaching the percept through a prism of five specific dimensions each related to specific physiology and auditory-perceptual subcategories. Further tests are needed with framework-naïve listeners, nonmusically educated listeners, artificial intelligence comparisons, and synthetic stimuli to further test the reliability.
Collapse
Affiliation(s)
- Mathias Aaen
- Nottingham University Hospitals, NHS Trust, Queen's Medical, ENT Department, Nottingham, United Kingdom; Complete Vocal Institute, Copenhagen K, Denmark.
| | | |
Collapse
|
4
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
5
|
Burk F, Traser L, Burdumy M, Richter B, Echternach M. Dynamic changes of vocal tract dimensions with sound pressure level during messa di vocea). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3595-3603. [PMID: 38038612 DOI: 10.1121/10.0022582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/14/2023] [Indexed: 12/02/2023]
Abstract
The messa di voce (MdV), which consists of a continuous crescendo and subsequent decrescendo on one pitch is one of the more difficult exercises of the technical repertoire of Western classical singing. With rising lung pressure, regulatory adjustments both on the level of the glottis and the vocal tract are required to keep the pitch stable. The dynamic changes of vocal tract dimensions with the bidirectional variation of sound pressure level (SPL) during MdV were analyzed by two-dimensional real-time magnetic resonance imaging (25 frames/s) and synchronous audio recordings in 12 professional singer subjects. Close associations in the respective articulatory kinetics were found between SPL and lip opening, jaw opening, pharynx width, uvula elevation, and vertical larynx position. However, changes in vocal tract dimensions during plateaus of SPL suggest that perceived loudness could have been varied beyond the dimension of SPL. Further multimodal investigation, including the analysis of sound spectra, is needed for a better understanding of the role of vocal tract resonances in the control of vocal loudness in human phonation.
Collapse
Affiliation(s)
- Fabian Burk
- Department of Otorhinolaryngology and Plastic Surgery, SRH Wald-Klinikum Gera, Gera, Germany
- Institute of Musicians' Medicine, University Medical Center Freiburg, Freiburg im Breisgau, Germany
| | - Louisa Traser
- Institute of Musicians' Medicine, University Medical Center Freiburg, Freiburg im Breisgau, Germany
| | - Michael Burdumy
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Freiburg im Breisgau, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, University Medical Center Freiburg, Freiburg im Breisgau, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| |
Collapse
|
6
|
Sol J, Aaen M, Sadolin C, Ten Bosch L. Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier. J Voice 2023:S0892-1997(23)00281-3. [PMID: 37953088 DOI: 10.1016/j.jvoice.2023.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/07/2023] [Indexed: 11/14/2023]
Abstract
Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.
Collapse
Affiliation(s)
- Jeroen Sol
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, the Netherlands
| | - Mathias Aaen
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark; Nottingham University Hospitals, NHS Trust, Queen's Medical, ENT Department, Nottingham, United Kingdom.
| | - Cathrine Sadolin
- Research & Development, Complete Vocal Institute, Copenhagen K, Denmark
| | - Louis Ten Bosch
- Department of Language and Communication, Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
7
|
McGlashan J, Aaen M, White A, Sadolin C. A mixed-method feasibility study of the use of the Complete Vocal Technique (CVT), a pedagogic method to improve the voice and vocal function in singers and actors, in the treatment of patients with muscle tension dysphonia: a study protocol. Pilot Feasibility Stud 2023; 9:88. [PMID: 37226281 DOI: 10.1186/s40814-023-01317-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 05/02/2023] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND Muscle tension dysphonia (MTD) results from inefficient or ineffective voice production and is the cause of voice and throat complaints in up to 40% of patients presenting with hoarseness. Standard treatment is voice therapy (SLT-VT) delivered by specialist speech therapists in voice disorders (SLT-V). The Complete Vocal Technique (CVT) is a structured, pedagogic method which helps healthy singers and other performers optimise their vocal function enabling them to produce any sound required. The aim of this feasibility study is to investigate whether CVT administered by a trained, non-clinical CVT practitioner (CVT-P) can be applied to patients with MTD before progressing to a pilot randomised control study of CVT voice therapy (CVT-VT) versus SLT-VT. METHODS/DESIGN In this feasibility study, we use a mixed-method, single-arm, prospective cohort design. The primary aim is to demonstrate whether CVT-VT can improve the voice and vocal function in patients with MTD in a pilot study using multidimensional assessment methods. Secondary aims are to assess whether (1) a CVT-VT study is feasible to perform; (2) is acceptable to patients, the CVT-P and SLT-VTs; and (3) whether CVT-VT differs from existing SLT-VT techniques. A minimum of 10 consecutive patients with a clinical diagnosis of primary MTD (types I-III) will be recruited over a 6-month period. Up to 6 video sessions of CVT-VT will be delivered by a CVT-P using a video link. The primary outcome will be a change in pre-/post-therapy scores of a self-reported patient questionnaire (Voice Handicap Index (VHI)). Secondary outcomes include changes in throat symptoms (Vocal Tract Discomfort Scale), acoustic/electroglottographic and auditory-perceptual measures of voice. Acceptability of the CVT-VT will be assessed prospectively, concurrently and retrospectively both quantitatively and qualitatively. Differences from SLT-VT will be assessed by performing a deductive thematic analysis of CVT-P transcripts of therapy sessions. CONCLUSION This feasibility study will provide important data to support whether to proceed with a randomised controlled pilot study focusing on the effectiveness of the intervention compared to standard SLT-VT. Progression criteria will be based on demonstrating a positive outcome in treatment, successful delivery of the pilot study protocol, acceptability to all stakeholders and satisfactory recruitment rates. TRIAL REGISTRATION ClinicalTrials.gov website ( NCT05365126 Unique Protocol ID: 19ET004). Registered on 06 May 2022.
Collapse
Affiliation(s)
- Julian McGlashan
- Ear, Nose and Throat Department, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, NG7 2UH, UK.
| | - Mathias Aaen
- Complete Vocal Institute, Kompagnistraede 32A, 1208, Copenhagen K, Denmark
- Honorary Researcher, Ear, Nose and Throat Department, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, NG7 2UH, UK
| | - Anna White
- Ear, Nose and Throat Department, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, NG7 2UH, UK
| | - Cathrine Sadolin
- Complete Vocal Institute, Kompagnistraede 32A, 1208, Copenhagen K, Denmark
- Honorary Researcher, Ear, Nose and Throat Department, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, NG7 2UH, UK
| |
Collapse
|
8
|
Aaen M, Sadolin C, White A, Nouraei R, McGlashan J. Extreme Vocals-A Retrospective Longitudinal study of Vocal Health in 20 Professional Singers Performing and Teaching Rough Vocal Effects. J Voice 2022:S0892-1997(22)00134-5. [PMID: 35667986 DOI: 10.1016/j.jvoice.2022.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022]
Abstract
BACKGROUND Rough vocal effects, extreme, or extended vocal techniques to sound intentionally hoarse or rough are an integral part of many genres and styles, and research has recently demonstrated the involvement of supraglottic narrowing and vibrations to produce such sounds. The vocal health of singing with rough vocal effects is poorly documented, especially in a longitudinal manner, while much vocal pedagogy continuously treats the sounds as harming to or dangerous for the vocal mechanism. OBJECTIVE To longitudinally investigate the vocal health of professional singers who perform the five rough-sounding vocal effects Distortion, Growl, Grunt, Rattle, and Creaking as part of their singing and teaching. METHODS Twenty singers underwent nasoendoscopic examination, filled in SVHI questionnaires, and were assessed by GRBAS with a 14-year interval in a retrospective longitudinal study (from 2007 to 2021). Endoscopic materials were assessed by Reflux Finding Score and a hybrid version of the Stroboscopy Rating Scale. RESULTS Singers presented at initiation of study with an average SVHI of 9.2 (±9), which decreased at time of follow up 14 years later to an average of 5.12 (±6). Laryngeal assessments (RFS and SRS) revealed low averages at initiation of study as well as at conclusion of the study with only small fluctuations in averages, with findings mainly relating to arytenoid asymmetry. CONCLUSION The participating singers perform and teach rough vocal effects continually and present with healthy laryngeal mechanisms and within-normal SVHI and GRBAS scores. The findings suggest that controlled supraglottic narrowing and techniques to allow for supraglottic structures to engage in vibration as an additional noise source can be performed sustainable and in a healthy manner if performed with correct vocal technique.
Collapse
Affiliation(s)
- Mathias Aaen
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom.
| | | | - Anna White
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom
| | - Reza Nouraei
- University of Southampton, University Road, Southampton, United Kingdom
| | - Julian McGlashan
- Department of Otorhinolaryngology, Queen's Medical Centre Campus, Nottingham University Hospitals, Nottingham, United Kingdom
| |
Collapse
|
9
|
Ruthven M, Miquel ME, King AP. Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 198:105814. [PMID: 33197740 PMCID: PMC7732702 DOI: 10.1016/j.cmpb.2020.105814] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/19/2020] [Indexed: 06/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Magnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods. METHODS Five speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure. RESULTS The segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets. CONCLUSIONS An automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition, a novel clinically relevant metric for assessing the accuracy of vocal tract and articulator segmentation methods was developed.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom; School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom.
| | - Marc E Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London EC1A 7BE, United Kingdom; Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre, William Harvey Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London SE1 7EH, United Kingdom
| |
Collapse
|