1
|
Roon KD, Chen WR, Iwasaki R, Kang J, Kim B, Shejaeya G, Tiede MK, Whalen DH. Comparison of auto-contouring and hand-contouring of ultrasound images of the tongue surface. CLINICAL LINGUISTICS & PHONETICS 2022; 36:1112-1131. [PMID: 34974782 PMCID: PMC9250540 DOI: 10.1080/02699206.2021.1998633] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 10/13/2021] [Accepted: 10/13/2021] [Indexed: 06/04/2023]
Abstract
Contours traced by trained phoneticians have been considered to be the most accurate way to identify the midsagittal tongue surface from ultrasound video frames. In this study, inter-measurer reliability was evaluated using measures that quantified both how closely human-placed contours approximated each other as well as how consistent measurers were in defining the start and end points of contours. High reliability across three measurers was found for all measures, consistent with treating contours placed by trained phoneticians as the 'gold standard.' However, due to the labour-intensive nature of hand-placing contours, automatic algorithms that detect the tongue surface are increasingly being used to extract tongue-surface data from ultrasound videos. Contours placed by six automatic algorithms (SLURP, EdgeTrak, EPCS, and three different configurations of the algorithm provided in Articulate Assistant Advanced) were compared to human-placed contours, with the same measures used to evaluate the consistency of the trained phoneticians. We found that contours defined by SLURP, EdgeTrak, and two of the AAA configurations closely matched the hand-placed contours along sections of the image where the algorithms and humans agreed that there was a discernible contour. All of the algorithms were much less reliable than humans in determining the anterior (tongue-tip) edge of tongue contours. Overall, the contours produced by SLURP, EdgeTrak, and AAA should be useable in a variety of clinical applications, subject to spot-checking. Additional practical considerations of these algorithms are also discussed.
Collapse
Affiliation(s)
- Kevin D. Roon
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | | | - Rion Iwasaki
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | - Jaekoo Kang
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | - Boram Kim
- Haskins Laboratories, New Haven, Connecticut, USA
- Program in Linguistics, CUNY Graduate Center, New York, USA
| | - Ghada Shejaeya
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
| | | | - D. H. Whalen
- Program in Speech-Language-Hearing Sciences, CUNY Graduate Center, New York, USA
- Haskins Laboratories, New Haven, Connecticut, USA
- Department of Linguistics, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
2
|
Zhao Z, Lim Y, Byrd D, Narayanan S, Nayak KS. Improved 3D real-time MRI of speech production. Magn Reson Med 2021; 85:3182-3195. [PMID: 33452722 DOI: 10.1002/mrm.28651] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 10/29/2020] [Accepted: 11/26/2020] [Indexed: 01/21/2023]
Abstract
PURPOSE To provide 3D real-time MRI of speech production with improved spatio-temporal sharpness using randomized, variable-density, stack-of-spiral sampling combined with a 3D spatio-temporally constrained reconstruction. METHODS We evaluated five candidate (k, t) sampling strategies using a previously proposed gradient-echo stack-of-spiral sequence and a 3D constrained reconstruction with spatial and temporal penalties. Regularization parameters were chosen by expert readers based on qualitative assessment. We experimentally determined the effect of spiral angle increment and kz temporal order. The strategy yielding highest image quality was chosen as the proposed method. We evaluated the proposed and original 3D real-time MRI methods in 2 healthy subjects performing speech production tasks that invoke rapid movements of articulators seen in multiple planes, using interleaved 2D real-time MRI as the reference. We quantitatively evaluated tongue boundary sharpness in three locations at two speech rates. RESULTS The proposed data-sampling scheme uses a golden-angle spiral increment in the kx -ky plane and variable-density, randomized encoding along kz . It provided a statistically significant improvement in tongue boundary sharpness score (P < .001) in the blade, body, and root of the tongue during normal and 1.5-times speeded speech. Qualitative improvements were substantial during natural speech tasks of alternating high, low tongue postures during vowels. The proposed method was also able to capture complex tongue shapes during fast alveolar consonant segments. Furthermore, the proposed scheme allows flexible retrospective selection of temporal resolution. CONCLUSION We have demonstrated improved 3D real-time MRI of speech production using randomized, variable-density, stack-of-spiral sampling with a 3D spatio-temporally constrained reconstruction.
Collapse
Affiliation(s)
- Ziwei Zhao
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA.,Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
3
|
Heyne M, Derrick D, Al-Tamimi J. Native Language Influence on Brass Instrument Performance: An Application of Generalized Additive Mixed Models (GAMMs) to Midsagittal Ultrasound Images of the Tongue. Front Psychol 2019; 10:2597. [PMID: 31827453 PMCID: PMC6890863 DOI: 10.3389/fpsyg.2019.02597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/01/2019] [Indexed: 02/06/2023] Open
Abstract
This paper presents the findings of an ultrasound study of 10 New Zealand English and 10 Tongan-speaking trombone players, to determine whether there is an influence of native language speech production on trombone performance. Trombone players' midsagittal tongue shapes were recorded while reading wordlists and during sustained note productions, and tongue surface contours traced. After normalizing to account for differences in vocal tract shape and ultrasound transducer orientation, we used generalized additive mixed models (GAMMs) to estimate average tongue surface shapes used by the players from the two language groups when producing notes at different pitches and intensities, and during the production of the monophthongs in their native languages. The average midsagittal tongue contours predicted by our models show a statistically robust difference at the back of the tongue distinguishing the two groups, where the New Zealand English players display an overall more retracted tongue position; however, tongue shape during playing does not directly map onto vowel tongue shapes as prescribed by the pedagogical literature. While the New Zealand English-speaking participants employed a playing tongue shape approximating schwa and the vowel used in the word 'lot,' the Tongan participants used a tongue shape loosely patterning with the back vowels /o/ and /u/. We argue that these findings represent evidence for native language influence on brass instrument performance; however, this influence seems to be secondary to more basic constraints of brass playing related to airflow requirements and acoustical considerations, with the vocal tract configurations observed across both groups satisfying these conditions in different ways. Our findings furthermore provide evidence for the functional independence of various sections of the tongue and indicate that speech production, itself an acquired motor skill, can influence another skilled behavior via motor memory of vocal tract gestures forming the basis of local optimization processes to arrive at a suitable tongue shape for sustained note production.
Collapse
Affiliation(s)
- Matthias Heyne
- Speech Laboratory, Department of Speech, Language & Hearing Sciences, College of Health & Rehabilitation Sciences: Sargent College, Boston University, Boston, MA, United States
- New Zealand Institute of Language Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Donald Derrick
- New Zealand Institute of Language Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Jalal Al-Tamimi
- Speech and Language Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
4
|
Abstract
Purpose
Speech production is a complex 3-dimensional (3D) process, and yet most of what is known about it is derived from 2D midsagittal data. The relatively recent development of safe 3D imaging technologies (including magnetic resonance imaging and ultrasound) provide new opportunities to revisit and reformulate what is already known and to push the boundaries of current knowledge still further. A particularly useful imaging modality for this purpose is 3D/4D ultrasound, which until very recently was not well suited for studies in speech research. This technical report presents an overview of what 3D/4D ultrasound can contribute to speech research, with a focus on 2 demonstrations.
Conclusion
The 1st demonstration illustrates how 3D/4D ultrasound makes it possible to image certain vocal tract anatomical structures and planes that conventional 2D ultrasound is not capable of imaging. The 2nd demonstration illustrates how 3D/4D ultrasound can be combined with static 3D magnetic resonance imaging to provide new insight into the temporal pervasiveness and spatial extensiveness of lateral contact between the tongue and palate–teeth during speech production.
Collapse
Affiliation(s)
- Steven M. Lulich
- Department of Speech & Hearing Sciences, Indiana University, Bloomington
| | - William G. Pearson
- Department of Cellular Biology and Anatomy, Medical College of Georgia, Augusta
| |
Collapse
|
5
|
Bressmann T, Harper S, Zhylich I, Kulkarni GV. Perceptual, durational and tongue displacement measures following articulation therapy for rhotic sound errors. CLINICAL LINGUISTICS & PHONETICS 2016; 30:345-362. [PMID: 26979162 DOI: 10.3109/02699206.2016.1140227] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Outcomes of articulation therapy for rhotic errors are usually assessed perceptually. However, our understanding of associated changes of tongue movement is limited. This study described perceptual, durational and tongue displacement changes over 10 sessions of articulation therapy for /ɹ/ in six children. Four of the participants also received ultrasound biofeedback of their tongue shape. Speech and tongue movement were recorded pre-therapy, after 5 sessions, in the final session and at a one month follow-up. Perceptually, listeners perceived improvement and classified more productions as /ɹ/ in the final and follow-up assessments. The durations of VɹV syllables at the midway point of the therapy were longer. Cumulative tongue displacement increased in the final session. The average standard deviation was significantly higher in the middle and final assessments. The duration and tongue displacement measures illustrated how articulation therapy affected tongue movement and may be useful for outcomes research about articulation therapy.
Collapse
Affiliation(s)
- Tim Bressmann
- a Department of Speech-Language Pathology , University of Toronto , Toronto , ON , Canada
- b Faculty of Dentistry , University of Toronto , Toronto , ON , Canada
| | - Susan Harper
- a Department of Speech-Language Pathology , University of Toronto , Toronto , ON , Canada
| | - Irina Zhylich
- a Department of Speech-Language Pathology , University of Toronto , Toronto , ON , Canada
| | | |
Collapse
|
6
|
Heyne M, Derrick D. Using a radial ultrasound probe's virtual origin to compute midsagittal smoothing splines in polar coordinates. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:EL509-EL514. [PMID: 26723359 DOI: 10.1121/1.4937168] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Tongue surface measurements from midsagittal ultrasound scans are effectively arcs with deviations representing tongue shape, but smoothing-spline analysis of variances (SSANOVAs) assume variance around a horizontal line. Therefore, calculating SSANOVA average curves of tongue traces in Cartesian Coordinates [Davidson, J. Acoust. Soc. Am. 120(1), 407-415 (2006)] creates errors that are compounded at tongue tip and root where average tongue shape deviates most from a horizontal line. This paper introduces a method for transforming data into polar coordinates similar to the technique by Mielke [J. Acoust. Soc. Am. 137(5), 2858-2869 (2015)], but using the virtual origin of a radial ultrasound transducer as the polar origin-allowing data conversion in a manner that is robust against between-subject and between-session variability.
Collapse
Affiliation(s)
- Matthias Heyne
- Department of Linguistics, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| | - Donald Derrick
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Private Bag 4800, Christchurch, New Zealand ,
| |
Collapse
|
7
|
Zharkova N, Hewlett N, Hardcastle WJ, Lickley RJ. Spatial and temporal lingual coarticulation and motor control in preadolescents. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:374-388. [PMID: 24686467 DOI: 10.1044/2014_jslhr-s-11-0350] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE In this study, the authors compared coarticulation and lingual kinematics in preadolescents and adults in order to establish whether preadolescents had a greater degree of random variability in tongue posture and whether their patterns of lingual coarticulation differed from those of adults. METHOD High-speed ultrasound tongue contour data synchronized with the acoustic signal were recorded from 15 children (ages 10-12 years) and 15 adults. Tongue shape contours were analyzed at 9 normalized time points during the fricative phase of schwa-fricative-/a/ and schwa-fricative-/i/ sequences with the consonants /s/ and /ʃ/. RESULTS There was no significant age-related difference in random variability. Where a significant vowel effect occurred, the amount of coarticulation was similar in the 2 groups. However, the onset of the coarticulatory effect on preadolescent /ʃ/ was significantly later than on preadolescent /s/, and also later than on adult /s/ and /ʃ/. CONCLUSIONS Preadolescents have adult-like precision of tongue control and adult-like anticipatory lingual coarticulation with respect to spatial characteristics of tongue posture. However, there remains some immaturity in the motor programming of certain complex tongue movements.
Collapse
|
8
|
Zharkova N. A normative-speaker validation study of two indices developed to quantify tongue dorsum activity from midsagittal tongue shapes. CLINICAL LINGUISTICS & PHONETICS 2013; 27:484-496. [PMID: 23651147 DOI: 10.3109/02699206.2013.778903] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This study reported adult scores on two measures of tongue shape, based on midsagittal tongue shape data from ultrasound imaging. One of the measures quantified the extent of tongue dorsum excursion, and the other measure represented the place of maximal excursion. Data from six adult speakers of Scottish Standard English without speech disorders were analyzed. The stimuli included a range of consonants in consonant-vowel sequences, with the vowels /a/ and /i/. The measures reliably distinguished between articulations with and without tongue dorsum excursion, and produced robust results on lingual coarticulation of the consonants. The reported data can be used as a starting point for collecting more typical data and for analyzing disordered speech. The measurements do not require head-to-transducer stabilization. Possible applications of the measures include studying tongue dorsum overuse in people with cleft palate, and typical and disordered development of coarticulation.
Collapse
Affiliation(s)
- Natalia Zharkova
- Clinical Audiology, Speech and Language Research Centre, Queen Margaret University, Edinburgh, UK.
| |
Collapse
|