1
|
Shahid MS, French AP, Valstar MF, Yakubov GE. Research in methodologies for modelling the oral cavity. Biomed Phys Eng Express 2024; 10:032001. [PMID: 38350128 DOI: 10.1088/2057-1976/ad28cc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/13/2024] [Indexed: 02/15/2024]
Abstract
The paper aims to explore the current state of understanding surrounding in silico oral modelling. This involves exploring methodologies, technologies and approaches pertaining to the modelling of the whole oral cavity; both internally and externally visible structures that may be relevant or appropriate to oral actions. Such a model could be referred to as a 'complete model' which includes consideration of a full set of facial features (i.e. not only mouth) as well as synergistic stimuli such as audio and facial thermal data. 3D modelling technologies capable of accurately and efficiently capturing a complete representation of the mouth for an individual have broad applications in the study of oral actions, due to their cost-effectiveness and time efficiency. This review delves into the field of clinical phonetics to classify oral actions pertaining to both speech and non-speech movements, identifying how the various vocal organs play a role in the articulatory and masticatory process. Vitaly, it provides a summation of 12 articulatory recording methods, forming a tool to be used by researchers in identifying which method of recording is appropriate for their work. After addressing the cost and resource-intensive limitations of existing methods, a new system of modelling is proposed that leverages external to internal correlation modelling techniques to create a more efficient models of the oral cavity. The vision is that the outcomes will be applicable to a broad spectrum of oral functions related to physiology, health and wellbeing, including speech, oral processing of foods as well as dental health. The applications may span from speech correction, designing foods for the aging population, whilst in the dental field we would be able to gain information about patient's oral actions that would become part of creating a personalised dental treatment plan.
Collapse
Affiliation(s)
| | - Andrew P French
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| | - Michel F Valstar
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
| | - Gleb E Yakubov
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| |
Collapse
|
2
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
3
|
Serrurier A, Neuschaefer-Rube C. Morphological and acoustic modeling of the vocal tract. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1867. [PMID: 37002095 DOI: 10.1121/10.0017356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 02/07/2023] [Indexed: 05/18/2023]
Abstract
In speech production, the anatomical morphology forms the substrate on which the speakers build their articulatory strategy to reach specific articulatory-acoustic goals. The aim of this study is to characterize morphological inter-speaker variability by building a shape model of the full vocal tract including hard and soft structures. Static magnetic resonance imaging data from 41 speakers articulating altogether 1947 phonemes were considered, and the midsagittal articulator contours were manually outlined. A phoneme-independent average-articulation representative of morphology was calculated as the speaker mean articulation. A principal component analysis-driven shape model was derived from average-articulations, leading to five morphological components, which explained 87% of the variance. Almost three-quarters of the variance was related to independent variations of the horizontal oral and vertical pharyngeal lengths, the latter capturing male-female differences. The three additional components captured shape variations related to head tilt and palate shape. Plane wave propagation acoustic simulations were run to characterize morphological components. A lengthening of 1 cm of the vocal tract in the vertical or horizontal directions led to a decrease in formant values of 7%-8%. Further analyses are required to analyze three-dimensional variability and to understand the morphological-acoustic relationships per phoneme. Average-articulations and model code are publicly available (https://github.com/tonioser/VTMorphologicalModel).
Collapse
Affiliation(s)
- Antoine Serrurier
- Clinic for Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital and Medical Faculty of the RWTH Aachen University, 52057 Aachen, Germany
| | - Christiane Neuschaefer-Rube
- Clinic for Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital and Medical Faculty of the RWTH Aachen University, 52057 Aachen, Germany
| |
Collapse
|
4
|
Al-hammuri K, Gebali F, Thirumarai Chelvan I, Kanan A. Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review. Diagnostics (Basel) 2022; 12:diagnostics12112811. [PMID: 36428870 PMCID: PMC9689563 DOI: 10.3390/diagnostics12112811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/07/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022] Open
Abstract
Lingual ultrasound imaging is essential in linguistic research and speech recognition. It has been used widely in different applications as visual feedback to enhance language learning for non-native speakers, study speech-related disorders and remediation, articulation research and analysis, swallowing study, tongue 3D modelling, and silent speech interface. This article provides a comparative analysis and review based on quantitative and qualitative criteria of the two main streams of tongue contour segmentation from ultrasound images. The first stream utilizes traditional computer vision and image processing algorithms for tongue segmentation. The second stream uses machine and deep learning algorithms for tongue segmentation. The results show that tongue tracking using machine learning-based techniques is superior to traditional techniques, considering the performance and algorithm generalization ability. Meanwhile, traditional techniques are helpful for implementing interactive image segmentation to extract valuable features during training and postprocessing. We recommend using a hybrid approach to combine machine learning and traditional techniques to implement a real-time tongue segmentation tool.
Collapse
Affiliation(s)
- Khalid Al-hammuri
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
- Correspondence:
| | - Fayez Gebali
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
| | | | - Awos Kanan
- Department of Computer Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan
| |
Collapse
|
5
|
3D Dynamic Spatiotemporal Atlas of the Vocal Tract during Consonant–Vowel Production from 2D Real Time MRI. J Imaging 2022; 8:jimaging8090227. [PMID: 36135393 PMCID: PMC9504642 DOI: 10.3390/jimaging8090227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/12/2022] [Accepted: 08/19/2022] [Indexed: 11/21/2022] Open
Abstract
In this work, we address the problem of creating a 3D dynamic atlas of the vocal tract that captures the dynamics of the articulators in all three dimensions in order to create a global speaker model independent of speaker-specific characteristics. The core steps of the proposed method are the temporal alignment of the real-time MR images acquired in several sagittal planes and their combination with adaptive kernel regression. As a preprocessing step, a reference space was created to be used in order to remove anatomical information of the speakers and keep only the variability in speech production for the construction of the atlas. The adaptive kernel regression makes the choice of atlas time points independently of the time points of the frames that are used as an input for the construction. The evaluation of this atlas construction method was made by mapping two new speakers to the atlas and by checking how similar the resulting mapped images are. The use of the atlas helps in reducing subject variability. The results show that the use of the proposed atlas can capture the dynamic behavior of the articulators and is able to generalize the speech production process by creating a universal-speaker reference space.
Collapse
|
6
|
Lu Y, Wiltshire CEE, Watkins KE, Chiew M, Goldstein L. Characteristics of articulatory gestures in stuttered speech: A case study using real-time magnetic resonance imaging. JOURNAL OF COMMUNICATION DISORDERS 2022; 97:106213. [PMID: 35397388 DOI: 10.1016/j.jcomdis.2022.106213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 02/09/2022] [Accepted: 03/14/2022] [Indexed: 06/14/2023]
Abstract
INTRODUCTION Most of the previous articulatory studies of stuttering have focussed on the fluent speech of people who stutter. However, to better understand what causes the actual moments of stuttering, it is necessary to probe articulatory behaviors during stuttered speech. We examined the supralaryngeal articulatory characteristics of stuttered speech using real-time structural magnetic resonance imaging (RT-MRI). We investigated how articulatory gestures differ across stuttered and fluent speech of the same speaker. METHODS Vocal tract movements of an adult man who stutters during a pseudoword reading task were recorded using RT-MRI. Four regions of interest (ROIs) were defined on RT-MRI image sequences around the lips, tongue tip, tongue body, and velum. The variation of pixel intensity in each ROI over time provided an estimate of the movement of these four articulators. RESULTS All disfluencies occurred on syllable-initial consonants. Three articulatory patterns were identified. Pattern 1 showed smooth gestural formation and release like fluent speech. Patterns 2 and 3 showed delayed release of gestures due to articulator fixation or oscillation respectively. Block and prolongation corresponded to either pattern 1 or 2. Repetition corresponded to pattern 3 or a mix of patterns. Gestures for disfluent consonants typically exhibited a greater constriction than fluent gestures, which was rarely corrected during disfluencies. Gestures for the upcoming vowel were initiated and executed during these consonant disfluencies, achieving a tongue body position similar to the fluent counterpart. CONCLUSION Different perceptual types of disfluencies did not necessarily result from distinct articulatory patterns, highlighting the importance of collecting articulatory data of stuttering. Disfluencies on syllable-initial consonants were related to the delayed release and the overshoot of consonant gestures, rather than the delayed initiation of vowel gestures. This suggests that stuttering does not arise from problems with planning the vowel gestures, but rather with releasing the overly constricted consonant gestures.
Collapse
Affiliation(s)
- Yijing Lu
- Department of Linguistics, University of Southern California, United States.
| | - Charlotte E E Wiltshire
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, United Kingdom.
| | - Kate E Watkins
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, United Kingdom.
| | - Mark Chiew
- Wellcome Centre for Integrative Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, United Kingdom.
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, United States.
| |
Collapse
|
7
|
Nayak KS, Lim Y, Campbell-Washburn AE, Steeden J. Real-Time Magnetic Resonance Imaging. J Magn Reson Imaging 2022; 55:81-99. [PMID: 33295674 PMCID: PMC8435094 DOI: 10.1002/jmri.27411] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/06/2020] [Accepted: 10/09/2020] [Indexed: 01/03/2023] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) allows for imaging dynamic processes as they occur, without relying on any repetition or synchronization. This is made possible by modern MRI technology such as fast-switching gradients and parallel imaging. It is compatible with many (but not all) MRI sequences, including spoiled gradient echo, balanced steady-state free precession, and single-shot rapid acquisition with relaxation enhancement. RT-MRI has earned an important role in both diagnostic imaging and image guidance of invasive procedures. Its unique diagnostic value is prominent in areas of the body that undergo substantial and often irregular motion, such as the heart, gastrointestinal system, upper airway vocal tract, and joints. Its value in interventional procedure guidance is prominent for procedures that require multiple forms of soft-tissue contrast, as well as flow information. In this review, we discuss the history of RT-MRI, fundamental tradeoffs, enabling technology, established applications, and current trends. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY STAGE: 1.
Collapse
Affiliation(s)
- Krishna S. Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA,Address reprint requests to: K.S.N., 3740 McClintock Ave, EEB 400C, Los Angeles, CA 90089-2564, USA.
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, California, USA
| | - Adrienne E. Campbell-Washburn
- Cardiovascular Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Jennifer Steeden
- Institute of Cardiovascular Science, Centre for Cardiovascular Imaging, University College London, London, UK
| |
Collapse
|
8
|
Xing F, Jin R, Gilbert IR, Perry JL, Sutton BP, Liu X, El Fakhri G, Shosted RK, Woo J. 4D magnetic resonance imaging atlas construction using temporally aligned audio waveforms in speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3500. [PMID: 34852570 PMCID: PMC8580575 DOI: 10.1121/10.0007064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 09/16/2021] [Accepted: 10/15/2021] [Indexed: 06/13/2023]
Abstract
Magnetic resonance (MR) imaging is becoming an established tool in capturing articulatory and physiological motion of the structures and muscles throughout the vocal tract and enabling visual and quantitative assessment of real-time speech activities. Although motion capture speed has been regularly improved by the continual developments in high-speed MR technology, quantitative analysis of multi-subject group data remains challenging due to variations in speaking rate and imaging time among different subjects. In this paper, a workflow of post-processing methods that matches different MR image datasets within a study group is proposed. Each subject's recorded audio waveform during speech is used to extract temporal domain information and generate temporal alignment mappings from their matching pattern. The corresponding image data are resampled by deformable registration and interpolation of the deformation fields, achieving inter-subject temporal alignment between image sequences. A four-dimensional dynamic MR speech atlas is constructed using aligned volumes from four human subjects. Similarity tests between subject and target domains using the squared error, cross correlation, and mutual information measures all show an overall score increase after spatiotemporal alignment. The amount of image variability in atlas construction is reduced, indicating a quality increase in the multi-subject data for groupwise quantitative analysis.
Collapse
Affiliation(s)
- Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Riwei Jin
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Imani R Gilbert
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27858, USA
| | - Jamie L Perry
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27858, USA
| | - Bradley P Sutton
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Ryan K Shosted
- Department of Linguistics, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
9
|
Real-time magnetic resonance imaging: mechanics of oral and facial function. Br J Oral Maxillofac Surg 2021; 60:596-603. [DOI: 10.1016/j.bjoms.2021.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 11/19/2022]
|
10
|
Boustani M, Lunn S, Visser U, Lisetti C. Development, Feasibility, Acceptability, and Utility of an Expressive Speech-Enabled Digital Health Agent to Deliver Online, Brief Motivational Interviewing for Alcohol Misuse: Descriptive Study. J Med Internet Res 2021; 23:e25837. [PMID: 34586074 PMCID: PMC8515230 DOI: 10.2196/25837] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 05/26/2021] [Accepted: 05/29/2021] [Indexed: 01/27/2023] Open
Abstract
Background Digital health agents — embodied conversational agents designed specifically for health interventions — provide a promising alternative or supplement to behavioral health services by reducing barriers to access to care. Objective Our goals were to (1) develop an expressive, speech-enabled digital health agent operating in a 3-dimensional virtual environment to deliver a brief behavioral health intervention over the internet to reduce alcohol use and to (2) understand its acceptability, feasibility, and utility with its end users. Methods We developed an expressive, speech-enabled digital health agent with facial expressions and body gestures operating in a 3-dimensional virtual office and able to deliver a brief behavioral health intervention over the internet to reduce alcohol use. We then asked 51 alcohol users to report on the digital health agent acceptability, feasibility, and utility. Results The developed digital health agent uses speech recognition and a model of empathetic verbal and nonverbal behaviors to engage the user, and its performance enabled it to successfully deliver a brief behavioral health intervention over the internet to reduce alcohol use. Descriptive statistics indicated that participants had overwhelmingly positive experiences with the digital health agent, including engagement with the technology, acceptance, perceived utility, and intent to use the technology. Illustrative qualitative quotes provided further insight about the potential reach and impact of digital health agents in behavioral health care. Conclusions Web-delivered interventions delivered by expressive, speech-enabled digital health agents may provide an exciting complement or alternative to traditional one-on-one treatment. They may be especially helpful for hard-to-reach communities with behavioral workforce shortages.
Collapse
Affiliation(s)
- Maya Boustani
- Department of Psychology, Loma Linda University, Loma Linda, CA, United States
| | - Stephanie Lunn
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL, United States
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Miami, FL, United States
| | - Christine Lisetti
- Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL, United States
| |
Collapse
|
11
|
Nakao Y, Uchiyama Y, Honda K, Hasegawa Y, Nanto T, Jomoto W, Domen K. Tongue pressure waveform analysis for ascertaining the influence of tongue muscle composition on articulation. J Oral Rehabil 2021; 48:1347-1353. [PMID: 34491591 DOI: 10.1111/joor.13255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/19/2021] [Accepted: 09/01/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Rate force development is associated with performance and muscle composition in whole-body muscle. Although rate force development on tongue muscle can be examined using tongue pressure waveform, there have been only few investigations on this topic. OBJECTIVES This study's main purpose was to investigate the reliability of tongue pressure waveform analysis and its relationship with articulation and tongue muscle composition. In addition, we also investigated the association between tongue muscle composition and articulation. METHODS Forty-five community-dwelling individuals aged >20 years participated. We analysed tongue pressure waveform, including maximum tongue pressure (MTP), time to peak, mean rate of tongue force development and peak rate of tongue force development (PRTFD). We also assessed oral diadochokinesis. Magnetic resonance imaging of the tongue provided data on tongue muscle composition, including tongue volume, fat mass, lean muscle mass and fat percentage. We evaluated the reliability of tongue pressure waveform analysis. Moreover, we examined the coefficients between tongue pressure waveform analysis, oral diadochokinesis and tongue composition. RESULTS We detected a high reliability of MTP and PRTFD. MTP and PRTFD were significantly correlated with tongue muscle composition. MTP was not significantly correlated with oral diadochokinesis. PRTFD was significantly positively correlated with oral diadochokinesis. Tongue fat mass and fat percentage were negatively correlated with oral diadochokinesis of /ta/ and /ka/. CONCLUSIONS Peak rate of tongue force development is a highly reliable method for tongue pressure analysis and is useful for elucidating the functional importance of tongue muscle function on articulation. We speculated that fatty infiltration of the tongue adversely affects articulation.
Collapse
Affiliation(s)
- Yuta Nakao
- Department of Rehabilitation, Hyogo College of Medicine College Hospital, Nishinomiya, Japan.,Department of Rehabilitation Therapy, Kurashiki Central Hospital, Kurashiki, Japan
| | - Yuki Uchiyama
- Department of Rehabilitation Medicine, Hyogo College of Medicine, Nishinomiya, Japan
| | - Kosuke Honda
- Department of Dentistry and Oral Surgery, Hyogo College of Medicine, Nishinomiya, Japan
| | - Yoko Hasegawa
- Department of Dentistry and Oral Surgery, Hyogo College of Medicine, Nishinomiya, Japan.,Division of Comprehensive Prosthodontics, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Tomoki Nanto
- Department of Rehabilitation, Hyogo College of Medicine College Hospital, Nishinomiya, Japan
| | - Wataru Jomoto
- Department of Radiological Technology, Hyogo College of Medicine College Hospital, Nishinomiya, Japan
| | - Kazuhisa Domen
- Department of Rehabilitation Medicine, Hyogo College of Medicine, Nishinomiya, Japan
| |
Collapse
|
12
|
Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021; 8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Colin Vaz
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Tanner Sorensen
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Miran Oh
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Sarah Harper
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yoonjeong Lee
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Johannes Töger
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Mairym Lloréns Monteserin
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Caitlin Smith
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Bianca Godinez
- Department of Linguistics, California State University Long Beach, Long Beach, California, USA
| | - Louis Goldstein
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
| |
Collapse
|
13
|
Wiltshire CEE, Chiew M, Chesters J, Healy MP, Watkins KE. Speech Movement Variability in People Who Stutter: A Vocal Tract Magnetic Resonance Imaging Study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2438-2452. [PMID: 34157239 PMCID: PMC8323486 DOI: 10.1044/2021_jslhr-20-00507] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 01/29/2021] [Accepted: 03/01/2021] [Indexed: 06/01/2023]
Abstract
Purpose People who stutter (PWS) have more unstable speech motor systems than people who are typically fluent (PWTF). Here, we used real-time magnetic resonance imaging (MRI) of the vocal tract to assess variability and duration of movements of different articulators in PWS and PWTF during fluent speech production. Method The vocal tracts of 28 adults with moderate to severe stuttering and 20 PWTF were scanned using MRI while repeating simple and complex pseudowords. Midsagittal images of the vocal tract from lips to larynx were reconstructed at 33.3 frames per second. For each participant, we measured the variability and duration of movements across multiple repetitions of the pseudowords in three selected articulators: the lips, tongue body, and velum. Results PWS showed significantly greater speech movement variability than PWTF during fluent repetitions of pseudowords. The group difference was most evident for measurements of lip aperture using these stimuli, as reported previously, but here, we report that movements of the tongue body and velum were also affected during the same utterances. Variability was not affected by phonological complexity. Speech movement variability was unrelated to stuttering severity within the PWS group. PWS also showed longer speech movement durations relative to PWTF for fluent repetitions of multisyllabic pseudowords, and this group difference was even more evident as complexity increased. Conclusions Using real-time MRI of the vocal tract, we found that PWS produced more variable movements than PWTF even during fluent productions of simple pseudowords. PWS also took longer to produce multisyllabic words relative to PWTF, particularly when words were more complex. This indicates general, trait-level differences in the control of the articulators between PWS and PWTF. Supplemental Material https://doi.org/10.23641/asha.14782092.
Collapse
Affiliation(s)
- Charlotte E. E. Wiltshire
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Mark Chiew
- Wellcome Centre for Integrative Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, United Kingdom
| | - Jennifer Chesters
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Máiréad P. Healy
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| | - Kate E. Watkins
- Wellcome Centre for Integrative Neuroimaging, Department of Experimental Psychology, Radcliffe Observatory Quarter, University of Oxford, United Kingdom
| |
Collapse
|
14
|
Tian Y, Lim Y, Zhao Z, Byrd D, Narayanan S, Nayak KS. Aliasing artifact reduction in spiral real-time MRI. Magn Reson Med 2021; 86:916-925. [PMID: 33728700 DOI: 10.1002/mrm.28746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 01/09/2021] [Accepted: 02/02/2021] [Indexed: 12/17/2022]
Abstract
PURPOSE To mitigate a common artifact in spiral real-time MRI, caused by aliasing of signal outside the desired FOV. This artifact frequently occurs in midsagittal speech real-time MRI. METHODS Simulations were performed to determine the likely origin of the artifact. Two methods to mitigate the artifact are proposed. The first approach, denoted as "large FOV" (LF), keeps an FOV that is large enough to include the artifact signal source during reconstruction. The second approach, denoted as "estimation-subtraction" (ES), estimates the artifact signal source before subtracting a synthetic signal representing that source in multicoil k-space raw data. Twenty-five midsagittal speech-production real-time MRI data sets were used to evaluate both of the proposed methods. Reconstructions without and with corrections were evaluated by two expert readers using a 5-level Likert scale assessing artifact severity. Reconstruction time was also compared. RESULTS The origin of the artifact was found to be a combination of gradient nonlinearity and imperfect anti-aliasing in spiral sampling. The LF and ES methods were both able to substantially reduce the artifact, with an averaged qualitative score improvement of 1.25 and 1.35 Likert levels for LF correction and ES correction, respectively. Average reconstruction time without correction, with LF correction, and with ES correction were 160.69 ± 1.56, 526.43 ± 5.17, and 171.47 ± 1.71 ms/frame. CONCLUSION Both proposed methods were able to reduce the spiral aliasing artifacts, with the ES-reduction method being more effective and more time efficient.
Collapse
Affiliation(s)
- Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ziwei Zhao
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.,Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
15
|
Zhao Z, Lim Y, Byrd D, Narayanan S, Nayak KS. Improved 3D real-time MRI of speech production. Magn Reson Med 2021; 85:3182-3195. [PMID: 33452722 DOI: 10.1002/mrm.28651] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 10/29/2020] [Accepted: 11/26/2020] [Indexed: 01/21/2023]
Abstract
PURPOSE To provide 3D real-time MRI of speech production with improved spatio-temporal sharpness using randomized, variable-density, stack-of-spiral sampling combined with a 3D spatio-temporally constrained reconstruction. METHODS We evaluated five candidate (k, t) sampling strategies using a previously proposed gradient-echo stack-of-spiral sequence and a 3D constrained reconstruction with spatial and temporal penalties. Regularization parameters were chosen by expert readers based on qualitative assessment. We experimentally determined the effect of spiral angle increment and kz temporal order. The strategy yielding highest image quality was chosen as the proposed method. We evaluated the proposed and original 3D real-time MRI methods in 2 healthy subjects performing speech production tasks that invoke rapid movements of articulators seen in multiple planes, using interleaved 2D real-time MRI as the reference. We quantitatively evaluated tongue boundary sharpness in three locations at two speech rates. RESULTS The proposed data-sampling scheme uses a golden-angle spiral increment in the kx -ky plane and variable-density, randomized encoding along kz . It provided a statistically significant improvement in tongue boundary sharpness score (P < .001) in the blade, body, and root of the tongue during normal and 1.5-times speeded speech. Qualitative improvements were substantial during natural speech tasks of alternating high, low tongue postures during vowels. The proposed method was also able to capture complex tongue shapes during fast alveolar consonant segments. Furthermore, the proposed scheme allows flexible retrospective selection of temporal resolution. CONCLUSION We have demonstrated improved 3D real-time MRI of speech production using randomized, variable-density, stack-of-spiral sampling with a 3D spatio-temporally constrained reconstruction.
Collapse
Affiliation(s)
- Ziwei Zhao
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA.,Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
16
|
Martin J, Ruthven M, Boubertakh R, Miquel ME. Realistic Dynamic Numerical Phantom for MRI of the Upper Vocal Tract. J Imaging 2020; 6:86. [PMID: 34460743 PMCID: PMC8320850 DOI: 10.3390/jimaging6090086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/08/2020] [Accepted: 08/24/2020] [Indexed: 11/16/2022] Open
Abstract
Dynamic and real-time MRI (rtMRI) of human speech is an active field of research, with interest from both the linguistics and clinical communities. At present, different research groups are investigating a range of rtMRI acquisition and reconstruction approaches to visualise the speech organs. Similar to other moving organs, it is difficult to create a physical phantom of the speech organs to optimise these approaches; therefore, the optimisation requires extensive scanner access and imaging of volunteers. As previously demonstrated in cardiac imaging, realistic numerical phantoms can be useful tools for optimising rtMRI approaches and reduce reliance on scanner access and imaging volunteers. However, currently, no such speech rtMRI phantom exists. In this work, a numerical phantom for optimising speech rtMRI approaches was developed and tested on different reconstruction schemes. The novel phantom comprised a dynamic image series and corresponding k-space data of a single mid-sagittal slice with a temporal resolution of 30 frames per second (fps). The phantom was developed based on images of a volunteer acquired at a frame rate of 10 fps. The creation of the numerical phantom involved the following steps: image acquisition, image enhancement, segmentation, mask optimisation, through-time and spatial interpolation and finally the derived k-space phantom. The phantom was used to: (1) test different k-space sampling schemes (Cartesian, radial and spiral); (2) create lower frame rate acquisitions by simulating segmented k-space acquisitions; (3) simulate parallel imaging reconstructions (SENSE and GRAPPA). This demonstrated how such a numerical phantom could be used to optimise images and test multiple sampling strategies without extensive scanner access.
Collapse
Affiliation(s)
- Joe Martin
- MR Physics, Guy’s and St Thomas’ NHS Foundation Trust, St Thomas’s Hospital, London SE1 7EH, UK;
| | - Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
| | - Redha Boubertakh
- Singapore Bioimaging Consortium (SBIC), Singapore 138667, Singapore;
| | - Marc E. Miquel
- Clinical Physics, Barts Health NHS Trust, St Bartholomew’s Hospital, London EC1A 7BE, UK;
- Centre for Advanced Cardiovascular Imaging, NIHR Barts Biomedical Research Centre (BRC), William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
17
|
Automatic vocal tract landmark localization from midsagittal MRI data. Sci Rep 2020; 10:1468. [PMID: 32001739 PMCID: PMC6992757 DOI: 10.1038/s41598-020-58103-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 01/09/2020] [Indexed: 11/29/2022] Open
Abstract
The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
Collapse
|
18
|
Kappert KDR, van Alphen MJA, Smeele LE, Balm AJM, van der Heijden F. Quantification of tongue mobility impairment using optical tracking in patients after receiving primary surgery or chemoradiation. PLoS One 2019; 14:e0221593. [PMID: 31454385 PMCID: PMC6711543 DOI: 10.1371/journal.pone.0221593] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 08/10/2019] [Indexed: 11/18/2022] Open
Abstract
PURPOSE Tongue mobility has shown to be a clinically interesting parameter on functional results after tongue cancer treatment which can be objectified by measuring the Range Of Motion (ROM). Reliable measurements of ROM would enable us to quantify the severity of functional impairments and use these for shared decision making in treatment choices, rehabilitation of speech and swallowing disturbances after treatment. METHOD Nineteen healthy participants, eighteen post-chemotherapy patients and seventeen post-surgery patients were asked to perform standardized tongue maneuvers in front of a 3D camera system, which were subsequently tracked and corrected for head and jaw motion. Indicators, such as the left-right tongue range and the deflection angle with the horizontal axis were extracted from the tongue trajectory to serve as a quantitative measure for the impaired tongue mobility. RESULTS The range and deflection angle showed an excellent intra- and interrater reliability (ICC 0.9) The repeatability experiment showed an average standard deviation of 2.5 mm to 3.5 mm for every movement, except the upward movement. The post-surgery patient group showed a smaller tongue range and higher deflection angle overall than the healthy participants. Post-chemoradiation patients showed less difference in tongue ROM compared with healthy participants. Only a few patients showed asymmetrical movement after treatment, which could not always be explained by T-stage or the side of treatment alone. CONCLUSION We introduced a reliable and reproducible method for measuring the ROM and to quantify for motion impairments, that was able to show differences in tongue ROM between healthy subjects and patients after chemoradiation or surgery. Future research should focus on measuring patients with oral cancer pre- and post-treatment in combination with the collection of detailed information about the individual tongue anatomy, so that the full ROM trajectory can be used to identify changes over time and to quantify functional impairment.
Collapse
Affiliation(s)
- K. D. R. Kappert
- Head & Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Robotics and Mechatronics, University of Twente, Enschede, The Netherlands
- * E-mail:
| | - M. J. A. van Alphen
- Head & Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - L. E. Smeele
- Head & Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Oral and Maxillofacial Surgery, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - A. J. M. Balm
- Head & Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Robotics and Mechatronics, University of Twente, Enschede, The Netherlands
- Oral and Maxillofacial Surgery, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - F. van der Heijden
- Head & Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Robotics and Mechatronics, University of Twente, Enschede, The Netherlands
| |
Collapse
|
19
|
Kim YC. Fast upper airway magnetic resonance imaging for assessment of speech production and sleep apnea. PRECISION AND FUTURE MEDICINE 2018. [DOI: 10.23838/pfm.2018.00100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
|
20
|
Lim Y, Zhu Y, Lingala SG, Byrd D, Narayanan S, Nayak KS. 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med 2018; 81:1511-1520. [PMID: 30390319 DOI: 10.1002/mrm.27570] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/25/2018] [Accepted: 09/26/2018] [Indexed: 12/19/2022]
Abstract
PURPOSE To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech. METHODS We demonstrate 2.4 × 2.4 × 5.8 mm3 spatial resolution, 61-ms temporal resolution, and a 200 × 200 × 70 mm3 FOV. The proposed method uses 3D gradient-echo imaging with a custom upper-airway coil, a minimum-phase slab excitation, stack-of-spirals readout, pseudo golden-angle view order in kx -ky , linear Cartesian order along kz , and spatiotemporal finite difference constrained reconstruction, with 13-fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI. RESULTS This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants "s" and "l" and of consonant-vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume-of-interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels. CONCLUSION We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant-vowel syllables, without requiring multiple repetitions.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Yinghua Zhu
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Sajan Goud Lingala
- Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, Iowa
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| | - Krishna Shrinivas Nayak
- Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California
| |
Collapse
|