1
|
Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021; 8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Colin Vaz
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Tanner Sorensen
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Miran Oh
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Sarah Harper
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yoonjeong Lee
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Johannes Töger
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Mairym Lloréns Monteserin
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Caitlin Smith
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Bianca Godinez
- Department of Linguistics, California State University Long Beach, Long Beach, California, USA
| | - Louis Goldstein
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
| |
Collapse
|
2
|
Alexander R, Sorensen T, Toutios A, Narayanan S. A modular architecture for articulatory synthesis from gestural specification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:4458. [PMID: 31893678 PMCID: PMC7043897 DOI: 10.1121/1.5139413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 09/19/2019] [Accepted: 11/11/2019] [Indexed: 06/10/2023]
Abstract
This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.
Collapse
Affiliation(s)
- Rachel Alexander
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Tanner Sorensen
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Asterios Toutios
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Shrikanth Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| |
Collapse
|
3
|
Berner S, Schmidt AB, Zimmermann M, Pravdivtsev AN, Glöggler S, Hennig J, von Elverfeldt D, Hövener J. SAMBADENA Hyperpolarization of 13C-Succinate in an MRI: Singlet-Triplet Mixing Causes Polarization Loss. ChemistryOpen 2019; 8:728-736. [PMID: 31275794 PMCID: PMC6587320 DOI: 10.1002/open.201900139] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Indexed: 12/30/2022] Open
Abstract
The signal enhancement provided by the hyperpolarization of nuclear spins of biological molecules is a highly promising technique for diagnostic imaging. To date, most 13C-contrast agents had to be polarized in an extra, complex or cost intensive polarizer. Recently, the in situ hyperpolarization of a 13C contrast agent to >20 % was demonstrated without a polarizer but within the bore of an MRI system. This approach addresses some of the challenges of MRI with hyperpolarized tracers, i. e. elevated cost, long production times, and loss of polarization during transfer to the detection site. Here, we demonstrate the first hyperpolarization of a biomolecule in aqueous solution in the bore of an MRI at field strength of 7 T within seconds. The 13C nucleus of 1-13C, 2,3-2H2-succinate was polarized to 11 % corresponding to a signal enhancement of approximately 18.000. Interesting effects during the process of the hydrogenation reaction which lead to a significant loss of polarization have been observed.
Collapse
Affiliation(s)
- Stephan Berner
- Department of Radiology, Medical Physics, Medical Center – University of Freiburg, Faculty of MedicineUniversity of FreiburgKillianstraße 5a79106FreiburgGermany
- German Consortium for Cancer Research (DKTK) partner site Freiburg
- German Cancer Research Center (DKFZ)Im Neuenheimer Feld 28069120HeidelbergGermany
| | - Andreas B. Schmidt
- Department of Radiology, Medical Physics, Medical Center – University of Freiburg, Faculty of MedicineUniversity of FreiburgKillianstraße 5a79106FreiburgGermany
- Department of Radiology and Neuroradiology, Section Biomedical Imaging, MOIN CC, University Medical Center Schleswig-HolsteinUniversity of KielAm Botanischen Garten 1424118KielGermany
| | - Mirko Zimmermann
- Department of Radiology, Medical Physics, Medical Center – University of Freiburg, Faculty of MedicineUniversity of FreiburgKillianstraße 5a79106FreiburgGermany
| | - Andrey N. Pravdivtsev
- Department of Radiology and Neuroradiology, Section Biomedical Imaging, MOIN CC, University Medical Center Schleswig-HolsteinUniversity of KielAm Botanischen Garten 1424118KielGermany
| | - Stefan Glöggler
- Max Planck Institute for Biophysical Chemistry Am Fassberg 1137077GöttingenGermany
- Center for Biostructural Imaging of NeurodegenerationVon-Siebold-Straße 3a37075GöttingenGermany
| | - Jürgen Hennig
- Department of Radiology, Medical Physics, Medical Center – University of Freiburg, Faculty of MedicineUniversity of FreiburgKillianstraße 5a79106FreiburgGermany
| | - Dominik von Elverfeldt
- Department of Radiology, Medical Physics, Medical Center – University of Freiburg, Faculty of MedicineUniversity of FreiburgKillianstraße 5a79106FreiburgGermany
| | - Jan‐Bernd Hövener
- Department of Radiology and Neuroradiology, Section Biomedical Imaging, MOIN CC, University Medical Center Schleswig-HolsteinUniversity of KielAm Botanischen Garten 1424118KielGermany
| |
Collapse
|
4
|
Hamdan AL, Khalifee E, Ziade G, Semaan S. Sexual Dimorphism in Laryngeal Volumetric Measurements Using Magnetic Resonance Imaging. EAR, NOSE & THROAT JOURNAL 2019; 99:132-136. [PMID: 31018691 DOI: 10.1177/0145561319840568] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The objective of this study is to investigate the dimensional and volumetric measurements in the thyroarytenoid (TA) muscle in men and women using magnetic resonance imaging (MRI). The hypothesis is that there is a gender-related difference in these measurements. A retrospective chart review of 76 patients who underwent MRI of the neck at the American University of Beirut Medical Center was conducted. The dimension and volume of the right and left TA muscle were measured on axial and coronal planes short tau inversion recovery images. Male and female groups were compared with respect to demographic data and MRI findings using parametric and nonparametric tests. The mean length of the thyro-arytenoid muscle in males was larger than that in females on the right (males 2.44 [0.29] cm vs females 1.70 [0.22] cm) and on the left (males 2.50 [0.28] cm vs females 1.72 [0.24] cm) reaching statistical significance (P < .001). The mean width of the thyro-arytenoid muscle in males was larger than that in females on the right (males 0.68 [0.13] cm vs females 0.59 [0.11] cm) and on the left (males 0.68 [0.12] cm vs females 0.57 [0.12] cm) reaching statistical significance (P < .001). The mean height of the thyro-arytenoid muscle in males was larger than that in females on the right (males 1.05 [0.21] cm vs females 0.95 [0.12] cm) and on the left (males 1.05 [0.21] cm vs females 0.95 [0.12] cm) reaching statistical significance (P < .01 on the right and P < .05 on the left). The volume of the thyroarytenoid muscle in males was larger than that in females on the right (males 0.86 [0.25] mL vs females 0.48 [0.15] mL) and on the left (males 0.89 [0.27] mL vs females 0.48 [0.17] mL) reaching statistical significance (P < .001). The results of this investigation clearly indicate a significant difference in these measurements between men and women.
Collapse
Affiliation(s)
- Abdul-Latif Hamdan
- Department of Otolaryngology and Head & Neck Surgery, American University of Beirut-Medical Center, Beirut, Lebanon
| | - Elie Khalifee
- Department of Otolaryngology and Head & Neck Surgery, American University of Beirut-Medical Center, Beirut, Lebanon
| | - Georges Ziade
- Department of Otolaryngology and Head & Neck Surgery, American University of Beirut-Medical Center, Beirut, Lebanon
| | - Sahar Semaan
- Department of Clinical Radiology, American University of Beirut Medical Center, Beirut, Lebanon
| |
Collapse
|
5
|
Sorensen T, Toutios A, Goldstein L, Narayanan S. Task-dependence of articulator synergies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1504. [PMID: 31067947 PMCID: PMC6910022 DOI: 10.1121/1.5093538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 02/15/2019] [Accepted: 02/19/2019] [Indexed: 06/09/2023]
Abstract
In speech production, the motor system organizes articulators such as the jaw, tongue, and lips into synergies whose function is to produce speech sounds by forming constrictions at the phonetic places of articulation. The present study tests whether synergies for different constriction tasks differ in terms of inter-articulator coordination. The test is conducted on utterances [ɑpɑ], [ɑtɑ], [ɑiɑ], and [ɑkɑ] with a real-time magnetic resonance imaging biomarker that is computed using a statistical model of the forward kinematics of the vocal tract. The present study is the first to estimate the forward kinematics of the vocal tract from speech production data. Using the imaging biomarker, the study finds that the jaw contributes least to the velar stop for [k], more to pharyngeal approximation for [ɑ], still more to palatal approximation for [i], and most to the coronal stop for [t]. Additionally, the jaw contributes more to the coronal stop for [t] than to the bilabial stop for [p]. Finally, the study investigates how this pattern of results varies by participant. The study identifies differences in inter-articulator coordination by constriction task, which support the claim that inter-articulator coordination differs depending on the active articulator synergy.
Collapse
Affiliation(s)
- Tanner Sorensen
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Asterios Toutios
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
6
|
Oh M, Lee Y. ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in real-time magnetic resonance imaging speech production data. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:EL290. [PMID: 30404513 PMCID: PMC6192793 DOI: 10.1121/1.5057367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 08/28/2018] [Accepted: 09/11/2018] [Indexed: 06/08/2023]
Abstract
Real-time magnetic resonance imaging (MRI) speech production data have expanded the understanding of vocal tract actions. This letter presents an Automatic Centroid Tracking tool, ACT, which obtains both spatial and temporal information characterizing multi-directional articulatory movement. ACT auto-segments an articulatory object composed of connected pixels in a real-time MRI video, by finding its intensity centroids over time and returns kinematic profiles including direction and magnitude information of the object. This letter discusses the utility of ACT, which outperforms other similar object tracking techniques, by demonstrating its successful online tracking of vertical larynx movement. ACT can be deployed generally for dynamic image processing and analysis.
Collapse
Affiliation(s)
- Miran Oh
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA ,
| | - Yoonjeong Lee
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA ,
| |
Collapse
|
7
|
Carey D, Miquel ME, Evans BG, Adank P, McGettigan C. Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation. Cereb Cortex 2018; 27:3064-3079. [PMID: 28334401 PMCID: PMC5939209 DOI: 10.1093/cercor/bhx056] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Indexed: 12/23/2022] Open
Abstract
Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants’ vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST.
Collapse
Affiliation(s)
- Daniel Carey
- Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.,Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.,The Irish Longitudinal Study on Ageing (TILDA), Department of Medical Gerontology, Trinity College Dublin, Dublin, Ireland
| | - Marc E Miquel
- William Harvey Research Institute, Queen Mary, University of London, London EC1M 6BQ, UK.,Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, UK
| | - Bronwen G Evans
- Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK
| | - Patti Adank
- Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK
| | - Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.,Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.,Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, UK
| |
Collapse
|