1
|
Xia Z, Yuan R, Cao Y, Sun T, Xiong Y, Xu K. A systematic review of the application of machine learning techniques to ultrasound tongue imaging analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:1796-1819. [PMID: 39287468 DOI: 10.1121/10.0028610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 08/23/2024] [Indexed: 09/19/2024]
Abstract
B-mode ultrasound has emerged as a prevalent tool for observing tongue motion in speech production, gaining traction in speech therapy applications. However, the effective analysis of ultrasound tongue image frame sequences (UTIFs) encounters many challenges, such as the presence of high levels of speckle noise and obscured views. Recently, the application of machine learning, especially deep learning techniques, to UTIF interpretation has shown promise in overcoming these hurdles. This paper presents a thorough examination of the existing literature, focusing on UTIF analysis. The scope of our work encompasses four key areas: a foundational introduction to deep learning principles, an exploration of motion tracking methodologies, a discussion of feature extraction techniques, and an examination of cross-modality mapping. The paper concludes with a detailed discussion of insights gleaned from the comprehensive literature review, outlining potential trends and challenges that lie ahead in the field.
Collapse
Affiliation(s)
- Zhen Xia
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Ruicheng Yuan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Yuan Cao
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Tao Sun
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Yunsheng Xiong
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| | - Kele Xu
- National Key Lab of Parallel and Distributed Processing, National University of Defense Technology, Changsha, Hunan, China
| |
Collapse
|
2
|
Karakuzu A, Boudreau M, Stikov N. Reproducible Research Practices in Magnetic Resonance Neuroimaging: A Review Informed by Advanced Language Models. Magn Reson Med Sci 2024; 23:252-267. [PMID: 38897936 PMCID: PMC11234949 DOI: 10.2463/mrms.rev.2023-0174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024] Open
Abstract
MRI has progressed significantly with the introduction of advanced computational methods and novel imaging techniques, but their wider adoption hinges on their reproducibility. This concise review synthesizes reproducible research insights from recent MRI articles to examine the current state of reproducibility in neuroimaging, highlighting key trends and challenges. It also provides a custom generative pretrained transformer (GPT) model, designed specifically for aiding in an automated analysis and synthesis of information pertaining to the reproducibility insights associated with the articles at the core of this review.
Collapse
Affiliation(s)
- Agah Karakuzu
- NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montréal, Montréal, Quebec, Canada
- Montréal Heart Institute, Montréal, Quebec, Canada
| | - Mathieu Boudreau
- NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montréal, Montréal, Quebec, Canada
| | - Nikola Stikov
- NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montréal, Montréal, Quebec, Canada
- Montréal Heart Institute, Montréal, Quebec, Canada
- Center for Advanced Interdisciplinary Research, Ss. Cyril and Methodius University, Skopje, North Macedonia
| |
Collapse
|
3
|
Tian Y, Nayak KS. Real-time water/fat imaging at 0.55T with spiral out-in-out-in sampling. Magn Reson Med 2024; 91:649-659. [PMID: 37815020 PMCID: PMC10841523 DOI: 10.1002/mrm.29885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/23/2023] [Accepted: 09/21/2023] [Indexed: 10/11/2023]
Abstract
PURPOSE To develop an efficient and flexible water/fat separated real-time MRI (RT-MRI) method using spiral out-in-out-in (OIOI) sampling and balanced SSFP (bSSFP) at 0.55T. METHODS A bSSFP sequence with golden-angle spiral OIOI readout was developed, capturing three echoes to allow water/fat separation. A low-latency reconstruction that combines all echoes was available for online visualization. An offline reconstruction provided water and fat RT-MRI in two steps: (1) image reconstruction with spatiotemporally constrained reconstruction (STCR) and (2) water/fat separation with hierarchical iterative decomposition of water and fat with echo asymmetry and least-squares estimation (HIDEAL). In healthy volunteers, spiral OIOI was acquired in the wrist during a radial-to-ulnar deviation maneuver, in the heart without breath-hold and cardiac gating, and in the lower abdomen during free-breathing for visualizing small bowel motility. RESULTS We demonstrate successful water/fat separated RT-MRI for all tested applications. In the wrist, resulting images provided clear depiction of ligament gaps and their interactions during the radial-to-ulnar deviation maneuver. In the heart, water/fat RT-MRI depicted epicardial fat, provided improved delineation of epicardial coronary arteries, and provided high blood-myocardial contrast for ventricular function assessment. In the abdomen, water-only RT-MRI captured small bowel mobility clearly with improved water-fat contrast. CONCLUSIONS We have demonstrated a novel and flexible bSSFP spiral OIOI sequence at 0.55T that can provide water/fat separated RT-MRI with a variety of application-specific temporal resolution and spatial resolution requirements.
Collapse
Affiliation(s)
- Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Krishna S. Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
- Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
4
|
Ribeiro V, Isaieva K, Leclere J, Felblinger J, Vuissoz PA, Laprie Y. Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107907. [PMID: 37976615 DOI: 10.1016/j.cmpb.2023.107907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 10/26/2023] [Accepted: 10/29/2023] [Indexed: 11/19/2023]
Abstract
BACKGROUND AND OBJECTIVES The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and non-invasive tool for this purpose. In most cases, it is necessary to know the contours of the individual articulators from the glottis to the lips. Several techniques have been proposed for segmenting vocal tract articulators, but most are limited to specific applications. Moreover, they often do not provide individualized contours for all soft-tissue articulators in a multi-speaker configuration. METHODS A Mask R-CNN network was trained to detect and segment the vocal tract articulator contours in two real-time MRI (RT-MRI) datasets with speech recordings of multiple speakers. Two post-processing algorithms were then proposed to convert the network's outputs into geometrical curves. Nine articulators were considered: the two lips, tongue, soft palate, pharynx, arytenoid cartilage, epiglottis, thyroid cartilage, and vocal folds. A leave-one-out cross-validation protocol was used to evaluate inter-speaker generalization. The evaluation metrics were the point-to-closest-point distance and the Jaccard index (for articulators annotated as closed contours). RESULTS The proposed method accurately segmented the vocal tract articulators, with an average root mean square point-to-closest-point distance of less than 2.2mm for all the articulators in the leave-one-out cross-validation setting. The minimum P2CPRMS was 0.91mm for the upper lip, and the maximum was 2.18mm for the tongue. The Jaccard indices for the thyroid cartilage and vocal folds were 0.60 and 0.61, respectively. Additionally, the method adapted to a new subject with only ten annotated samples. CONCLUSIONS Our research introduced a method for individually segmenting nine non-rigid vocal tract articulators in real-time MRI movies. The software is openly available as an installable package to the speech community. It is designed to develop speech applications and clinical and non-clinical research in fields that require vocal tract geometry, such as speech, singing, and human beatboxing.
Collapse
Affiliation(s)
- Vinicius Ribeiro
- Universite de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France.
| | - Karyna Isaieva
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France
| | - Justine Leclere
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France; Service de Medecine Bucco-dentaire, Hopital Maison Blanche, Reims, F-51100, France
| | - Jacques Felblinger
- Universite de Lorraine, INSERM, U1254, IADI, Nancy, F-54000, France; CIC-IT 1433, INSERM, CHRU, Nancy, F-54000, France
| | | | - Yves Laprie
- Universite de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| |
Collapse
|
5
|
Isaieva K, Odille F, Laprie Y, Drouot G, Felblinger J, Vuissoz PA. Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech. J Imaging 2023; 9:233. [PMID: 37888339 PMCID: PMC10607793 DOI: 10.3390/jimaging9100233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| | - Freddy Odille
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Yves Laprie
- LORIA, Université de Lorraine, CNRS, INRIA, F-54000 Nancy, France
| | - Guillaume Drouot
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Jacques Felblinger
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Pierre-André Vuissoz
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| |
Collapse
|
6
|
Guo K, Xiao Y, Deng W, Zhao G, Zhang J, Liang Y, Yang L, Liao G. Speech disorders in patients with Tongue squamous cell carcinoma: A longitudinal observational study based on a questionnaire and acoustic analysis. BMC Oral Health 2023; 23:192. [PMID: 37005608 PMCID: PMC10068158 DOI: 10.1186/s12903-023-02888-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 03/15/2023] [Indexed: 04/04/2023] Open
Abstract
BACKGROUND Speech disorders are common dysfunctions in patients with tongue squamous cell carcinoma (TSCC) that can diminish their quality of life. There are few studies with multidimensional and longitudinal assessments of speech function in TSCC patients. METHODS This longitudinal observational study was conducted at the Hospital of Stomatology, Sun Yat-sen University, China, from January 2018 to March 2021. A cohort of 92 patients (53 males, age range: 24-77 years) diagnosed with TSCC participated in this study. Speech function was assessed from preoperatively to one year postoperatively using the Speech Handicap Index questionnaire and acoustic parameters. The risk factors for postoperative speech disorder were analyzed by a linear mixed-effects model. A t test or Mann‒Whitney U test was applied to analyze the differences in acoustic parameters under the influence of risk factors to determine the pathophysiological mechanisms of speech disorders in patients with TSCC. RESULTS The incidence of preoperative speech disorders was 58.7%, which increased up to 91.4% after surgery. Higher T stage (P<0.001) and larger range of tongue resection (P = 0.002) were risk factors for postoperative speech disorders. Among the acoustic parameters, F2/i/decreased remarkably with higher T stage (P = 0.021) and larger range of tongue resection (P = 0.009), indicating restricted tongue movement in the anterior-posterior direction. The acoustic parameters analysis during the follow-up period showed that F1 and F2 were not significantly different of the patients with subtotal or total glossectomy over time. CONCLUSIONS Speech disorders in TSCC patients is common and persistent. Less residual tongue volume led to worse speech-related QoL, indicating that surgically restoring the length of the tongue and strengthening tongue extension postoperatively may be important.
Collapse
Affiliation(s)
- Kaixin Guo
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Yudong Xiao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Wei Deng
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Guiyi Zhao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Jie Zhang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Yujie Liang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China
| | - Le Yang
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China.
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China.
| | - Guiqing Liao
- Department of Oral and Maxillofacial Surgery, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, 56th Lingyuanxi Road, Guangzhou, Guangdong, 510055, China.
- Guangdong Provincial Key Laboratory of Stomatology, No.74, 2nd Zhongshan Road, Guangzhou, Guangdong, 510080, China.
| |
Collapse
|
7
|
Feng L. 4D Golden-Angle Radial MRI at Subsecond Temporal Resolution. NMR IN BIOMEDICINE 2023; 36:e4844. [PMID: 36259951 PMCID: PMC9845193 DOI: 10.1002/nbm.4844] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/29/2022] [Accepted: 10/13/2022] [Indexed: 05/14/2023]
Abstract
Intraframe motion blurring, as a major challenge in free-breathing dynamic MRI, can be reduced if high temporal resolution can be achieved. To address this challenge, this work proposes a highly accelerated 4D (3D + time) dynamic MRI framework with subsecond temporal resolution that does not require explicit motion compensation. The method combines standard stack-of-stars golden-angle radial sampling and tailored GRASP-Pro (Golden-angle RAdial Sparse Parallel imaging with imProved performance) reconstruction. Specifically, 4D dynamic MRI acquisition is performed continuously without motion gating or sorting. The k-space centers in stack-of-stars radial data are organized to guide estimation of a temporal basis, with which GRASP-Pro reconstruction is employed to enforce joint low-rank subspace and sparsity constraints. This new basis estimation strategy is the new feature proposed for subspace-based reconstruction in this work to achieve high temporal resolution (e.g., subsecond/3D volume). It does not require sequence modification to acquire additional navigation data, it is compatible with commercially available stack-of-stars sequences, and it does not need an intermediate reconstruction step. The proposed 4D dynamic MRI approach was tested in abdominal motion phantom, free-breathing abdominal MRI, and dynamic contrast-enhanced MRI (DCE-MRI). Our results have shown that GRASP-Pro reconstruction with the new basis estimation strategy enables highly-accelerated 4D dynamic imaging at subsecond temporal resolution (with five spokes or less for each dynamic frame per image slice) for both free-breathing non-DCE-MRI and DCE-MRI. In the abdominal phantom, better image quality with lower root mean square error and higher structural similarity index was achieved using GRASP-Pro compared with standard GRASP. With the ability to acquire each 3D image in less than 1 s, intraframe respiratory blurring can be intrinsically reduced for body applications with our approach, which eliminates the need for explicit motion detection and motion compensation.
Collapse
Affiliation(s)
- Li Feng
- Biomedical Engineering and Imaging Institute and Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
8
|
Herbst CT, Emerich K, Mayr MA, Rudisch A, Kremser C, Talasz H, Kofler M. Time-Synchronized MRI-Assessment of Respiratory Apparatus Subsystems-A Feasibility Study. J Voice 2023:S0892-1997(22)00358-7. [PMID: 36642590 DOI: 10.1016/j.jvoice.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/02/2022] [Accepted: 11/02/2022] [Indexed: 01/15/2023]
Abstract
The thorax (TH), the thoracic diaphragm (TD), and the abdominal wall (AW) are three sub-systems of the respiratory apparatus whose displacement motion has been well studied with the use of magnetic resonance imaging (MRI). Another sub-system, which has however received less research attention with respect to breathing, is the pelvic floor (PF). In particular, there is no study that has investigated the displacement of all four sub-systems simultaneously. Addressing this issue, it was the purpose of this feasibility study to establish a data acquisition paradigm for time-synchronous quantitative analysis of dynamic MRI data from these four major contributors to respiration and phonation (TH, TD, AW, and PF). Three healthy females were asked to breathe in and out forcefully while being recorded in a 1.5-Tesla whole body MR-scanner. Spanning a sequence of 15.12 seconds, 40 MRI data frames were acquired. Each data frame contained two slices, simultaneously documenting the mid-sagittal (TH, TD, PF) and transversal (AW) planes. The displacement motion of the four anatomical structures of interest was documented using kymographic analysis, resulting in time-varying calibrated structure displacement data. After computing the fundamental frequency of the cyclical breathing motion, the phase offsets of the TH, PF, and AW with respect to the TD were computed. Data analysis revealed three fundamentally different displacement patterns. Total structure displacement was in the range of 0.94 cm (TH) to 4.27 cm (TD). Phase delays of up to 90∘ (i.e., a quarter of a breathing cycle) between different structures were found. Motion offsets in the range of -28.30∘ to 14.90∘ were computed for the PF with respect to the TD. The diversity of results in only three investigated participants suggests a variety of possible breathing strategies, warranting further research.
Collapse
Affiliation(s)
- Christian T Herbst
- Department of Vocal Studies, Mozarteum University, Salzburg, Austria; Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, VA, USA.
| | - Kate Emerich
- University of Denver, Lamont School of Music, Newman Center for the Performing Arts, Denver, CO, USA; Vocal Essentials, LLC., Denver, CO, USA
| | - Michaela A Mayr
- Antonio Salieri Department of Vocal Studies and Vocal Research in Music Education, University of Music and Performing Arts, Vienna, Austria
| | - Ansgar Rudisch
- Department of Radiology, Medical University of Innsbruck, Austria
| | | | - Helena Talasz
- Department of Internal Medicine, Hochzirl Hospital, Zirl, Austria
| | - Markus Kofler
- Department of Neurology, Hochzirl Hospital, Zirl, Austria
| |
Collapse
|
9
|
Al-hammuri K, Gebali F, Thirumarai Chelvan I, Kanan A. Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review. Diagnostics (Basel) 2022; 12:diagnostics12112811. [PMID: 36428870 PMCID: PMC9689563 DOI: 10.3390/diagnostics12112811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/07/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022] Open
Abstract
Lingual ultrasound imaging is essential in linguistic research and speech recognition. It has been used widely in different applications as visual feedback to enhance language learning for non-native speakers, study speech-related disorders and remediation, articulation research and analysis, swallowing study, tongue 3D modelling, and silent speech interface. This article provides a comparative analysis and review based on quantitative and qualitative criteria of the two main streams of tongue contour segmentation from ultrasound images. The first stream utilizes traditional computer vision and image processing algorithms for tongue segmentation. The second stream uses machine and deep learning algorithms for tongue segmentation. The results show that tongue tracking using machine learning-based techniques is superior to traditional techniques, considering the performance and algorithm generalization ability. Meanwhile, traditional techniques are helpful for implementing interactive image segmentation to extract valuable features during training and postprocessing. We recommend using a hybrid approach to combine machine learning and traditional techniques to implement a real-time tongue segmentation tool.
Collapse
Affiliation(s)
- Khalid Al-hammuri
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
- Correspondence:
| | - Fayez Gebali
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC V8W 2Y2, Canada
| | | | - Awos Kanan
- Department of Computer Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan
| |
Collapse
|
10
|
3D Dynamic Spatiotemporal Atlas of the Vocal Tract during Consonant–Vowel Production from 2D Real Time MRI. J Imaging 2022; 8:jimaging8090227. [PMID: 36135393 PMCID: PMC9504642 DOI: 10.3390/jimaging8090227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/12/2022] [Accepted: 08/19/2022] [Indexed: 11/21/2022] Open
Abstract
In this work, we address the problem of creating a 3D dynamic atlas of the vocal tract that captures the dynamics of the articulators in all three dimensions in order to create a global speaker model independent of speaker-specific characteristics. The core steps of the proposed method are the temporal alignment of the real-time MR images acquired in several sagittal planes and their combination with adaptive kernel regression. As a preprocessing step, a reference space was created to be used in order to remove anatomical information of the speakers and keep only the variability in speech production for the construction of the atlas. The adaptive kernel regression makes the choice of atlas time points independently of the time points of the frames that are used as an input for the construction. The evaluation of this atlas construction method was made by mapping two new speakers to the atlas and by checking how similar the resulting mapped images are. The use of the atlas helps in reducing subject variability. The results show that the use of the proposed atlas can capture the dynamic behavior of the articulators and is able to generalize the speech production process by creating a universal-speaker reference space.
Collapse
|
11
|
Isaieva K, Laprie Y, Leclère J, Douros IK, Felblinger J, Vuissoz PA. Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers. Sci Data 2021; 8:258. [PMID: 34599194 PMCID: PMC8486854 DOI: 10.1038/s41597-021-01041-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 08/25/2021] [Indexed: 12/28/2022] Open
Abstract
The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus consisting of synthetic sentences was used to ensure a good coverage of the French phonetic context. A real-time MRI technology with temporal resolution of 20 ms was used to acquire vocal tract images of the participants speaking. The sound was recorded simultaneously with MRI, denoised and temporally aligned with the images. The speech was transcribed to obtain phoneme-wise segmentation of sound. We also acquired static 3D MR images for a wide list of French phonemes. In addition, we include annotations of spontaneous swallowing. Measurement(s) | Vocal tract images • Speech | Technology Type(s) | Magnetic Resonance Imaging • Microphone Device | Sample Characteristic - Organism | Homo sapiens |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16404453
Collapse
Affiliation(s)
- Karyna Isaieva
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.
| | - Yves Laprie
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Justine Leclère
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Oral Medicine Department, University Hospital of Reims, 45 rue Cognacq-Jay, 51092, Reims, Cedex, France
| | - Ioannis K Douros
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Jacques Felblinger
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,CIC-IT, INSERM, CHRU de Nancy, Nancy, F-54000, France
| | | |
Collapse
|
12
|
Fischer J, Özen AC, Ilbey S, Traser L, Echternach M, Richter B, Bock M. Sub-millisecond 2D MRI of the vocal fold oscillation using single-point imaging with rapid encoding. MAGNETIC RESONANCE MATERIALS IN PHYSICS BIOLOGY AND MEDICINE 2021; 35:301-310. [PMID: 34542771 PMCID: PMC8995286 DOI: 10.1007/s10334-021-00959-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/06/2021] [Accepted: 09/06/2021] [Indexed: 10/24/2022]
Abstract
OBJECTIVE The slow spatial encoding of MRI has precluded its application to rapid physiologic motion in the past. The purpose of this study is to introduce a new fast acquisition method and to demonstrate feasibility of encoding rapid two-dimensional motion of human vocal folds with sub-millisecond resolution. METHOD In our previous work, we achieved high temporal resolution by applying a rapidly switched phase encoding gradient along the direction of motion. In this work, we extend phase encoding to the second image direction by using single-point imaging with rapid encoding (SPIRE) to image the two-dimensional vocal fold oscillation in the coronal view. Image data were gated using electroglottography (EGG) and motion corrected. An iterative reconstruction with a total variation (TV) constraint was used and the sequence was also simulated using a motion phantom. RESULTS Dynamic images of the vocal folds during phonation at pitches of 150 and 165 Hz were acquired in two volunteers and the periodic motion of the vocal folds at a temporal resolution of about 600 µs was shown. The simulations emphasize the necessity of SPIRE for two-dimensional motion encoding. DISCUSSION SPIRE is a new MRI method to image rapidly oscillating structures and for the first time provides dynamic images of the vocal folds oscillations in the coronal plane.
Collapse
Affiliation(s)
- Johannes Fischer
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Ali Caglar Özen
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.,German Consortium for Translational Cancer Research Partner Site Freiburg, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Serhat Ilbey
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Louisa Traser
- Freiburg Institute for Musicians' Medicine, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Ludwig-Maximilians-University, Munich, Germany
| | - Bernhard Richter
- Freiburg Institute for Musicians' Medicine, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Michael Bock
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
13
|
Tian Y, Lim Y, Zhao Z, Byrd D, Narayanan S, Nayak KS. Aliasing artifact reduction in spiral real-time MRI. Magn Reson Med 2021; 86:916-925. [PMID: 33728700 DOI: 10.1002/mrm.28746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 01/09/2021] [Accepted: 02/02/2021] [Indexed: 12/17/2022]
Abstract
PURPOSE To mitigate a common artifact in spiral real-time MRI, caused by aliasing of signal outside the desired FOV. This artifact frequently occurs in midsagittal speech real-time MRI. METHODS Simulations were performed to determine the likely origin of the artifact. Two methods to mitigate the artifact are proposed. The first approach, denoted as "large FOV" (LF), keeps an FOV that is large enough to include the artifact signal source during reconstruction. The second approach, denoted as "estimation-subtraction" (ES), estimates the artifact signal source before subtracting a synthetic signal representing that source in multicoil k-space raw data. Twenty-five midsagittal speech-production real-time MRI data sets were used to evaluate both of the proposed methods. Reconstructions without and with corrections were evaluated by two expert readers using a 5-level Likert scale assessing artifact severity. Reconstruction time was also compared. RESULTS The origin of the artifact was found to be a combination of gradient nonlinearity and imperfect anti-aliasing in spiral sampling. The LF and ES methods were both able to substantially reduce the artifact, with an averaged qualitative score improvement of 1.25 and 1.35 Likert levels for LF correction and ES correction, respectively. Average reconstruction time without correction, with LF correction, and with ES correction were 160.69 ± 1.56, 526.43 ± 5.17, and 171.47 ± 1.71 ms/frame. CONCLUSION Both proposed methods were able to reduce the spiral aliasing artifacts, with the ES-reduction method being more effective and more time efficient.
Collapse
Affiliation(s)
- Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ziwei Zhao
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Shrikanth Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.,Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|