1
|
Heckel R, Jacob M, Chaudhari A, Perlman O, Shimron E. Deep learning for accelerated and robust MRI reconstruction. MAGMA (NEW YORK, N.Y.) 2024; 37:335-368. [PMID: 39042206 DOI: 10.1007/s10334-024-01173-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/24/2024] [Accepted: 05/28/2024] [Indexed: 07/24/2024]
Abstract
Deep learning (DL) has recently emerged as a pivotal technology for enhancing magnetic resonance imaging (MRI), a critical tool in diagnostic radiology. This review paper provides a comprehensive overview of recent advances in DL for MRI reconstruction, and focuses on various DL approaches and architectures designed to improve image quality, accelerate scans, and address data-related challenges. It explores end-to-end neural networks, pre-trained and generative models, and self-supervised methods, and highlights their contributions to overcoming traditional MRI limitations. It also discusses the role of DL in optimizing acquisition protocols, enhancing robustness against distribution shifts, and tackling biases. Drawing on the extensive literature and practical insights, it outlines current successes, limitations, and future directions for leveraging DL in MRI reconstruction, while emphasizing the potential of DL to significantly impact clinical imaging practices.
Collapse
Affiliation(s)
- Reinhard Heckel
- Department of computer engineering, Technical University of Munich, Munich, Germany
| | - Mathews Jacob
- Department of Electrical and Computer Engineering, University of Iowa, Iowa, 52242, IA, USA
| | - Akshay Chaudhari
- Department of Radiology, Stanford University, Stanford, 94305, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Or Perlman
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Efrat Shimron
- Department of Electrical and Computer Engineering, Technion-Israel Institute of Technology, Haifa, 3200004, Israel.
- Department of Biomedical Engineering, Technion-Israel Institute of Technology, Haifa, 3200004, Israel.
| |
Collapse
|
2
|
Badin P, Sawallis TR, Tabain M, Lamalle L. Bilinguals from Larynx to Lips: Exploring Bilingual Articulatory Strategies with Anatomic MRI Data. LANGUAGE AND SPEECH 2024:238309231224790. [PMID: 38680040 DOI: 10.1177/00238309231224790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
The goal of this article is to illustrate the use of MRI for exploring bi- and multi-lingual articulatory strategies. One male and one female speaker recorded sets of static midsagittal MRIs of the whole vocal tract, producing vowels as well as consonants in various vowel contexts in either the male's two or the female's three languages. Both speakers were native speakers of English (American and Australian English, respectively), and both were fluent L2 speakers of French. In addition, the female speaker was a heritage speaker of Croatian. Articulatory contours extracted from the MRIs were subsequently used at three progressively more compact and abstract levels of analysis. (1) Direct comparison of overlaid contours was used to assess whether phones analogous across L1 and L2 are similar or dissimilar, both overall and in specific vocal tract regions. (2) Consonant contour variability along the vocal tract due to vowel context was determined using dispersion ellipses and used to explore the variable resistance to coarticulation for non-analogous rhotics and analogous laterals in Australian, French, and Croatian. (3) Articulatory modeling was used to focus on specific articulatory gestures (tongue position and shape, lip protrusion, laryngeal height, etc.) and then to explore the articulatory strategies in the speakers' interlanguages for production of the French front rounded vowel series. This revealed that the Australian and American speakers used different strategies to produce the non-analogous French vowel series. We conclude that MRI-based articulatory data constitute a very rich and underused source of information that amply deserves applications to the study of L2 articulation and bilingual and multi-lingual speech.
Collapse
Affiliation(s)
- Pierre Badin
- Institute of Engineering, Université Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, France
| | | | - Marija Tabain
- Department of Languages and Linguistics, La Trobe University, Australia
| | - Laurent Lamalle
- Université Grenoble Alpes and CHU de Grenoble, Inserm US 17, CNRS UMS 3552, UMS IRMaGe, France
| |
Collapse
|
3
|
Shahid MS, French AP, Valstar MF, Yakubov GE. Research in methodologies for modelling the oral cavity. Biomed Phys Eng Express 2024; 10:032001. [PMID: 38350128 DOI: 10.1088/2057-1976/ad28cc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/13/2024] [Indexed: 02/15/2024]
Abstract
The paper aims to explore the current state of understanding surrounding in silico oral modelling. This involves exploring methodologies, technologies and approaches pertaining to the modelling of the whole oral cavity; both internally and externally visible structures that may be relevant or appropriate to oral actions. Such a model could be referred to as a 'complete model' which includes consideration of a full set of facial features (i.e. not only mouth) as well as synergistic stimuli such as audio and facial thermal data. 3D modelling technologies capable of accurately and efficiently capturing a complete representation of the mouth for an individual have broad applications in the study of oral actions, due to their cost-effectiveness and time efficiency. This review delves into the field of clinical phonetics to classify oral actions pertaining to both speech and non-speech movements, identifying how the various vocal organs play a role in the articulatory and masticatory process. Vitaly, it provides a summation of 12 articulatory recording methods, forming a tool to be used by researchers in identifying which method of recording is appropriate for their work. After addressing the cost and resource-intensive limitations of existing methods, a new system of modelling is proposed that leverages external to internal correlation modelling techniques to create a more efficient models of the oral cavity. The vision is that the outcomes will be applicable to a broad spectrum of oral functions related to physiology, health and wellbeing, including speech, oral processing of foods as well as dental health. The applications may span from speech correction, designing foods for the aging population, whilst in the dental field we would be able to gain information about patient's oral actions that would become part of creating a personalised dental treatment plan.
Collapse
Affiliation(s)
| | - Andrew P French
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| | - Michel F Valstar
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
| | - Gleb E Yakubov
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| |
Collapse
|
4
|
Belyk M, Carignan C, McGettigan C. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images. Behav Res Methods 2024; 56:2623-2635. [PMID: 37507650 PMCID: PMC10990993 DOI: 10.3758/s13428-023-02171-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/30/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms.
Collapse
Affiliation(s)
- Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, UK.
| | - Christopher Carignan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
5
|
Lim Y, Kumar P, Nayak KS. Speech production real-time MRI at 0.55 T. Magn Reson Med 2024; 91:337-343. [PMID: 37799039 DOI: 10.1002/mrm.29843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/11/2023] [Accepted: 08/10/2023] [Indexed: 10/07/2023]
Abstract
PURPOSE To demonstrate speech-production real-time MRI (RT-MRI) using a contemporary 0.55T system, and to identify opportunities for improved performance compared with conventional field strengths. METHODS Experiments were performed on healthy adult volunteers using a 0.55T MRI system with high-performance gradients and a custom 8-channel upper airway coil. Imaging was performed using spiral-based balanced SSFP and gradient-recalled echo (GRE) pulse sequences using a temporal finite-difference constrained reconstruction. Speech-production RT-MRI was performed with three spiral readout durations (8.90, 5.58, and 3.48 ms) to determine trade-offs with respect to articulator contrast, blurring, banding artifacts, and overall image quality. RESULTS Both spiral GRE and bSSFP captured tongue boundary dynamics during rapid consonant-vowel syllables. Although bSSFP provided substantially higher SNR in all vocal tract articulators than GRE, it suffered from banding artifacts at TR > 10.9 ms. Spiral bSSFP with the shortest readout duration (3.48 ms, TR = 5.30 ms) had the best image quality, with a 1.54-times boost in SNR compared with an equivalent GRE sequence. Longer readout durations led to increased SNR efficiency and blurring in both bSSFP and GRE. CONCLUSION High-performance 0.55T MRI systems can be used for speech-production RT-MRI. Spiral bSSFP can be used without suffering from banding artifacts in vocal tract articulators, provide better SNR efficiency, and have better image quality than what is typically achieved at 1.5 T or 3 T.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Prakash Kumar
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
6
|
Bell LC, Shimron E. Sharing Data Is Essential for the Future of AI in Medical Imaging. Radiol Artif Intell 2024; 6:e230337. [PMID: 38231036 PMCID: PMC10831510 DOI: 10.1148/ryai.230337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 11/16/2023] [Accepted: 11/20/2023] [Indexed: 01/18/2024]
Abstract
If we want artificial intelligence to succeed in radiology, we must share data and learn how to share data.
Collapse
Affiliation(s)
- Laura C. Bell
- From the Clinical Imaging Group, Genentech, 1 DNA Way, South San
Francisco, CA 94080 (L.C.B.); and Department of Electrical and Computer
Engineering and Department of Biomedical Engineering, Technion-Israel Institute
of Technology, Haifa, Israel (E.S.)
| | - Efrat Shimron
- From the Clinical Imaging Group, Genentech, 1 DNA Way, South San
Francisco, CA 94080 (L.C.B.); and Department of Electrical and Computer
Engineering and Department of Biomedical Engineering, Technion-Israel Institute
of Technology, Haifa, Israel (E.S.)
| |
Collapse
|
7
|
Ruthven M, Peplinski AM, Adams DM, King AP, Miquel ME. Real-time speech MRI datasets with corresponding articulator ground-truth segmentations. Sci Data 2023; 10:860. [PMID: 38042857 PMCID: PMC10693552 DOI: 10.1038/s41597-023-02766-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/20/2023] [Indexed: 12/04/2023] Open
Abstract
The use of real-time magnetic resonance imaging (rt-MRI) of speech is increasing in clinical practice and speech science research. Analysis of such images often requires segmentation of articulators and the vocal tract, and the community is turning to deep-learning-based methods to perform this segmentation. While there are publicly available rt-MRI datasets of speech, these do not include ground-truth (GT) segmentations, a key requirement for the development of deep-learning-based segmentation methods. To begin to address this barrier, this work presents rt-MRI speech datasets of five healthy adult volunteers with corresponding GT segmentations and velopharyngeal closure patterns. The images were acquired using standard clinical MRI scanners, coils and sequences to facilitate acquisition of similar images in other centres. The datasets include manually created GT segmentations of six anatomical features including the tongue, soft palate and vocal tract. In addition, this work makes code and instructions to implement a current state-of-the-art deep-learning-based method to segment rt-MRI speech datasets publicly available, thus providing the community and others with a starting point for developing such methods.
Collapse
Affiliation(s)
- Matthieu Ruthven
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | | | - David M Adams
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK
| | - Andrew P King
- School of Biomedical Engineering & Imaging Sciences, King's College London, King's Health Partners, St Thomas' Hospital, London, SE1 7EH, UK
| | - Marc Eric Miquel
- Clinical Physics, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
- Digital Environment Research Institute (DERI), Empire House, 67-75 New Road, Queen Mary University of London, London, E1 1HH, UK.
- Advanced Cardiovascular Imaging, Barts NIHR BRC, Queen Mary University of London, London, EC1M 6BQ, UK.
| |
Collapse
|
8
|
Zhang Y, Liu J, Yu D, Ding H, Wu Y. Articulation distortion in Mandarin-speaking individuals with complete arch maxillary implant-supported fixed dental prostheses. J Prosthet Dent 2023:S0022-3913(23)00685-6. [PMID: 37978009 DOI: 10.1016/j.prosdent.2023.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 10/07/2023] [Accepted: 10/10/2023] [Indexed: 11/19/2023]
Abstract
STATEMENT OF PROBLEM Implant-supported fixed dental prostheses (IFPs) have been increasingly used to restore edentulous jaws, yet few studies have utilized acoustic analysis for objective evaluation of postrestoration speech outcomes. PURPOSE The purpose of this clinical study was to assess speech articulation in edentulous individuals before and after the provision of IFPs by combining the results of subjective evaluations and objective acoustic analysis parameters. MATERIAL AND METHODS The study included thirty-four individuals who had an edentulous maxilla and had been provided with an IFP for over 6 months, along with 6 dentate controls. Acoustic analysis was conducted, and mean opinion scores (MOS) were rated from recordings. The participants were interviewed about perceived speech changes. Changes in the parameters were evaluated using the paired t test or Wilcoxon signed-rank test (α=.05). A comparison between dentate controls and edentulous individuals (with or without prostheses) was made using an independent t test or Mann-Whitney U test (α=.025). RESULTS Following restoration, the center of gravity (CoG) changes occurred in 11 of 12 consonants in edentulous individuals (P<.05). Prosthesis use allowed the CoG of all affricates and fricatives to appear larger and closer to control values. Before restoration, the CoG of 9 of 12 consonants in edentulous individuals differed from controls (P<.01); after restoration, this reduced to 3 out of 12 (P<.01). MOS improved in 10 of 12 consonants (P<.01), nearing a score of 4. Despite restoration, the CoG of alveolo-palatals [tɕh], [tɕ], and [ɕ] remained different from controls (P<.01). Most participants were satisfied with the improvement, with few reporting discomfort with alveolars [s] and [tsh]. CONCLUSIONS IFPs can enhance speech in edentulous individuals, yet articulation distortions of alveolar and alveolo-palatal consonants persist. The improper palatal shape of IFPs or an abrupt joint between the IFP and atrophic natural bone may contribute to these distortions.
Collapse
Affiliation(s)
- Yun Zhang
- Doctoral student, Department of 2nd Dental Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology; Shanghai Research Institute of Stomatology, Shanghai, PR China
| | - Jie Liu
- Postgraduate student, Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, Shanghai, PR China
| | - Dedong Yu
- Associate Professor, Department of 2nd Dental Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology; Shanghai Research Institute of Stomatology, Shanghai, PR China
| | - Hongwei Ding
- Professor, Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, Shanghai, PR China
| | - Yiqun Wu
- Professor, Department of 2nd Dental Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology, Shanghai, PR China.
| |
Collapse
|
9
|
Isaieva K, Odille F, Laprie Y, Drouot G, Felblinger J, Vuissoz PA. Super-Resolved Dynamic 3D Reconstruction of the Vocal Tract during Natural Speech. J Imaging 2023; 9:233. [PMID: 37888339 PMCID: PMC10607793 DOI: 10.3390/jimaging9100233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
MRI is the gold standard modality for speech imaging. However, it remains relatively slow, which complicates imaging of fast movements. Thus, an MRI of the vocal tract is often performed in 2D. While 3D MRI provides more information, the quality of such images is often insufficient. The goal of this study was to test the applicability of super-resolution algorithms for dynamic vocal tract MRI. In total, 25 sagittal slices of 8 mm with an in-plane resolution of 1.6 × 1.6 mm2 were acquired consecutively using a highly-undersampled radial 2D FLASH sequence. The volunteers were reading a text in French with two different protocols. The slices were aligned using the simultaneously recorded sound. The super-resolution strategy was used to reconstruct 1.6 × 1.6 × 1.6 mm3 isotropic volumes. The resulting images were less sharp than the native 2D images but demonstrated a higher signal-to-noise ratio. It was also shown that the super-resolution allows for eliminating inconsistencies leading to regular transitions between the slices. Additionally, it was demonstrated that using visual stimuli and shorter text fragments improves the inter-slice consistency and the super-resolved image sharpness. Therefore, with a correct speech task choice, the proposed method allows for the reconstruction of high-quality dynamic 3D volumes of the vocal tract during natural speech.
Collapse
Affiliation(s)
- Karyna Isaieva
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| | - Freddy Odille
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Yves Laprie
- LORIA, Université de Lorraine, CNRS, INRIA, F-54000 Nancy, France
| | - Guillaume Drouot
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Jacques Felblinger
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
- CIC-IT 1433, CHRU de Nancy, INSERM, Université de Lorraine, F-54000 Nancy, France
| | - Pierre-André Vuissoz
- IADI, Université de Lorraine, U1254 INSERM, F-54000 Nancy, France; (F.O.); (P.-A.V.)
| |
Collapse
|
10
|
Erattakulangara S, Kelat K, Meyer D, Priya S, Lingala SG. Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model. Bioengineering (Basel) 2023; 10:bioengineering10050623. [PMID: 37237693 DOI: 10.3390/bioengineering10050623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/11/2023] [Accepted: 05/19/2023] [Indexed: 05/28/2023] Open
Abstract
Dynamic magnetic resonance imaging has emerged as a powerful modality for investigating upper-airway function during speech production. Analyzing the changes in the vocal tract airspace, including the position of soft-tissue articulators (e.g., the tongue and velum), enhances our understanding of speech production. The advent of various fast speech MRI protocols based on sparse sampling and constrained reconstruction has led to the creation of dynamic speech MRI datasets on the order of 80-100 image frames/second. In this paper, we propose a stacked transfer learning U-NET model to segment the deforming vocal tract in 2D mid-sagittal slices of dynamic speech MRI. Our approach leverages (a) low- and mid-level features and (b) high-level features. The low- and mid-level features are derived from models pre-trained on labeled open-source brain tumor MR and lung CT datasets, and an in-house airway labeled dataset. The high-level features are derived from labeled protocol-specific MR images. The applicability of our approach to segmenting dynamic datasets is demonstrated in data acquired from three fast speech MRI protocols: Protocol 1: 3 T-based radial acquisition scheme coupled with a non-linear temporal regularizer, where speakers were producing French speech tokens; Protocol 2: 1.5 T-based uniform density spiral acquisition scheme coupled with a temporal finite difference (FD) sparsity regularization, where speakers were producing fluent speech tokens in English, and Protocol 3: 3 T-based variable density spiral acquisition scheme coupled with manifold regularization, where speakers were producing various speech tokens from the International Phonetic Alphabetic (IPA). Segments from our approach were compared to those from an expert human user (a vocologist), and the conventional U-NET model without transfer learning. Segmentations from a second expert human user (a radiologist) were used as ground truth. Evaluations were performed using the quantitative DICE similarity metric, the Hausdorff distance metric, and segmentation count metric. This approach was successfully adapted to different speech MRI protocols with only a handful of protocol-specific images (e.g., of the order of 20 images), and provided accurate segmentations similar to those of an expert human.
Collapse
Affiliation(s)
- Subin Erattakulangara
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Karthika Kelat
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah University, Winchester, VA 22601, USA
| | - Sarv Priya
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| | - Sajan Goud Lingala
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
- Department of Radiology, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
11
|
Lyu M, Mei L, Huang S, Liu S, Li Y, Yang K, Liu Y, Dong Y, Dong L, Wu EX. M4Raw: A multi-contrast, multi-repetition, multi-channel MRI k-space dataset for low-field MRI research. Sci Data 2023; 10:264. [PMID: 37164976 PMCID: PMC10172399 DOI: 10.1038/s41597-023-02181-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 04/25/2023] [Indexed: 05/12/2023] Open
Abstract
Recently, low-field magnetic resonance imaging (MRI) has gained renewed interest to promote MRI accessibility and affordability worldwide. The presented M4Raw dataset aims to facilitate methodology development and reproducible research in this field. The dataset comprises multi-channel brain k-space data collected from 183 healthy volunteers using a 0.3 Tesla whole-body MRI system, and includes T1-weighted, T2-weighted, and fluid attenuated inversion recovery (FLAIR) images with in-plane resolution of ~1.2 mm and through-plane resolution of 5 mm. Importantly, each contrast contains multiple repetitions, which can be used individually or to form multi-repetition averaged images. After excluding motion-corrupted data, the partitioned training and validation subsets contain 1024 and 240 volumes, respectively. To demonstrate the potential utility of this dataset, we trained deep learning models for image denoising and parallel imaging tasks and compared their performance with traditional reconstruction methods. This M4Raw dataset will be valuable for the development of advanced data-driven methods specifically for low-field MRI. It can also serve as a benchmark dataset for general MRI reconstruction algorithms.
Collapse
Affiliation(s)
- Mengye Lyu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China.
| | - Lifeng Mei
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Shoujin Huang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Sixing Liu
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Yi Li
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Kexin Yang
- College of Health Science and Environmental Engineering, Shenzhen Technology University, Shenzhen, China
| | - Yilong Liu
- Guangdong-Hongkong-Macau Institute of CNS Regeneration, Key Laboratory of CNS Regeneration (Ministry of Education), Jinan University, Guangzhou, China
| | - Yu Dong
- Department of Neurosurgery, Shenzhen Samii Medical Center, Shenzhen, China
| | - Linzheng Dong
- Department of Neurosurgery, Shenzhen Samii Medical Center, Shenzhen, China
| | - Ed X Wu
- Laboratory of Biomedical Imaging and Signal Processing, The University of Hong Kong, Hong Kong, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
12
|
Deveshwar N, Rajagopal A, Sahin S, Shimron E, Larson PEZ. Synthesizing Complex-Valued Multicoil MRI Data from Magnitude-Only Images. Bioengineering (Basel) 2023; 10:358. [PMID: 36978749 PMCID: PMC10045391 DOI: 10.3390/bioengineering10030358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/08/2023] [Accepted: 03/12/2023] [Indexed: 03/18/2023] Open
Abstract
Despite the proliferation of deep learning techniques for accelerated MRI acquisition and enhanced image reconstruction, the construction of large and diverse MRI datasets continues to pose a barrier to effective clinical translation of these technologies. One major challenge is in collecting the MRI raw data (required for image reconstruction) from clinical scanning, as only magnitude images are typically saved and used for clinical assessment and diagnosis. The image phase and multi-channel RF coil information are not retained when magnitude-only images are saved in clinical imaging archives. Additionally, preprocessing used for data in clinical imaging can lead to biased results. While several groups have begun concerted efforts to collect large amounts of MRI raw data, current databases are limited in the diversity of anatomy, pathology, annotations, and acquisition types they contain. To address this, we present a method for synthesizing realistic MR data from magnitude-only data, allowing for the use of diverse data from clinical imaging archives in advanced MRI reconstruction development. Our method uses a conditional GAN-based framework to generate synthetic phase images from input magnitude images. We then applied ESPIRiT to derive RF coil sensitivity maps from fully sampled real data to generate multi-coil data. The synthetic data generation method was evaluated by comparing image reconstruction results from training Variational Networks either with real data or synthetic data. We demonstrate that the Variational Network trained on synthetic MRI data from our method, consisting of GAN-derived synthetic phase and multi-coil information, outperformed Variational Networks trained on data with synthetic phase generated using current state-of-the-art methods. Additionally, we demonstrate that the Variational Networks trained with synthetic k-space data from our method perform comparably to image reconstruction networks trained on undersampled real k-space data.
Collapse
Affiliation(s)
- Nikhil Deveshwar
- UC Berkeley-UCSF Graduate Program in Bioengineering, Berkeley, CA 94701, USA
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA 94016, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94701, USA
| | - Abhejit Rajagopal
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA 94016, USA
| | - Sule Sahin
- UC Berkeley-UCSF Graduate Program in Bioengineering, Berkeley, CA 94701, USA
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA 94016, USA
| | - Efrat Shimron
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94701, USA
| | - Peder E. Z. Larson
- UC Berkeley-UCSF Graduate Program in Bioengineering, Berkeley, CA 94701, USA
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA 94016, USA
| |
Collapse
|
13
|
Shimron E, Tamir JI, Wang K, Lustig M. Implicit data crimes: Machine learning bias arising from misuse of public data. Proc Natl Acad Sci U S A 2022; 119:e2117203119. [PMID: 35312366 PMCID: PMC9060447 DOI: 10.1073/pnas.2117203119] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 02/01/2022] [Indexed: 02/01/2023] Open
Abstract
SignificancePublic databases are an important resource for machine learning research, but their growing availability sometimes leads to "off-label" usage, where data published for one task are used for another. This work reveals that such off-label usage could lead to biased, overly optimistic results of machine-learning algorithms. The underlying cause is that public data are processed with hidden processing pipelines that alter the data features. Here we study three well-known algorithms developed for image reconstruction from magnetic resonance imaging measurements and show they could produce biased results with up to 48% artificial improvement when applied to public databases. We relate to the publication of such results as implicit "data crimes" to raise community awareness of this growing big data problem.
Collapse
Affiliation(s)
- Efrat Shimron
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| | - Jonathan I. Tamir
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712
- Department of Diagnostic Medicine, Dell Medical School, The University of Texas at Austin, Austin, TX 78712
- Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX 78712
| | - Ke Wang
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| | - Michael Lustig
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| |
Collapse
|
14
|
Wrench A, Balch-Tomes J. Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut. SENSORS 2022; 22:s22031133. [PMID: 35161879 PMCID: PMC8838804 DOI: 10.3390/s22031133] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 01/18/2023]
Abstract
Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56 mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.
Collapse
Affiliation(s)
- Alan Wrench
- Clinical Audiology, Speech and Language Research Centre, Queen Margaret University, Musselburgh EH21 6UU, UK
- Articulate Instruments Ltd., Musselburgh EH21 6UU, UK;
- Correspondence: ; Tel.: +44-131-474-0000
| | | |
Collapse
|
15
|
Isaieva K, Laprie Y, Leclère J, Douros IK, Felblinger J, Vuissoz PA. Multimodal dataset of real-time 2D and static 3D MRI of healthy French speakers. Sci Data 2021; 8:258. [PMID: 34599194 PMCID: PMC8486854 DOI: 10.1038/s41597-021-01041-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 08/25/2021] [Indexed: 12/28/2022] Open
Abstract
The study of articulatory gestures has a wide spectrum of applications, notably in speech production and recognition. Sets of phonemes, as well as their articulation, are language-specific; however, existing MRI databases mostly include English speakers. In our present work, we introduce a dataset acquired with MRI from 10 healthy native French speakers. A corpus consisting of synthetic sentences was used to ensure a good coverage of the French phonetic context. A real-time MRI technology with temporal resolution of 20 ms was used to acquire vocal tract images of the participants speaking. The sound was recorded simultaneously with MRI, denoised and temporally aligned with the images. The speech was transcribed to obtain phoneme-wise segmentation of sound. We also acquired static 3D MR images for a wide list of French phonemes. In addition, we include annotations of spontaneous swallowing. Measurement(s) | Vocal tract images • Speech | Technology Type(s) | Magnetic Resonance Imaging • Microphone Device | Sample Characteristic - Organism | Homo sapiens |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16404453
Collapse
Affiliation(s)
- Karyna Isaieva
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.
| | - Yves Laprie
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Justine Leclère
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Oral Medicine Department, University Hospital of Reims, 45 rue Cognacq-Jay, 51092, Reims, Cedex, France
| | - Ioannis K Douros
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54000, France
| | - Jacques Felblinger
- Université de Lorraine, INSERM, IADI, Nancy, F-54000, France.,CIC-IT, INSERM, CHRU de Nancy, Nancy, F-54000, France
| | | |
Collapse
|