1
|
Carl M, Rudyk E, Shapira Y, Rusiewicz HL, Icht M. Accuracy of Speech Sound Analysis: Comparison of an Automatic Artificial Intelligence Algorithm With Clinician Assessment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:3004-3021. [PMID: 39173066 DOI: 10.1044/2024_jslhr-24-00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
PURPOSE Automatic speech analysis (ASA) and automatic speech recognition systems are increasingly being used in the treatment of speech sound disorders (SSDs). When utilized as a home practice tool or in the absence of the clinician, the ASA system has the potential to facilitate treatment gains. However, the feedback accuracy of such systems varies, a factor that may impact these gains. The current research analyzes the feedback accuracy of a novel ASA algorithm (Amplio Learning Technologies), in comparison to clinician judgments. METHOD A total of 3,584 consonant stimuli, produced by 395 American English-speaking children and adolescents with SSDs (age range: 4-18 years), were analyzed with respect to automatic classification of the ASA algorithm, clinician-ASA agreement, and interclinician agreement. Further analysis of results as related to phoneme acquisition categories (early-, middle-, and late-acquired phonemes) was conducted. RESULTS Agreement between clinicians and ASA classification for sounds produced accurately was above 80% for all phonemes, with some variation based on phoneme acquisition category (early, middle, late). This variation was also noted for ASA classification into "acceptable," "unacceptable," and "unknown" (which means no determination of phoneme accuracy) categories, as well as interclinician agreement. Clinician-ASA agreement was reduced for misarticulated sounds. CONCLUSIONS The initial findings of Amplio's novel algorithm are promising for its potential use within the context of home practice, as it demonstrates high feedback accuracy for correctly produced sounds. Furthermore, complexity of sound influences consistency of perception, both by clinicians and by automated platforms, indicating variable performance of the ASA algorithm across phonemes. Taken together, the ASA algorithm may be effective in facilitating speech sound practice for children with SSDs, even in the absence of the clinician.
Collapse
Affiliation(s)
- Micalle Carl
- Department of Communication Disorders, Ariel University, Israel
| | | | | | | | - Michal Icht
- Department of Communication Disorders, Ariel University, Israel
| |
Collapse
|
2
|
Lim Y, Kumar P, Nayak KS. Speech production real-time MRI at 0.55 T. Magn Reson Med 2024; 91:337-343. [PMID: 37799039 DOI: 10.1002/mrm.29843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/11/2023] [Accepted: 08/10/2023] [Indexed: 10/07/2023]
Abstract
PURPOSE To demonstrate speech-production real-time MRI (RT-MRI) using a contemporary 0.55T system, and to identify opportunities for improved performance compared with conventional field strengths. METHODS Experiments were performed on healthy adult volunteers using a 0.55T MRI system with high-performance gradients and a custom 8-channel upper airway coil. Imaging was performed using spiral-based balanced SSFP and gradient-recalled echo (GRE) pulse sequences using a temporal finite-difference constrained reconstruction. Speech-production RT-MRI was performed with three spiral readout durations (8.90, 5.58, and 3.48 ms) to determine trade-offs with respect to articulator contrast, blurring, banding artifacts, and overall image quality. RESULTS Both spiral GRE and bSSFP captured tongue boundary dynamics during rapid consonant-vowel syllables. Although bSSFP provided substantially higher SNR in all vocal tract articulators than GRE, it suffered from banding artifacts at TR > 10.9 ms. Spiral bSSFP with the shortest readout duration (3.48 ms, TR = 5.30 ms) had the best image quality, with a 1.54-times boost in SNR compared with an equivalent GRE sequence. Longer readout durations led to increased SNR efficiency and blurring in both bSSFP and GRE. CONCLUSION High-performance 0.55T MRI systems can be used for speech-production RT-MRI. Spiral bSSFP can be used without suffering from banding artifacts in vocal tract articulators, provide better SNR efficiency, and have better image quality than what is typically achieved at 1.5 T or 3 T.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Prakash Kumar
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
3
|
Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, Vaz C, Sorensen T, Oh M, Harper S, Chen W, Lee Y, Töger J, Monteserin ML, Smith C, Godinez B, Goldstein L, Byrd D, Nayak KS, Narayanan SS. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. Sci Data 2021; 8:187. [PMID: 34285240 PMCID: PMC8292336 DOI: 10.1038/s41597-021-00976-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 06/22/2021] [Indexed: 12/11/2022] Open
Abstract
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.
Collapse
Affiliation(s)
- Yongwan Lim
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Asterios Toutios
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yannick Bliesener
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Ye Tian
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Sajan Goud Lingala
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Colin Vaz
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Tanner Sorensen
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Miran Oh
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Sarah Harper
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Weiyi Chen
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Yoonjeong Lee
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Johannes Töger
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Mairym Lloréns Monteserin
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Caitlin Smith
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Bianca Godinez
- Department of Linguistics, California State University Long Beach, Long Beach, California, USA
| | - Louis Goldstein
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Dani Byrd
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA
| | - Krishna S Nayak
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Shrikanth S Narayanan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
- Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
| |
Collapse
|
4
|
Bakst S. Palate shape influence depends on the segment: Articulatory and acoustic variability in American English /ɹ/ and /s/. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:960. [PMID: 33639819 DOI: 10.1121/10.0003379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
This ultrasound and acoustics study of American English /ɹ/ and /s/ investigates whether variability in production as measured in the midsagittal plane is related to individual differences in the shape of the hard palate in the coronal plane. Both token-to-token variability and variability between different phonetic contexts were investigated. While no direct relationship was found between palate flatness and articulatory variability, a secondary analysis revealed that speakers' articulatory variability for one segment was related to their variability in the other. Speakers with flatter palates tended towards lower articulatory variability scores, but speakers with more domed palates showed both high and low variability scores.
Collapse
Affiliation(s)
- Sarah Bakst
- Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, Wisconsin 53703, USA
| |
Collapse
|