1
|
Ali ASM, Masaki K, Hattori M, Sumita YI, Wakabayashi N. Maxillectomy patients' speech and performance of contemporary speaker-independent automatic speech recognition platforms in Japanese. J Oral Rehabil 2024; 51:2361-2367. [PMID: 39135293 DOI: 10.1111/joor.13832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 07/28/2024] [Accepted: 08/01/2024] [Indexed: 10/16/2024]
Abstract
BACKGROUND Automatic speech recognition (ASR) can potentially help older adults and people with disabilities reduce their dependence on others and increase their participation in society. However, maxillectomy patients with reduced speech intelligibility may encounter some problems using such technologies. OBJECTIVES To investigate the accuracy of three commonly used ASR platforms when used by Japanese maxillectomy patients with and without their obturator placed. METHODS Speech samples were obtained from 29 maxillectomy patients with and without their obturator and 17 healthy volunteers. The samples were input into three speaker-independent speech recognition platforms and the transcribed text was compared with the original text to calculate the syllable error rate (SER). All participants also completed a conventional speech intelligibility test to grade their speech using Taguchi's method. A comprehensive articulation assessment of patients without their obturator was also performed. RESULTS Significant differences in SER were observed between healthy and maxillectomy groups. Maxillectomy patients with an obturator showed a significant negative correlation between speech intelligibility scores and SER. However, for those without an obturator, no significant correlations were observed. Furthermore, for maxillectomy patients without an obturator, significant differences were found between syllables grouped by vowels. Syllables containing /i/, /u/ and /e/ exhibited higher error rates compared to those containing /a/ and /o/. Additionally, significant differences were observed when syllables were grouped by consonant place of articulation and manner of articulation. CONCLUSION The three platforms performed well for healthy volunteers and maxillectomy patients with their obturator, but the SER for maxillectomy patients without their obturator was high, rendering the platforms unusable. System improvement is needed to increase accuracy for maxillectomy patients.
Collapse
Affiliation(s)
- Ahmed Sameir Mohamed Ali
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Keita Masaki
- Speech Clinic, Tokyo Medical and Dental University Hospital, Tokyo, Japan
| | - Mariko Hattori
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yuka I Sumita
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Department of Partial and Complete Denture, School of Life Dentistry at Tokyo, The Nippon Dental University, Tokyo, Japan
| | - Noriyuki Wakabayashi
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
2
|
Stipancic KL, Brenk F, Qiu M, Tjaden K. Progress Toward Estimating the Minimal Clinically Important Difference of Intelligibility: A Crowdsourced Perceptual Experiment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-15. [PMID: 39453526 DOI: 10.1044/2024_jslhr-24-00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
PURPOSE The purpose of the current study was to estimate the minimal clinically important difference (MCID) of sentence intelligibility in control speakers and in speakers with dysarthria due to multiple sclerosis (MS) and Parkinson's disease (PD). METHOD Sixteen control speakers, 16 speakers with MS, and 16 speakers with PD were audio-recorded reading aloud sentences in habitual, clear, fast, loud, and slow speaking conditions. Two hundred forty nonexpert crowdsourced listeners heard paired conditions of the same sentence content from a speaker and indicated if one condition was more understandable than another. Listeners then used the Global Ratings of Change (GROC) Scale to indicate how much more understandable that condition was than the other. Listener ratings were compared with objective intelligibility scores obtained previously via orthographic transcriptions from nonexpert listeners. Receiver operating characteristic (ROC) curves and average magnitude of intelligibility difference per level of the GROC Scale were evaluated to determine the sensitivity, specificity, and accuracy of potential cutoff scores in intelligibility for establishing thresholds of important change. RESULTS MCIDs derived from the ROC curves were invalid. However, the average magnitude of intelligibility difference derived valid and useful thresholds. The MCID of intelligibility was determined to be about 7% for a small amount of difference and about 15% for a large amount of difference. CONCLUSIONS This work demonstrates the feasibility of the novel experimental paradigm for collecting crowdsourced perceptual data to estimate MCIDs. Results provide empirical evidence that clinical tools for the perception of intelligibility by nonexpert listeners could consist of three categories, which emerged from the data ("no difference," "a little bit of difference," "a lot of difference"). The current work is a critical step toward development of a universal language with which to evaluate changes in intelligibility as a result of neurological injury, disease progression, and speech-language therapy.
Collapse
Affiliation(s)
- Kaila L Stipancic
- Department of Communicative Disorders and Sciences, University at Buffalo, New York
| | - Frits Brenk
- Department of Communicative Disorders and Sciences, University at Buffalo, New York
| | - Mengyang Qiu
- Department of Psychology, Trent University, Peterborough, Ontario, Canada
| | - Kris Tjaden
- Department of Communicative Disorders and Sciences, University at Buffalo, New York
| |
Collapse
|
3
|
Carl M, Rudyk E, Shapira Y, Rusiewicz HL, Icht M. Accuracy of Speech Sound Analysis: Comparison of an Automatic Artificial Intelligence Algorithm With Clinician Assessment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:3004-3021. [PMID: 39173066 DOI: 10.1044/2024_jslhr-24-00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
PURPOSE Automatic speech analysis (ASA) and automatic speech recognition systems are increasingly being used in the treatment of speech sound disorders (SSDs). When utilized as a home practice tool or in the absence of the clinician, the ASA system has the potential to facilitate treatment gains. However, the feedback accuracy of such systems varies, a factor that may impact these gains. The current research analyzes the feedback accuracy of a novel ASA algorithm (Amplio Learning Technologies), in comparison to clinician judgments. METHOD A total of 3,584 consonant stimuli, produced by 395 American English-speaking children and adolescents with SSDs (age range: 4-18 years), were analyzed with respect to automatic classification of the ASA algorithm, clinician-ASA agreement, and interclinician agreement. Further analysis of results as related to phoneme acquisition categories (early-, middle-, and late-acquired phonemes) was conducted. RESULTS Agreement between clinicians and ASA classification for sounds produced accurately was above 80% for all phonemes, with some variation based on phoneme acquisition category (early, middle, late). This variation was also noted for ASA classification into "acceptable," "unacceptable," and "unknown" (which means no determination of phoneme accuracy) categories, as well as interclinician agreement. Clinician-ASA agreement was reduced for misarticulated sounds. CONCLUSIONS The initial findings of Amplio's novel algorithm are promising for its potential use within the context of home practice, as it demonstrates high feedback accuracy for correctly produced sounds. Furthermore, complexity of sound influences consistency of perception, both by clinicians and by automated platforms, indicating variable performance of the ASA algorithm across phonemes. Taken together, the ASA algorithm may be effective in facilitating speech sound practice for children with SSDs, even in the absence of the clinician.
Collapse
Affiliation(s)
- Micalle Carl
- Department of Communication Disorders, Ariel University, Israel
| | | | | | | | - Michal Icht
- Department of Communication Disorders, Ariel University, Israel
| |
Collapse
|
4
|
Tetzloff KA, Wiepert D, Botha H, Duffy JR, Clark HM, Whitwell JL, Josephs KA, Utianski RL. Automatic Speech Recognition in Primary Progressive Apraxia of Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:2964-2976. [PMID: 39265154 PMCID: PMC11427443 DOI: 10.1044/2024_jslhr-24-00049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/17/2024] [Accepted: 06/07/2024] [Indexed: 09/14/2024]
Abstract
INTRODUCTION Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions. METHOD Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions. RESULTS Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions. CONCLUSIONS This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.26359417.
Collapse
Affiliation(s)
| | | | - Hugo Botha
- Department of Neurology, Mayo Clinic, Rochester, MN
| | | | | | | | | | | |
Collapse
|
5
|
Tobin J, Nelson P, MacDonald B, Heywood R, Cave R, Seaver K, Desjardins A, Jiang PP, Green JR. Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-10. [PMID: 38963790 DOI: 10.1044/2024_jslhr-24-00045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]
Abstract
PURPOSE This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy. METHOD Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap. RESULTS We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy. CONCLUSIONS We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Jordan R Green
- MGH Institute of Health Professions, Boston, MA
- Harvard University, Cambridge, MA
| |
Collapse
|
6
|
Ziegler W, Staiger A, Schölderle T. Profiles of Dysarthria: Clinical Assessment and Treatment. Brain Sci 2023; 14:11. [PMID: 38248226 PMCID: PMC10813547 DOI: 10.3390/brainsci14010011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 12/12/2023] [Indexed: 01/23/2024] Open
Abstract
In recent decades, we have witnessed a wealth of theoretical work and proof-of-principle studies on dysarthria, including descriptions and classifications of dysarthric speech patterns, new and refined assessment methods, and innovative experimental intervention trials [...].
Collapse
Affiliation(s)
- Wolfram Ziegler
- Clinical Neuropsychology Research Group (EKN), Institute of Phonetics and Speech Processing, Ludwig-Maximilians-University, 80799 Munich, Germany; (A.S.); (T.S.)
| | | | | |
Collapse
|
7
|
Gutz SE, Maffei MF, Green JR. Feedback From Automatic Speech Recognition to Elicit Clear Speech in Healthy Speakers. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 32:2940-2959. [PMID: 37824377 PMCID: PMC10721250 DOI: 10.1044/2023_ajslp-23-00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 04/10/2023] [Accepted: 08/01/2023] [Indexed: 10/14/2023]
Abstract
PURPOSE This study assessed the effectiveness of feedback generated by automatic speech recognition (ASR) for eliciting clear speech from young, healthy individuals. As a preliminary step toward exploring a novel method for eliciting clear speech in patients with dysarthria, we investigated the effects of ASR feedback in healthy controls. If successful, ASR feedback has the potential to facilitate independent, at-home clear speech practice. METHOD Twenty-three healthy control speakers (ages 23-40 years) read sentences aloud in three speaking modes: Habitual, Clear (over-enunciated), and in response to ASR feedback (ASR). In the ASR condition, we used Mozilla DeepSpeech to transcribe speech samples and provide participants with a value indicating the accuracy of the ASR's transcription. For speakers who achieved sufficiently high ASR accuracy, noise was added to their speech at a participant-specific signal-to-noise ratio to ensure that each participant had to over-enunciate to achieve high ASR accuracy. RESULTS Compared to habitual speech, speech produced in the ASR and Clear conditions was clearer, as rated by speech-language pathologists, and more intelligible, per speech-language pathologist transcriptions. Speech in the Clear and ASR conditions aligned on several acoustic measures, particularly those associated with increased vowel distinctiveness and decreased speaking rate. However, ASR accuracy, intelligibility, and clarity were each correlated with different speech features, which may have implications for how people change their speech for ASR feedback. CONCLUSIONS ASR successfully elicited outcomes similar to clear speech in healthy speakers. Future work should investigate its efficacy in eliciting clear speech in people with dysarthria.
Collapse
Affiliation(s)
- Sarah E. Gutz
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA
| | - Marc F. Maffei
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
| | - Jordan R. Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA
| |
Collapse
|
8
|
Avantaggiato F, Farokhniaee A, Bandini A, Palmisano C, Hanafi I, Pezzoli G, Mazzoni A, Isaias IU. Intelligibility of speech in Parkinson's disease relies on anatomically segregated subthalamic beta oscillations. Neurobiol Dis 2023; 185:106239. [PMID: 37499882 DOI: 10.1016/j.nbd.2023.106239] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/16/2023] [Accepted: 07/24/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Speech impairment is commonly reported in Parkinson's disease and is not consistently improved by available therapies - including deep brain stimulation of the subthalamic nucleus (STN-DBS), which can worsen communication performance in some patients. Improving the outcome of STN-DBS on speech is difficult due to our incomplete understanding of the contribution of the STN to fluent speaking. OBJECTIVE To assess the relationship between subthalamic neural activity and speech production and intelligibility. METHODS We investigated bilateral STN local field potentials (LFPs) in nine parkinsonian patients chronically implanted with DBS during overt reading. LFP spectral features were correlated with clinical scores and measures of speech intelligibility. RESULTS Overt reading was associated with increased beta-low ([1220) Hz) power in the left STN, whereas speech intelligibility correlated positively with beta-high ([2030) Hz) power in the right STN. CONCLUSION We identified separate contributions from frequency and brain lateralization of the STN in the execution of an overt reading motor task and its intelligibility. This subcortical organization could be exploited for new adaptive stimulation strategies capable of identifying the occurrence of speaking behavior and facilitating its functional execution.
Collapse
Affiliation(s)
- Federica Avantaggiato
- Department of Neurology, University Hospital of Würzburg and Julius Maximilian University of Würzburg, Josef-Schneider-Straße 11, 97080 Würzburg, Germany.
| | - AmirAli Farokhniaee
- Fondazione Grigioni per il Morbo di Parkinson, Via Gianfranco Zuretti 35, 20125 Milano, Italy.
| | - Andrea Bandini
- The BioRobotics Institute, Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Viale Rinaldo Piaggo 34, Pontedera, Pisa, Italy; KITE Research Institute, Toronto Rehabilitation Institute, University Health Network, Toronto, ON, Canada; Health Science Interdisciplinary Center, Scuola Superiore Sant'Anna, Viale Rinaldo Piaggo 34, Pontedera, Pisa, Italy.
| | - Chiara Palmisano
- Department of Neurology, University Hospital of Würzburg and Julius Maximilian University of Würzburg, Josef-Schneider-Straße 11, 97080 Würzburg, Germany; Parkinson Institute Milan, ASST G. Pini-CTO, via Bignami 1, 20126 Milano, Italy.
| | - Ibrahem Hanafi
- Department of Neurology, University Hospital of Würzburg and Julius Maximilian University of Würzburg, Josef-Schneider-Straße 11, 97080 Würzburg, Germany.
| | - Gianni Pezzoli
- Fondazione Grigioni per il Morbo di Parkinson, Via Gianfranco Zuretti 35, 20125 Milano, Italy; Parkinson Institute Milan, ASST G. Pini-CTO, via Bignami 1, 20126 Milano, Italy.
| | - Alberto Mazzoni
- Health Science Interdisciplinary Center, Scuola Superiore Sant'Anna, Viale Rinaldo Piaggo 34, Pontedera, Pisa, Italy.
| | - Ioannis U Isaias
- Department of Neurology, University Hospital of Würzburg and Julius Maximilian University of Würzburg, Josef-Schneider-Straße 11, 97080 Würzburg, Germany; Parkinson Institute Milan, ASST G. Pini-CTO, via Bignami 1, 20126 Milano, Italy.
| |
Collapse
|
9
|
Wolfrum V, Lehner K, Heim S, Ziegler W. Clinical Assessment of Communication-Related Speech Parameters in Dysarthria: The Impact of Perceptual Adaptation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-21. [PMID: 37486782 DOI: 10.1044/2023_jslhr-23-00105] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
PURPOSE In current clinical practice, intelligibility of dysarthric speech is commonly assessed by speech-language therapists (SLTs), in most cases by the therapist caring for the patient being diagnosed. Since SLTs are familiar with dysarthria in general and with the speech of the individual patient to be assessed in particular, they have an adaptation advantage in understanding the patient's utterances. We examined whether and how listeners' assessments of communication-related speech parameters vary as a function of their familiarity with dysarthria in general and with the diagnosed patients in particular. METHOD Intelligibility, speech naturalness, and perceived listener effort were assessed in 20 persons with dysarthria (PWD). Patients' speech samples were judged by the individual treating therapists, five dysarthria experts who were unfamiliar with the patients, and crowdsourced naïve listeners. Adaptation effects were analyzed using (a) linear mixed models of overall scoring levels, (b) regression models of severity dependence, (c) network analyses of between-listener and between-parameter relationships, and (d) measures of intra- and interobserver consistency. RESULTS Significant advantages of dysarthria experts over laypeople were found in all parameters. An overall advantage of the treating therapists over nonfamiliar experts was only seen in listening effort. Severity-dependent adaptation effects occurred in all parameters. The therapists' responses were heterogeneous and inconsistent with those of the unfamiliar experts and the naïve listeners. CONCLUSIONS The way SLTs evaluate communication-relevant speech parameters of the PWD whom they care for is influenced not only by adaptation benefits but also by therapeutic biases. This finding weakens the validity of assessments of communication-relevant speech parameters by the treating therapists themselves and encourages the development and use of alternative methods.
Collapse
Affiliation(s)
- Vera Wolfrum
- Department of Neurology, Faculty of Medicine, RWTH Aachen University, Germany
| | - Katharina Lehner
- Clinical Neuropsychology Research Group, Institute for Phonetics and Speech Processing, Ludwig Maximilian University of Munich, Germany
| | - Stefan Heim
- Department of Psychiatry, Psychotherapy, and Psychosomatics, Faculty of Medicine, RWTH Aachen University, Germany
- Research Center Jülich, Institute of Neurosciences and Medicine (INM-1), Germany
- JARA - Translational Brain Medicine, Aachen, Germany
| | - Wolfram Ziegler
- Clinical Neuropsychology Research Group, Institute for Phonetics and Speech Processing, Ludwig Maximilian University of Munich, Germany
| |
Collapse
|
10
|
Maffei MF, Chenausky KV, Gill SV, Tager-Flusberg H, Green JR. Oromotor skills in autism spectrum disorder: A scoping review. Autism Res 2023; 16:879-917. [PMID: 37010327 PMCID: PMC10365059 DOI: 10.1002/aur.2923] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 03/15/2023] [Indexed: 04/04/2023]
Abstract
Oromotor functioning plays a foundational role in spoken communication and feeding, two areas of significant difficulty for many autistic individuals. However, despite years of research and established differences in gross and fine motor skills in this population, there is currently no clear consensus regarding the presence or nature of oral motor control deficits in autistic individuals. In this scoping review, we summarize research published between 1994 and 2022 to answer the following research questions: (1) What methods have been used to investigate oromotor functioning in autistic individuals? (2) Which oromotor behaviors have been investigated in this population? and (3) What conclusions can be drawn regarding oromotor skills in this population? Seven online databases were searched resulting in 107 studies meeting our inclusion criteria. Included studies varied widely in sample characteristics, behaviors analyzed, and research methodology. The large majority (81%) of included studies report a significant oromotor abnormality related to speech production, nonspeech oromotor skills, or feeding within a sample of autistic individuals based on age norms or in comparison to a control group. We examine these findings to identify trends, address methodological aspects hindering cross-study synthesis and generalization, and provide suggestions for future research.
Collapse
Affiliation(s)
- Marc F. Maffei
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
| | - Karen V. Chenausky
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Neurology Department, Harvard Medical School, Boston, Massachusetts, USA
| | - Simone V. Gill
- College of Health and Rehabilitation Sciences, Sargent College, Boston University, Boston, Massachusetts, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, Boston, Massachusetts, USA
| | - Jordan R. Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts, USA
- Speech and Hearing Biosciences and Technology Program, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
11
|
Moya-Galé G, Walsh SJ, Goudarzi A. Automatic Assessment of Intelligibility in Noise in Parkinson’s Disease: A Validation Method (Preprint). J Med Internet Res 2022; 24:e40567. [PMID: 36264608 PMCID: PMC9634525 DOI: 10.2196/40567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 09/05/2022] [Accepted: 09/16/2022] [Indexed: 11/30/2022] Open
Abstract
Background Most individuals with Parkinson disease (PD) experience a degradation in their speech intelligibility. Research on the use of automatic speech recognition (ASR) to assess intelligibility is still sparse, especially when trying to replicate communication challenges in real-life conditions (ie, noisy backgrounds). Developing technologies to automatically measure intelligibility in noise can ultimately assist patients in self-managing their voice changes due to the disease. Objective The goal of this study was to pilot-test and validate the use of a customized web-based app to assess speech intelligibility in noise in individuals with dysarthria associated with PD. Methods In total, 20 individuals with dysarthria associated with PD and 20 healthy controls (HCs) recorded a set of sentences using their phones. The Google Cloud ASR API was used to automatically transcribe the speakers’ sentences. An algorithm was created to embed speakers’ sentences in +6-dB signal-to-noise multitalker babble. Results from ASR performance were compared to those from 30 listeners who orthographically transcribed the same set of sentences. Data were reduced into a single event, defined as a success if the artificial intelligence (AI) system transcribed a random speaker or sentence as well or better than the average of 3 randomly chosen human listeners. These data were further analyzed by logistic regression to assess whether AI success differed by speaker group (HCs or speakers with dysarthria) or was affected by sentence length. A discriminant analysis was conducted on the human listener data and AI transcriber data independently to compare the ability of each data set to discriminate between HCs and speakers with dysarthria. Results The data analysis indicated a 0.8 probability (95% CI 0.65-0.91) that AI performance would be as good or better than the average human listener. AI transcriber success probability was not found to be dependent on speaker group. AI transcriber success was found to decrease with sentence length, losing an estimated 0.03 probability of transcribing as well as the average human listener for each word increase in sentence length. The AI transcriber data were found to offer the same discrimination of speakers into categories (HCs and speakers with dysarthria) as the human listener data. Conclusions ASR has the potential to assess intelligibility in noise in speakers with dysarthria associated with PD. Our results hold promise for the use of AI with this clinical population, although a full range of speech severity needs to be evaluated in future work, as well as the effect of different speaking tasks on ASR.
Collapse
Affiliation(s)
- Gemma Moya-Galé
- Department of Communication Sciences & Disorders, Long Island University, Brooklyn, NY, United States
| | - Stephen J Walsh
- Department of Mathematics and Statistics, Utah State University, Logan, UT, United States
| | | |
Collapse
|