1
|
Chen SHK, Saeli C, Hu G. A proof-of-concept study for automatic speech recognition to transcribe AAC speakers' speech from high-technology AAC systems. Assist Technol 2024; 36:319-326. [PMID: 37748185 DOI: 10.1080/10400435.2023.2260860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 09/13/2023] [Indexed: 09/27/2023] Open
Abstract
Automatic speech recognition (ASR) is an emerging technology that has been used in recognizing non-typical speech of people with speech impairment and enhancing the language sample transcription process in communication sciences and disorders. However, the feasibility of using ASR for recognizing speech samples from high-tech Augmentative and Alternative Communication (AAC) systems has not been investigated. This proof-of-concept paper aims to investigate the feasibility of using AAC-ASR to transcribe language samples generated by high-tech AAC systems and compares the recognition accuracy of two published ASR models: CMU Sphinx and Google Speech-to-text. An AAC-ASR model was developed that transcribes simulated AAC speaker language samples. The AAC-ASR model's word error rate (WER) was compared with those of CMU Sphinx and Google Speech-to-text. The WER of the AAC-ASR model outperformed (28.6%) compared with CMU Sphinx and Google when tested on the testing files (70.7% and 86.2% retrospectively). Our results demonstrate the feasibility of using the ASR model to automatically transcribe high-technology AAC-simulated language samples to support language sample analysis. Future steps will focus on developing the model with diverse AAC speech training datasets and understanding the speech patterns of individual AAC users to refine the AAC-ASR model.
Collapse
Affiliation(s)
- Szu-Han Kay Chen
- Department of Communication Sciences and Disorders, The University of New Hampshire, Durham, New Hampshire, USA
| | - Conner Saeli
- , The State University of New York Buffalo State University, Buffalo, New York, USA
| | - Gang Hu
- , The State University of New York Buffalo State University, Buffalo, New York, USA
| |
Collapse
|
2
|
Baker E, Li W, Hodges R, Masso S, Jones C, Guo Y, Alt M, Antoniou M, Afshar S, Tosi K, Munro N. Harnessing automatic speech recognition to realise Sustainable Development Goals 3, 9, and 17 through interdisciplinary partnerships for children with communication disability. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 25:125-129. [PMID: 36511655 DOI: 10.1080/17549507.2022.2146194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
PURPOSE To showcase how applications of automatic speech recognition (ASR) technology could help solve challenges in speech-language pathology practice with children with communication disability, and contribute to the realisation of the Sustainable Development Goals (SDGs). RESULT ASR technologies have been developed to address the need for equitable, efficient, and accurate assessment and diagnosis of communication disability in children by automating the transcription and analysis of speech and language samples and supporting dual-language assessment of bilingual children. ASR tools can automate the measurement of and help optimise intervention fidelity. ASR tools can also be used by children to engage in independent speech production practice without relying on feedback from speech-language pathologists (SLPs), thus bridging the long-standing gap between recommended and received intervention intensity. These innovative technologies and tools have been generated from interdisciplinary partnerships between SLPs, engineers, data scientists, and linguists. CONCLUSION To advance equitable, efficient, and effective speech-language pathology services for children with communication disability, SLPs would benefit from integrating ASR solutions into their clinical practice. Ongoing interdisciplinary research is needed to further advance ASR technologies to optimise children's outcomes. This commentary paper focusses on industry, innovation and infrastructure (SDG 9) and partnerships for the goals (SDG 17). It also addresses SDG 1, SDG 3, SDG 4, SDG 8, SDG 10, SDG 11, and SDG 16.
Collapse
Affiliation(s)
- Elise Baker
- School of Health Sciences, Western Sydney University, Campbelltown, Australia
- South Western Sydney Local Health District, Liverpool, Australia
- Ingham Institute for Applied Medical Research, Liverpool, Australia
| | - Weicong Li
- School of Health Sciences, Western Sydney University, Campbelltown, Australia
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Westmead, Australia
| | - Rosemary Hodges
- School of Health Sciences, Western Sydney University, Campbelltown, Australia
- Western Sydney Speech Pathology, Blacktown, Australia
| | - Sarah Masso
- Sydney School of Health Sciences, The University of Sydney, Sydney, Australia
| | - Caroline Jones
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Westmead, Australia
| | - Yi Guo
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, Australia
| | - Mary Alt
- School of Mind, Brain, and Behaviour, The University of Arizona, Tucson, AZ, USA, and
| | - Mark Antoniou
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Westmead, Australia
| | - Saeed Afshar
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Westmead, Australia
- School of Computing, Engineering and Mathematics, Western Sydney University, Penrith, Australia
- International Centre for Neuromorphic Systems, Western Sydney University, Penrith, Australia
| | - Katrina Tosi
- South Western Sydney Local Health District, Liverpool, Australia
| | - Natalie Munro
- Sydney School of Health Sciences, The University of Sydney, Sydney, Australia
| |
Collapse
|
3
|
Scott A, Gillon G, McNeill B, Kopach A. The Evolution of an Innovative Online Task to Monitor Children's Oral Narrative Development. Front Psychol 2022; 13:903124. [PMID: 35967638 PMCID: PMC9364821 DOI: 10.3389/fpsyg.2022.903124] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
Oral narrative abilities are an important measure of children's language competency and have predictive value for children's later academic performance. Research and development underway in New Zealand is advancing an innovative online oral narrative task. This task uses audio recordings of children's story retells, speech-to-text software and language analysis to record, transcribe, analyse and present oral narrative and listening comprehension data back to class teachers. The task has been designed for class teachers' use with the support of SLP or literacy specialists in data interpretation. Teachers are upskilled and supported in order to interpret these data and implement teaching practices for students through online professional learning and development modules, within the context of a broader evidence-based approach to early literacy instruction. This article describes the development of this innovative, culturally relevant, online tool for monitoring children's oral narrative ability and listening comprehension in their first year of school. Three phases of development are outlined, showing the progression of the tool from a researcher-administered task during controlled research trials, to wide-scale implementation with thousands of students throughout New Zealand. The current iteration of the tool uses an automatic speech-recognition system with specifically trained transcription models and support from research assistants to check transcription, then code and analyse the oral narrative. This reduces transcription and analysis time to ~7 min, with a word error rate of around 20%. Future development plans to increase the accuracy of automatic transcription and embed basic language analysis into the tool, with the aim of removing the need for support from research assistants.
Collapse
Affiliation(s)
- Amy Scott
- Faculty of Education, Child Well-Being Research Institute, University of Canterbury, Christchurch, New Zealand
- *Correspondence: Amy Scott
| | - Gail Gillon
- Faculty of Education, Child Well-Being Research Institute, University of Canterbury, Christchurch, New Zealand
| | - Brigid McNeill
- Faculty of Education, Child Well-Being Research Institute, University of Canterbury, Christchurch, New Zealand
- Faculty of Education, School of Teacher Education, University of Canterbury, Christchurch, New Zealand
| | - Alex Kopach
- Global Office Limited, Christchurch, New Zealand
| |
Collapse
|
4
|
An English Teaching Pronunciation Detection and Recognition Algorithm Based on Cluster Analysis and Improved SSD. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2022. [DOI: 10.1155/2022/1626229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The accuracy of English pronunciation is the key index to evaluate the quality of English teaching. Correct pronunciation and smooth language flow are the expectations of every student for English learning. Aiming at the poor effect and slow speed of the original SSD (Single Shot MultiBox Detector) algorithm in English teaching pronunciation detection, this paper proposes a clustering and improved SSD algorithm for English teaching pronunciation detection and recognition. The algorithm improves the concept module to enhance the feature extraction ability of the network and improve the detection speed. Meanwhile, it integrates multiscale features to realize multilayer multiplexing and equalization of features, so as to improve the detection effect of small target sound. This algorithm extracts more features by introducing channel attention mechanism, which increases the detection accuracy while reducing computation. In order to optimize the network’s ability to extract target location information, K-means clustering method is used to set the default parameters that are more in line with the characteristics of target samples. The experimental results showed that the proposed algorithm can accurately evaluate the pronunciation quality of reading aloud, so as to effectively reflect the oral English level of the reader.
Collapse
|