1
|
Marin E, Unsihuay N, Abarca VE, Elias DA. Identification of the Biomechanical Response of the Muscles That Contract the Most during Disfluencies in Stuttered Speech. SENSORS (BASEL, SWITZERLAND) 2024; 24:2629. [PMID: 38676246 PMCID: PMC11053464 DOI: 10.3390/s24082629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/04/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024]
Abstract
Stuttering, affecting approximately 1% of the global population, is a complex speech disorder significantly impacting individuals' quality of life. Prior studies using electromyography (EMG) to examine orofacial muscle activity in stuttering have presented mixed results, highlighting the variability in neuromuscular responses during stuttering episodes. Fifty-five participants with stuttering and 30 individuals without stuttering, aged between 18 and 40, participated in the study. EMG signals from five facial and cervical muscles were recorded during speech tasks and analyzed for mean amplitude and frequency activity in the 5-15 Hz range to identify significant differences. Upon analysis of the 5-15 Hz frequency range, a higher average amplitude was observed in the zygomaticus major muscle for participants while stuttering (p < 0.05). Additionally, when assessing the overall EMG signal amplitude, a higher average amplitude was observed in samples obtained from disfluencies in participants who did not stutter, particularly in the depressor anguli oris muscle (p < 0.05). Significant differences in muscle activity were observed between the two groups, particularly in the depressor anguli oris and zygomaticus major muscles. These results suggest that the underlying neuromuscular mechanisms of stuttering might involve subtle aspects of timing and coordination in muscle activation. Therefore, these findings may contribute to the field of biosensors by providing valuable perspectives on neuromuscular mechanisms and the relevance of electromyography in stuttering research. Further research in this area has the potential to advance the development of biosensor technology for language-related applications and therapeutic interventions in stuttering.
Collapse
Affiliation(s)
| | | | - Victoria E. Abarca
- Biomechanics and Applied Robotics Research Laboratory, Pontificia Universidad Católica del Perú, Lima 15088, Peru; (E.M.); (N.U.); (D.A.E.)
| | | |
Collapse
|
2
|
Dong P, Li Y, Chen S, Grafstein JT, Khan I, Yao S. Decoding silent speech commands from articulatory movements through soft magnetic skin and machine learning. MATERIALS HORIZONS 2023; 10:5607-5620. [PMID: 37751158 DOI: 10.1039/d3mh01062g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
Silent speech interfaces have been pursued to restore spoken communication for individuals with voice disorders and to facilitate intuitive communications when acoustic-based speech communication is unreliable, inappropriate, or undesired. However, the current methodology for silent speech faces several challenges, including bulkiness, obtrusiveness, low accuracy, limited portability, and susceptibility to interferences. In this work, we present a wireless, unobtrusive, and robust silent speech interface for tracking and decoding speech-relevant movements of the temporomandibular joint. Our solution employs a single soft magnetic skin placed behind the ear for wireless and socially acceptable silent speech recognition. The developed system alleviates several concerns associated with existing interfaces based on face-worn sensors, including a large number of sensors, highly visible interfaces on the face, and obtrusive interconnections between sensors and data acquisition components. With machine learning-based signal processing techniques, good speech recognition accuracy is achieved (93.2% accuracy for phonemes, and 87.3% for a list of words from the same viseme groups). Moreover, the reported silent speech interface demonstrates robustness against noises from both ambient environments and users' daily motions. Finally, its potential in assistive technology and human-machine interactions is illustrated through two demonstrations - silent speech enabled smartphone assistants and silent speech enabled drone control.
Collapse
Affiliation(s)
- Penghao Dong
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, New York 11794, USA.
| | - Yizong Li
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, New York 11794, USA.
| | - Si Chen
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, New York 11794, USA.
| | - Justin T Grafstein
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, New York 11794, USA.
| | - Irfaan Khan
- Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, New York 11794, USA
| | - Shanshan Yao
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, New York 11794, USA.
| |
Collapse
|
3
|
Dong P, Song Y, Yu S, Zhang Z, Mallipattu SK, Djurić PM, Yao S. Electromyogram-Based Lip-Reading via Unobtrusive Dry Electrodes and Machine Learning Methods. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2205058. [PMID: 36703524 DOI: 10.1002/smll.202205058] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/11/2023] [Indexed: 06/18/2023]
Abstract
Lip-reading provides an effective speech communication interface for people with voice disorders and for intuitive human-machine interactions. Existing systems are generally challenged by bulkiness, obtrusiveness, and poor robustness against environmental interferences. The lack of a truly natural and unobtrusive system for converting lip movements to speech precludes the continuous use and wide-scale deployment of such devices. Here, the design of a hardware-software architecture to capture, analyze, and interpret lip movements associated with either normal or silent speech is presented. The system can recognize different and similar visemes. It is robust in a noisy or dark environment. Self-adhesive, skin-conformable, and semi-transparent dry electrodes are developed to track high-fidelity speech-relevant electromyogram signals without impeding daily activities. The resulting skin-like sensors can form seamless contact with the curvilinear and dynamic surfaces of the skin, which is crucial for a high signal-to-noise ratio and minimal interference. Machine learning algorithms are employed to decode electromyogram signals and convert them to spoken words. Finally, the applications of the developed lip-reading system in augmented reality and medical service are demonstrated, which illustrate the great potential in immersive interaction and healthcare applications.
Collapse
Affiliation(s)
- Penghao Dong
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Yuanqing Song
- Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Shangyouqiao Yu
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Zimeng Zhang
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Sandeep K Mallipattu
- Department of Medicine, Stony Brook University, Stony Brook, NY, 11794, USA
- Renal Section, Northport VA Medical Center, Northport, NY, 11768, USA
| | - Petar M Djurić
- Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Shanshan Yao
- Department of Mechanical Engineering, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
4
|
Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device. ADVANCES IN HUMAN-COMPUTER INTERACTION 2023. [DOI: 10.1155/2023/7398538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all the spoken languages. The Arabic language is one of them. Arabic speech recognition is bounded to the lack of suitable datasets. Artificial intelligence algorithms have shown promising capabilities for Arabic speech recognition. Arabic is the official language of 22 countries, and it has been estimated that 400 million people speak the Arabic language worldwide. Speech disabilities have been one of the expanding problems in the last decades, even in kids. Some devices can be used to generate speech for those people. One of these devices is the Servox Digital Electro-Larynx (EL). In this research, we developed an autoencoder with a combination of long short-term memory (LSTM) and gated recurrent units (GRU) models to recognize recorded signals from Servox Digital EL Electro-Larynx. The proposed framework consisted of three steps: denoising, feature extraction, and Arabic speech recognition. The experimental results show 95.31% accuracy for Arabic speech recognition with the proposed model. In this research, we evaluated different combinations of LSTM and GRU for constructing the best autoencoder. A rigorous evaluation process indicates better performance with the use of GRU in both encoder and decoder structures. The proposed model achieved a 4.69% word error rate (WER). Experimental results confirm that the proposed model can be used for developing a real-time app to recognize common Arabic spoken words.
Collapse
|
5
|
Vojtech JM, Mitchell CL, Raiff L, Kline JC, De Luca G. Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck. VIBRATION 2022; 5:692-710. [PMID: 36299552 PMCID: PMC9592063 DOI: 10.3390/vibration5040041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development.
Collapse
Affiliation(s)
| | | | - Laura Raiff
- Delsys, Inc., Natick, MA 01760, USA
- Altec, Inc., Natick, MA 01760, USA
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Joshua C. Kline
- Delsys, Inc., Natick, MA 01760, USA
- Altec, Inc., Natick, MA 01760, USA
| | - Gianluca De Luca
- Delsys, Inc., Natick, MA 01760, USA
- Altec, Inc., Natick, MA 01760, USA
| |
Collapse
|