1
|
Cui Q, Zhang X, Zhang Y, Zheng C, Xie L, Yan Y, Wu EQ, Yin E. A simplified adversarial architecture for cross-subject silent speech recognition using electromyography. J Neural Eng 2024; 21:056001. [PMID: 39178906 DOI: 10.1088/1741-2552/ad7321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 08/23/2024] [Indexed: 08/26/2024]
Abstract
Objective. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier.Approach. To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition.Main results. A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects.Significance. Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application.
Collapse
Affiliation(s)
- Qiang Cui
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| | - Xingyu Zhang
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| | - Yakun Zhang
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| | - Changyan Zheng
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- High-tech Institute, Weifang 261000, People's Republic of China
| | - Liang Xie
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| | - Ye Yan
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| | - Edmond Q Wu
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
- Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China
| | - Erwei Yin
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing 100071, People's Republic of China
- Intelligent Game and Decision Laboratory, Beijing 100071, People's Republic of China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin 300450, People's Republic of China
| |
Collapse
|
2
|
Kwon J, Hwang J, Sung JE, Im CH. Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network. Comput Biol Med 2024; 182:109090. [PMID: 39232406 DOI: 10.1016/j.compbiomed.2024.109090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/23/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]
Abstract
Silent speech interfaces (SSIs) have emerged as innovative non-acoustic communication methods, and our previous study demonstrated the significant potential of three-axis accelerometer-based SSIs to identify silently spoken words with high classification accuracy. The developed accelerometer-based SSI with only four accelerometers and a small training dataset outperformed a conventional surface electromyography (sEMG)-based SSI. In this study, motivated by the promising initial results, we investigated the feasibility of synthesizing spoken speech from three-axis accelerometer signals. This exploration aimed to assess the potential of accelerometer-based SSIs for practical silent communication applications. Nineteen healthy individuals participated in our experiments. Five accelerometers were attached to the face to acquire speech-related facial movements while the participants read 270 Korean sentences aloud. For the speech synthesis, we used a convolution-augmented Transformer (Conformer)-based deep neural network model to convert the accelerometer signals into a Mel spectrogram, from which an audio waveform was synthesized using HiFi-GAN. To evaluate the quality of the generated Mel spectrograms, ten-fold cross-validation was performed, and the Mel cepstral distortion (MCD) was chosen as the evaluation metric. As a result, an average MCD of 5.03 ± 0.65 was achieved using four optimized accelerometers based on our previous study. Furthermore, the quality of generated Mel spectrograms was significantly enhanced by adding one more accelerometer attached under the chin, achieving an average MCD of 4.86 ± 0.65 (p < 0.001, Wilcoxon signed-rank test). Although an objective comparison is difficult, these results surpass those obtained using conventional SSIs based on sEMG, electromagnetic articulography, and electropalatography with the fewest sensors and a similar or smaller number of sentences to train the model. Our proposed approach will contribute to the widespread adoption of accelerometer-based SSIs, leveraging the advantages of accelerometers like low power consumption, invulnerability to physiological artifacts, and high portability.
Collapse
Affiliation(s)
- Jinuk Kwon
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea.
| | - Jihun Hwang
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea.
| | - Jee Eun Sung
- Department of Communication Disorders, Ewha Womans University, Seoul, South Korea.
| | - Chang-Hwan Im
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea; Department of Biomedical Engineering, Hanyang University, Seoul, South Korea; Department of Artificial Intelligence, Hanyang University, Seoul, South Korea; Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, South Korea.
| |
Collapse
|
3
|
Elbourhamy DM. Automated sentiment analysis of visually impaired students' audio feedback in virtual learning environments. PeerJ Comput Sci 2024; 10:e2143. [PMID: 38983237 PMCID: PMC11232573 DOI: 10.7717/peerj-cs.2143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/29/2024] [Indexed: 07/11/2024]
Abstract
This research introduces an innovative intelligent model developed for predicting and analyzing sentiment responses regarding audio feedback from students with visual impairments in a virtual learning environment. Sentiment is divided into five types: high positive, positive, neutral, negative, and high negative. The model sources data from post-COVID-19 outbreak educational platforms (Microsoft Teams) and offers automated evaluation and visualization of audio feedback, which enhances students' performances. It also offers better insight into the sentiment scenarios of e-learning visually impaired students to educators. The sentiment responses from the assessment to point out deficiencies in computer literacy and forecast performance were pretty successful with the support vector machine (SVM) and artificial neural network (ANN) algorithms. The model performed well in predicting student performance using ANN algorithms on structured and unstructured data, especially by the 9th week against unstructured data only. In general, the research findings provide an inclusive policy implication that ought to be followed to provide education to students with a visual impairment and the role of technology in enhancing the learning experience for these students.
Collapse
Affiliation(s)
- Doaa Mohamed Elbourhamy
- Educational Technology and Computer Department, Faculty of Specific Education, Kafrelshiekh University, Egypt
| |
Collapse
|
4
|
Zhou C, Li X, Feng F, Zhang J, Lyu H, Wu W, Tang X, Luo B, Li D, Xiang W, Yao D. Inter-patient ECG heartbeat classification for arrhythmia classification: a new approach of multi-layer perceptron with weight capsule and sequence-to-sequence combination. Front Physiol 2023; 14:1247587. [PMID: 37841320 PMCID: PMC10569428 DOI: 10.3389/fphys.2023.1247587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 09/07/2023] [Indexed: 10/17/2023] Open
Abstract
Objective: The objective of this research is to construct a method to alleviate the problem of sample imbalance in classification, especially for arrhythmia classification. This approach can improve the performance of the model without using data enhancement. Methods: In this study, we have developed a new Multi-layer Perceptron (MLP) block and have used a Weight Capsule (WCapsule) network with MLP combined with sequence-to-sequence (Seq2Seq) network to classify arrhythmias. Our work is based on the MIT-BIH arrhythmia database, the original electrocardiogram (ECG) data is classified according to the criteria recommended by the American Association for Medical Instrumentation (AAMI). Also, our method's performance is further evaluated. Results: The proposed model is evaluated using the inter-patient paradigm. Our proposed method shows an accuracy (ACC) of 99.88% under sample imbalance. For Class N, sensitivity (SEN) is 99.79%, positive predictive value (PPV) is 99.90%, and specificity (SPEC) is 99.19%. For Class S, SEN is 97.66%, PPV is 96.14%, and SPEC is 99.85%. For Class V, SEN is 99.97%, PPV is 99.07%, and SPEC is 99.94%. For Class F, SEN is 97.94%, PPV is 98.70%, and SPEC is 99.99%. When using only half of the training sample, our method shows that the SEN of Class N and V is 0.97% and 5.27% higher than the traditional machine learning algorithm. Conclusion: The proposed method combines MLP, weight capsule network with Seq2seq network, effectively addresses the problem of sample imbalance in arrhythmia classification, and produces good performance. Our method also shows promising potential in less samples.
Collapse
Affiliation(s)
- Chenchen Zhou
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
- Guangxi Key Laboratory of Digital Infrastructure, Guangxi Information Center, Nanning, China
| | - Xiangkui Li
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Fan Feng
- Guangxi Key Laboratory of Digital Infrastructure, Guangxi Information Center, Nanning, China
| | - Jian Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
| | - He Lyu
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
| | - Weixuan Wu
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
| | - Xuezhi Tang
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
| | - Bin Luo
- Sichuan Huhui Software Co., Ltd., Mianyang, China
| | - Dong Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Wei Xiang
- Key Laboratory of Electronic and Information Engineering, State Ethnic Affairs Commission, Southwest Minzu University, Chengdu, China
| | - Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
5
|
Cao B, Ravi S, Sebkhi N, Bhavsar A, Inan OT, Xu W, Wang J. MagTrack: A Wearable Tongue Motion Tracking System for Silent Speech Interfaces. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3206-3221. [PMID: 37146629 PMCID: PMC10555459 DOI: 10.1044/2023_jslhr-22-00319] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 09/06/2022] [Accepted: 02/20/2023] [Indexed: 05/07/2023]
Abstract
PURPOSE Current electromagnetic tongue tracking devices are not amenable for daily use and thus not suitable for silent speech interface and other applications. We have recently developed MagTrack, a novel wearable electromagnetic articulograph tongue tracking device. This study aimed to validate MagTrack for potential silent speech interface applications. METHOD We conducted two experiments: (a) classification of eight isolated vowels in consonant-vowel-consonant form and (b) continuous silent speech recognition. In these experiments, we used data from healthy adult speakers collected with MagTrack. The performance of vowel classification was measured by accuracies. The continuous silent speech recognition was measured by phoneme error rates. The performance was then compared with results using data collected with commercial electromagnetic articulograph in a prior study. RESULTS The isolated vowel classification using MagTrack achieved an average accuracy of 89.74% when leveraging all MagTrack signals (x, y, z coordinates; orientation; and magnetic signals), which outperformed the accuracy using commercial electromagnetic articulograph data (only y, z coordinates) in our previous study. The continuous speech recognition from two subjects using MagTrack achieved phoneme error rates of 73.92% and 66.73%, respectively. The commercial electromagnetic articulograph achieved 64.53% from the same subject (66.73% using MagTrack data). CONCLUSIONS MagTrack showed comparable results with the commercial electromagnetic articulograph when using the same localized information. Adding raw magnetic signals would improve the performance of MagTrack. Our preliminary testing demonstrated the potential for silent speech interface as a lightweight wearable device. This work also lays the foundation to support MagTrack's potential for other applications including visual feedback-based speech therapy and second language learning.
Collapse
Affiliation(s)
- Beiming Cao
- Department of Electrical and Computer Engineering, The University of Texas at Austin
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin
| | - Shravan Ravi
- Department of Computer Science, The University of Texas at Austin
| | - Nordine Sebkhi
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
| | - Arpan Bhavsar
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
| | - Omer T. Inan
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
| | - Wen Xu
- Division of Computer Science, Texas Woman's University, Denton
| | - Jun Wang
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin
- Department of Neurology, The University of Texas at Austin
| |
Collapse
|
6
|
Csapó TG, Gosztolya G, Tóth L, Shandiz AH, Markó A. Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping. SENSORS (BASEL, SWITZERLAND) 2022; 22:8601. [PMID: 36433196 PMCID: PMC9696288 DOI: 10.3390/s22228601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/28/2022] [Accepted: 10/31/2022] [Indexed: 06/16/2023]
Abstract
Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a "silent speech interface", which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions.
Collapse
Affiliation(s)
- Tamás Gábor Csapó
- Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Hungary
| | - Gábor Gosztolya
- ELRN-SZTE Research Group on Artificial Intelligence, H-6720 Szeged, Hungary
| | - László Tóth
- Institute of Informatics, University of Szeged, H-6720 Szeged, Hungary
| | | | - Alexandra Markó
- MTA-ELTE Lendület Lingual Articulation Research Group, H-1088 Budapest, Hungary
| |
Collapse
|
7
|
Wu J, Zhang Y, Xie L, Yan Y, Zhang X, Liu S, An X, Yin E, Ming D. A novel silent speech recognition approach based on parallel inception convolutional neural network and Mel frequency spectral coefficient. Front Neurorobot 2022; 16:971446. [PMID: 36119717 PMCID: PMC9478652 DOI: 10.3389/fnbot.2022.971446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 07/07/2022] [Indexed: 11/13/2022] Open
Abstract
Silent speech recognition breaks the limitations of automatic speech recognition when acoustic signals cannot be produced or captured clearly, but still has a long way to go before being ready for any real-life applications. To address this issue, we propose a novel silent speech recognition framework based on surface electromyography (sEMG) signals. In our approach, a new deep learning architecture Parallel Inception Convolutional Neural Network (PICNN) is proposed and implemented in our silent speech recognition system, with six inception modules processing six channels of sEMG data, separately and simultaneously. Meanwhile, Mel Frequency Spectral Coefficients (MFSCs) are employed to extract speech-related sEMG features for the first time. We further design and generate a 100-class dataset containing daily life assistance demands for the elderly and disabled individuals. The experimental results obtained from 28 subjects confirm that our silent speech recognition method outperforms state-of-the-art machine learning algorithms and deep learning architectures, achieving the best recognition accuracy of 90.76%. With sEMG data collected from four new subjects, efficient steps of subject-based transfer learning are conducted to further improve the cross-subject recognition ability of the proposed model. Promising results prove that our sEMG-based silent speech recognition system could have high recognition accuracy and steady performance in practical applications.
Collapse
Affiliation(s)
- Jinghan Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
| | - Yakun Zhang
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing, China
| | - Liang Xie
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing, China
| | - Ye Yan
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing, China
| | - Xu Zhang
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, China
| | - Shuang Liu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Xingwei An
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
- *Correspondence: Xingwei An
| | - Erwei Yin
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), Beijing, China
- Erwei Yin
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| |
Collapse
|
8
|
Cao B, Wisler A, Wang J. Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis. SENSORS (BASEL, SWITZERLAND) 2022; 22:6056. [PMID: 36015817 PMCID: PMC9416444 DOI: 10.3390/s22166056] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/03/2022] [Accepted: 08/08/2022] [Indexed: 05/23/2023]
Abstract
Silent speech interfaces (SSIs) convert non-audio bio-signals, such as articulatory movement, to speech. This technology has the potential to recover the speech ability of individuals who have lost their voice but can still articulate (e.g., laryngectomees). Articulation-to-speech (ATS) synthesis is an algorithm design of SSI that has the advantages of easy-implementation and low-latency, and therefore is becoming more popular. Current ATS studies focus on speaker-dependent (SD) models to avoid large variations of articulatory patterns and acoustic features across speakers. However, these designs are limited by the small data size from individual speakers. Speaker adaptation designs that include multiple speakers' data have the potential to address the issue of limited data size from single speakers; however, few prior studies have investigated their performance in ATS. In this paper, we investigated speaker adaptation on both the input articulation and the output acoustic signals (with or without direct inclusion of data from test speakers) using the publicly available electromagnetic articulatory (EMA) dataset. We used Procrustes matching and voice conversion for articulation and voice adaptation, respectively. The performance of the ATS models was measured objectively by the mel-cepstral distortions (MCDs). The synthetic speech samples were generated and are provided in the supplementary material. The results demonstrated the improvement brought by both Procrustes matching and voice conversion on speaker-independent ATS. With the direct inclusion of target speaker data in the training process, the speaker-adaptive ATS achieved a comparable performance to speaker-dependent ATS. To our knowledge, this is the first study that has demonstrated that speaker-adaptive ATS can achieve a non-statistically different performance to speaker-dependent ATS.
Collapse
Affiliation(s)
- Beiming Cao
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712, USA
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Alan Wisler
- Department of Mathematics and Statistics, Utah State University, Logan, UT 84322, USA
| | - Jun Wang
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
9
|
Sang Y, Chen X. Human-computer interactive physical education teaching method based on speech recognition engine technology. Front Public Health 2022; 10:941083. [PMID: 35923977 PMCID: PMC9339716 DOI: 10.3389/fpubh.2022.941083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
Abstract
With the advent of the era of artificial intelligence, speech recognition engine technology has a profound impact on social production, life, education, and other fields. Voice interaction is the most basic and practical type of human-computer interaction. To build an intelligent and automatic physical education teaching mode, this paper combines human-computer interaction based on speech recognition technology with physical education teaching. Students input through voice signals, and the system receives signals, analyzes signals, recognizes signals, and feeds back information to students in multiple forms. For the system to process the external speech signal, this paper uses the Mel cepstral coefficient algorithm to extract the speech information. By comparing the speech recognition rate and antinoise rate of Hidden Markov Model, Probabilistic Statistics Neural Network, and Hybrid Model (Hidden Markov and Rate Statistical Neural Network combination), the speech recognition engine uses the hybrid model, and its speech recognition rate is 98.3%, and the average antinoise rate can reach 85%. By comparing the human-computer interaction physical education teaching method with the traditional teaching method, the human-computer interaction method is superior to the traditional teaching method in the acquisition of physical knowledge, the acquisition of physical skills, the satisfaction of physical education courses and the ability of active learning. It effectively solves the drawbacks of traditional physical education and rationally uses human-computer interaction technology. On the basis of not violating physical education, realize the diversification of physical education, improve the quality of teaching, improve students' individual development and students' autonomous learning ability. Therefore, the combination of human-computer interaction and physical education based on recognition engine technology is the trend of today's physical education development.
Collapse
Affiliation(s)
- Yunpeng Sang
- Sport Department, Changshu Institute of Technology, Suzhou, China
| | - Xingquan Chen
- Physical Education College, Sichuan University, Chengdu, China
- *Correspondence: Xingquan Chen
| |
Collapse
|
10
|
Wagner C, Schaffer P, Amini Digehsara P, Bärhold M, Plettemeier D, Birkholz P. Silent speech command word recognition using stepped frequency continuous wave radar. Sci Rep 2022; 12:4192. [PMID: 35273225 PMCID: PMC8913675 DOI: 10.1038/s41598-022-07842-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/21/2022] [Indexed: 11/25/2022] Open
Abstract
Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.
Collapse
Affiliation(s)
- Christoph Wagner
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany.
| | - Petr Schaffer
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany.
| | - Pouriya Amini Digehsara
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany
| | - Michael Bärhold
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany
| | - Dirk Plettemeier
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany
| |
Collapse
|
11
|
Zhou Z, Tam VWL, Lam EY. A Portable Sign Language Collection and Translation Platform with Smart Watches Using a BLSTM-Based Multi-Feature Framework. MICROMACHINES 2022; 13:mi13020333. [PMID: 35208457 PMCID: PMC8877205 DOI: 10.3390/mi13020333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/12/2022] [Accepted: 02/17/2022] [Indexed: 11/16/2022]
Abstract
Continuous sign language recognition (CSLR) using different types of sensors to precisely recognize sign language in real time is a very challenging but important research direction in sensor technology. Many previous methods are vision-based, with computationally intensive algorithms to process a large number of image/video frames possibly contaminated with noises, which can result in a large translation delay. On the other hand, gesture-based CSLR relying on hand movement data captured on wearable devices may require less computation resources and translation time. Thus, it is more efficient to provide instant translation during real-world communication. However, the insufficient amount of information provided by the wearable sensors often affect the overall performance of this system. To tackle this issue, we propose a bidirectional long short-term memory (BLSTM)-based multi-feature framework for conducting gesture-based CSLR precisely with two smart watches. In this framework, multiple sets of input features are extracted from the collected gesture data to provide a diverse spectrum of valuable information to the underlying BLSTM model for CSLR. To demonstrate the effectiveness of the proposed framework, we test it on an extremely challenging and radically new dataset of Hong Kong sign language (HKSL), in which hand movement data are collected from 6 individual signers for 50 different sentences. The experimental results reveal that the proposed framework attains a much lower word error rate compared with other existing machine learning or deep learning approaches for gesture-based CSLR. Based on this framework, we further propose a portable sign language collection and translation platform, which can simplify the procedure of collecting gesture-based sign language dataset and recognize sign language through smart watch data in real time, in order to break the communication barrier for the sign language users.
Collapse
|
12
|
Wang N, Xu H, Xu F, Cheng L. The algorithmic composition for music copyright protection under deep learning and blockchain. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107763] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Dam Deformation Interpretation and Prediction Based on a Long Short-Term Memory Model Coupled with an Attention Mechanism. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11146625] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
An accurate dam deformation prediction model is vital to a dam safety monitoring system, as it helps assess and manage dam risks. Most traditional dam deformation prediction algorithms ignore the interpretation and evaluation of variables and lack qualitative measures. This paper proposes a data processing framework that uses a long short-term memory (LSTM) model coupled with an attention mechanism to predict the deformation response of a dam structure. First, the random forest (RF) model is introduced to assess the relative importance of impact factors and screen input variables. Secondly, the density-based spatial clustering of applications with noise (DBSCAN) method is used to identify and filter the equipment based abnormal values to reduce the random error in the measurements. Finally, the coupled model is used to focus on important factors in the time dimension in order to obtain more accurate nonlinear prediction results. The results of the case study show that, of all tested methods, the proposed coupled method performed best. In addition, it was found that temperature and water level both have significant impacts on dam deformation and can serve as reliable metrics for dam management.
Collapse
|
14
|
Vojtech JM, Chan MD, Shiwani B, Roy SH, Heaton JT, Meltzner GS, Contessa P, De Luca G, Patel R, Kline JC. Surface Electromyography-Based Recognition, Synthesis, and Perception of Prosodic Subvocal Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2134-2153. [PMID: 33979177 PMCID: PMC8740708 DOI: 10.1044/2021_jslhr-20-00257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Purpose This study aimed to evaluate a novel communication system designed to translate surface electromyographic (sEMG) signals from articulatory muscles into speech using a personalized, digital voice. The system was evaluated for word recognition, prosodic classification, and listener perception of synthesized speech. Method sEMG signals were recorded from the face and neck as speakers with (n = 4) and without (n = 4) laryngectomy subvocally recited (silently mouthed) a speech corpus comprising 750 phrases (150 phrases with variable phrase-level stress). Corpus tokens were then translated into speech via personalized voice synthesis (n = 8 synthetic voices) and compared against phrases produced by each speaker when using their typical mode of communication (n = 4 natural voices, n = 4 electrolaryngeal [EL] voices). Naïve listeners (n = 12) evaluated synthetic, natural, and EL speech for acceptability and intelligibility in a visual sort-and-rate task, as well as phrasal stress discriminability via a classification mechanism. Results Recorded sEMG signals were processed to translate sEMG muscle activity into lexical content and categorize variations in phrase-level stress, achieving a mean accuracy of 96.3% (SD = 3.10%) and 91.2% (SD = 4.46%), respectively. Synthetic speech was significantly higher in acceptability and intelligibility than EL speech, also leading to greater phrasal stress classification accuracy, whereas natural speech was rated as the most acceptable and intelligible, with the greatest phrasal stress classification accuracy. Conclusion This proof-of-concept study establishes the feasibility of using subvocal sEMG-based alternative communication not only for lexical recognition but also for prosodic communication in healthy individuals, as well as those living with vocal impairments and residual articulatory function. Supplemental Material https://doi.org/10.23641/asha.14558481.
Collapse
Affiliation(s)
| | | | | | | | - James T. Heaton
- Massachusetts General Hospital Department of Surgery, Boston
| | | | | | | | - Rupal Patel
- VocaliD, Inc., Belmont, MA
- Northeastern University, Boston, MA
| | | |
Collapse
|
15
|
Zhang R, Guo Z, Meng Y, Wang S, Li S, Niu R, Wang Y, Guo Q, Li Y. Comparison of ARIMA and LSTM in Forecasting the Incidence of HFMD Combined and Uncombined with Exogenous Meteorological Variables in Ningbo, China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18116174. [PMID: 34200378 PMCID: PMC8201362 DOI: 10.3390/ijerph18116174] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 05/26/2021] [Accepted: 06/03/2021] [Indexed: 11/30/2022]
Abstract
Background: This study intends to identify the best model for predicting the incidence of hand, foot and mouth disease (HFMD) in Ningbo by comparing Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory Neural Network (LSTM) models combined and uncombined with exogenous meteorological variables. Methods: The data of daily HFMD incidence in Ningbo from January 2014 to November 2017 were set as the training set, and the data of December 2017 were set as the test set. ARIMA and LSTM models combined and uncombined with exogenous meteorological variables were adopted to fit the daily incidence of HFMD by using the data of the training set. The forecasting performances of the four fitted models were verified by using the data of the test set. Root mean square error (RMSE) was selected as the main measure to evaluate the performance of the models. Results: The RMSE for multivariate LSTM, univariate LSTM, ARIMA and ARIMAX (Autoregressive Integrated Moving Average Model with Exogenous Input Variables) was 10.78, 11.20, 12.43 and 14.73, respectively. The LSTM model with exogenous meteorological variables has the best performance among the four models and meteorological variables can increase the prediction accuracy of LSTM model. For the ARIMA model, exogenous meteorological variables did not increase the prediction accuracy but became the interference factor of the model. Conclusions: Multivariate LSTM is the best among the four models to fit the daily incidence of HFMD in Ningbo. It can provide a scientific method to build the HFMD early warning system and the methodology can also be applied to other communicable diseases.
Collapse
Affiliation(s)
- Rui Zhang
- Chinese Center for Disease Control and Prevention, Beijing 102206, China; (R.Z.); (Y.M.); (S.W.); (S.L.)
| | - Zhen Guo
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China;
| | - Yujie Meng
- Chinese Center for Disease Control and Prevention, Beijing 102206, China; (R.Z.); (Y.M.); (S.W.); (S.L.)
| | - Songwang Wang
- Chinese Center for Disease Control and Prevention, Beijing 102206, China; (R.Z.); (Y.M.); (S.W.); (S.L.)
| | - Shaoqiong Li
- Chinese Center for Disease Control and Prevention, Beijing 102206, China; (R.Z.); (Y.M.); (S.W.); (S.L.)
| | - Ran Niu
- National Institute for Nutrition and Health, Chinese Center for Disease Control and Prevention, Beijing 100050, China;
| | - Yu Wang
- National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, Beijing 100021, China;
| | - Qing Guo
- Chinese Center for Disease Control and Prevention, Beijing 102206, China; (R.Z.); (Y.M.); (S.W.); (S.L.)
- Correspondence: (Q.G.); (Y.L.); Tel.: +86-10-5890-0410 (Q.G.); Fax: +86-10-5890-0445 (Q.G.)
| | - Yonghong Li
- National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, Beijing 100021, China;
- Correspondence: (Q.G.); (Y.L.); Tel.: +86-10-5890-0410 (Q.G.); Fax: +86-10-5890-0445 (Q.G.)
| |
Collapse
|
16
|
Sebkhi N, Bhavsar A, Anderson DV, Wang J, Inan OT. Inertial Measurements for Tongue Motion Tracking Based on Magnetic Localization with Orientation Compensation. IEEE SENSORS JOURNAL 2021; 21:7964-7971. [PMID: 33746627 PMCID: PMC7978385 DOI: 10.1109/jsen.2020.3046469] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Permanent magnet localization (PML) is designed for applications requiring non-line-of-sight motion tracking with millimetric accuracy. Current PML-based tongue tracking is not only impractical for daily use due to many sensors being placed around the mouth, but also requires a large training set of tracer motion. Our method was designed to overcome these shortcomings by generating a local magnetic field and removing the need for the localization to be trained with tracer rotations. An inertial measurement unit (IMU) is used as a tracer that moves in a local magnetic field generated by a magnet strip. The magnetic strength can be optimized to enable the strip to be placed further away from the tracer, thus hidden from view. The tracer is small (6×6×0.8 mm3) to reduce hindrance to natural tongue movements, and the strip is designed to be worn as a neckband. The IMU's magnetometer measures the local magnetic field which is compensated for the tracer's orientation by using the IMU's accelerometer and gyroscope. The orientation-compensated magnetic measurements are then fed into a localization algorithm that estimates the tracer's 3D position. The objective of this study is to evaluate the tracking accuracy of our method. In a 8×8×5 cm3 volume, positional errors of 1.6 mm (median) and 2.4 mm (third quartile, Q3) were achieved on a tracer being rotated ±50° along both pitch and roll. These results indicate this technology is promising for tongue tracking applications.
Collapse
Affiliation(s)
- Nordine Sebkhi
- School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Arpan Bhavsar
- School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - David V Anderson
- School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jun Wang
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, TX, USA
| | - Omer T Inan
- School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
17
|
Lee W, Seong JJ, Ozlu B, Shim BS, Marakhimov A, Lee S. Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review. SENSORS (BASEL, SWITZERLAND) 2021; 21:1399. [PMID: 33671282 PMCID: PMC7922488 DOI: 10.3390/s21041399] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 02/01/2021] [Accepted: 02/12/2021] [Indexed: 11/16/2022]
Abstract
Voice is one of the essential mechanisms for communicating and expressing one's intentions as a human being. There are several causes of voice inability, including disease, accident, vocal abuse, medical surgery, ageing, and environmental pollution, and the risk of voice loss continues to increase. Novel approaches should have been developed for speech recognition and production because that would seriously undermine the quality of life and sometimes leads to isolation from society. In this review, we survey mouth interface technologies which are mouth-mounted devices for speech recognition, production, and volitional control, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articulography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques. We especially research various deep learning technologies related to voice recognition, including visual speech recognition, silent speech interface, and analyze its flow, and systematize them into a taxonomy. Finally, we discuss methods to solve the communication problems of people with disabilities in speaking and future research with respect to deep learning components.
Collapse
Affiliation(s)
- Wookey Lee
- Biomedical Science and Engineering & Dept. of Industrial Security Governance & IE, Inha University, 100 Inharo, Incheon 22212, Korea;
| | - Jessica Jiwon Seong
- Department of Industrial Security Governance, Inha University, 100 Inharo, Incheon 22212, Korea;
| | - Busra Ozlu
- Biomedical Science and Engineering & Department of Chemical Engineering, Inha University, 100 Inharo, Incheon 22212, Korea; (B.O.); (B.S.S.)
| | - Bong Sup Shim
- Biomedical Science and Engineering & Department of Chemical Engineering, Inha University, 100 Inharo, Incheon 22212, Korea; (B.O.); (B.S.S.)
| | | | - Suan Lee
- School of Computer Science, Semyung University, Jecheon 27136, Korea
| |
Collapse
|
18
|
Automated Atrial Fibrillation Detection using a Hybrid CNN-LSTM Network on Imbalanced ECG Datasets. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102194] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
19
|
Tang Z, Chai X, Wang Y, Cao S. Gene Regulatory Network Construction Based on a Particle Swarm Optimization of a Long Short-term Memory Network. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191023115224] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The Gene Regulatory Network (GRN) is a model for studying the
function and behavior of genes by treating the genome as a whole, which can reveal the gene
expression mechanism. However, due to the dynamics, nonlinearity, and complexity of gene
expression data, it is a challenging task to construct a GRN precisely. And in the circulating
cooling water system, the Slime-Forming Bacteria (SFB) is one of the bacteria that helps to form
dirt. In order to explore the microbial fouling mechanism of SFB, constructing a GRN for the
fouling-forming genes of SFB is significant.
Objective:
Propose an effective GRN construction method and construct a GRN for the foulingforming
genes of SFB.
Methods:
In this paper, a combination method of Long Short-Term Memory Network (LSTM) and
Mean Impact Value (MIV) was applied for GRN reconstruction. Firstly, LSTM was employed to
establish a gene expression prediction model. To improve the performance of LSTM, a Particle
Swarm Optimization (PSO) was introduced to optimize the weight and learning rate. Then, the
MIV was used to infer the regulation among genes. In view of the fouling-forming problem of
SFB, we have designed electromagnetic field experiments and transcriptome sequencing
experiments to locate the fouling-forming genes and obtain gene expression data.
Results:
In order to test the proposed approach, the proposed method was applied to three datasets:
a simulated dataset and two real biology datasets. By comparing with other methods, the
experimental results indicate that the proposed method has higher modeling accuracy and it can be
used to effectively construct a GRN. And at last, a GRN for fouling-forming genes of SFB was
constructed using the proposed approach.
Conclusion:
The experiments indicated that the proposed approach can reconstruct a GRN
precisely, and compared with other approaches, the proposed approach performs better in
extracting the regulations among genes.
Collapse
Affiliation(s)
- Zhenhao Tang
- School of Automation Engineering, Northeast Electric Power University, Jilin, China
| | - Xiangying Chai
- School of Automation Engineering, Northeast Electric Power University, Jilin, China
| | - Yu Wang
- School of Automation Engineering, Northeast Electric Power University, Jilin, China
| | - Shengxian Cao
- School of Automation Engineering, Northeast Electric Power University, Jilin, China
| |
Collapse
|
20
|
Kolokas N, Drosou A, Tzovaras D. Text synthesis from keywords: a comparison of recurrent-neural-network-based architectures and hybrid approaches. Neural Comput Appl 2020. [DOI: 10.1007/s00521-019-04435-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
21
|
Bozec A, Culié D, Poissonnet G, Dassonville O. Current Role of Total Laryngectomy in the Era of Organ Preservation. Cancers (Basel) 2020; 12:cancers12030584. [PMID: 32138168 PMCID: PMC7139381 DOI: 10.3390/cancers12030584] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 02/26/2020] [Accepted: 02/27/2020] [Indexed: 01/02/2023] Open
Abstract
In this article, we aimed to discuss the role of total laryngectomy (TL) in the management of patients with larynx cancer (LC) in the era of organ preservation. Before the 1990s, TL followed by radiotherapy (RT) was the standard treatment for patients with locally advanced LC. Over the last 30 years, various types of larynx preservation (LP) programs associating induction or concurrent chemotherapy (CT) with RT have been developed, with the aim of treating locally advanced LC patients while preserving the larynx and its functions. Overall, more than two-thirds of patients included in a LP program will not require total laryngectomy (TL) and will preserve a functional larynx. However, despite these advances, the larynx is the only tumor site in the upper aero-digestive tract for which prognosis has not improved during recent decades. Indeed, none of these LP protocols have shown any survival advantage compared to primary radical surgery, and it appears that certain LC patients do not benefit from an LP program. This is the case for patients with T4a LC (extra-laryngeal tumor extension through the thyroid cartilage) or with poor pretreatment laryngeal function and for whom primary TL is still the preferred therapeutic option. Moreover, TL is the standard salvage therapy for patients with recurrent tumor after an LP protocol.
Collapse
Affiliation(s)
- Alexandre Bozec
- Correspondence: ; Tel.: +0033-4-92-03-17-66; Fax: +0033-4-92-03-17-64
| | | | | | | |
Collapse
|
22
|
Oh SL, Ng EYK, Tan RS, Acharya UR. Automated beat-wise arrhythmia diagnosis using modified U-net on extended electrocardiographic recordings with heterogeneous arrhythmia types. Comput Biol Med 2018; 105:92-101. [PMID: 30599317 DOI: 10.1016/j.compbiomed.2018.12.012] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 12/18/2018] [Accepted: 12/18/2018] [Indexed: 01/10/2023]
Abstract
Abnormality of the cardiac conduction system can induce arrhythmia - abnormal heart rhythm - that can frequently lead to other cardiac diseases and complications, and are sometimes life-threatening. These conduction system perturbations can manifest as morphological changes on the surface electrocardiographic (ECG) signal. Assessment of these morphological changes can be challenging and time-consuming, as ECG signal features are often low in amplitude and subtle. The main aim of this study is to develop an automated computer aided diagnostic (CAD) system that can expedite the process of arrhythmia diagnosis, as an aid to clinicians to provide appropriate and timely intervention to patients. We propose an autoencoder of ECG signals that can diagnose normal sinus beats, atrial premature beats (APB), premature ventricular contractions (PVC), left bundle branch block (LBBB) and right bundle branch block (RBBB). Apart from the first, the rest are morphological beat-to-beat elements that characterize and constitute complex arrhythmia. The novelty of this work lies in how we modified the U-net model to perform beat-wise analysis on heterogeneously segmented ECGs of variable lengths derived from the MIT-BIH arrhythmia database. The proposed system has demonstrated self-learning ability in generating class activations maps, and these generated maps faithfully reflect the cardiac conditions in each ECG cardiac cycle. It has attained a high classification accuracy of 97.32% in diagnosing cardiac conditions, and 99.3% for R peak detection using a ten-fold cross validation strategy. Our developed model can help physicians to screen ECG accurately, potentially resulting in timely intervention of patients with arrhythmia.
Collapse
Affiliation(s)
- Shu Lih Oh
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore
| | - Eddie Y K Ng
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
| | - Ru San Tan
- National Heart Centre Singapore, Singapore
| | - U Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore; Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Singapore; School of Medicine, Faculty of Health and Medical Sciences, Taylor's University, 47500, Subang Jaya, Malaysia.
| |
Collapse
|
23
|
Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats. Comput Biol Med 2018; 102:278-287. [PMID: 29903630 DOI: 10.1016/j.compbiomed.2018.06.002] [Citation(s) in RCA: 227] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 06/01/2018] [Accepted: 06/02/2018] [Indexed: 11/22/2022]
Abstract
Arrhythmia is a cardiac conduction disorder characterized by irregular heartbeats. Abnormalities in the conduction system can manifest in the electrocardiographic (ECG) signal. However, it can be challenging and time-consuming to visually assess the ECG signals due to the very low amplitudes. Implementing an automated system in the clinical setting can potentially help expedite diagnosis of arrhythmia, and improve the accuracies. In this paper, we propose an automated system using a combination of convolutional neural network (CNN) and long short-term memory (LSTM) for diagnosis of normal sinus rhythm, left bundle branch block (LBBB), right bundle branch block (RBBB), atrial premature beats (APB) and premature ventricular contraction (PVC) on ECG signals. The novelty of this work is that we used ECG segments of variable length from the MIT-BIT arrhythmia physio bank database. The proposed system demonstrated high classification performance in the handling of variable-length data, achieving an accuracy of 98.10%, sensitivity of 97.50% and specificity of 98.70% using ten-fold cross validation strategy. Our proposed model can aid clinicians to detect common arrhythmias accurately on routine screening ECG.
Collapse
|