1
|
Yang J, Lü J, Qiu Z, Zhang M, Yan H. Risk prediction of pulse wave for hypertensive target organ damage based on frequency-domain feature map. Med Eng Phys 2024; 126:104161. [PMID: 38621841 DOI: 10.1016/j.medengphy.2024.104161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/29/2024] [Accepted: 03/27/2024] [Indexed: 04/17/2024]
Abstract
The application of deep learning to the classification of pulse waves in Traditional Chinese Medicine (TCM) related to hypertensive target organ damage (TOD) is hindered by challenges such as low classification accuracy and inadequate generalization performance. To address these challenges, we introduce a lightweight transfer learning model named MobileNetV2SCP. This model transforms time-domain pulse waves into 36-dimensional frequency-domain waveform feature maps and establishes a dedicated pre-training network based on these maps to enhance the learning capability for small samples. To improve global feature correlation, we incorporate a novel fusion attention mechanism (SAS) into the inverted residual structure, along with the utilization of 3 × 3 convolutional layers and BatchNorm layers to mitigate model overfitting. The proposed model is evaluated using cross-validation results from 805 cases of pulse waves associated with hypertensive TOD. The assessment metrics, including Accuracy (92.74 %), F1-score (91.47 %), and Area Under Curve (AUC) (97.12 %), demonstrate superior classification accuracy and generalization performance compared to various state-of-the-art models. Furthermore, this study investigates the correlations between time-domain and frequency-domain features in pulse waves and their classification in hypertensive TOD. It analyzes key factors influencing pulse wave classification, providing valuable insights for the clinical diagnosis of TOD.
Collapse
Affiliation(s)
- Jingdong Yang
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
| | - Jiangtao Lü
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Zehao Qiu
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Mengchu Zhang
- Shanghai Key Laboratory of Health Identification and Assessment, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Haixia Yan
- Shanghai Key Laboratory of Health Identification and Assessment, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.
| |
Collapse
|
2
|
Kumar Y, Koul A, Kamini, Woźniak M, Shafi J, Ijaz MF. Automated detection and recognition system for chewable food items using advanced deep learning models. Sci Rep 2024; 14:6589. [PMID: 38504098 PMCID: PMC10951243 DOI: 10.1038/s41598-024-57077-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 03/14/2024] [Indexed: 03/21/2024] Open
Abstract
Identifying and recognizing the food on the basis of its eating sounds is a challenging task, as it plays an important role in avoiding allergic foods, providing dietary preferences to people who are restricted to a particular diet, showcasing its cultural significance, etc. In this research paper, the aim is to design a novel methodology that helps to identify food items by analyzing their eating sounds using various deep learning models. To achieve this objective, a system has been proposed that extracts meaningful features from food-eating sounds with the help of signal processing techniques and deep learning models for classifying them into their respective food classes. Initially, 1200 audio files for 20 food items labeled have been collected and visualized to find relationships between the sound files of different food items. Later, to extract meaningful features, various techniques such as spectrograms, spectral rolloff, spectral bandwidth, and mel-frequency cepstral coefficients are used for the cleaning of audio files as well as to capture the unique characteristics of different food items. In the next phase, various deep learning models like GRU, LSTM, InceptionResNetV2, and the customized CNN model have been trained to learn spectral and temporal patterns in audio signals. Besides this, the models have also been hybridized i.e. Bidirectional LSTM + GRU and RNN + Bidirectional LSTM, and RNN + Bidirectional GRU to analyze their performance for the same labeled data in order to associate particular patterns of sound with their corresponding class of food item. During evaluation, the highest accuracy, precision,F1 score, and recall have been obtained by GRU with 99.28%, Bidirectional LSTM + GRU with 97.7% as well as 97.3%, and RNN + Bidirectional LSTM with 97.45%, respectively. The results of this study demonstrate that deep learning models have the potential to precisely identify foods on the basis of their sound by computing the best outcomes.
Collapse
Affiliation(s)
- Yogesh Kumar
- Department of CSE, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
| | - Apeksha Koul
- Department of Computer Science and Engineering, Punjabi University, Patiala, Punjab, India
| | - Kamini
- Southern Alberta Institute of Technology, Calgary, Alberta, Canada
| | - Marcin Woźniak
- Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100, Gliwice, Poland.
| | - Jana Shafi
- Department of Computer Engineering and Information, College of Engineering in Wadi Al Dawasir, Prince Sattam Bin Abdulaziz University, 11991, Wadi Al Dawasir, Saudi Arabia
| | - Muhammad Fazal Ijaz
- School of IT and Engineering, Melbourne Institute of Technology, Melbourne, 3000, Australia.
| |
Collapse
|
3
|
Alzakari SA, Hassairi S, Ali Alhussan A, Ejbali R. A mobile Deep Sparse Wavelet autoencoder for Arabic acoustic unit modeling and recognition. Heliyon 2024; 10:e26583. [PMID: 38434048 PMCID: PMC10906401 DOI: 10.1016/j.heliyon.2024.e26583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/03/2024] [Accepted: 02/15/2024] [Indexed: 03/05/2024] Open
Abstract
In this manuscript, we introduce a novel methodology for modeling acoustic units within a mobile architecture, employing a synergistic combination of various motivating techniques, including deep learning, sparse coding, and wavelet networks. The core concept involves constructing a Deep Sparse Wavelet Network (DSWN) through the integration of stacked wavelet autoencoders. The DSWN is designed to classify a specific class and discern it from other classes within a dataset of acoustic units. Mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) features are utilized for encoding speech units. This approach is tailored to leverage the computational capabilities of mobile devices by establishing deep networks with minimal connections, thereby immediately reducing computational overhead. The experimental findings demonstrate the efficacy of our system when applied to a segmented corpus of Arabic words. Notwithstanding promising results, we will explore the limitations of our methodology. One limitation concerns the use of a specific dataset of Arabic words. The generalizability of the sparse deep wavelet network (DSWN) to various contexts requires further investigation "We will evaluate the impact of speech variations, such as accents, on the performance of our model, for a nuanced understanding.
Collapse
Affiliation(s)
- Sarah A. Alzakari
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Salima Hassairi
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W 6072, Gabes, Tunisia
| | - Amel Ali Alhussan
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Ridha Ejbali
- Research Team in Intelligent Machines, National School of Engineers of Gabes, B.P. W 6072, Gabes, Tunisia
| |
Collapse
|
4
|
Pei W, Li Y, Wen P, Yang F, Ji X. An automatic method using MFCC features for sleep stage classification. Brain Inform 2024; 11:6. [PMID: 38340211 DOI: 10.1186/s40708-024-00219-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/19/2024] [Indexed: 02/12/2024] Open
Abstract
Sleep stage classification is a necessary step for diagnosing sleep disorders. Generally, experts use traditional methods based on every 30 seconds (s) of the biological signals, such as electrooculograms (EOGs), electrocardiograms (ECGs), electromyograms (EMGs), and electroencephalograms (EEGs), to classify sleep stages. Recently, various state-of-the-art approaches based on a deep learning model have been demonstrated to have efficient and accurate outcomes in sleep stage classification. In this paper, a novel deep convolutional neural network (CNN) combined with a long short-time memory (LSTM) model is proposed for sleep scoring tasks. A key frequency domain feature named Mel-frequency Cepstral Coefficient (MFCC) is extracted from EEG and EMG signals. The proposed method can learn features from frequency domains on different bio-signal channels. It firstly extracts the MFCC features from multi-channel signals, and then inputs them to several convolutional layers and an LSTM layer. Secondly, the learned representations are fed to a fully connected layer and a softmax classifier for sleep stage classification. The experiments are conducted on two widely used sleep datasets, Sleep Heart Health Study (SHHS) and Vincent's University Hospital/University College Dublin Sleep Apnoea (UCDDB) to test the effectiveness of the method. The results of this study indicate that the model can perform well in the classification of sleep stages using the features of the 2-dimensional (2D) MFCC feature. The advantage of using the feature is that it can be used to input a two-dimensional data stream, which can be used to retain information about each sleep stage. Using 2D data streams can reduce the time it takes to retrieve the data from the one-dimensional stream. Another advantage of this method is that it eliminates the need for deep layers, which can help improve the performance of the model. For instance, by reducing the number of layers, our seven layers of the model structure takes around 400 s to train and test 100 subjects in the SHHS1 dataset. Its best accuracy and Cohen's kappa are 82.35% and 0.75 for the SHHS dataset, and 73.07% and 0.63 for the UCDDB dataset, respectively.
Collapse
Affiliation(s)
- Wei Pei
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, 4350, Australia.
| | - Yan Li
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
| | - Peng Wen
- School of Engineering, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
| | - Fuwen Yang
- School of Engineering and Built Environment, Griffith University, Gold Coast, QLD, 4222, Australia
| | - Xiaopeng Ji
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
| |
Collapse
|
5
|
汪 琴, 杨 宏, 潘 家, 田 英, 郭 涛, 王 威. [Heart sound classification algorithm based on time-frequency combination feature and adaptive fuzzy neural network]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2023; 40:1152-1159. [PMID: 38151938 PMCID: PMC10753303 DOI: 10.7507/1001-5515.202301015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 11/07/2023] [Indexed: 12/29/2023]
Abstract
Feature extraction methods and classifier selection are two critical steps in heart sound classification. To capture the pathological features of heart sound signals, this paper introduces a feature extraction method that combines mel-frequency cepstral coefficients (MFCC) and power spectral density (PSD). Unlike conventional classifiers, the adaptive neuro-fuzzy inference system (ANFIS) was chosen as the classifier for this study. In terms of experimental design, we compared different PSDs across various time intervals and frequency ranges, selecting the characteristics with the most effective classification outcomes. We compared four statistical properties, including mean PSD, standard deviation PSD, variance PSD, and median PSD. Through experimental comparisons, we found that combining the features of median PSD and MFCC with heart sound systolic period of 100-300 Hz yielded the best results. The accuracy, precision, sensitivity, specificity, and F1 score were determined to be 96.50%, 99.27%, 93.35%, 99.60%, and 96.35%, respectively. These results demonstrate the algorithm's significant potential for aiding in the diagnosis of congenital heart disease.
Collapse
Affiliation(s)
- 琴 汪
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| | - 宏波 杨
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
- 昆明医科大学(昆明 650500)Kunming Medical University, Kunming 650500, P. R. China
| | - 家华 潘
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| | - 英杰 田
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| | - 涛 郭
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
- 昆明医科大学(昆明 650500)Kunming Medical University, Kunming 650500, P. R. China
| | - 威廉 王
- 云南大学 信息学院(昆明 650504)School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| |
Collapse
|
6
|
Singh RB, Zhuang H. Measurements, Analysis, Classification, and Detection of Gunshot and Gunshot-like Sounds. Sensors (Basel) 2022; 22:9170. [PMID: 36501869 PMCID: PMC9737970 DOI: 10.3390/s22239170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]
Abstract
Gun violence has been on the rise in recent years. To help curb the downward spiral of this negative influence in communities, machine learning strategies on gunshot detection can be developed and deployed. After outlining the procedure by which a typical type of gunshot-like sounds were measured, this paper focuses on the analysis of feature importance pertaining to gunshot and gunshot-like sounds. The random forest mean decrease in impurity and the SHapley Additive exPlanations feature importance analysis were employed for this task. From the feature importance analysis, feature reduction was then carried out. Via the Mel-frequency cepstral coefficients feature extraction process on 1-sec audio clips, these extracted features were then reduced to a more manageable quantity using the above-mentioned feature reduction processes. These reduced features were sent to a random forest classifier. The SHapley Additive exPlanations feature importance output was compared to that of the mean decrease in impurity feature importance. The results show what Mel-frequency cepstral coefficients features are important in discriminating gunshot sounds and various gunshot-like sounds. Together with the feature importance/reduction processes, the recent uniform manifold approximation and projection method was used to compare the closeness of various gunshot-like sounds to gunshot sounds in the feature space. Finally, the approach presented in this paper provides people with a viable means to make gunshot sounds more discernible from other sounds.
Collapse
|
7
|
Wu YC, Han CC, Chang CS, Chang FL, Chen SF, Shieh TY, Chen HM, Lin JY. Development of an Electronic Stethoscope and a Classification Algorithm for Cardiopulmonary Sounds. Sensors (Basel) 2022; 22:s22114263. [PMID: 35684884 PMCID: PMC9185316 DOI: 10.3390/s22114263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 05/27/2023]
Abstract
With conventional stethoscopes, the auscultation results may vary from one doctor to another due to a decline in his/her hearing ability with age or his/her different professional training, and the problematic cardiopulmonary sound cannot be recorded for analysis. In this paper, to resolve the above-mentioned issues, an electronic stethoscope was developed consisting of a traditional stethoscope with a condenser microphone embedded in the head to collect cardiopulmonary sounds and an AI-based classifier for cardiopulmonary sounds was proposed. Different deployments of the microphone in the stethoscope head with amplification and filter circuits were explored and analyzed using fast Fourier transform (FFT) to evaluate the effects of noise reduction. After testing, the microphone placed in the stethoscope head surrounded by cork is found to have better noise reduction. For classifying normal (healthy) and abnormal (pathological) cardiopulmonary sounds, each sample of cardiopulmonary sound is first segmented into several small frames and then a principal component analysis is performed on each small frame. The difference signal is obtained by subtracting PCA from the original signal. MFCC (Mel-frequency cepstral coefficients) and statistics are used for feature extraction based on the difference signal, and ensemble learning is used as the classifier. The final results are determined by voting based on the classification results of each small frame. After the testing, two distinct classifiers, one for heart sounds and one for lung sounds, are proposed. The best voting for heart sounds falls at 5-45% and the best voting for lung sounds falls at 5-65%. The best accuracy of 86.9%, sensitivity of 81.9%, specificity of 91.8%, and F1 score of 86.1% are obtained for heart sounds using 2 s frame segmentation with a 20% overlap, whereas the best accuracy of 73.3%, sensitivity of 66.7%, specificity of 80%, and F1 score of 71.5% are yielded for lung sounds using 5 s frame segmentation with a 50% overlap.
Collapse
Affiliation(s)
- Yu-Chi Wu
- Department of Electrical Engineering, National United University, Miaoli City 36003, Taiwan; (F.-L.C.); (S.-F.C.); (J.-Y.L.)
| | - Chin-Chuan Han
- Department of Computer Science and Information Engineering, National United University, Miaoli City 36003, Taiwan;
| | - Chao-Shu Chang
- Department of Information Management, National United University, Miaoli City 36003, Taiwan;
| | - Fu-Lin Chang
- Department of Electrical Engineering, National United University, Miaoli City 36003, Taiwan; (F.-L.C.); (S.-F.C.); (J.-Y.L.)
| | - Shi-Feng Chen
- Department of Electrical Engineering, National United University, Miaoli City 36003, Taiwan; (F.-L.C.); (S.-F.C.); (J.-Y.L.)
| | - Tsu-Yi Shieh
- Section of Clinical Training, Department of Medical Education, Taichung Veterans General Hospital, Taichung City 40705, Taiwan;
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung City 40705, Taiwan
| | - Hsian-Min Chen
- Center for Quantitative Imaging in Medicine (CQUIM), Department of Medical Research, Taichung Veterans General Hospital, Taichung City 40705, Taiwan;
| | - Jin-Yuan Lin
- Department of Electrical Engineering, National United University, Miaoli City 36003, Taiwan; (F.-L.C.); (S.-F.C.); (J.-Y.L.)
| |
Collapse
|
8
|
Baliram Singh R, Zhuang H, Pawani JK. Data Collection, Modeling, and Classification for Gunshot and Gunshot-like Audio Events: A Case Study. Sensors (Basel) 2021; 21:s21217320. [PMID: 34770635 PMCID: PMC8587567 DOI: 10.3390/s21217320] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 11/01/2021] [Accepted: 11/01/2021] [Indexed: 11/23/2022]
Abstract
Distinguishing between a dangerous audio event like a gun firing and other non-life-threatening events, such as a plastic bag bursting, can mean the difference between life and death and, therefore, the necessary and unnecessary deployment of public safety personnel. Sounds generated by plastic bag explosions are often confused with real gunshot sounds, by either humans or computer algorithms. As a case study, the research reported in this paper offers insight into sounds of plastic bag explosions and gunshots. An experimental study in this research reveals that a deep learning-based classification model trained with a popular urban sound dataset containing gunshot sounds cannot distinguish plastic bag pop sounds from gunshot sounds. This study further shows that the same deep learning model, if trained with a dataset containing plastic pop sounds, can effectively detect the non-life-threatening sounds. For this purpose, first, a collection of plastic bag-popping sounds was recorded in different environments with varying parameters, such as plastic bag size and distance from the recording microphones. The audio clips’ duration ranged from 400 ms to 600 ms. This collection of data was then used, together with a gunshot sound dataset, to train a classification model based on a convolutional neural network (CNN) to differentiate life-threatening gunshot events from non-life-threatening plastic bag explosion events. A comparison between two feature extraction methods, the Mel-frequency cepstral coefficients (MFCC) and Mel-spectrograms, was also done. Experimental studies conducted in this research show that once the plastic bag pop sounds are injected into model training, the CNN classification model performs well in distinguishing actual gunshot sounds from plastic bag sounds.
Collapse
Affiliation(s)
- Rajesh Baliram Singh
- EECS Department, Florida Atlantic University, Boca Raton, FL 33431, USA;
- Correspondence:
| | - Hanqi Zhuang
- EECS Department, Florida Atlantic University, Boca Raton, FL 33431, USA;
| | | |
Collapse
|
9
|
Al-Dhlan KA. An adaptive speech signal processing for COVID-19 detection using deep learning approach. Int J Speech Technol 2021; 25:641-649. [PMID: 34456611 PMCID: PMC8380014 DOI: 10.1007/s10772-021-09878-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 07/29/2021] [Indexed: 06/13/2023]
Abstract
Researchers and scientists have been conducting plenty of research on COVID-19 since its outbreak. Healthcare professionals, laboratory technicians, and front-line workers like sanitary workers, data collectors are putting tremendous efforts to avoid the prevalence of the COVID-19 pandemic. Currently, the reverse transcription polymerase chain reaction (RT-PCR) testing strategy determines the COVID-19 virus. This RT-PCR processing is more expensive and induces violation of social distancing rules, and time-consuming. Therefore, this research work introduces generative adversarial network deep learning for quickly detect COVID-19 from speech signals. This proposed system consists of two stages, pre-processing and classification. This work uses the least mean square (LMS) filter algorithm to remove the noise or artifacts from input speech signals. After removing the noise, the proposed generative adversarial network classification method analyses the mel-frequency cepstral coefficients features and classifies the COVID-19 signals and non-COVID-19 signals. The results show a more prominent correlation of MFCCs with various COVID-19 cough and breathing sounds, while the sound is more robust between COVID-19 and non-COVID-19 models. As compared with the existing Artificial Neural Network, Convolutional Neural Network, and Recurrent Neural Network, the proposed GAN method obtains the best result. The precision, recall, accuracy, and F-measure of the proposed GAN are 96.54%, 96.15%, 98.56%, and 0.96, respectively.
Collapse
Affiliation(s)
- Kawther A. Al-Dhlan
- Information and Computer Science Department, University of Ha’il, Hail, Kingdom of Saudi Arabia
| |
Collapse
|
10
|
Abstract
Imprecise articulation is the major issue reported in various types of dysarthria. Detection of articulation errors can help in diagnosis. The cues derived from both the burst and the formant transitions contribute to the discrimination of place of articulation of stops. It is believed that any acoustic deviations in stops due to articulation error can be analyzed by deriving features around the burst and the voicing onsets. The derived features can be used to discriminate the normal and dysarthric speech. In this work, a method is proposed to differentiate the voiceless stops produced by the normal speakers from the dysarthric by deriving the spectral moments, two-dimensional discrete cosine transform of linear prediction spectrum and Mel frequency cepstral coefficients features. These features and cosine distance based classifier is used for the classification of normal and dysarthic speech.
Collapse
Affiliation(s)
- Upashana Goswami
- Department of ECE, Gauhati University, Guwahati, Assam, 781014, India
| | - S R Nirmala
- Department of ECE, Gauhati University, Guwahati, Assam, 781014, India.
- School of ECE, KLE Technological University, Hubballi, Karnataka, 580031, India.
| | - C M Vikram
- Department of EEE, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India
| | - Sishir Kalita
- Department of EEE, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India
| | - S R M Prasanna
- Department of EEE, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India
| |
Collapse
|
11
|
Ali H, Ahmad N, Zhou X, Iqbal K, Ali SM. DWT features performance analysis for automatic speech recognition of Urdu. Springerplus 2015; 3:204. [PMID: 25674450 PMCID: PMC4320178 DOI: 10.1186/2193-1801-3-204] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 04/10/2014] [Indexed: 12/04/2022]
Abstract
This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of Urdu. A variety of age and dialect has been covered by using a balanced corpus approach. After extraction of features, the classification has been achieved by using Linear Discriminant Analysis. After the classification task, the confusion matrix obtained for the DWT features has been compared with the one obtained for Mel-Frequency Cepstral Coefficients based speech recognition. The framework has been trained and tested for speech data recorded under controlled environments. The experimental results are useful in determination of the optimum features for speech recognition task.
Collapse
Affiliation(s)
- Hazrat Ali
- Machine Learning Group, Department of Computing, City University London, Northampton Square, EC1V 0HB London, UK ; School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083 Beijing, China
| | - Nasir Ahmad
- Department of Computer Systems Engineering, University of Engineering and Technology Peshawar, 25120 Peshawar, Pakistan
| | - Xianwei Zhou
- School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083 Beijing, China
| | - Khalid Iqbal
- School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083 Beijing, China
| | - Sahibzada Muhammad Ali
- Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND 58108-6050 USA
| |
Collapse
|